This patch series contains various fixes for pci-mvebu.c driver. Only
bugfixes, no new features.
For pci-mvebu.c I have prepared another 30+ patches with cleanups and
new features, they are currently available in my git branch:
https://git.kernel.org/pub/scm/linux/kernel/git/pali/linux.git/log/?h=pci-mvebu
Pali Rohár (15):
PCI: mvebu: Check for valid ports
PCI: mvebu: Check for errors from pci_bridge_emul_init() call
PCI: mvebu: Check that PCI bridge specified in DT has function number
zero
PCI: mvebu: Handle invalid size of read config request
PCI: mvebu: Disallow mapping interrupts on emulated bridges
PCI: mvebu: Fix support for bus mastering and PCI_COMMAND on emulated
bridge
PCI: mvebu: Do not modify PCI IO type bits in conf_write
PCI: mvebu: Propagate errors when updating PCI_IO_BASE and
PCI_MEM_BASE registers
PCI: mvebu: Setup PCIe controller to Root Complex mode
PCI: mvebu: Set PCI Bridge Class Code to PCI Bridge
PCI: mvebu: Fix configuring secondary bus of PCIe Root Port via
emulated bridge
PCI: mvebu: Fix support for PCI_BRIDGE_CTL_BUS_RESET on emulated
bridge
PCI: mvebu: Fix support for PCI_EXP_DEVCTL on emulated bridge
PCI: mvebu: Fix support for PCI_EXP_RTSTA on emulated bridge
PCI: mvebu: Fix support for DEVCAP2, DEVCTL2 and LNKCTL2 registers on
emulated bridge
drivers/pci/controller/pci-mvebu.c | 380 ++++++++++++++++++++++++-----
1 file changed, 313 insertions(+), 67 deletions(-)
--
2.20.1
Some mvebu ports do not have to be initialized. So skip these uninitialized
mvebu ports in every port iteration function to prevent access to unmapped
memory or dereferencing NULL pointers. Uninitialized mvebu port has base
address set to NULL.
Signed-off-by: Pali Rohár <[email protected]>
Cc: [email protected]
---
drivers/pci/controller/pci-mvebu.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/drivers/pci/controller/pci-mvebu.c b/drivers/pci/controller/pci-mvebu.c
index 06f06085beba..d655c887ba1b 100644
--- a/drivers/pci/controller/pci-mvebu.c
+++ b/drivers/pci/controller/pci-mvebu.c
@@ -625,6 +625,9 @@ static struct mvebu_pcie_port *mvebu_pcie_find_port(struct mvebu_pcie *pcie,
for (i = 0; i < pcie->nports; i++) {
struct mvebu_pcie_port *port = &pcie->ports[i];
+ if (!port->base)
+ continue;
+
if (bus->number == 0 && port->devfn == devfn)
return port;
if (bus->number != 0 &&
@@ -800,6 +803,8 @@ static int mvebu_pcie_suspend(struct device *dev)
pcie = dev_get_drvdata(dev);
for (i = 0; i < pcie->nports; i++) {
struct mvebu_pcie_port *port = pcie->ports + i;
+ if (!port->base)
+ continue;
port->saved_pcie_stat = mvebu_readl(port, PCIE_STAT_OFF);
}
@@ -814,6 +819,8 @@ static int mvebu_pcie_resume(struct device *dev)
pcie = dev_get_drvdata(dev);
for (i = 0; i < pcie->nports; i++) {
struct mvebu_pcie_port *port = pcie->ports + i;
+ if (!port->base)
+ continue;
mvebu_writel(port, port->saved_pcie_stat, PCIE_STAT_OFF);
mvebu_pcie_setup_hw(port);
}
--
2.20.1
Function mvebu_pcie_hw_rd_conf() does not handle invalid size. So correctly
set read value to all-ones and return appropriate error return value
PCIBIOS_BAD_REGISTER_NUMBER like in mvebu_pcie_hw_wr_conf() function.
Signed-off-by: Pali Rohár <[email protected]>
Cc: [email protected]
---
drivers/pci/controller/pci-mvebu.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/pci/controller/pci-mvebu.c b/drivers/pci/controller/pci-mvebu.c
index 08274132cdfb..19c6ee298442 100644
--- a/drivers/pci/controller/pci-mvebu.c
+++ b/drivers/pci/controller/pci-mvebu.c
@@ -261,6 +261,9 @@ static int mvebu_pcie_hw_rd_conf(struct mvebu_pcie_port *port,
case 4:
*val = readl_relaxed(conf_data);
break;
+ default:
+ *val = 0xffffffff;
+ return PCIBIOS_BAD_REGISTER_NUMBER;
}
return PCIBIOS_SUCCESSFUL;
--
2.20.1
Interrupt support on mvebu emulated bridges is not implemented yet.
So properly indicate return value to callers that they cannot request
interrupts from emulated bridge.
Signed-off-by: Pali Rohár <[email protected]>
Cc: [email protected]
---
drivers/pci/controller/pci-mvebu.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/drivers/pci/controller/pci-mvebu.c b/drivers/pci/controller/pci-mvebu.c
index 19c6ee298442..a3df352d440e 100644
--- a/drivers/pci/controller/pci-mvebu.c
+++ b/drivers/pci/controller/pci-mvebu.c
@@ -705,6 +705,15 @@ static struct pci_ops mvebu_pcie_ops = {
.write = mvebu_pcie_wr_conf,
};
+static int mvebu_pcie_map_irq(const struct pci_dev *dev, u8 slot, u8 pin)
+{
+ /* Interrupt support on mvebu emulated bridges is not implemented yet */
+ if (dev->bus->number == 0)
+ return 0; /* Proper return code 0 == NO_IRQ */
+
+ return of_irq_parse_and_map_pci(dev, slot, pin);
+}
+
static resource_size_t mvebu_pcie_align_resource(struct pci_dev *dev,
const struct resource *res,
resource_size_t start,
@@ -1119,6 +1128,7 @@ static int mvebu_pcie_probe(struct platform_device *pdev)
bridge->sysdata = pcie;
bridge->ops = &mvebu_pcie_ops;
bridge->align_resource = mvebu_pcie_align_resource;
+ bridge->map_irq = mvebu_pcie_map_irq;
return pci_host_probe(bridge);
}
--
2.20.1
Armada XP and new hardware supports access to DEVCAP2, DEVCTL2 and LNKCTL2
configuration registers of PCIe core via PCIE_CAP_PCIEXP. So export them
via emulated software root bridge.
Pre-XP hardware does not support these registers and returns zeros.
Signed-off-by: Pali Rohár <[email protected]>
Fixes: 1f08673eef12 ("PCI: mvebu: Convert to PCI emulated bridge config space")
Cc: [email protected]
---
drivers/pci/controller/pci-mvebu.c | 23 +++++++++++++++++++++++
1 file changed, 23 insertions(+)
diff --git a/drivers/pci/controller/pci-mvebu.c b/drivers/pci/controller/pci-mvebu.c
index 798cf5cff8be..9a17bab4019f 100644
--- a/drivers/pci/controller/pci-mvebu.c
+++ b/drivers/pci/controller/pci-mvebu.c
@@ -571,6 +571,18 @@ mvebu_pci_bridge_emul_pcie_conf_read(struct pci_bridge_emul *bridge,
*value = mvebu_readl(port, PCIE_RC_RTSTA);
break;
+ case PCI_EXP_DEVCAP2:
+ *value = mvebu_readl(port, PCIE_CAP_PCIEXP + PCI_EXP_DEVCAP2);
+ break;
+
+ case PCI_EXP_DEVCTL2:
+ *value = mvebu_readl(port, PCIE_CAP_PCIEXP + PCI_EXP_DEVCTL2);
+ break;
+
+ case PCI_EXP_LNKCTL2:
+ *value = mvebu_readl(port, PCIE_CAP_PCIEXP + PCI_EXP_LNKCTL2);
+ break;
+
default:
return PCI_BRIDGE_EMUL_NOT_HANDLED;
}
@@ -683,6 +695,17 @@ mvebu_pci_bridge_emul_pcie_conf_write(struct pci_bridge_emul *bridge,
if (new & PCI_EXP_RTSTA_PME)
mvebu_writel(port, ~PCIE_INT_PM_PME, PCIE_INT_CAUSE_OFF);
break;
+
+ case PCI_EXP_DEVCTL2:
+ mvebu_writel(port, new, PCIE_CAP_PCIEXP + PCI_EXP_DEVCTL2);
+ break;
+
+ case PCI_EXP_LNKCTL2:
+ mvebu_writel(port, new, PCIE_CAP_PCIEXP + PCI_EXP_LNKCTL2);
+ break;
+
+ default:
+ break;
}
}
--
2.20.1
Comment in Armada 370 functional specification is misleading.
PCI_EXP_DEVCTL_*RE bits are supported and configures receiving of error
interrupts.
Signed-off-by: Pali Rohár <[email protected]>
Fixes: 1f08673eef12 ("PCI: mvebu: Convert to PCI emulated bridge config space")
Cc: [email protected]
---
drivers/pci/controller/pci-mvebu.c | 11 +----------
1 file changed, 1 insertion(+), 10 deletions(-)
diff --git a/drivers/pci/controller/pci-mvebu.c b/drivers/pci/controller/pci-mvebu.c
index 3075ea98c131..c9b736344b56 100644
--- a/drivers/pci/controller/pci-mvebu.c
+++ b/drivers/pci/controller/pci-mvebu.c
@@ -545,9 +545,7 @@ mvebu_pci_bridge_emul_pcie_conf_read(struct pci_bridge_emul *bridge,
break;
case PCI_EXP_DEVCTL:
- *value = mvebu_readl(port, PCIE_CAP_PCIEXP + PCI_EXP_DEVCTL) &
- ~(PCI_EXP_DEVCTL_URRE | PCI_EXP_DEVCTL_FERE |
- PCI_EXP_DEVCTL_NFERE | PCI_EXP_DEVCTL_CERE);
+ *value = mvebu_readl(port, PCIE_CAP_PCIEXP + PCI_EXP_DEVCTL);
break;
case PCI_EXP_LNKCAP:
@@ -658,13 +656,6 @@ mvebu_pci_bridge_emul_pcie_conf_write(struct pci_bridge_emul *bridge,
switch (reg) {
case PCI_EXP_DEVCTL:
- /*
- * Armada370 data says these bits must always
- * be zero when in root complex mode.
- */
- new &= ~(PCI_EXP_DEVCTL_URRE | PCI_EXP_DEVCTL_FERE |
- PCI_EXP_DEVCTL_NFERE | PCI_EXP_DEVCTL_CERE);
-
mvebu_writel(port, new, PCIE_CAP_PCIEXP + PCI_EXP_DEVCTL);
break;
--
2.20.1
It looks like that mvebu PCIe controller has for each PCIe link fully
independent PCIe host bridge and so every PCIe Root Port is isolated not
only on its own bus but also isolated from each others. But in past device
tree structure was defined to put all PCIe Root Ports (as PCI Bridge
devices) into one root bus 0 and this bus is emulated by pci-mvebu.c
driver.
Probably reason for this decision was incorrect understanding of PCIe
topology of these Armada SoCs and also reason of misunderstanding how is
PCIe controller generating Type 0 and Type 1 config requests (it is fully
different compared to other drivers). Probably incorrect setup leaded to
very surprised things like having PCIe Root Port (PCI Bridge device, with
even incorrect Device Class set to Memory Controller) and the PCIe device
behind the Root Port on the same PCI bus, which obviously was needed to
somehow hack (as these two devices cannot be in reality on the same bus).
Properly set mvebu local bus number and mvebu local device number based on
PCI Bridge secondary bus number configuration. Also correctly report
configured secondary bus number in config space. And explain in driver
comment why this setup is correct.
Signed-off-by: Pali Rohár <[email protected]>
Fixes: 1f08673eef12 ("PCI: mvebu: Convert to PCI emulated bridge config space")
Cc: [email protected]
---
drivers/pci/controller/pci-mvebu.c | 98 +++++++++++++++++++++++++++++-
1 file changed, 97 insertions(+), 1 deletion(-)
diff --git a/drivers/pci/controller/pci-mvebu.c b/drivers/pci/controller/pci-mvebu.c
index 4edce441901c..36fbdc4f0e06 100644
--- a/drivers/pci/controller/pci-mvebu.c
+++ b/drivers/pci/controller/pci-mvebu.c
@@ -127,6 +127,11 @@ static bool mvebu_pcie_link_up(struct mvebu_pcie_port *port)
return !(mvebu_readl(port, PCIE_STAT_OFF) & PCIE_STAT_LINK_DOWN);
}
+static u8 mvebu_pcie_get_local_bus_nr(struct mvebu_pcie_port *port)
+{
+ return (mvebu_readl(port, PCIE_STAT_OFF) & PCIE_STAT_BUS) >> 8;
+}
+
static void mvebu_pcie_set_local_bus_nr(struct mvebu_pcie_port *port, int nr)
{
u32 stat;
@@ -490,6 +495,20 @@ mvebu_pci_bridge_emul_base_conf_read(struct pci_bridge_emul *bridge,
*value = mvebu_readl(port, PCIE_CMD_OFF);
break;
+ case PCI_PRIMARY_BUS: {
+ /*
+ * From the whole 32bit register we support reading from HW only
+ * secondary bus number which is mvebu local bus number.
+ * Other bits are retrieved only from emulated config buffer.
+ */
+ __le32 *cfgspace = (__le32 *)&bridge->conf;
+ u32 val = le32_to_cpu(cfgspace[PCI_PRIMARY_BUS / 4]);
+ val &= ~0xff00;
+ val |= mvebu_pcie_get_local_bus_nr(port) << 8;
+ *value = val;
+ break;
+ }
+
default:
return PCI_BRIDGE_EMUL_NOT_HANDLED;
}
@@ -594,7 +613,8 @@ mvebu_pci_bridge_emul_base_conf_write(struct pci_bridge_emul *bridge,
break;
case PCI_PRIMARY_BUS:
- mvebu_pcie_set_local_bus_nr(port, conf->secondary_bus);
+ if (mask & 0xff00)
+ mvebu_pcie_set_local_bus_nr(port, conf->secondary_bus);
break;
default:
@@ -1186,8 +1206,84 @@ static int mvebu_pcie_probe(struct platform_device *pdev)
continue;
}
+ /*
+ * PCIe topology exported by mvebu hw is quite complicated. In
+ * reality has something like N fully independent host bridges
+ * where each host bridge has one PCIe Root Port (which acts as
+ * PCI Bridge device). Each host bridge has its own independent
+ * internal registers, independent access to PCI config space,
+ * independent interrupt lines, independent window and memory
+ * access configuration. But additionally there is some kind of
+ * peer-to-peer support between PCIe devices behind different
+ * host bridges limited just to forwarding of memory and I/O
+ * transactions (forwarding of error messages and config cycles
+ * is not supported). So we could say there are N independent
+ * PCIe Root Complexes.
+ *
+ * For this kind of setup DT should have been structured into
+ * N independent PCIe controllers / host bridges. But instead
+ * structure in past was defined to put PCIe Root Ports of all
+ * host bridges into one bus zero, like in classic multi-port
+ * Root Complex setup with just one host bridge.
+ *
+ * This means that pci-mvebu.c driver provides "virtual" bus 0
+ * on which registers all PCIe Root Ports (PCI Bridge devices)
+ * specified in DT by their BDF addresses and virtually routes
+ * PCI config access of each PCI bridge device to specific PCIe
+ * host bridge.
+ *
+ * Normally PCI Bridge should choose between Type 0 and Type 1
+ * config requests based on primary and secondary bus numbers
+ * configured on the bridge itself. But because mvebu PCI Bridge
+ * does not have registers for primary and secondary bus numbers
+ * in its config space, it determinates type of config requests
+ * via its own custom way.
+ *
+ * There are two options how mvebu determinate type of config
+ * request.
+ *
+ * 1. If Secondary Bus Number Enable bit is not set or is not
+ * available (applies for pre-XP PCIe controllers) then Type 0
+ * is used if target bus number equals Local Bus Number (bits
+ * [15:8] in register 0x1a04) and target device number differs
+ * from Local Device Number (bits [20:16] in register 0x1a04).
+ * Type 1 is used if target bus number differs from Local Bus
+ * Number. And when target bus number equals Local Bus Number
+ * and target device equals Local Device Number then request is
+ * routed to Local PCI Bridge (PCIe Root Port).
+ *
+ * 2. If Secondary Bus Number Enable bit is set (bit 7 in
+ * register 0x1a2c) then mvebu hw determinate type of config
+ * request like compliant PCI Bridge based on primary bus number
+ * which is configured via Local Bus Number (bits [15:8] in
+ * register 0x1a04) and secondary bus number which is configured
+ * via Secondary Bus Number (bits [7:0] in register 0x1a2c).
+ * Local PCI Bridge (PCIe Root Port) is available on primary bus
+ * as device with Local Device Number (bits [20:16] in register
+ * 0x1a04).
+ *
+ * Secondary Bus Number Enable bit is disabled by default and
+ * option 2. is not available on pre-XP PCIe controllers. Hence
+ * this driver always use option 1.
+ *
+ * Basically it means that primary and secondary buses shares
+ * one virtual number configured via Local Bus Number bits and
+ * Local Device Number bits determinates if accessing primary
+ * or secondary bus. Set Local Device Number to 1 and redirect
+ * all writes of PCI Bridge Secondary Bus Number register to
+ * Local Bus Number (bits [15:8] in register 0x1a04).
+ *
+ * So when accessing devices on buses behind secondary bus
+ * number it would work correctly. And also when accessing
+ * device 0 at secondary bus number via config space would be
+ * correctly routed to secondary bus. Due to issues described
+ * in mvebu_pcie_setup_hw(), PCI Bridges at primary bus (zero)
+ * are not accessed directly via PCI config space but rarher
+ * indirectly via kernel emulated PCI bridge driver.
+ */
mvebu_pcie_setup_hw(port);
mvebu_pcie_set_local_dev_nr(port, 1);
+ mvebu_pcie_set_local_bus_nr(port, 0);
}
bridge->sysdata = pcie;
--
2.20.1
Hardware supports PCIe Hot Reset via PCIE_CTRL_OFF register. Use it for
implementing PCI_BRIDGE_CTL_BUS_RESET bit of PCI_BRIDGE_CONTROL register on
emulated bridge.
With this change the function pci_reset_secondary_bus() starts working and
can reset connected PCIe card.
Signed-off-by: Pali Rohár <[email protected]>
Fixes: 1f08673eef12 ("PCI: mvebu: Convert to PCI emulated bridge config space")
Cc: [email protected]
---
drivers/pci/controller/pci-mvebu.c | 28 ++++++++++++++++++++++++++++
1 file changed, 28 insertions(+)
diff --git a/drivers/pci/controller/pci-mvebu.c b/drivers/pci/controller/pci-mvebu.c
index 36fbdc4f0e06..3075ea98c131 100644
--- a/drivers/pci/controller/pci-mvebu.c
+++ b/drivers/pci/controller/pci-mvebu.c
@@ -57,6 +57,7 @@
#define PCIE_CTRL_OFF 0x1a00
#define PCIE_CTRL_X1_MODE 0x0001
#define PCIE_CTRL_RC_MODE BIT(1)
+#define PCIE_CTRL_MASTER_HOT_RESET BIT(24)
#define PCIE_STAT_OFF 0x1a04
#define PCIE_STAT_BUS 0xff00
#define PCIE_STAT_DEV 0x1f0000
@@ -509,6 +510,22 @@ mvebu_pci_bridge_emul_base_conf_read(struct pci_bridge_emul *bridge,
break;
}
+ case PCI_INTERRUPT_LINE: {
+ /*
+ * From the whole 32bit register we support reading from HW only
+ * one bit: PCI_BRIDGE_CTL_BUS_RESET.
+ * Other bits are retrieved only from emulated config buffer.
+ */
+ __le32 *cfgspace = (__le32 *)&bridge->conf;
+ u32 val = le32_to_cpu(cfgspace[PCI_INTERRUPT_LINE / 4]);
+ if (mvebu_readl(port, PCIE_CTRL_OFF) & PCIE_CTRL_MASTER_HOT_RESET)
+ val |= PCI_BRIDGE_CTL_BUS_RESET << 16;
+ else
+ val &= ~(PCI_BRIDGE_CTL_BUS_RESET << 16);
+ *value = val;
+ break;
+ }
+
default:
return PCI_BRIDGE_EMUL_NOT_HANDLED;
}
@@ -617,6 +634,17 @@ mvebu_pci_bridge_emul_base_conf_write(struct pci_bridge_emul *bridge,
mvebu_pcie_set_local_bus_nr(port, conf->secondary_bus);
break;
+ case PCI_INTERRUPT_LINE:
+ if (mask & (PCI_BRIDGE_CTL_BUS_RESET << 16)) {
+ u32 ctrl = mvebu_readl(port, PCIE_CTRL_OFF);
+ if (new & (PCI_BRIDGE_CTL_BUS_RESET << 16))
+ ctrl |= PCIE_CTRL_MASTER_HOT_RESET;
+ else
+ ctrl &= ~PCIE_CTRL_MASTER_HOT_RESET;
+ mvebu_writel(port, ctrl, PCIE_CTRL_OFF);
+ }
+ break;
+
default:
break;
}
--
2.20.1
Function pci_bridge_emul_init() may fail so correctly check for errors.
Signed-off-by: Pali Rohár <[email protected]>
Fixes: 1f08673eef12 ("PCI: mvebu: Convert to PCI emulated bridge config space")
Cc: [email protected]
---
drivers/pci/controller/pci-mvebu.c | 15 ++++++++++++---
1 file changed, 12 insertions(+), 3 deletions(-)
diff --git a/drivers/pci/controller/pci-mvebu.c b/drivers/pci/controller/pci-mvebu.c
index d655c887ba1b..6197f7e7c317 100644
--- a/drivers/pci/controller/pci-mvebu.c
+++ b/drivers/pci/controller/pci-mvebu.c
@@ -581,7 +581,7 @@ static struct pci_bridge_emul_ops mvebu_pci_bridge_emul_ops = {
* Initialize the configuration space of the PCI-to-PCI bridge
* associated with the given PCIe interface.
*/
-static void mvebu_pci_bridge_emul_init(struct mvebu_pcie_port *port)
+static int mvebu_pci_bridge_emul_init(struct mvebu_pcie_port *port)
{
struct pci_bridge_emul *bridge = &port->bridge;
u32 pcie_cap = mvebu_readl(port, PCIE_CAP_PCIEXP);
@@ -608,7 +608,7 @@ static void mvebu_pci_bridge_emul_init(struct mvebu_pcie_port *port)
bridge->data = port;
bridge->ops = &mvebu_pci_bridge_emul_ops;
- pci_bridge_emul_init(bridge, PCI_BRIDGE_EMUL_NO_PREFETCHABLE_BAR);
+ return pci_bridge_emul_init(bridge, PCI_BRIDGE_EMUL_NO_PREFETCHABLE_BAR);
}
static inline struct mvebu_pcie *sys_to_pcie(struct pci_sys_data *sys)
@@ -1094,9 +1094,18 @@ static int mvebu_pcie_probe(struct platform_device *pdev)
continue;
}
+ ret = mvebu_pci_bridge_emul_init(port);
+ if (ret < 0) {
+ dev_err(dev, "%s: cannot init emulated bridge\n",
+ port->name);
+ devm_iounmap(dev, port->base);
+ port->base = NULL;
+ mvebu_pcie_powerdown(port);
+ continue;
+ }
+
mvebu_pcie_setup_hw(port);
mvebu_pcie_set_local_dev_nr(port, 1);
- mvebu_pci_bridge_emul_init(port);
}
bridge->sysdata = pcie;
--
2.20.1
The default value of Class Code of this bridge corresponds to a Memory
controller, though. This is probably relict from the past when old
Marvell/Galileo PCI-based controllers were used as standalone PCI device
for connecting SDRAM or workaround for PCs with broken BIOS. Details are
in commit 36de23a4c5f0 ("MIPS: Cobalt: Explain GT64111 early PCI fixup").
Change the Class Code to correspond to a PCI Bridge.
Add comment explaining this change.
Signed-off-by: Pali Rohár <[email protected]>
Cc: [email protected]
---
drivers/pci/controller/pci-mvebu.c | 28 +++++++++++++++++++++++++++-
1 file changed, 27 insertions(+), 1 deletion(-)
diff --git a/drivers/pci/controller/pci-mvebu.c b/drivers/pci/controller/pci-mvebu.c
index 017ae9f869ac..4edce441901c 100644
--- a/drivers/pci/controller/pci-mvebu.c
+++ b/drivers/pci/controller/pci-mvebu.c
@@ -225,7 +225,7 @@ static void mvebu_pcie_setup_wins(struct mvebu_pcie_port *port)
static void mvebu_pcie_setup_hw(struct mvebu_pcie_port *port)
{
- u32 ctrl, cmd, mask;
+ u32 ctrl, cmd, dev_rev, mask;
/* Setup PCIe controller to Root Complex mode. */
ctrl = mvebu_readl(port, PCIE_CTRL_OFF);
@@ -237,6 +237,32 @@ static void mvebu_pcie_setup_hw(struct mvebu_pcie_port *port)
cmd &= ~(PCI_COMMAND_IO | PCI_COMMAND_MEMORY | PCI_COMMAND_MASTER);
mvebu_writel(port, cmd, PCIE_CMD_OFF);
+ /*
+ * Change Class Code of PCI Bridge device to PCI Bridge (0x6004)
+ * because default value is Memory controller (0x5080).
+ *
+ * Note that this mvebu PCI Bridge does not have compliant Type 1
+ * Configuration Space. Header Type is reported as Type 0 and it
+ * has format of Type 0 config space.
+ *
+ * Moreover Type 0 BAR registers (ranges 0x10 - 0x28 and 0x30 - 0x34)
+ * have the same format in Marvell's specification as in PCIe
+ * specification, but their meaning is totally different and they do
+ * different things: they are aliased into internal mvebu registers
+ * (e.g. PCIE_BAR_LO_OFF) and these should not be changed or
+ * reconfigured by pci device drivers.
+ *
+ * Therefore driver uses emulation of PCI Bridge which emulates
+ * access to configuration space via internal mvebu registers or
+ * emulated configuration buffer. Driver access these PCI Bridge
+ * directly for simplification, but these registers can be accessed
+ * also via standard mvebu way for accessing PCI config space.
+ */
+ dev_rev = mvebu_readl(port, PCIE_DEV_REV_OFF);
+ dev_rev &= ~0xffffff00;
+ dev_rev |= (PCI_CLASS_BRIDGE_PCI << 8) << 8;
+ mvebu_writel(port, dev_rev, PCIE_DEV_REV_OFF);
+
/* Point PCIe unit MBUS decode windows to DRAM space. */
mvebu_pcie_setup_wins(port);
--
2.20.1
This driver operates only in Root Complex mode, so ensure that hardware is
properly configured in Root Complex mode.
Signed-off-by: Pali Rohár <[email protected]>
Cc: [email protected]
---
drivers/pci/controller/pci-mvebu.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/drivers/pci/controller/pci-mvebu.c b/drivers/pci/controller/pci-mvebu.c
index 12afa565bfcf..017ae9f869ac 100644
--- a/drivers/pci/controller/pci-mvebu.c
+++ b/drivers/pci/controller/pci-mvebu.c
@@ -56,6 +56,7 @@
#define PCIE_MASK_ENABLE_INTS 0x0f000000
#define PCIE_CTRL_OFF 0x1a00
#define PCIE_CTRL_X1_MODE 0x0001
+#define PCIE_CTRL_RC_MODE BIT(1)
#define PCIE_STAT_OFF 0x1a04
#define PCIE_STAT_BUS 0xff00
#define PCIE_STAT_DEV 0x1f0000
@@ -224,7 +225,12 @@ static void mvebu_pcie_setup_wins(struct mvebu_pcie_port *port)
static void mvebu_pcie_setup_hw(struct mvebu_pcie_port *port)
{
- u32 cmd, mask;
+ u32 ctrl, cmd, mask;
+
+ /* Setup PCIe controller to Root Complex mode. */
+ ctrl = mvebu_readl(port, PCIE_CTRL_OFF);
+ ctrl |= PCIE_CTRL_RC_MODE;
+ mvebu_writel(port, ctrl, PCIE_CTRL_OFF);
/* Disable Root Bridge I/O space, memory space and bus mastering. */
cmd = mvebu_readl(port, PCIE_CMD_OFF);
--
2.20.1
Properly propagate failure from mvebu_pcie_add_windows() function back to
the caller mvebu_pci_bridge_emul_base_conf_write() and correctly updates
PCI_IO_BASE, PCI_MEM_BASE and PCI_IO_BASE_UPPER16 registers on error.
On error set base value higher than limit value which indicates that
address range is disabled. When IO is unsupported then let IO registers
zeroed as required by PCIe base specification.
Signed-off-by: Pali Rohár <[email protected]>
Cc: [email protected]
---
drivers/pci/controller/pci-mvebu.c | 82 ++++++++++++++++++++----------
1 file changed, 55 insertions(+), 27 deletions(-)
diff --git a/drivers/pci/controller/pci-mvebu.c b/drivers/pci/controller/pci-mvebu.c
index a0b661972436..12afa565bfcf 100644
--- a/drivers/pci/controller/pci-mvebu.c
+++ b/drivers/pci/controller/pci-mvebu.c
@@ -315,7 +315,7 @@ static void mvebu_pcie_del_windows(struct mvebu_pcie_port *port,
* areas each having a power of two size. We start from the largest
* one (i.e highest order bit set in the size).
*/
-static void mvebu_pcie_add_windows(struct mvebu_pcie_port *port,
+static int mvebu_pcie_add_windows(struct mvebu_pcie_port *port,
unsigned int target, unsigned int attribute,
phys_addr_t base, size_t size,
phys_addr_t remap)
@@ -336,7 +336,7 @@ static void mvebu_pcie_add_windows(struct mvebu_pcie_port *port,
&base, &end, ret);
mvebu_pcie_del_windows(port, base - size_mapped,
size_mapped);
- return;
+ return ret;
}
size -= sz;
@@ -345,16 +345,20 @@ static void mvebu_pcie_add_windows(struct mvebu_pcie_port *port,
if (remap != MVEBU_MBUS_NO_REMAP)
remap += sz;
}
+
+ return 0;
}
-static void mvebu_pcie_set_window(struct mvebu_pcie_port *port,
+static int mvebu_pcie_set_window(struct mvebu_pcie_port *port,
unsigned int target, unsigned int attribute,
const struct mvebu_pcie_window *desired,
struct mvebu_pcie_window *cur)
{
+ int ret;
+
if (desired->base == cur->base && desired->remap == cur->remap &&
desired->size == cur->size)
- return;
+ return 0;
if (cur->size != 0) {
mvebu_pcie_del_windows(port, cur->base, cur->size);
@@ -369,30 +373,35 @@ static void mvebu_pcie_set_window(struct mvebu_pcie_port *port,
}
if (desired->size == 0)
- return;
+ return 0;
+
+ ret = mvebu_pcie_add_windows(port, target, attribute, desired->base,
+ desired->size, desired->remap);
+ if (ret) {
+ cur->size = 0;
+ cur->base = 0;
+ return ret;
+ }
- mvebu_pcie_add_windows(port, target, attribute, desired->base,
- desired->size, desired->remap);
*cur = *desired;
+ return 0;
}
-static void mvebu_pcie_handle_iobase_change(struct mvebu_pcie_port *port)
+static int mvebu_pcie_handle_iobase_change(struct mvebu_pcie_port *port)
{
struct mvebu_pcie_window desired = {};
struct pci_bridge_emul_conf *conf = &port->bridge.conf;
/* Are the new iobase/iolimit values invalid? */
if (conf->iolimit < conf->iobase ||
- conf->iolimitupper < conf->iobaseupper) {
- mvebu_pcie_set_window(port, port->io_target, port->io_attr,
- &desired, &port->iowin);
- return;
- }
+ conf->iolimitupper < conf->iobaseupper)
+ return mvebu_pcie_set_window(port, port->io_target, port->io_attr,
+ &desired, &port->iowin);
if (!mvebu_has_ioport(port)) {
dev_WARN(&port->pcie->pdev->dev,
"Attempt to set IO when IO is disabled\n");
- return;
+ return -EOPNOTSUPP;
}
/*
@@ -410,21 +419,19 @@ static void mvebu_pcie_handle_iobase_change(struct mvebu_pcie_port *port)
desired.remap) +
1;
- mvebu_pcie_set_window(port, port->io_target, port->io_attr, &desired,
- &port->iowin);
+ return mvebu_pcie_set_window(port, port->io_target, port->io_attr, &desired,
+ &port->iowin);
}
-static void mvebu_pcie_handle_membase_change(struct mvebu_pcie_port *port)
+static int mvebu_pcie_handle_membase_change(struct mvebu_pcie_port *port)
{
struct mvebu_pcie_window desired = {.remap = MVEBU_MBUS_NO_REMAP};
struct pci_bridge_emul_conf *conf = &port->bridge.conf;
/* Are the new membase/memlimit values invalid? */
- if (conf->memlimit < conf->membase) {
- mvebu_pcie_set_window(port, port->mem_target, port->mem_attr,
- &desired, &port->memwin);
- return;
- }
+ if (conf->memlimit < conf->membase)
+ return mvebu_pcie_set_window(port, port->mem_target, port->mem_attr,
+ &desired, &port->memwin);
/*
* We read the PCI-to-PCI bridge emulated registers, and
@@ -436,8 +443,8 @@ static void mvebu_pcie_handle_membase_change(struct mvebu_pcie_port *port)
desired.size = (((conf->memlimit & 0xFFF0) << 16) | 0xFFFFF) -
desired.base + 1;
- mvebu_pcie_set_window(port, port->mem_target, port->mem_attr, &desired,
- &port->memwin);
+ return mvebu_pcie_set_window(port, port->mem_target, port->mem_attr, &desired,
+ &port->memwin);
}
static pci_bridge_emul_read_status_t
@@ -522,15 +529,36 @@ mvebu_pci_bridge_emul_base_conf_write(struct pci_bridge_emul *bridge,
break;
case PCI_IO_BASE:
- mvebu_pcie_handle_iobase_change(port);
+ if ((mask & 0xffff) && mvebu_pcie_handle_iobase_change(port)) {
+ /* On error disable IO range */
+ conf->iobase &= ~0xf0;
+ conf->iolimit &= ~0xf0;
+ conf->iobaseupper = cpu_to_le16(0x0000);
+ conf->iolimitupper = cpu_to_le16(0x0000);
+ if (mvebu_has_ioport(port))
+ conf->iobase |= 0xf0;
+ }
break;
case PCI_MEMORY_BASE:
- mvebu_pcie_handle_membase_change(port);
+ if (mvebu_pcie_handle_membase_change(port)) {
+ /* On error disable mem range */
+ conf->membase = cpu_to_le16(le16_to_cpu(conf->membase) & ~0xfff0);
+ conf->memlimit = cpu_to_le16(le16_to_cpu(conf->memlimit) & ~0xfff0);
+ conf->membase = cpu_to_le16(le16_to_cpu(conf->membase) | 0xfff0);
+ }
break;
case PCI_IO_BASE_UPPER16:
- mvebu_pcie_handle_iobase_change(port);
+ if (mvebu_pcie_handle_iobase_change(port)) {
+ /* On error disable IO range */
+ conf->iobase &= ~0xf0;
+ conf->iolimit &= ~0xf0;
+ conf->iobaseupper = cpu_to_le16(0x0000);
+ conf->iolimitupper = cpu_to_le16(0x0000);
+ if (mvebu_has_ioport(port))
+ conf->iobase |= 0xf0;
+ }
break;
case PCI_PRIMARY_BUS:
--
2.20.1
According to PCI specifications bits [0:2] of Command Register, this should
be by default disabled on reset. So explicitly disable these bits at early
beginning of driver initialization.
Also remove code which unconditionally enables all 3 bits and let kernel
code (via pci_set_master() function) to handle bus mastering of PCI Bridge
via emulated PCI_COMMAND on emulated bridge.
Adjust existing functions mvebu_pcie_handle_iobase_change() and
mvebu_pcie_handle_membase_change() to handle PCI_IO_BASE and PCI_MEM_BASE
registers correctly even when bus mastering on emulated bridge is disabled.
Signed-off-by: Pali Rohár <[email protected]>
Cc: [email protected]
---
drivers/pci/controller/pci-mvebu.c | 52 ++++++++++++++++++------------
1 file changed, 32 insertions(+), 20 deletions(-)
diff --git a/drivers/pci/controller/pci-mvebu.c b/drivers/pci/controller/pci-mvebu.c
index a3df352d440e..32694763e930 100644
--- a/drivers/pci/controller/pci-mvebu.c
+++ b/drivers/pci/controller/pci-mvebu.c
@@ -226,16 +226,14 @@ static void mvebu_pcie_setup_hw(struct mvebu_pcie_port *port)
{
u32 cmd, mask;
- /* Point PCIe unit MBUS decode windows to DRAM space. */
- mvebu_pcie_setup_wins(port);
-
- /* Master + slave enable. */
+ /* Disable Root Bridge I/O space, memory space and bus mastering. */
cmd = mvebu_readl(port, PCIE_CMD_OFF);
- cmd |= PCI_COMMAND_IO;
- cmd |= PCI_COMMAND_MEMORY;
- cmd |= PCI_COMMAND_MASTER;
+ cmd &= ~(PCI_COMMAND_IO | PCI_COMMAND_MEMORY | PCI_COMMAND_MASTER);
mvebu_writel(port, cmd, PCIE_CMD_OFF);
+ /* Point PCIe unit MBUS decode windows to DRAM space. */
+ mvebu_pcie_setup_wins(port);
+
/* Enable interrupt lines A-D. */
mask = mvebu_readl(port, PCIE_MASK_OFF);
mask |= PCIE_MASK_ENABLE_INTS;
@@ -385,8 +383,7 @@ static void mvebu_pcie_handle_iobase_change(struct mvebu_pcie_port *port)
/* Are the new iobase/iolimit values invalid? */
if (conf->iolimit < conf->iobase ||
- conf->iolimitupper < conf->iobaseupper ||
- !(conf->command & PCI_COMMAND_IO)) {
+ conf->iolimitupper < conf->iobaseupper) {
mvebu_pcie_set_window(port, port->io_target, port->io_attr,
&desired, &port->iowin);
return;
@@ -423,8 +420,7 @@ static void mvebu_pcie_handle_membase_change(struct mvebu_pcie_port *port)
struct pci_bridge_emul_conf *conf = &port->bridge.conf;
/* Are the new membase/memlimit values invalid? */
- if (conf->memlimit < conf->membase ||
- !(conf->command & PCI_COMMAND_MEMORY)) {
+ if (conf->memlimit < conf->membase) {
mvebu_pcie_set_window(port, port->mem_target, port->mem_attr,
&desired, &port->memwin);
return;
@@ -444,6 +440,24 @@ static void mvebu_pcie_handle_membase_change(struct mvebu_pcie_port *port)
&port->memwin);
}
+static pci_bridge_emul_read_status_t
+mvebu_pci_bridge_emul_base_conf_read(struct pci_bridge_emul *bridge,
+ int reg, u32 *value)
+{
+ struct mvebu_pcie_port *port = bridge->data;
+
+ switch (reg) {
+ case PCI_COMMAND:
+ *value = mvebu_readl(port, PCIE_CMD_OFF);
+ break;
+
+ default:
+ return PCI_BRIDGE_EMUL_NOT_HANDLED;
+ }
+
+ return PCI_BRIDGE_EMUL_HANDLED;
+}
+
static pci_bridge_emul_read_status_t
mvebu_pci_bridge_emul_pcie_conf_read(struct pci_bridge_emul *bridge,
int reg, u32 *value)
@@ -498,17 +512,14 @@ mvebu_pci_bridge_emul_base_conf_write(struct pci_bridge_emul *bridge,
switch (reg) {
case PCI_COMMAND:
- {
- if (!mvebu_has_ioport(port))
- conf->command &= ~PCI_COMMAND_IO;
-
- if ((old ^ new) & PCI_COMMAND_IO)
- mvebu_pcie_handle_iobase_change(port);
- if ((old ^ new) & PCI_COMMAND_MEMORY)
- mvebu_pcie_handle_membase_change(port);
+ if (!mvebu_has_ioport(port)) {
+ conf->command = cpu_to_le16(
+ le16_to_cpu(conf->command) & ~PCI_COMMAND_IO);
+ new &= ~PCI_COMMAND_IO;
+ }
+ mvebu_writel(port, new, PCIE_CMD_OFF);
break;
- }
case PCI_IO_BASE:
/*
@@ -575,6 +586,7 @@ mvebu_pci_bridge_emul_pcie_conf_write(struct pci_bridge_emul *bridge,
}
static struct pci_bridge_emul_ops mvebu_pci_bridge_emul_ops = {
+ .read_base = mvebu_pci_bridge_emul_base_conf_read,
.write_base = mvebu_pci_bridge_emul_base_conf_write,
.read_pcie = mvebu_pci_bridge_emul_pcie_conf_read,
.write_pcie = mvebu_pci_bridge_emul_pcie_conf_write,
--
2.20.1
PCI IO type bits are already initialized in mvebu_pci_bridge_emul_init()
function and only when IO support is enabled. These type bits are read-only
and pci-bridge-emul.c code already does not allow to modify them from upper
layers.
When IO support is disabled then all IO registers should be read-only and
return zeros. Therefore do not modify PCI IO type bits in
mvebu_pci_bridge_emul_base_conf_write() callback.
Signed-off-by: Pali Rohár <[email protected]>
Fixes: 1f08673eef12 ("PCI: mvebu: Convert to PCI emulated bridge config space")
Cc: [email protected]
---
drivers/pci/controller/pci-mvebu.c | 7 -------
1 file changed, 7 deletions(-)
diff --git a/drivers/pci/controller/pci-mvebu.c b/drivers/pci/controller/pci-mvebu.c
index 32694763e930..a0b661972436 100644
--- a/drivers/pci/controller/pci-mvebu.c
+++ b/drivers/pci/controller/pci-mvebu.c
@@ -522,13 +522,6 @@ mvebu_pci_bridge_emul_base_conf_write(struct pci_bridge_emul *bridge,
break;
case PCI_IO_BASE:
- /*
- * We keep bit 1 set, it is a read-only bit that
- * indicates we support 32 bits addressing for the
- * I/O
- */
- conf->iobase |= PCI_IO_RANGE_TYPE_32;
- conf->iolimit |= PCI_IO_RANGE_TYPE_32;
mvebu_pcie_handle_iobase_change(port);
break;
--
2.20.1
PME Status bit in Root Status Register (PCIE_RC_RTSTA_OFF) is read-only and
can be cleared only by writing 0b to the Interrupt Cause RW0C register
(PCIE_INT_CAUSE_OFF).
Signed-off-by: Pali Rohár <[email protected]>
Fixes: 1f08673eef12 ("PCI: mvebu: Convert to PCI emulated bridge config space")
Cc: [email protected]
---
drivers/pci/controller/pci-mvebu.c | 11 ++++++++++-
1 file changed, 10 insertions(+), 1 deletion(-)
diff --git a/drivers/pci/controller/pci-mvebu.c b/drivers/pci/controller/pci-mvebu.c
index c9b736344b56..798cf5cff8be 100644
--- a/drivers/pci/controller/pci-mvebu.c
+++ b/drivers/pci/controller/pci-mvebu.c
@@ -52,6 +52,8 @@
PCIE_CONF_FUNC(PCI_FUNC(devfn)) | PCIE_CONF_REG(where) | \
PCIE_CONF_ADDR_EN)
#define PCIE_CONF_DATA_OFF 0x18fc
+#define PCIE_INT_CAUSE_OFF 0x1900
+#define PCIE_INT_PM_PME BIT(28)
#define PCIE_MASK_OFF 0x1910
#define PCIE_MASK_ENABLE_INTS 0x0f000000
#define PCIE_CTRL_OFF 0x1a00
@@ -672,7 +674,14 @@ mvebu_pci_bridge_emul_pcie_conf_write(struct pci_bridge_emul *bridge,
break;
case PCI_EXP_RTSTA:
- mvebu_writel(port, new, PCIE_RC_RTSTA);
+ /*
+ * PME Status bit in Root Status Register (PCIE_RC_RTSTA)
+ * is read-only and can be cleared only by writing 0b to the
+ * Interrupt Cause RW0C register (PCIE_INT_CAUSE_OFF). So
+ * clear PME via Interrupt Cause.
+ */
+ if (new & PCI_EXP_RTSTA_PME)
+ mvebu_writel(port, ~PCIE_INT_PM_PME, PCIE_INT_CAUSE_OFF);
break;
}
}
--
2.20.1
On Thu, 25 Nov 2021 13:45:50 +0100, Pali Rohár wrote:
> This patch series contains various fixes for pci-mvebu.c driver. Only
> bugfixes, no new features.
>
> For pci-mvebu.c I have prepared another 30+ patches with cleanups and
> new features, they are currently available in my git branch:
> https://git.kernel.org/pub/scm/linux/kernel/git/pali/linux.git/log/?h=pci-mvebu
>
> [...]
Dropped stable tags and applied on top of my pci/mvebu branch (please
check I rebased commits correctly), thanks!
[01/15] PCI: mvebu: Check for valid ports
https://git.kernel.org/lpieralisi/pci/c/8cdabfdd5a
[02/15] PCI: mvebu: Check for errors from pci_bridge_emul_init() call
https://git.kernel.org/lpieralisi/pci/c/5d18d702e5
[03/15] PCI: mvebu: Check that PCI bridge specified in DT has function number zero
https://git.kernel.org/lpieralisi/pci/c/489bfc5187
[04/15] PCI: mvebu: Handle invalid size of read config request
https://git.kernel.org/lpieralisi/pci/c/11c2bf4a20
[05/15] PCI: mvebu: Disallow mapping interrupts on emulated bridges
https://git.kernel.org/lpieralisi/pci/c/319e6046bd
[06/15] PCI: mvebu: Fix support for bus mastering and PCI_COMMAND on emulated bridge
https://git.kernel.org/lpieralisi/pci/c/e42b855837
[07/15] PCI: mvebu: Do not modify PCI IO type bits in conf_write
https://git.kernel.org/lpieralisi/pci/c/2cf150216e
[08/15] PCI: mvebu: Propagate errors when updating PCI_IO_BASE and PCI_MEM_BASE registers
https://git.kernel.org/lpieralisi/pci/c/e7a0187672
[09/15] PCI: mvebu: Setup PCIe controller to Root Complex mode
https://git.kernel.org/lpieralisi/pci/c/df08ac0161
[10/15] PCI: mvebu: Set PCI Bridge Class Code to PCI Bridge
https://git.kernel.org/lpieralisi/pci/c/f587775828
[11/15] PCI: mvebu: Fix configuring secondary bus of PCIe Root Port via emulated bridge
https://git.kernel.org/lpieralisi/pci/c/91a8d79fc7
[12/15] PCI: mvebu: Fix support for PCI_BRIDGE_CTL_BUS_RESET on emulated bridge
https://git.kernel.org/lpieralisi/pci/c/d75404cc08
[13/15] PCI: mvebu: Fix support for PCI_EXP_DEVCTL on emulated bridge
https://git.kernel.org/lpieralisi/pci/c/ecae073e39
[14/15] PCI: mvebu: Fix support for PCI_EXP_RTSTA on emulated bridge
https://git.kernel.org/lpieralisi/pci/c/838ff44a39
[15/15] PCI: mvebu: Fix support for DEVCAP2, DEVCTL2 and LNKCTL2 registers on emulated bridge
https://git.kernel.org/lpieralisi/pci/c/4ab34548c5
Thanks,
Lorenzo
On Thu, Nov 25, 2021 at 01:45:54PM +0100, Pali Roh?r wrote:
> Function mvebu_pcie_hw_rd_conf() does not handle invalid size. So correctly
> set read value to all-ones and return appropriate error return value
> PCIBIOS_BAD_REGISTER_NUMBER like in mvebu_pcie_hw_wr_conf() function.
>
> Signed-off-by: Pali Roh?r <[email protected]>
> Cc: [email protected]
Is there a bug that this fixes? If not, I would drop the stable tag
(as I see Lorenzo already did, thanks!).
> ---
> drivers/pci/controller/pci-mvebu.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/drivers/pci/controller/pci-mvebu.c b/drivers/pci/controller/pci-mvebu.c
> index 08274132cdfb..19c6ee298442 100644
> --- a/drivers/pci/controller/pci-mvebu.c
> +++ b/drivers/pci/controller/pci-mvebu.c
> @@ -261,6 +261,9 @@ static int mvebu_pcie_hw_rd_conf(struct mvebu_pcie_port *port,
> case 4:
> *val = readl_relaxed(conf_data);
> break;
> + default:
> + *val = 0xffffffff;
> + return PCIBIOS_BAD_REGISTER_NUMBER;
Might be the right thing to do, but there are many config accessors
that do not set *val to ~0 before returning
PCIBIOS_BAD_REGISTER_NUMBER:
pci_bus_read_config_byte (and word, dword) # PCI_OP_READ(), *val unchanged
pci_generic_config_read # *val = 32-bit value
pci_user_read_config_byte (...) # PCI_USER_READ_CONFIG(), *val unchanged
sh7786_pcie_read # *val unchanged
dw_pcie_read # *val = 0
mobiveil_pcie_read # *val = 0
faraday_raw_pci_read_config # *val = 32-bit value
ixp4xx_pci_read_config # *val unchanged
orion5x_pci_hw_rd_conf # *val = 32-bit value
orion_pcie_rd_conf # *val = 32-bit value
bonito64_pcibios_read # *val = 32-bit value
loongson_pcibios_read # *val = 32-bit value
msc_pcibios_read # *val = 32-bit value
ar724x_pci_read # *val unchanged
bcm1480_pcibios_read # *val = 32-bit value
_altera_pcie_cfg_read # *val = 32-bit value
rockchip_pcie_rd_own_conf # *val = 0
rockchip_pcie_rd_other_conf # *val = 0
pci_bridge_emul_conf_read # may depend on op?
There are more, but I got tired of looking. I actually didn't see any
that set *val to ~0.
I think the check in PCI_OP_READ() means that most accessors will
never see an invalid "size".
> }
>
> return PCIBIOS_SUCCESSFUL;
> --
> 2.20.1
>
On Fri, Jan 07, 2022 at 12:45:48PM -0600, Bjorn Helgaas wrote:
> On Thu, Nov 25, 2021 at 01:45:54PM +0100, Pali Roh?r wrote:
> > Function mvebu_pcie_hw_rd_conf() does not handle invalid size. So correctly
> > set read value to all-ones and return appropriate error return value
> > PCIBIOS_BAD_REGISTER_NUMBER like in mvebu_pcie_hw_wr_conf() function.
> >
> > Signed-off-by: Pali Roh?r <[email protected]>
> > Cc: [email protected]
>
> Is there a bug that this fixes? If not, I would drop the stable tag
> (as I see Lorenzo already did, thanks!).
>
> > ---
> > drivers/pci/controller/pci-mvebu.c | 3 +++
> > 1 file changed, 3 insertions(+)
> >
> > diff --git a/drivers/pci/controller/pci-mvebu.c b/drivers/pci/controller/pci-mvebu.c
> > index 08274132cdfb..19c6ee298442 100644
> > --- a/drivers/pci/controller/pci-mvebu.c
> > +++ b/drivers/pci/controller/pci-mvebu.c
> > @@ -261,6 +261,9 @@ static int mvebu_pcie_hw_rd_conf(struct mvebu_pcie_port *port,
> > case 4:
> > *val = readl_relaxed(conf_data);
> > break;
> > + default:
> > + *val = 0xffffffff;
> > + return PCIBIOS_BAD_REGISTER_NUMBER;
>
> Might be the right thing to do, but there are many config accessors
> that do not set *val to ~0 before returning
> PCIBIOS_BAD_REGISTER_NUMBER:
I think a better question would be - how can this function be called
with a size that isn't 1, 2 or 4? I suppose if someone were to add
another PCI_OP_READ/PCI_OP_WRITE. However... they really need to audit
every implementation if they do that.
The generic implementation does this:
if (size == 1)
*val = readb(addr);
else if (size == 2)
*val = readw(addr);
else
*val = readl(addr);
and therefore completely ignores the size if it isn't 1 or 2. So I
don't think this is something that needs fixing.
If we're going to fix this in drivers, shouldn't we fix the generic
implementation too?
--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!
On Thu, Nov 25, 2021 at 01:45:55PM +0100, Pali Roh?r wrote:
> Interrupt support on mvebu emulated bridges is not implemented yet.
Is this mvebu-specific, or is aardvar also affected?
> So properly indicate return value to callers that they cannot request
> interrupts from emulated bridge.
Pet peeve: descriptions that say "do this *properly*". As though the
previous authors were just ignorant or intentionally did something
*improperly* :)
> Signed-off-by: Pali Roh?r <[email protected]>
> Cc: [email protected]
> ---
> drivers/pci/controller/pci-mvebu.c | 10 ++++++++++
> 1 file changed, 10 insertions(+)
>
> diff --git a/drivers/pci/controller/pci-mvebu.c b/drivers/pci/controller/pci-mvebu.c
> index 19c6ee298442..a3df352d440e 100644
> --- a/drivers/pci/controller/pci-mvebu.c
> +++ b/drivers/pci/controller/pci-mvebu.c
> @@ -705,6 +705,15 @@ static struct pci_ops mvebu_pcie_ops = {
> .write = mvebu_pcie_wr_conf,
> };
>
> +static int mvebu_pcie_map_irq(const struct pci_dev *dev, u8 slot, u8 pin)
> +{
> + /* Interrupt support on mvebu emulated bridges is not implemented yet */
> + if (dev->bus->number == 0)
> + return 0; /* Proper return code 0 == NO_IRQ */
> +
> + return of_irq_parse_and_map_pci(dev, slot, pin);
Is this something that could be done with a .read_base() op, e.g.,
make PCI_INTERRUPT_PIN contain zero (PCI_INTERRUPT_UNKNOWN)?
> +}
> +
> static resource_size_t mvebu_pcie_align_resource(struct pci_dev *dev,
> const struct resource *res,
> resource_size_t start,
> @@ -1119,6 +1128,7 @@ static int mvebu_pcie_probe(struct platform_device *pdev)
> bridge->sysdata = pcie;
> bridge->ops = &mvebu_pcie_ops;
> bridge->align_resource = mvebu_pcie_align_resource;
> + bridge->map_irq = mvebu_pcie_map_irq;
I assume this means INTx doesn't work for some devices? Which ones?
I guess anything on the root bus? But INTx for devices *below* these
emulated Root Ports *does* work?
> return pci_host_probe(bridge);
> }
> --
> 2.20.1
>
On Thu, Nov 25, 2021 at 01:45:58PM +0100, Pali Roh?r wrote:
> Properly propagate failure from mvebu_pcie_add_windows() function back to
> the caller mvebu_pci_bridge_emul_base_conf_write() and correctly updates
> PCI_IO_BASE, PCI_MEM_BASE and PCI_IO_BASE_UPPER16 registers on error.
> On error set base value higher than limit value which indicates that
> address range is disabled.
Does the spec say that if software programs something invalid,
hardware should proactively set the base and limit registers to
disable the window?
I'm not sure I've seen hardware that does this, and it seems ... maybe
a little aggressive.
What happens if software writes the base and limit in the wrong order,
so the window is invalid after the first write but valid after the
second? That actually sounds like it could be a sensible strategy to
prevent a partially-configured window from being active.
Bjorn
On Friday 07 January 2022 15:32:16 Bjorn Helgaas wrote:
> On Thu, Nov 25, 2021 at 01:45:55PM +0100, Pali Rohár wrote:
> > Interrupt support on mvebu emulated bridges is not implemented yet.
>
> Is this mvebu-specific, or is aardvar also affected?
This is pci-mvebu.c driver specific, it does not implement emulation of
neither INTx, nor MSI interrupts for emulated pci bridge (root port). As
we know this HW does not have compliant pci root port, it needs to be
emulated in driver, and emulation for interrupts is missing. (it means
that also AER interrupt is missing).
And pci-aardvark.c driver has same issue and similar patch is required
for pci-aardvark.c too. Marek should take care of it. But for
pci-aardvark we already have implementation which emulates INTx
interrupts and it is waiting for review on the list:
https://lore.kernel.org/linux-pci/[email protected]/
> > So properly indicate return value to callers that they cannot request
> > interrupts from emulated bridge.
>
> Pet peeve: descriptions that say "do this *properly*". As though the
> previous authors were just ignorant or intentionally did something
> *improperly* :)
>
> > Signed-off-by: Pali Rohár <[email protected]>
> > Cc: [email protected]
> > ---
> > drivers/pci/controller/pci-mvebu.c | 10 ++++++++++
> > 1 file changed, 10 insertions(+)
> >
> > diff --git a/drivers/pci/controller/pci-mvebu.c b/drivers/pci/controller/pci-mvebu.c
> > index 19c6ee298442..a3df352d440e 100644
> > --- a/drivers/pci/controller/pci-mvebu.c
> > +++ b/drivers/pci/controller/pci-mvebu.c
> > @@ -705,6 +705,15 @@ static struct pci_ops mvebu_pcie_ops = {
> > .write = mvebu_pcie_wr_conf,
> > };
> >
> > +static int mvebu_pcie_map_irq(const struct pci_dev *dev, u8 slot, u8 pin)
> > +{
> > + /* Interrupt support on mvebu emulated bridges is not implemented yet */
> > + if (dev->bus->number == 0)
> > + return 0; /* Proper return code 0 == NO_IRQ */
> > +
> > + return of_irq_parse_and_map_pci(dev, slot, pin);
>
> Is this something that could be done with a .read_base() op, e.g.,
> make PCI_INTERRUPT_PIN contain zero (PCI_INTERRUPT_UNKNOWN)?
I'm not sure... maybe. I choose this style as after I implement
emulation of INTx interrupts it allows me just to replace "return 0;" by
"return my_mapping_function_for_root_port(...);". Similarly like it is
in pending pci-aardvark.c patch:
https://lore.kernel.org/linux-pci/[email protected]/
> > +}
> > +
> > static resource_size_t mvebu_pcie_align_resource(struct pci_dev *dev,
> > const struct resource *res,
> > resource_size_t start,
> > @@ -1119,6 +1128,7 @@ static int mvebu_pcie_probe(struct platform_device *pdev)
> > bridge->sysdata = pcie;
> > bridge->ops = &mvebu_pcie_ops;
> > bridge->align_resource = mvebu_pcie_align_resource;
> > + bridge->map_irq = mvebu_pcie_map_irq;
>
> I assume this means INTx doesn't work for some devices? Which ones?
> I guess anything on the root bus? But INTx for devices *below* these
> emulated Root Ports *does* work?
Exactly. All devices except emulated root ports (which are on bus 0)
have working MSI, MSI-X and INTx interrupts.
> > return pci_host_probe(bridge);
> > }
> > --
> > 2.20.1
> >
On Friday 07 January 2022 15:55:04 Bjorn Helgaas wrote:
> On Thu, Nov 25, 2021 at 01:45:58PM +0100, Pali Rohár wrote:
> > Properly propagate failure from mvebu_pcie_add_windows() function back to
> > the caller mvebu_pci_bridge_emul_base_conf_write() and correctly updates
> > PCI_IO_BASE, PCI_MEM_BASE and PCI_IO_BASE_UPPER16 registers on error.
> > On error set base value higher than limit value which indicates that
> > address range is disabled.
>
> Does the spec say that if software programs something invalid,
> hardware should proactively set the base and limit registers to
> disable the window?
No. But this patch address something totally different. Software can do
fully valid operation, e.g. try to set forwarding memory window as large
as possible. But because this driver "emulates" pci bridge by calling
software/kernel function (mvebu_pcie_add_windows), some operations which
in real HW cannot happen, are possible in software.
For example there are limitations in sizes of forwarding memory windows,
because it is done by mvebu-mbus driver, which is responsible for
configuring mapping and forwarding of PCIe I/O and MEM windows. And due
to Marvell HW, there are restrictions which are not in PCIe HW.
Currently if such error happens, obviously kernel is not able to set
PCIe windows and it just print warnings to dmesg. Trying to access these
windows would result in the worst case in crashes.
With this change when mvebu_pcie_add_windows() function fails then into
emulated config space is put information that particular forwarding
window is disabled. I think that it is better to indicate it in config
space what is the current "reality" of hardware configuration. If window
is disabled in real-HW (meaning in mvebu-mbus driver) then show it also
in emulated config space of pci bridge.
Do you have better idea what should emulated pci bridge do, if software
try to set fully valid configuration of forwarding window, but it is not
possible to achieve it (even compliant PCI bridge must be able to do
it)?
> I'm not sure I've seen hardware that does this, and it seems ... maybe
> a little aggressive.
>
> What happens if software writes the base and limit in the wrong order,
> so the window is invalid after the first write but valid after the
> second? That actually sounds like it could be a sensible strategy to
> prevent a partially-configured window from being active.
>
> Bjorn
Invalid window (limit < base) means that window is disabled. And
pci-mvebu.c in its callbacks from pci-bridge-emul.c should correctly
handle it and propagates information about disablement to mvebu-mbus
driver.
After window is valid again (limit > base) then pci-mvebu.c call
mvebu-mbus to setup new mapping.
On Fri, Jan 07, 2022 at 11:13:48PM +0100, Pali Roh?r wrote:
> On Friday 07 January 2022 15:32:16 Bjorn Helgaas wrote:
> > On Thu, Nov 25, 2021 at 01:45:55PM +0100, Pali Roh?r wrote:
> > > Interrupt support on mvebu emulated bridges is not implemented yet.
> >
> > Is this mvebu-specific, or is aardvar also affected?
>
> This is pci-mvebu.c driver specific, it does not implement emulation of
> neither INTx, nor MSI interrupts for emulated pci bridge (root port). As
> we know this HW does not have compliant pci root port, it needs to be
> emulated in driver, and emulation for interrupts is missing. (it means
> that also AER interrupt is missing).
>
> And pci-aardvark.c driver has same issue and similar patch is required
> for pci-aardvark.c too. Marek should take care of it. But for
> pci-aardvark we already have implementation which emulates INTx
> interrupts and it is waiting for review on the list:
> https://lore.kernel.org/linux-pci/[email protected]/
>
> > > So properly indicate return value to callers that they cannot request
> > > interrupts from emulated bridge.
> >
> > Pet peeve: descriptions that say "do this *properly*". As though the
> > previous authors were just ignorant or intentionally did something
> > *improperly* :)
> >
> > > Signed-off-by: Pali Roh?r <[email protected]>
> > > Cc: [email protected]
> > > ---
> > > drivers/pci/controller/pci-mvebu.c | 10 ++++++++++
> > > 1 file changed, 10 insertions(+)
> > >
> > > diff --git a/drivers/pci/controller/pci-mvebu.c b/drivers/pci/controller/pci-mvebu.c
> > > index 19c6ee298442..a3df352d440e 100644
> > > --- a/drivers/pci/controller/pci-mvebu.c
> > > +++ b/drivers/pci/controller/pci-mvebu.c
> > > @@ -705,6 +705,15 @@ static struct pci_ops mvebu_pcie_ops = {
> > > .write = mvebu_pcie_wr_conf,
> > > };
> > >
> > > +static int mvebu_pcie_map_irq(const struct pci_dev *dev, u8 slot, u8 pin)
> > > +{
> > > + /* Interrupt support on mvebu emulated bridges is not implemented yet */
> > > + if (dev->bus->number == 0)
> > > + return 0; /* Proper return code 0 == NO_IRQ */
> > > +
> > > + return of_irq_parse_and_map_pci(dev, slot, pin);
> >
> > Is this something that could be done with a .read_base() op, e.g.,
> > make PCI_INTERRUPT_PIN contain zero (PCI_INTERRUPT_UNKNOWN)?
>
> I'm not sure... maybe. I choose this style as after I implement
> emulation of INTx interrupts it allows me just to replace "return 0;" by
> "return my_mapping_function_for_root_port(...);".
OK, so even after you implement INTx for the emulated Root Ports, the
default of_irq_parse_and_map_pci() is insufficient, and you will
require an mvebu .map_irq() function. That's reasonable.
"PCI_INTERRUPT_PIN == 0" is the way software learns that a device
doesn't use INTx, of course, and I suppose PCI_INTERRUPT_PIN already
reads as zero, since mvebu_pci_bridge_emul_init() doesn't set
bridge->conf.intpin, and I assume the default value would be zero?
Bjorn
On Friday 07 January 2022 17:01:55 Bjorn Helgaas wrote:
> On Fri, Jan 07, 2022 at 11:13:48PM +0100, Pali Rohár wrote:
> > On Friday 07 January 2022 15:32:16 Bjorn Helgaas wrote:
> > > On Thu, Nov 25, 2021 at 01:45:55PM +0100, Pali Rohár wrote:
> > > > Interrupt support on mvebu emulated bridges is not implemented yet.
> > >
> > > Is this mvebu-specific, or is aardvar also affected?
> >
> > This is pci-mvebu.c driver specific, it does not implement emulation of
> > neither INTx, nor MSI interrupts for emulated pci bridge (root port). As
> > we know this HW does not have compliant pci root port, it needs to be
> > emulated in driver, and emulation for interrupts is missing. (it means
> > that also AER interrupt is missing).
> >
> > And pci-aardvark.c driver has same issue and similar patch is required
> > for pci-aardvark.c too. Marek should take care of it. But for
> > pci-aardvark we already have implementation which emulates INTx
> > interrupts and it is waiting for review on the list:
> > https://lore.kernel.org/linux-pci/[email protected]/
> >
> > > > So properly indicate return value to callers that they cannot request
> > > > interrupts from emulated bridge.
> > >
> > > Pet peeve: descriptions that say "do this *properly*". As though the
> > > previous authors were just ignorant or intentionally did something
> > > *improperly* :)
> > >
> > > > Signed-off-by: Pali Rohár <[email protected]>
> > > > Cc: [email protected]
> > > > ---
> > > > drivers/pci/controller/pci-mvebu.c | 10 ++++++++++
> > > > 1 file changed, 10 insertions(+)
> > > >
> > > > diff --git a/drivers/pci/controller/pci-mvebu.c b/drivers/pci/controller/pci-mvebu.c
> > > > index 19c6ee298442..a3df352d440e 100644
> > > > --- a/drivers/pci/controller/pci-mvebu.c
> > > > +++ b/drivers/pci/controller/pci-mvebu.c
> > > > @@ -705,6 +705,15 @@ static struct pci_ops mvebu_pcie_ops = {
> > > > .write = mvebu_pcie_wr_conf,
> > > > };
> > > >
> > > > +static int mvebu_pcie_map_irq(const struct pci_dev *dev, u8 slot, u8 pin)
> > > > +{
> > > > + /* Interrupt support on mvebu emulated bridges is not implemented yet */
> > > > + if (dev->bus->number == 0)
> > > > + return 0; /* Proper return code 0 == NO_IRQ */
> > > > +
> > > > + return of_irq_parse_and_map_pci(dev, slot, pin);
> > >
> > > Is this something that could be done with a .read_base() op, e.g.,
> > > make PCI_INTERRUPT_PIN contain zero (PCI_INTERRUPT_UNKNOWN)?
> >
> > I'm not sure... maybe. I choose this style as after I implement
> > emulation of INTx interrupts it allows me just to replace "return 0;" by
> > "return my_mapping_function_for_root_port(...);".
>
> OK, so even after you implement INTx for the emulated Root Ports, the
> default of_irq_parse_and_map_pci() is insufficient, and you will
> require an mvebu .map_irq() function. That's reasonable.
>
> "PCI_INTERRUPT_PIN == 0" is the way software learns that a device
> doesn't use INTx, of course, and I suppose PCI_INTERRUPT_PIN already
> reads as zero, since mvebu_pci_bridge_emul_init() doesn't set
> bridge->conf.intpin, and I assume the default value would be zero?
>
> Bjorn
Yes, looks like that zeros are in emulated config space for fields not
explicitly initialized. Which is the pci-mvebu.c case.
But now I'm looking at pci-aardvark.c driver and it sets
PCI_INTERRUPT_PIN register to A:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/pci/controller/pci-aardvark.c?h=v5.16-rc8#n953
And that comment "/* Support interrupt A for MSI feature */" must be
total nonsense as INTA for sure is not required for MSI... Plus we know
that pci-aardvark.c driver does not implement for pci bridge neither
INTx nor MSI... Ach... seems that this code is here since beginning and
needs to be fixed...
On Fri, Jan 07, 2022 at 11:28:26PM +0100, Pali Roh?r wrote:
> On Friday 07 January 2022 15:55:04 Bjorn Helgaas wrote:
> > On Thu, Nov 25, 2021 at 01:45:58PM +0100, Pali Roh?r wrote:
> > > Properly propagate failure from mvebu_pcie_add_windows() function back to
> > > the caller mvebu_pci_bridge_emul_base_conf_write() and correctly updates
> > > PCI_IO_BASE, PCI_MEM_BASE and PCI_IO_BASE_UPPER16 registers on error.
> > > On error set base value higher than limit value which indicates that
> > > address range is disabled.
> >
> > Does the spec say that if software programs something invalid,
> > hardware should proactively set the base and limit registers to
> > disable the window?
>
> No. But this patch address something totally different. Software can do
> fully valid operation, e.g. try to set forwarding memory window as large
> as possible. But because this driver "emulates" pci bridge by calling
> software/kernel function (mvebu_pcie_add_windows), some operations which
> in real HW cannot happen, are possible in software.
>
> For example there are limitations in sizes of forwarding memory windows,
> because it is done by mvebu-mbus driver, which is responsible for
> configuring mapping and forwarding of PCIe I/O and MEM windows. And due
> to Marvell HW, there are restrictions which are not in PCIe HW.
>
> Currently if such error happens, obviously kernel is not able to set
> PCIe windows and it just print warnings to dmesg. Trying to access these
> windows would result in the worst case in crashes.
>
> With this change when mvebu_pcie_add_windows() function fails then into
> emulated config space is put information that particular forwarding
> window is disabled. I think that it is better to indicate it in config
> space what is the current "reality" of hardware configuration. If window
> is disabled in real-HW (meaning in mvebu-mbus driver) then show it also
> in emulated config space of pci bridge.
>
> Do you have better idea what should emulated pci bridge do, if software
> try to set fully valid configuration of forwarding window, but it is not
> possible to achieve it (even compliant PCI bridge must be able to do
> it)?
On an ACPI system, the host bridge window sizes are constrained by the
host bridge _CRS method. I assume there's a similar constraint in DT.
Is the fact that mvebu_pcie_add_windows() can fail a symptom of a DT
that describes more available space than mvebu-bus can map?
> > I'm not sure I've seen hardware that does this, and it seems ... maybe
> > a little aggressive.
> >
> > What happens if software writes the base and limit in the wrong order,
> > so the window is invalid after the first write but valid after the
> > second? That actually sounds like it could be a sensible strategy to
> > prevent a partially-configured window from being active.
>
> Invalid window (limit < base) means that window is disabled. And
> pci-mvebu.c in its callbacks from pci-bridge-emul.c should correctly
> handle it and propagates information about disablement to mvebu-mbus
> driver.
>
> After window is valid again (limit > base) then pci-mvebu.c call
> mvebu-mbus to setup new mapping.
Not sure I'm understanding the code correctly. Here's the sort of
thing I'm worried about, but maybe this is actually impossible:
Let's say software writes (0x00, 0xff) to the I/O (base, limit), which
describes the [io 0x0000-0xffff] window. If mvebu-mbus can't handle
that, it looks like you set the (base, limit) to (0xf0, 0x0f), which
would describe [io 0xf000-0x0fff], which is invalid.
The software writes 0x40 to the limit, so now we have (0xf0, 0x40), or
[io 0xf000-0x40ff]. That's still invalid, but software thinks the
0x00 it wrote to the base is still there.
Bjorn
On Friday 07 January 2022 17:16:17 Bjorn Helgaas wrote:
> On Fri, Jan 07, 2022 at 11:28:26PM +0100, Pali Rohár wrote:
> > On Friday 07 January 2022 15:55:04 Bjorn Helgaas wrote:
> > > On Thu, Nov 25, 2021 at 01:45:58PM +0100, Pali Rohár wrote:
> > > > Properly propagate failure from mvebu_pcie_add_windows() function back to
> > > > the caller mvebu_pci_bridge_emul_base_conf_write() and correctly updates
> > > > PCI_IO_BASE, PCI_MEM_BASE and PCI_IO_BASE_UPPER16 registers on error.
> > > > On error set base value higher than limit value which indicates that
> > > > address range is disabled.
> > >
> > > Does the spec say that if software programs something invalid,
> > > hardware should proactively set the base and limit registers to
> > > disable the window?
> >
> > No. But this patch address something totally different. Software can do
> > fully valid operation, e.g. try to set forwarding memory window as large
> > as possible. But because this driver "emulates" pci bridge by calling
> > software/kernel function (mvebu_pcie_add_windows), some operations which
> > in real HW cannot happen, are possible in software.
> >
> > For example there are limitations in sizes of forwarding memory windows,
> > because it is done by mvebu-mbus driver, which is responsible for
> > configuring mapping and forwarding of PCIe I/O and MEM windows. And due
> > to Marvell HW, there are restrictions which are not in PCIe HW.
> >
> > Currently if such error happens, obviously kernel is not able to set
> > PCIe windows and it just print warnings to dmesg. Trying to access these
> > windows would result in the worst case in crashes.
> >
> > With this change when mvebu_pcie_add_windows() function fails then into
> > emulated config space is put information that particular forwarding
> > window is disabled. I think that it is better to indicate it in config
> > space what is the current "reality" of hardware configuration. If window
> > is disabled in real-HW (meaning in mvebu-mbus driver) then show it also
> > in emulated config space of pci bridge.
> >
> > Do you have better idea what should emulated pci bridge do, if software
> > try to set fully valid configuration of forwarding window, but it is not
> > possible to achieve it (even compliant PCI bridge must be able to do
> > it)?
>
> On an ACPI system, the host bridge window sizes are constrained by the
> host bridge _CRS method. I assume there's a similar constraint in DT.
>
> Is the fact that mvebu_pcie_add_windows() can fail a symptom of a DT
> that describes more available space than mvebu-bus can map?
Memory maps for mvebu are more complicated. There is no explicit size in
DT ranges property as it is dynamically allocated by mvebu-mbus:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/arm/boot/dts/armada-385.dtsi?h=v5.15#n47
On some Armada platform (I think it is AXP) there is lot of SerDeses
with PCIe functionality (I think six or seven?) but "shared memory pool"
which mvebu-mbus allocates to consumers is not big enough to allow e.g.
256 MB + 64 kB for every PCIe port.
There is upper limit of mvebu memory slots in HW and each slot has size
restrictions. So mvebu-mbus has to deal with splitting requested
PCIe window to more mvebu memory slots... So even if there is available
memory for assigning then there does not have to be a free slot.
Due to nature of plugable PCIe expansion cards, you are basically free
to put any PCIe card which you like into any PCIe slot. So it would be
up to the user to choose combination of such cards which pass all those
mvebu windows and slots restrictions... Otherwise kernel just say that
cannot satisfy card's BAR assignment because it is not possible to set
forwarding windows correctly.
Moreover it is possible to bind / unbind pci mvebu device dynamically at
runtime (also by rmmod), so whole resource allocation in mvebu-bus is
dynamic even during system runtime. So theoretically user can unbind one
driver to free some memory and then can bind another (which needs more
memory).
I think that this pci-mvebu driver and HW is very unusual in both
resource assignment and supported features and requirements from SW.
> > > I'm not sure I've seen hardware that does this, and it seems ... maybe
> > > a little aggressive.
> > >
> > > What happens if software writes the base and limit in the wrong order,
> > > so the window is invalid after the first write but valid after the
> > > second? That actually sounds like it could be a sensible strategy to
> > > prevent a partially-configured window from being active.
> >
> > Invalid window (limit < base) means that window is disabled. And
> > pci-mvebu.c in its callbacks from pci-bridge-emul.c should correctly
> > handle it and propagates information about disablement to mvebu-mbus
> > driver.
> >
> > After window is valid again (limit > base) then pci-mvebu.c call
> > mvebu-mbus to setup new mapping.
>
> Not sure I'm understanding the code correctly. Here's the sort of
> thing I'm worried about, but maybe this is actually impossible:
>
> Let's say software writes (0x00, 0xff) to the I/O (base, limit), which
> describes the [io 0x0000-0xffff] window. If mvebu-mbus can't handle
> that, it looks like you set the (base, limit) to (0xf0, 0x0f), which
> would describe [io 0xf000-0x0fff], which is invalid.
>
> The software writes 0x40 to the limit, so now we have (0xf0, 0x40), or
> [io 0xf000-0x40ff]. That's still invalid, but software thinks the
> 0x00 it wrote to the base is still there.
>
> Bjorn
I see. In this situation it does not work correctly.
But is not kernel itself "privileged" to setup forwarding windows?
Because currently kernel does not do it and therefore do we need to care
for it?
Or do you have idea how to handle this kind of situation? Or how to
handle these kinds of errors?
On Sat, Jan 08, 2022 at 12:46:58AM +0100, Pali Roh?r wrote:
> On Friday 07 January 2022 17:16:17 Bjorn Helgaas wrote:
> > On Fri, Jan 07, 2022 at 11:28:26PM +0100, Pali Roh?r wrote:
> > > On Friday 07 January 2022 15:55:04 Bjorn Helgaas wrote:
> > > > On Thu, Nov 25, 2021 at 01:45:58PM +0100, Pali Roh?r wrote:
> > > > > Properly propagate failure from mvebu_pcie_add_windows() function back to
> > > > > the caller mvebu_pci_bridge_emul_base_conf_write() and correctly updates
> > > > > PCI_IO_BASE, PCI_MEM_BASE and PCI_IO_BASE_UPPER16 registers on error.
> > > > > On error set base value higher than limit value which indicates that
> > > > > address range is disabled.
> > > >
> > > > Does the spec say that if software programs something invalid,
> > > > hardware should proactively set the base and limit registers to
> > > > disable the window?
> > >
> > > No. But this patch address something totally different. Software can do
> > > fully valid operation, e.g. try to set forwarding memory window as large
> > > as possible. But because this driver "emulates" pci bridge by calling
> > > software/kernel function (mvebu_pcie_add_windows), some operations which
> > > in real HW cannot happen, are possible in software.
> > >
> > > For example there are limitations in sizes of forwarding memory windows,
> > > because it is done by mvebu-mbus driver, which is responsible for
> > > configuring mapping and forwarding of PCIe I/O and MEM windows. And due
> > > to Marvell HW, there are restrictions which are not in PCIe HW.
> > >
> > > Currently if such error happens, obviously kernel is not able to set
> > > PCIe windows and it just print warnings to dmesg. Trying to access these
> > > windows would result in the worst case in crashes.
> > >
> > > With this change when mvebu_pcie_add_windows() function fails then into
> > > emulated config space is put information that particular forwarding
> > > window is disabled. I think that it is better to indicate it in config
> > > space what is the current "reality" of hardware configuration. If window
> > > is disabled in real-HW (meaning in mvebu-mbus driver) then show it also
> > > in emulated config space of pci bridge.
> > >
> > > Do you have better idea what should emulated pci bridge do, if software
> > > try to set fully valid configuration of forwarding window, but it is not
> > > possible to achieve it (even compliant PCI bridge must be able to do
> > > it)?
> >
> > On an ACPI system, the host bridge window sizes are constrained by the
> > host bridge _CRS method. I assume there's a similar constraint in DT.
> >
> > Is the fact that mvebu_pcie_add_windows() can fail a symptom of a DT
> > that describes more available space than mvebu-bus can map?
>
> Memory maps for mvebu are more complicated. There is no explicit size in
> DT ranges property as it is dynamically allocated by mvebu-mbus:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/arm/boot/dts/armada-385.dtsi?h=v5.15#n47
I wish I knew how to really interpret those "ranges" properties. (Is
there a good description in Documentation/ somewhere? All I've found
so far is https://elinux.org/Device_Tree_Usage, which is good, but
doesn't match this example completely.)
I see:
pciec: pcie {
ranges = <...>;
pcie1: pcie@1,0 {
ranges = <0x82000000 0 0 0x82000000 0x1 0 1 0
0x81000000 0 0 0x81000000 0x1 0 1 0>;
};
pcie2: pcie@2,0 {
ranges = <0x82000000 0 0 0x82000000 0x2 0 1 0
0x81000000 0 0 0x81000000 0x2 0 1 0>;
};
pcie3: pcie@3,0 {
ranges = <0x82000000 0 0 0x82000000 0x3 0 1 0
0x81000000 0 0 0x81000000 0x3 0 1 0>;
};
pcie4: pcie@4,0 {
ranges = <0x82000000 0 0 0x82000000 0x4 0 1 0
0x81000000 0 0 0x81000000 0x4 0 1 0>;
};
};
What does this look like in dmesg, i.e., what CPU address ranges are
mapped to what PCI bus addresses?
Are pcie1, pcie2, etc Root Ports? Or are they each separate host
bridges (they each have "bus-range = <0x00 0xff>")?
Is space from pciec dynamically assigned to pcie1, pcie2, etc? If so,
I assume there are more restrictions on the size and alignment than on
PCI bridge windows, which allow size/alignment down to 1MB?
I'm trying to see how this could be described in ACPI because that's a
fairly general model that accommodates most machines. Possibly
describing mvebu in ACPI would involve losing some flexibility.
Bjorn
On Wednesday 12 January 2022 18:19:21 Bjorn Helgaas wrote:
> On Sat, Jan 08, 2022 at 12:46:58AM +0100, Pali Rohár wrote:
> > On Friday 07 January 2022 17:16:17 Bjorn Helgaas wrote:
> > > On Fri, Jan 07, 2022 at 11:28:26PM +0100, Pali Rohár wrote:
> > > > On Friday 07 January 2022 15:55:04 Bjorn Helgaas wrote:
> > > > > On Thu, Nov 25, 2021 at 01:45:58PM +0100, Pali Rohár wrote:
> > > > > > Properly propagate failure from mvebu_pcie_add_windows() function back to
> > > > > > the caller mvebu_pci_bridge_emul_base_conf_write() and correctly updates
> > > > > > PCI_IO_BASE, PCI_MEM_BASE and PCI_IO_BASE_UPPER16 registers on error.
> > > > > > On error set base value higher than limit value which indicates that
> > > > > > address range is disabled.
> > > > >
> > > > > Does the spec say that if software programs something invalid,
> > > > > hardware should proactively set the base and limit registers to
> > > > > disable the window?
> > > >
> > > > No. But this patch address something totally different. Software can do
> > > > fully valid operation, e.g. try to set forwarding memory window as large
> > > > as possible. But because this driver "emulates" pci bridge by calling
> > > > software/kernel function (mvebu_pcie_add_windows), some operations which
> > > > in real HW cannot happen, are possible in software.
> > > >
> > > > For example there are limitations in sizes of forwarding memory windows,
> > > > because it is done by mvebu-mbus driver, which is responsible for
> > > > configuring mapping and forwarding of PCIe I/O and MEM windows. And due
> > > > to Marvell HW, there are restrictions which are not in PCIe HW.
> > > >
> > > > Currently if such error happens, obviously kernel is not able to set
> > > > PCIe windows and it just print warnings to dmesg. Trying to access these
> > > > windows would result in the worst case in crashes.
> > > >
> > > > With this change when mvebu_pcie_add_windows() function fails then into
> > > > emulated config space is put information that particular forwarding
> > > > window is disabled. I think that it is better to indicate it in config
> > > > space what is the current "reality" of hardware configuration. If window
> > > > is disabled in real-HW (meaning in mvebu-mbus driver) then show it also
> > > > in emulated config space of pci bridge.
> > > >
> > > > Do you have better idea what should emulated pci bridge do, if software
> > > > try to set fully valid configuration of forwarding window, but it is not
> > > > possible to achieve it (even compliant PCI bridge must be able to do
> > > > it)?
> > >
> > > On an ACPI system, the host bridge window sizes are constrained by the
> > > host bridge _CRS method. I assume there's a similar constraint in DT.
> > >
> > > Is the fact that mvebu_pcie_add_windows() can fail a symptom of a DT
> > > that describes more available space than mvebu-bus can map?
> >
> > Memory maps for mvebu are more complicated. There is no explicit size in
> > DT ranges property as it is dynamically allocated by mvebu-mbus:
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/arm/boot/dts/armada-385.dtsi?h=v5.15#n47
>
> I wish I knew how to really interpret those "ranges" properties. (Is
> there a good description in Documentation/ somewhere? All I've found
> so far is https://elinux.org/Device_Tree_Usage, which is good, but
> doesn't match this example completely.)
>
> I see:
>
> pciec: pcie {
> ranges = <...>;
> pcie1: pcie@1,0 {
> ranges = <0x82000000 0 0 0x82000000 0x1 0 1 0
> 0x81000000 0 0 0x81000000 0x1 0 1 0>;
> };
> pcie2: pcie@2,0 {
> ranges = <0x82000000 0 0 0x82000000 0x2 0 1 0
> 0x81000000 0 0 0x81000000 0x2 0 1 0>;
> };
> pcie3: pcie@3,0 {
> ranges = <0x82000000 0 0 0x82000000 0x3 0 1 0
> 0x81000000 0 0 0x81000000 0x3 0 1 0>;
> };
> pcie4: pcie@4,0 {
> ranges = <0x82000000 0 0 0x82000000 0x4 0 1 0
> 0x81000000 0 0 0x81000000 0x4 0 1 0>;
> };
> };
>
> What does this look like in dmesg, i.e., what CPU address ranges are
> mapped to what PCI bus addresses?
These explicit ranges in DT are probably ignored as they are invalid.
You can see them (0xffffffffffffffff) in dmesg. MEM and I/O resources
are parsed in pci-mvebu.c driver in mvebu_pcie_parse_request_resources()
function.
Here is relevant dmesg output:
[ 0.671607] mvebu-pcie soc:pcie: host bridge /soc/pcie ranges:
[ 0.677474] mvebu-pcie soc:pcie: MEM 0x00f1080000..0x00f1081fff -> 0x0000080000
[ 0.685245] mvebu-pcie soc:pcie: MEM 0x00f1040000..0x00f1041fff -> 0x0000040000
[ 0.693017] mvebu-pcie soc:pcie: MEM 0x00f1044000..0x00f1045fff -> 0x0000044000
[ 0.700788] mvebu-pcie soc:pcie: MEM 0x00f1048000..0x00f1049fff -> 0x0000048000
[ 0.708562] mvebu-pcie soc:pcie: MEM 0xffffffffffffffff..0x00fffffffe -> 0x0100000000
[ 0.716856] mvebu-pcie soc:pcie: IO 0xffffffffffffffff..0x00fffffffe -> 0x0100000000
[ 0.725146] mvebu-pcie soc:pcie: MEM 0xffffffffffffffff..0x00fffffffe -> 0x0200000000
[ 0.733438] mvebu-pcie soc:pcie: IO 0xffffffffffffffff..0x00fffffffe -> 0x0200000000
[ 0.741730] mvebu-pcie soc:pcie: MEM 0xffffffffffffffff..0x00fffffffe -> 0x0300000000
[ 0.750022] mvebu-pcie soc:pcie: IO 0xffffffffffffffff..0x00fffffffe -> 0x0300000000
[ 0.758314] mvebu-pcie soc:pcie: MEM 0xffffffffffffffff..0x00fffffffe -> 0x0400000000
[ 0.766606] mvebu-pcie soc:pcie: IO 0xffffffffffffffff..0x00fffffffe -> 0x0400000000
[ 0.907763] mvebu-pcie soc:pcie: PCI host bridge to bus 0000:00
[ 0.913698] pci_bus 0000:00: root bus resource [bus 00-ff]
[ 0.919203] pci_bus 0000:00: root bus resource [mem 0xf1080000-0xf1081fff] (bus address [0x00080000-0x00081fff])
[ 0.929406] pci_bus 0000:00: root bus resource [mem 0xf1040000-0xf1041fff] (bus address [0x00040000-0x00041fff])
[ 0.939608] pci_bus 0000:00: root bus resource [mem 0xf1044000-0xf1045fff] (bus address [0x00044000-0x00045fff])
[ 0.949812] pci_bus 0000:00: root bus resource [mem 0xf1048000-0xf1049fff] (bus address [0x00048000-0x00049fff])
[ 0.960014] pci_bus 0000:00: root bus resource [mem 0xe0000000-0xe7ffffff]
[ 0.966908] pci_bus 0000:00: root bus resource [io 0x1000-0xeffff]
[ 0.973261] pci 0000:00:01.0: [11ab:6820] type 01 class 0x060400
[ 0.979458] pci 0000:00:02.0: [11ab:6820] type 01 class 0x060400
[ 0.985643] pci 0000:00:03.0: [11ab:6820] type 01 class 0x060400
[ 0.992482] PCI: bus0: Fast back to back transfers disabled
[ 0.998075] pci 0000:00:01.0: bridge configuration invalid ([bus 00-00]), reconfiguring
[ 1.006108] pci 0000:00:02.0: bridge configuration invalid ([bus 00-00]), reconfiguring
[ 1.014134] pci 0000:00:03.0: bridge configuration invalid ([bus 00-00]), reconfiguring
[ 1.022242] pci 0000:01:00.0: [1e0f:0001] type 00 class 0x010802
[ 1.028289] pci 0000:01:00.0: reg 0x10: [mem 0xc0000000-0xc0003fff 64bit]
[ 1.035233] pci 0000:01:00.0: PME# supported from D0 D3hot
[ 1.040787] pci 0000:01:00.0: 2.000 Gb/s available PCIe bandwidth, limited by 2.5 GT/s PCIe x1 link at 0000:00:01.0 (capable of 31.504 Gb/s with 8.0 GT/s PCIe x4 link)
[ 1.056630] PCI: bus1: Fast back to back transfers disabled
[ 1.062217] pci_bus 0000:01: busn_res: [bus 01-ff] end is updated to 01
[ 1.068935] pci 0000:02:00.0: [168c:003c] type 00 class 0x028000
[ 1.074975] pci 0000:02:00.0: reg 0x10: [mem 0xc8000000-0xc81fffff 64bit]
[ 1.081817] pci 0000:02:00.0: reg 0x30: [mem 0xc8200000-0xc820ffff pref]
[ 1.088611] pci 0000:02:00.0: supports D1 D2
[ 1.093765] PCI: bus2: Fast back to back transfers disabled
[ 1.099356] pci_bus 0000:02: busn_res: [bus 02-ff] end is updated to 02
[ 1.106070] pci 0000:03:00.0: [168c:002e] type 00 class 0x028000
[ 1.112110] pci 0000:03:00.0: reg 0x10: [mem 0xd0000000-0xd000ffff 64bit]
[ 1.119031] pci 0000:03:00.0: supports D1
[ 1.123049] pci 0000:03:00.0: PME# supported from D0 D1 D3hot
[ 1.129688] PCI: bus3: Fast back to back transfers disabled
[ 1.135274] pci_bus 0000:03: busn_res: [bus 03-ff] end is updated to 03
[ 1.141926] pci 0000:00:01.0: BAR 8: assigned [mem 0xe0000000-0xe00fffff]
[ 1.148738] pci 0000:00:02.0: BAR 8: assigned [mem 0xe0200000-0xe04fffff]
[ 1.155544] pci 0000:00:03.0: BAR 8: assigned [mem 0xe0100000-0xe01fffff]
[ 1.162357] pci 0000:01:00.0: BAR 0: assigned [mem 0xe0000000-0xe0003fff 64bit]
[ 1.169696] pci 0000:00:01.0: PCI bridge to [bus 01]
[ 1.174674] pci 0000:00:01.0: bridge window [mem 0xe0000000-0xe00fffff]
[ 1.181490] pci 0000:02:00.0: BAR 0: assigned [mem 0xe0200000-0xe03fffff 64bit]
[ 1.188830] pci 0000:02:00.0: BAR 6: assigned [mem 0xe0400000-0xe040ffff pref]
[ 1.196074] pci 0000:00:02.0: PCI bridge to [bus 02]
[ 1.201050] pci 0000:00:02.0: bridge window [mem 0xe0200000-0xe04fffff]
[ 1.207867] pci 0000:03:00.0: BAR 0: assigned [mem 0xe0100000-0xe010ffff 64bit]
[ 1.215200] pci 0000:00:03.0: PCI bridge to [bus 03]
[ 1.220179] pci 0000:00:03.0: bridge window [mem 0xe0100000-0xe01fffff]
[ 1.227028] pcieport 0000:00:01.0: enabling device (0140 -> 0142)
[ 1.233186] pcieport 0000:00:02.0: enabling device (0140 -> 0142)
[ 1.239339] pcieport 0000:00:03.0: enabling device (0140 -> 0142)
And some more information from /proc:
$ cat /proc/ioports
00001000-000effff : PCI I/O
$ cat /proc/iomem
...
e0000000-e7ffffff : PCI MEM
e0000000-e00fffff : PCI Bus 0000:01
e0000000-e0003fff : 0000:01:00.0
e0000000-e0003fff : nvme
e0100000-e01fffff : PCI Bus 0000:03
e0100000-e010ffff : 0000:03:00.0
e0100000-e010ffff : ath9k
e0200000-e04fffff : PCI Bus 0000:02
e0200000-e03fffff : 0000:02:00.0
e0200000-e03fffff : ath
e0400000-e040ffff : 0000:02:00.0
...
> Are pcie1, pcie2, etc Root Ports? Or are they each separate host
> bridges (they each have "bus-range = <0x00 0xff>")?
From kernel point of view they are root ports. But in reality every of
these root port is on separate bus segment, but kernel pci-mvebu.c
driver merges all these segments/domains into one host bridge and put
all root ports into bus 0.
Here is lspci -tvnn output with topology:
$ lspci -tvnn
-[0000:00]-+-01.0-[01]----00.0 Device [1e0f:0001]
+-02.0-[02]----00.0 Qualcomm Atheros QCA986x/988x 802.11ac Wireless Network Adapter [168c:003c]
\-03.0-[03]----00.0 Qualcomm Atheros AR9287 Wireless Network Adapter (PCI-Express) [168c:002e]
And without topology:
$ lspci -nn
00:01.0 PCI bridge [0604]: Marvell Technology Group Ltd. Device [11ab:6820] (rev 04)
00:02.0 PCI bridge [0604]: Marvell Technology Group Ltd. Device [11ab:6820] (rev 04)
00:03.0 PCI bridge [0604]: Marvell Technology Group Ltd. Device [11ab:6820] (rev 04)
01:00.0 Non-Volatile memory controller [0108]: Device [1e0f:0001]
02:00.0 Network controller [0280]: Qualcomm Atheros QCA986x/988x 802.11ac Wireless Network Adapter [168c:003c]
03:00.0 Network controller [0280]: Qualcomm Atheros AR9287 Wireless Network Adapter (PCI-Express) [168c:002e] (rev 01)
Buses 1, 2 and 3 represents mPCIe cards, all of them are in reality in
separate bus segments and on different HW host bridges. So they do *not*
share access to config space, do *not* share INTx interrupts, etc...
> Is space from pciec dynamically assigned to pcie1, pcie2, etc? If so,
> I assume there are more restrictions on the size and alignment than on
> PCI bridge windows, which allow size/alignment down to 1MB?
Yes, exactly. I do not know now all restrictions. At least there are
fixed number of memory slots and each has to be of size 2^N. They are
dynamically assigned by kernel mbus driver at time when somebody updates
BASE/LIMIT registers. And that kernel mbus driver takes care to split
non-aligned window size to more slots of size 2^N. And resources are
shared from pool with other HW parts (e.g. DMA), so other drivers loaded
in kernel can "eat" available slots before pci-mvebu and then there does
not have to be nothing to allocate for PCI.
But most Armada boards do not have exported all peripherals from SoC,
unconnected are disabled in DT and therefore exhaustion should not
happen.
> I'm trying to see how this could be described in ACPI because that's a
> fairly general model that accommodates most machines. Possibly
> describing mvebu in ACPI would involve losing some flexibility.
>
> Bjorn
I do not understand APCI model very well and I'm in impression that it
is impossible to represent mvebu in ACPI.
On Thu, Jan 13, 2022 at 11:35:23AM +0100, Pali Roh?r wrote:
> On Wednesday 12 January 2022 18:19:21 Bjorn Helgaas wrote:
> > On Sat, Jan 08, 2022 at 12:46:58AM +0100, Pali Roh?r wrote:
> > > On Friday 07 January 2022 17:16:17 Bjorn Helgaas wrote:
> > > > On Fri, Jan 07, 2022 at 11:28:26PM +0100, Pali Roh?r wrote:
> > > > > On Friday 07 January 2022 15:55:04 Bjorn Helgaas wrote:
> > > > > > On Thu, Nov 25, 2021 at 01:45:58PM +0100, Pali Roh?r wrote:
> > > > > > > Properly propagate failure from mvebu_pcie_add_windows()
> > > > > > > function back to the caller
> > > > > > > mvebu_pci_bridge_emul_base_conf_write() and correctly
> > > > > > > updates PCI_IO_BASE, PCI_MEM_BASE and
> > > > > > > PCI_IO_BASE_UPPER16 registers on error. On error set
> > > > > > > base value higher than limit value which indicates that
> > > > > > > address range is disabled.
> > > > > >
> > > > > > Does the spec say that if software programs something
> > > > > > invalid, hardware should proactively set the base and
> > > > > > limit registers to disable the window?
> > > > >
> > > > > No. But this patch address something totally different.
> > > > > Software can do fully valid operation, e.g. try to set
> > > > > forwarding memory window as large as possible. But because
> > > > > this driver "emulates" pci bridge by calling software/kernel
> > > > > function (mvebu_pcie_add_windows), some operations which in
> > > > > real HW cannot happen, are possible in software.
> > > > >
> > > > > For example there are limitations in sizes of forwarding
> > > > > memory windows, because it is done by mvebu-mbus driver,
> > > > > which is responsible for configuring mapping and forwarding
> > > > > of PCIe I/O and MEM windows. And due to Marvell HW, there
> > > > > are restrictions which are not in PCIe HW.
> > > > >
> > > > > Currently if such error happens, obviously kernel is not
> > > > > able to set PCIe windows and it just print warnings to
> > > > > dmesg. Trying to access these windows would result in the
> > > > > worst case in crashes.
> > > > >
> > > > > With this change when mvebu_pcie_add_windows() function
> > > > > fails then into emulated config space is put information
> > > > > that particular forwarding window is disabled. I think that
> > > > > it is better to indicate it in config space what is the
> > > > > current "reality" of hardware configuration. If window is
> > > > > disabled in real-HW (meaning in mvebu-mbus driver) then show
> > > > > it also in emulated config space of pci bridge.
> > > > >
> > > > > Do you have better idea what should emulated pci bridge do,
> > > > > if software try to set fully valid configuration of
> > > > > forwarding window, but it is not possible to achieve it
> > > > > (even compliant PCI bridge must be able to do it)?
> > > >
> > > > On an ACPI system, the host bridge window sizes are
> > > > constrained by the host bridge _CRS method. I assume there's
> > > > a similar constraint in DT.
> > > >
> > > > Is the fact that mvebu_pcie_add_windows() can fail a symptom
> > > > of a DT that describes more available space than mvebu-bus can
> > > > map?
> > >
> > > Memory maps for mvebu are more complicated. There is no explicit
> > > size in DT ranges property as it is dynamically allocated by
> > > mvebu-mbus:
> > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/arm/boot/dts/armada-385.dtsi?h=v5.15#n47
> >
> > I wish I knew how to really interpret those "ranges" properties.
> > (Is there a good description in Documentation/ somewhere? All
> > I've found so far is https://elinux.org/Device_Tree_Usage, which
> > is good, but doesn't match this example completely.)
> >
> > I see:
> >
> > pciec: pcie {
> > ranges = <...>;
> > pcie1: pcie@1,0 {
> > ranges = <0x82000000 0 0 0x82000000 0x1 0 1 0
> > 0x81000000 0 0 0x81000000 0x1 0 1 0>;
> > };
> > pcie2: pcie@2,0 {
> > ranges = <0x82000000 0 0 0x82000000 0x2 0 1 0
> > 0x81000000 0 0 0x81000000 0x2 0 1 0>;
> > };
> > pcie3: pcie@3,0 {
> > ranges = <0x82000000 0 0 0x82000000 0x3 0 1 0
> > 0x81000000 0 0 0x81000000 0x3 0 1 0>;
> > };
> > pcie4: pcie@4,0 {
> > ranges = <0x82000000 0 0 0x82000000 0x4 0 1 0
> > 0x81000000 0 0 0x81000000 0x4 0 1 0>;
> > };
> > };
> >
> > What does this look like in dmesg, i.e., what CPU address ranges are
> > mapped to what PCI bus addresses?
>
> These explicit ranges in DT are probably ignored as they are invalid.
> You can see them (0xffffffffffffffff) in dmesg.
Are you saying that this DT ranges and the dmesg line are connected?
ranges = <0x82000000 0 0 0x82000000 0x1 0 1 0
0x81000000 0 0 0x81000000 0x1 0 1 0>;
mvebu-pcie soc:pcie: MEM 0xffffffffffffffff..0x00fffffffe -> 0x0100000000
1) It would be nice if there were a hint somewhere in Documentation/
that would allow mere mortals to see the connection there.
2) Why do we have these DT entries if they are invalid and useless?
> MEM and I/O resources are parsed in pci-mvebu.c driver in
> mvebu_pcie_parse_request_resources() function.
So mvebu-mbus.c fills in the static mbus_state from the DT
"pcie-mem-aperture", which seems unconnected to the DT descriptions of
the PCI controllers:
static struct mvebu_mbus_state mbus_state;
mvebu_mbus_dt_init
mvebu_mbus_get_pcie_resources(&mbus_state.pcie_mem_aperture)
of_property_read_u32_array("pcie-mem-aperture")
mvebu_pcie_probe
mvebu_pcie_parse_request_resources
mvebu_mbus_get_pcie_mem_aperture(&pcie->mem)
*res = mbus_state.pcie_mem_aperture
pci_add_resource(&bridge->windows, &pcie->mem)
> Here is relevant dmesg output:
>
> mvebu-pcie soc:pcie: host bridge /soc/pcie ranges:
> mvebu-pcie soc:pcie: MEM 0x00f1080000..0x00f1081fff -> 0x0000080000
> mvebu-pcie soc:pcie: MEM 0x00f1040000..0x00f1041fff -> 0x0000040000
> mvebu-pcie soc:pcie: MEM 0x00f1044000..0x00f1045fff -> 0x0000044000
> mvebu-pcie soc:pcie: MEM 0x00f1048000..0x00f1049fff -> 0x0000048000
> mvebu-pcie soc:pcie: MEM 0xffffffffffffffff..0x00fffffffe -> 0x0100000000
> mvebu-pcie soc:pcie: IO 0xffffffffffffffff..0x00fffffffe -> 0x0100000000
> mvebu-pcie soc:pcie: MEM 0xffffffffffffffff..0x00fffffffe -> 0x0200000000
> mvebu-pcie soc:pcie: IO 0xffffffffffffffff..0x00fffffffe -> 0x0200000000
> mvebu-pcie soc:pcie: MEM 0xffffffffffffffff..0x00fffffffe -> 0x0300000000
> mvebu-pcie soc:pcie: IO 0xffffffffffffffff..0x00fffffffe -> 0x0300000000
> mvebu-pcie soc:pcie: MEM 0xffffffffffffffff..0x00fffffffe -> 0x0400000000
> mvebu-pcie soc:pcie: IO 0xffffffffffffffff..0x00fffffffe -> 0x0400000000
> mvebu-pcie soc:pcie: PCI host bridge to bus 0000:00
> pci_bus 0000:00: root bus resource [bus 00-ff]
> pci_bus 0000:00: root bus resource [mem 0xf1080000-0xf1081fff] (bus address [0x00080000-0x00081fff])
> pci_bus 0000:00: root bus resource [mem 0xf1040000-0xf1041fff] (bus address [0x00040000-0x00041fff])
> pci_bus 0000:00: root bus resource [mem 0xf1044000-0xf1045fff] (bus address [0x00044000-0x00045fff])
> pci_bus 0000:00: root bus resource [mem 0xf1048000-0xf1049fff] (bus address [0x00048000-0x00049fff])
> pci_bus 0000:00: root bus resource [mem 0xe0000000-0xe7ffffff]
I see 0xf1080000-0xf1081fff, 0xf1040000-0xf1041fff, etc mentioned in
the DT info above, but I don't see where [mem 0xe0000000-0xe7ffffff]
came from.
Regardless, this means PCI thinks [mem 0xe0000000-0xe7ffffff] is
available on bus 00 and can be assigned to devices on bus 00 according
to the normal PCI rules (BARs aligned on size, PCI bridge windows
aligned on 1MB and multiple of 1MB in size). IIUC, mvebu imposes
additional alignment constraints on the bridge windows.
These are the bridge window assignments from your dmesg:
> pci 0000:00:01.0: BAR 8: assigned [mem 0xe0000000-0xe00fffff]
> pci 0000:00:02.0: BAR 8: assigned [mem 0xe0200000-0xe04fffff]
> pci 0000:00:03.0: BAR 8: assigned [mem 0xe0100000-0xe01fffff]
> pci 0000:00:01.0: PCI bridge to [bus 01]
> pci 0000:00:01.0: bridge window [mem 0xe0000000-0xe00fffff]
> pci 0000:00:02.0: PCI bridge to [bus 02]
> pci 0000:00:02.0: bridge window [mem 0xe0200000-0xe04fffff]
> pci 0000:00:03.0: PCI bridge to [bus 03]
> pci 0000:00:03.0: bridge window [mem 0xe0100000-0xe01fffff]
The PCI core knows nothing about the mvebu constraints. Are we just
lucky here that when PCI assigned these bridge windows, they happen to
be supported on mvebu? What happens if PCI decides it needs 29MB on
bus 01?
> > Are pcie1, pcie2, etc Root Ports? Or are they each separate host
> > bridges (they each have "bus-range = <0x00 0xff>")?
>
> From kernel point of view they are root ports. But in reality every of
> these root port is on separate bus segment, but kernel pci-mvebu.c
> driver merges all these segments/domains into one host bridge and put
> all root ports into bus 0.
>
> Here is lspci -tvnn output with topology:
>
> $ lspci -tvnn
> -[0000:00]-+-01.0-[01]----00.0 Device [1e0f:0001]
> +-02.0-[02]----00.0 Qualcomm Atheros QCA986x/988x 802.11ac Wireless Network Adapter [168c:003c]
> \-03.0-[03]----00.0 Qualcomm Atheros AR9287 Wireless Network Adapter (PCI-Express) [168c:002e]
> Buses 1, 2 and 3 represents mPCIe cards, all of them are in reality
> in separate bus segments and on different HW host bridges. So they
> do *not* share access to config space, do *not* share INTx
> interrupts, etc...
>
> > Is space from pciec dynamically assigned to pcie1, pcie2, etc? If
> > so, I assume there are more restrictions on the size and alignment
> > than on PCI bridge windows, which allow size/alignment down to
> > 1MB?
>
> Yes, exactly. I do not know now all restrictions. At least there are
> fixed number of memory slots and each has to be of size 2^N. They
> are dynamically assigned by kernel mbus driver at time when somebody
> updates BASE/LIMIT registers. And that kernel mbus driver takes care
> to split non-aligned window size to more slots of size 2^N. And
> resources are shared from pool with other HW parts (e.g. DMA), so
> other drivers loaded in kernel can "eat" available slots before
> pci-mvebu and then there does not have to be nothing to allocate for
> PCI.
So IIUC,
pcie1 == 00:01.0 Root Port
pcie2 == 00:02.0 Root Port
pcie3 == 00:03.0 Root Port
From a software point of view, they're all under a single host bridge,
and Linux assumes everything under a host bridge plays by the PCI
rules.
In this case, the root ports *don't* play by the rules since they have
additional alignment restrictions, so I think these really should be
described as separate host bridges in DT with the address space
carved up statically among them.
It's common on x86 to have multiple host bridges that all appear to
software to be in domain 0000. The bus number ranges under each are
static, e.g., one bridge has [bus 00-7f] and another has [bus 80-ff].
> But most Armada boards do not have exported all peripherals from SoC,
> unconnected are disabled in DT and therefore exhaustion should not
> happen.
>
> > I'm trying to see how this could be described in ACPI because that's a
> > fairly general model that accommodates most machines. Possibly
> > describing mvebu in ACPI would involve losing some flexibility.
>
> I do not understand APCI model very well and I'm in impression that it
> is impossible to represent mvebu in ACPI.
It could be described as a separate host bridge for every root port.
ACPI uses _CRS (current resource settings) to describe the apertures
to PCI and any address translation. Currently the _CRS description is
static, but ACPI does allow those resource assignments to be modified
via _PRS (possible resource settings) and _SRS (set resource
settings).
Bjorn
On Thursday 20 January 2022 11:50:47 Bjorn Helgaas wrote:
> On Thu, Jan 13, 2022 at 11:35:23AM +0100, Pali Rohár wrote:
> > On Wednesday 12 January 2022 18:19:21 Bjorn Helgaas wrote:
> > > On Sat, Jan 08, 2022 at 12:46:58AM +0100, Pali Rohár wrote:
> > > > On Friday 07 January 2022 17:16:17 Bjorn Helgaas wrote:
> > > > > On Fri, Jan 07, 2022 at 11:28:26PM +0100, Pali Rohár wrote:
> > > > > > On Friday 07 January 2022 15:55:04 Bjorn Helgaas wrote:
> > > > > > > On Thu, Nov 25, 2021 at 01:45:58PM +0100, Pali Rohár wrote:
> > > > > > > > Properly propagate failure from mvebu_pcie_add_windows()
> > > > > > > > function back to the caller
> > > > > > > > mvebu_pci_bridge_emul_base_conf_write() and correctly
> > > > > > > > updates PCI_IO_BASE, PCI_MEM_BASE and
> > > > > > > > PCI_IO_BASE_UPPER16 registers on error. On error set
> > > > > > > > base value higher than limit value which indicates that
> > > > > > > > address range is disabled.
> > > > > > >
> > > > > > > Does the spec say that if software programs something
> > > > > > > invalid, hardware should proactively set the base and
> > > > > > > limit registers to disable the window?
> > > > > >
> > > > > > No. But this patch address something totally different.
> > > > > > Software can do fully valid operation, e.g. try to set
> > > > > > forwarding memory window as large as possible. But because
> > > > > > this driver "emulates" pci bridge by calling software/kernel
> > > > > > function (mvebu_pcie_add_windows), some operations which in
> > > > > > real HW cannot happen, are possible in software.
> > > > > >
> > > > > > For example there are limitations in sizes of forwarding
> > > > > > memory windows, because it is done by mvebu-mbus driver,
> > > > > > which is responsible for configuring mapping and forwarding
> > > > > > of PCIe I/O and MEM windows. And due to Marvell HW, there
> > > > > > are restrictions which are not in PCIe HW.
> > > > > >
> > > > > > Currently if such error happens, obviously kernel is not
> > > > > > able to set PCIe windows and it just print warnings to
> > > > > > dmesg. Trying to access these windows would result in the
> > > > > > worst case in crashes.
> > > > > >
> > > > > > With this change when mvebu_pcie_add_windows() function
> > > > > > fails then into emulated config space is put information
> > > > > > that particular forwarding window is disabled. I think that
> > > > > > it is better to indicate it in config space what is the
> > > > > > current "reality" of hardware configuration. If window is
> > > > > > disabled in real-HW (meaning in mvebu-mbus driver) then show
> > > > > > it also in emulated config space of pci bridge.
> > > > > >
> > > > > > Do you have better idea what should emulated pci bridge do,
> > > > > > if software try to set fully valid configuration of
> > > > > > forwarding window, but it is not possible to achieve it
> > > > > > (even compliant PCI bridge must be able to do it)?
> > > > >
> > > > > On an ACPI system, the host bridge window sizes are
> > > > > constrained by the host bridge _CRS method. I assume there's
> > > > > a similar constraint in DT.
> > > > >
> > > > > Is the fact that mvebu_pcie_add_windows() can fail a symptom
> > > > > of a DT that describes more available space than mvebu-bus can
> > > > > map?
> > > >
> > > > Memory maps for mvebu are more complicated. There is no explicit
> > > > size in DT ranges property as it is dynamically allocated by
> > > > mvebu-mbus:
> > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/arm/boot/dts/armada-385.dtsi?h=v5.15#n47
> > >
> > > I wish I knew how to really interpret those "ranges" properties.
> > > (Is there a good description in Documentation/ somewhere? All
> > > I've found so far is https://elinux.org/Device_Tree_Usage, which
> > > is good, but doesn't match this example completely.)
> > >
> > > I see:
> > >
> > > pciec: pcie {
> > > ranges = <...>;
> > > pcie1: pcie@1,0 {
> > > ranges = <0x82000000 0 0 0x82000000 0x1 0 1 0
> > > 0x81000000 0 0 0x81000000 0x1 0 1 0>;
> > > };
> > > pcie2: pcie@2,0 {
> > > ranges = <0x82000000 0 0 0x82000000 0x2 0 1 0
> > > 0x81000000 0 0 0x81000000 0x2 0 1 0>;
> > > };
> > > pcie3: pcie@3,0 {
> > > ranges = <0x82000000 0 0 0x82000000 0x3 0 1 0
> > > 0x81000000 0 0 0x81000000 0x3 0 1 0>;
> > > };
> > > pcie4: pcie@4,0 {
> > > ranges = <0x82000000 0 0 0x82000000 0x4 0 1 0
> > > 0x81000000 0 0 0x81000000 0x4 0 1 0>;
> > > };
> > > };
> > >
> > > What does this look like in dmesg, i.e., what CPU address ranges are
> > > mapped to what PCI bus addresses?
> >
> > These explicit ranges in DT are probably ignored as they are invalid.
> > You can see them (0xffffffffffffffff) in dmesg.
>
> Are you saying that this DT ranges and the dmesg line are connected?
>
> ranges = <0x82000000 0 0 0x82000000 0x1 0 1 0
> 0x81000000 0 0 0x81000000 0x1 0 1 0>;
>
> mvebu-pcie soc:pcie: MEM 0xffffffffffffffff..0x00fffffffe -> 0x0100000000
>
> 1) It would be nice if there were a hint somewhere in Documentation/
> that would allow mere mortals to see the connection there.
I agree, that there is missing lot of documentation related to Marvell
PCIe controllers, both pci-mvebu.c and pci-aardvark.c. The main problem
now is that it is hard to find information which explain everything...
like mysterious Memory Controller (which is now explained):
https://lore.kernel.org/linux-pci/20211003120944.3lmwxylnhlp2kfj7@pali/
> 2) Why do we have these DT entries if they are invalid and useless?
Sorry, I do not know. I was not involved during introducing of DT files
for this platform. I'm just observing how it works...
> > MEM and I/O resources are parsed in pci-mvebu.c driver in
> > mvebu_pcie_parse_request_resources() function.
>
> So mvebu-mbus.c fills in the static mbus_state from the DT
> "pcie-mem-aperture", which seems unconnected to the DT descriptions of
> the PCI controllers:
>
> static struct mvebu_mbus_state mbus_state;
>
> mvebu_mbus_dt_init
> mvebu_mbus_get_pcie_resources(&mbus_state.pcie_mem_aperture)
> of_property_read_u32_array("pcie-mem-aperture")
>
> mvebu_pcie_probe
> mvebu_pcie_parse_request_resources
> mvebu_mbus_get_pcie_mem_aperture(&pcie->mem)
> *res = mbus_state.pcie_mem_aperture
> pci_add_resource(&bridge->windows, &pcie->mem)
>
> > Here is relevant dmesg output:
> >
> > mvebu-pcie soc:pcie: host bridge /soc/pcie ranges:
> > mvebu-pcie soc:pcie: MEM 0x00f1080000..0x00f1081fff -> 0x0000080000
> > mvebu-pcie soc:pcie: MEM 0x00f1040000..0x00f1041fff -> 0x0000040000
> > mvebu-pcie soc:pcie: MEM 0x00f1044000..0x00f1045fff -> 0x0000044000
> > mvebu-pcie soc:pcie: MEM 0x00f1048000..0x00f1049fff -> 0x0000048000
> > mvebu-pcie soc:pcie: MEM 0xffffffffffffffff..0x00fffffffe -> 0x0100000000
> > mvebu-pcie soc:pcie: IO 0xffffffffffffffff..0x00fffffffe -> 0x0100000000
> > mvebu-pcie soc:pcie: MEM 0xffffffffffffffff..0x00fffffffe -> 0x0200000000
> > mvebu-pcie soc:pcie: IO 0xffffffffffffffff..0x00fffffffe -> 0x0200000000
> > mvebu-pcie soc:pcie: MEM 0xffffffffffffffff..0x00fffffffe -> 0x0300000000
> > mvebu-pcie soc:pcie: IO 0xffffffffffffffff..0x00fffffffe -> 0x0300000000
> > mvebu-pcie soc:pcie: MEM 0xffffffffffffffff..0x00fffffffe -> 0x0400000000
> > mvebu-pcie soc:pcie: IO 0xffffffffffffffff..0x00fffffffe -> 0x0400000000
> > mvebu-pcie soc:pcie: PCI host bridge to bus 0000:00
> > pci_bus 0000:00: root bus resource [bus 00-ff]
> > pci_bus 0000:00: root bus resource [mem 0xf1080000-0xf1081fff] (bus address [0x00080000-0x00081fff])
> > pci_bus 0000:00: root bus resource [mem 0xf1040000-0xf1041fff] (bus address [0x00040000-0x00041fff])
> > pci_bus 0000:00: root bus resource [mem 0xf1044000-0xf1045fff] (bus address [0x00044000-0x00045fff])
> > pci_bus 0000:00: root bus resource [mem 0xf1048000-0xf1049fff] (bus address [0x00048000-0x00049fff])
> > pci_bus 0000:00: root bus resource [mem 0xe0000000-0xe7ffffff]
>
> I see 0xf1080000-0xf1081fff, 0xf1040000-0xf1041fff, etc mentioned in
> the DT info above, but I don't see where [mem 0xe0000000-0xe7ffffff]
> came from.
0xe0000000 is defined in above mentioned "pcie-mem-aperture" DT property
and it is in armada-38x.dtsi DTS file included from armada-385.dtsi:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/arm/boot/dts/armada-38x.dtsi?h=v5.15#n42
> Regardless, this means PCI thinks [mem 0xe0000000-0xe7ffffff] is
> available on bus 00 and can be assigned to devices on bus 00 according
> to the normal PCI rules (BARs aligned on size, PCI bridge windows
> aligned on 1MB and multiple of 1MB in size). IIUC, mvebu imposes
> additional alignment constraints on the bridge windows.
>
> These are the bridge window assignments from your dmesg:
>
> > pci 0000:00:01.0: BAR 8: assigned [mem 0xe0000000-0xe00fffff]
> > pci 0000:00:02.0: BAR 8: assigned [mem 0xe0200000-0xe04fffff]
> > pci 0000:00:03.0: BAR 8: assigned [mem 0xe0100000-0xe01fffff]
>
> > pci 0000:00:01.0: PCI bridge to [bus 01]
> > pci 0000:00:01.0: bridge window [mem 0xe0000000-0xe00fffff]
> > pci 0000:00:02.0: PCI bridge to [bus 02]
> > pci 0000:00:02.0: bridge window [mem 0xe0200000-0xe04fffff]
> > pci 0000:00:03.0: PCI bridge to [bus 03]
> > pci 0000:00:03.0: bridge window [mem 0xe0100000-0xe01fffff]
>
> The PCI core knows nothing about the mvebu constraints. Are we just
> lucky here that when PCI assigned these bridge windows, they happen to
> be supported on mvebu? What happens if PCI decides it needs 29MB on
> bus 01?
In this case pci-mvebu.c split 29MB window into continuous ranges of
power of two (16MB + 8MB + 4MB + 1MB) and then register each range to
mbus slot. Code is in function mvebu_pcie_add_windows():
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/pci/controller/pci-mvebu.c?h=v5.15#n300
So at the end there is continuous space of 29MB PCIe window, just it
"eats" 4 mbus slots.
This function may fail (if there is not enough free mbus slots) and this
patch is propagating that failure back to the caller.
> > > Are pcie1, pcie2, etc Root Ports? Or are they each separate host
> > > bridges (they each have "bus-range = <0x00 0xff>")?
> >
> > From kernel point of view they are root ports. But in reality every of
> > these root port is on separate bus segment, but kernel pci-mvebu.c
> > driver merges all these segments/domains into one host bridge and put
> > all root ports into bus 0.
> >
> > Here is lspci -tvnn output with topology:
> >
> > $ lspci -tvnn
> > -[0000:00]-+-01.0-[01]----00.0 Device [1e0f:0001]
> > +-02.0-[02]----00.0 Qualcomm Atheros QCA986x/988x 802.11ac Wireless Network Adapter [168c:003c]
> > \-03.0-[03]----00.0 Qualcomm Atheros AR9287 Wireless Network Adapter (PCI-Express) [168c:002e]
>
> > Buses 1, 2 and 3 represents mPCIe cards, all of them are in reality
> > in separate bus segments and on different HW host bridges. So they
> > do *not* share access to config space, do *not* share INTx
> > interrupts, etc...
> >
> > > Is space from pciec dynamically assigned to pcie1, pcie2, etc? If
> > > so, I assume there are more restrictions on the size and alignment
> > > than on PCI bridge windows, which allow size/alignment down to
> > > 1MB?
> >
> > Yes, exactly. I do not know now all restrictions. At least there are
> > fixed number of memory slots and each has to be of size 2^N. They
> > are dynamically assigned by kernel mbus driver at time when somebody
> > updates BASE/LIMIT registers. And that kernel mbus driver takes care
> > to split non-aligned window size to more slots of size 2^N. And
> > resources are shared from pool with other HW parts (e.g. DMA), so
> > other drivers loaded in kernel can "eat" available slots before
> > pci-mvebu and then there does not have to be nothing to allocate for
> > PCI.
>
> So IIUC,
>
> pcie1 == 00:01.0 Root Port
> pcie2 == 00:02.0 Root Port
> pcie3 == 00:03.0 Root Port
>
> From a software point of view, they're all under a single host bridge,
> and Linux assumes everything under a host bridge plays by the PCI
> rules.
Yes.
> In this case, the root ports *don't* play by the rules since they have
> additional alignment restrictions, so I think these really should be
> described as separate host bridges in DT with the address space
> carved up statically among them.
I fully agree with you.
But pci-mvebu.c driver and also its DT bindings are written differently.
Changing it probably would not be simple due to backward compatibility
and will take development resources...
> It's common on x86 to have multiple host bridges that all appear to
> software to be in domain 0000. The bus number ranges under each are
> static, e.g., one bridge has [bus 00-7f] and another has [bus 80-ff].
For mvebu they are dynamic and kernel assigns them at boot. As my above
printed lspci topology is simple, first bridge has assigned [bus 01-01],
second bridge [bus 02-02] and third bridge [bus 03-03].
> > But most Armada boards do not have exported all peripherals from SoC,
> > unconnected are disabled in DT and therefore exhaustion should not
> > happen.
> >
> > > I'm trying to see how this could be described in ACPI because that's a
> > > fairly general model that accommodates most machines. Possibly
> > > describing mvebu in ACPI would involve losing some flexibility.
> >
> > I do not understand APCI model very well and I'm in impression that it
> > is impossible to represent mvebu in ACPI.
>
> It could be described as a separate host bridge for every root port.
> ACPI uses _CRS (current resource settings) to describe the apertures
> to PCI and any address translation. Currently the _CRS description is
> static, but ACPI does allow those resource assignments to be modified
> via _PRS (possible resource settings) and _SRS (set resource
> settings).
>
> Bjorn
On Thu, Jan 20, 2022 at 08:08:26PM +0100, Pali Roh?r wrote:
> On Thursday 20 January 2022 11:50:47 Bjorn Helgaas wrote:
> > On Thu, Jan 13, 2022 at 11:35:23AM +0100, Pali Roh?r wrote:
> > > On Wednesday 12 January 2022 18:19:21 Bjorn Helgaas wrote:
> > > > On Sat, Jan 08, 2022 at 12:46:58AM +0100, Pali Roh?r wrote:
> > > > > On Friday 07 January 2022 17:16:17 Bjorn Helgaas wrote:
> > > > > > On Fri, Jan 07, 2022 at 11:28:26PM +0100, Pali Roh?r wrote:
> > > > > > > On Friday 07 January 2022 15:55:04 Bjorn Helgaas wrote:
> > > > > > > > On Thu, Nov 25, 2021 at 01:45:58PM +0100, Pali Roh?r wrote:
> > > > > > > > > Properly propagate failure from
> > > > > > > > > mvebu_pcie_add_windows() function back to the caller
> > > > > > > > > mvebu_pci_bridge_emul_base_conf_write() and
> > > > > > > > > correctly updates PCI_IO_BASE, PCI_MEM_BASE and
> > > > > > > > > PCI_IO_BASE_UPPER16 registers on error. On error
> > > > > > > > > set base value higher than limit value which
> > > > > > > > > indicates that address range is disabled.
> > Regardless, this means PCI thinks [mem 0xe0000000-0xe7ffffff] is
> > available on bus 00 and can be assigned to devices on bus 00
> > according to the normal PCI rules (BARs aligned on size, PCI
> > bridge windows aligned on 1MB and multiple of 1MB in size). IIUC,
> > mvebu imposes additional alignment constraints on the bridge
> > windows.
> >
> > These are the bridge window assignments from your dmesg:
> >
> > > pci 0000:00:01.0: BAR 8: assigned [mem 0xe0000000-0xe00fffff]
> > > pci 0000:00:02.0: BAR 8: assigned [mem 0xe0200000-0xe04fffff]
> > > pci 0000:00:03.0: BAR 8: assigned [mem 0xe0100000-0xe01fffff]
> >
> > > pci 0000:00:01.0: PCI bridge to [bus 01]
> > > pci 0000:00:01.0: bridge window [mem 0xe0000000-0xe00fffff]
> > > pci 0000:00:02.0: PCI bridge to [bus 02]
> > > pci 0000:00:02.0: bridge window [mem 0xe0200000-0xe04fffff]
> > > pci 0000:00:03.0: PCI bridge to [bus 03]
> > > pci 0000:00:03.0: bridge window [mem 0xe0100000-0xe01fffff]
> >
> > The PCI core knows nothing about the mvebu constraints. Are we
> > just lucky here that when PCI assigned these bridge windows, they
> > happen to be supported on mvebu? What happens if PCI decides it
> > needs 29MB on bus 01?
>
> In this case pci-mvebu.c split 29MB window into continuous ranges of
> power of two (16MB + 8MB + 4MB + 1MB) and then register each range
> to mbus slot. Code is in function mvebu_pcie_add_windows():
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/pci/controller/pci-mvebu.c?h=v5.15#n300
>
> So at the end there is continuous space of 29MB PCIe window, just it
> "eats" 4 mbus slots.
>
> This function may fail (if there is not enough free mbus slots) and
> this patch is propagating that failure back to the caller.
This failure cannot occur in conforming PCI hardware. I guess if you
want to propagate the error from mvebu_pcie_add_windows() back to
mvebu_pci_bridge_emul_base_conf_write() and do something there, I'm OK
with that.
But change the commit log so it doesn't say "... and correctly update
PCI_IO_BASE, PCI_MEM_BASE and PCI_IO_BASE_UPPER16" because this is
completely device-specific behavior and is not "correct" per any PCI
spec.
Instead, say something about how mvebu doesn't support arbitrary
windows and we're disabling the window completely if we can't provide
what's requested.
Maybe this error warrants a clue in dmesg? How would a user figure
out what's going on in this situation? From the patch, it looks like
we would assign resources to a device, but the device just would not
work because the root port window was silently disabled?
Bjorn