Hello,
This series is the continuation of previous work by Vidya Sagar [1] to fix the
issues related to accessing DBI register space before completing the core
initialization in some EP platforms like Tegra194/234 and Qcom SM8450.
Since Vidya is busy, I took over the series based on his consent (off-list
discussion).
I've reworked the series in v7 to make it bisect friendly, and also to avoid
build issue with the dependency. This resulted in the patches being heavily
modified, so I took over the authorship of the patches.
Testing
=======
I've tested the series on Qcom SM8450 based dev board. I also expect it to work
on Tegra platforms as well but it'd be good if Vidya or someone could test it.
- Mani
[1] https://lore.kernel.org/linux-pci/[email protected]/
[2] https://lore.kernel.org/linux-pci/20230825123843.GD6005@thinkpad/
Changes in v7:
- Rebased on top of v6.7-rc1
- Kept the current dw_pcie_ep_init_complete() API instead of renaming it to
dw_pcie_ep_init_late(), since changing the name causes a slight ambiguity.
- Splitted the change that moves pci_epc_init_notify() inside
dw_pcie_ep_init_notify() to help bisecting and also to avoid build issue.
- Added a new patch that moves pci_epc_init_notify() inside
dw_pcie_ep_init_notify().
- Took over the authorship and dropped the previous Ack as the patches are
heavily modified.
Changes in v6:
- Rebased on top of pci/next (6e2fca71e187)
- removed ep_init_late() callback as it is no longer necessary
For previous changelog, please refer [1].
Manivannan Sadhasivam (2):
PCI: designware-ep: Fix DBI access before core init
PCI: designware-ep: Move pci_epc_init_notify() inside
dw_pcie_ep_init_complete()
.../pci/controller/dwc/pcie-designware-ep.c | 149 +++++++++++-------
drivers/pci/controller/dwc/pcie-designware.h | 5 -
drivers/pci/controller/dwc/pcie-qcom-ep.c | 2 -
drivers/pci/controller/dwc/pcie-tegra194.c | 2 -
4 files changed, 93 insertions(+), 65 deletions(-)
--
2.25.1
Since pci_epc_init_notify() API is getting called right after the
dw_pcie_ep_init_complete() API in the DWC glue drivers, let's move it
inside dw_pcie_ep_init_complete() API as there is no compelling reason to
call it separately.
Co-developed-by: Vidya Sagar <[email protected]>
Signed-off-by: Vidya Sagar <[email protected]>
Signed-off-by: Manivannan Sadhasivam <[email protected]>
---
drivers/pci/controller/dwc/pcie-designware-ep.c | 10 ++--------
drivers/pci/controller/dwc/pcie-designware.h | 5 -----
drivers/pci/controller/dwc/pcie-qcom-ep.c | 2 --
drivers/pci/controller/dwc/pcie-tegra194.c | 2 --
4 files changed, 2 insertions(+), 17 deletions(-)
diff --git a/drivers/pci/controller/dwc/pcie-designware-ep.c b/drivers/pci/controller/dwc/pcie-designware-ep.c
index b1c79cd8e25f..63bb99d1c48f 100644
--- a/drivers/pci/controller/dwc/pcie-designware-ep.c
+++ b/drivers/pci/controller/dwc/pcie-designware-ep.c
@@ -22,14 +22,6 @@ void dw_pcie_ep_linkup(struct dw_pcie_ep *ep)
}
EXPORT_SYMBOL_GPL(dw_pcie_ep_linkup);
-void dw_pcie_ep_init_notify(struct dw_pcie_ep *ep)
-{
- struct pci_epc *epc = ep->epc;
-
- pci_epc_init_notify(epc);
-}
-EXPORT_SYMBOL_GPL(dw_pcie_ep_init_notify);
-
struct dw_pcie_ep_func *
dw_pcie_ep_get_func_from_ep(struct dw_pcie_ep *ep, u8 func_no)
{
@@ -784,6 +776,8 @@ int dw_pcie_ep_init_complete(struct dw_pcie_ep *ep)
if (ret)
goto err_cleanup;
+ pci_epc_init_notify(epc);
+
return 0;
err_cleanup:
diff --git a/drivers/pci/controller/dwc/pcie-designware.h b/drivers/pci/controller/dwc/pcie-designware.h
index 55ff76e3d384..53bf38989eea 100644
--- a/drivers/pci/controller/dwc/pcie-designware.h
+++ b/drivers/pci/controller/dwc/pcie-designware.h
@@ -578,7 +578,6 @@ static inline void __iomem *dw_pcie_own_conf_map_bus(struct pci_bus *bus,
void dw_pcie_ep_linkup(struct dw_pcie_ep *ep);
int dw_pcie_ep_init(struct dw_pcie_ep *ep);
int dw_pcie_ep_init_complete(struct dw_pcie_ep *ep);
-void dw_pcie_ep_init_notify(struct dw_pcie_ep *ep);
void dw_pcie_ep_exit(struct dw_pcie_ep *ep);
int dw_pcie_ep_raise_legacy_irq(struct dw_pcie_ep *ep, u8 func_no);
int dw_pcie_ep_raise_msi_irq(struct dw_pcie_ep *ep, u8 func_no,
@@ -605,10 +604,6 @@ static inline int dw_pcie_ep_init_complete(struct dw_pcie_ep *ep)
return 0;
}
-static inline void dw_pcie_ep_init_notify(struct dw_pcie_ep *ep)
-{
-}
-
static inline void dw_pcie_ep_exit(struct dw_pcie_ep *ep)
{
}
diff --git a/drivers/pci/controller/dwc/pcie-qcom-ep.c b/drivers/pci/controller/dwc/pcie-qcom-ep.c
index 9e58f055199a..4a8119779a29 100644
--- a/drivers/pci/controller/dwc/pcie-qcom-ep.c
+++ b/drivers/pci/controller/dwc/pcie-qcom-ep.c
@@ -482,8 +482,6 @@ static int qcom_pcie_perst_deassert(struct dw_pcie *pci)
val &= ~PARF_MSTR_AXI_CLK_EN;
writel_relaxed(val, pcie_ep->parf + PARF_MHI_CLOCK_RESET_CTRL);
- dw_pcie_ep_init_notify(&pcie_ep->pci.ep);
-
/* Enable LTSSM */
val = readl_relaxed(pcie_ep->parf + PARF_LTSSM);
val |= BIT(8);
diff --git a/drivers/pci/controller/dwc/pcie-tegra194.c b/drivers/pci/controller/dwc/pcie-tegra194.c
index 0fe113598ebb..1126d1f5830c 100644
--- a/drivers/pci/controller/dwc/pcie-tegra194.c
+++ b/drivers/pci/controller/dwc/pcie-tegra194.c
@@ -1901,8 +1901,6 @@ static void pex_ep_event_pex_rst_deassert(struct tegra_pcie_dw *pcie)
goto fail_init_complete;
}
- dw_pcie_ep_init_notify(ep);
-
/* Program the private control to allow sending LTR upstream */
if (pcie->of_data->has_ltr_req_fix) {
val = appl_readl(pcie, APPL_LTR_MSG_2);
--
2.25.1
The drivers for platforms requiring reference clock from the PCIe host for
initializing their PCIe EP core, make use of the 'core_init_notifier'
feature exposed by the DWC common code. On these platforms, access to the
hw registers like DBI before completing the core initialization will result
in a whole system hang. But the current DWC EP driver tries to access DBI
registers during dw_pcie_ep_init() without waiting for core initialization
and it results in system hang on platforms making use of
'core_init_notifier' such as Tegra194 and Qcom SM8450.
To workaround this issue, users of the above mentioned platforms have to
maintain the dependency with the PCIe host by booting the PCIe EP after
host boot. But this won't provide a good user experience, since PCIe EP is
_one_ of the features of those platforms and it doesn't make sense to
delay the whole platform booting due to the PCIe dependency.
So to fix this issue, let's move all the DBI access during
dw_pcie_ep_init() in the DWC EP driver to the dw_pcie_ep_init_complete()
API that gets called only after core initialization on these platforms.
This makes sure that the DBI register accesses are skipped during
dw_pcie_ep_init() and accessed later once the core initialization happens.
For the rest of the platforms, DBI access happens as usual.
Co-developed-by: Vidya Sagar <[email protected]>
Signed-off-by: Vidya Sagar <[email protected]>
Signed-off-by: Manivannan Sadhasivam <[email protected]>
---
.../pci/controller/dwc/pcie-designware-ep.c | 139 ++++++++++++------
1 file changed, 91 insertions(+), 48 deletions(-)
diff --git a/drivers/pci/controller/dwc/pcie-designware-ep.c b/drivers/pci/controller/dwc/pcie-designware-ep.c
index f6207989fc6a..b1c79cd8e25f 100644
--- a/drivers/pci/controller/dwc/pcie-designware-ep.c
+++ b/drivers/pci/controller/dwc/pcie-designware-ep.c
@@ -662,14 +662,19 @@ static unsigned int dw_pcie_ep_find_ext_capability(struct dw_pcie *pci, int cap)
return 0;
}
-int dw_pcie_ep_init_complete(struct dw_pcie_ep *ep)
+static int dw_pcie_ep_late_init(struct dw_pcie_ep *ep)
{
struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
+ struct dw_pcie_ep_func *ep_func;
+ struct device *dev = pci->dev;
+ struct pci_epc *epc = ep->epc;
unsigned int offset, ptm_cap_base;
unsigned int nbars;
u8 hdr_type;
+ u8 func_no;
+ int i, ret;
+ void *addr;
u32 reg;
- int i;
hdr_type = dw_pcie_readb_dbi(pci, PCI_HEADER_TYPE) &
PCI_HEADER_TYPE_MASK;
@@ -680,6 +685,55 @@ int dw_pcie_ep_init_complete(struct dw_pcie_ep *ep)
return -EIO;
}
+ dw_pcie_version_detect(pci);
+
+ dw_pcie_iatu_detect(pci);
+
+ ret = dw_pcie_edma_detect(pci);
+ if (ret)
+ return ret;
+
+ if (!ep->ib_window_map) {
+ ep->ib_window_map = devm_bitmap_zalloc(dev, pci->num_ib_windows,
+ GFP_KERNEL);
+ if (!ep->ib_window_map)
+ goto err_remove_edma;
+ }
+
+ if (!ep->ob_window_map) {
+ ep->ob_window_map = devm_bitmap_zalloc(dev, pci->num_ob_windows,
+ GFP_KERNEL);
+ if (!ep->ob_window_map)
+ goto err_remove_edma;
+ }
+
+ if (!ep->outbound_addr) {
+ addr = devm_kcalloc(dev, pci->num_ob_windows, sizeof(phys_addr_t),
+ GFP_KERNEL);
+ if (!addr)
+ goto err_remove_edma;
+ ep->outbound_addr = addr;
+ }
+
+ for (func_no = 0; func_no < epc->max_functions; func_no++) {
+
+ ep_func = dw_pcie_ep_get_func_from_ep(ep, func_no);
+ if (ep_func)
+ continue;
+
+ ep_func = devm_kzalloc(dev, sizeof(*ep_func), GFP_KERNEL);
+ if (!ep_func)
+ goto err_remove_edma;
+
+ ep_func->func_no = func_no;
+ ep_func->msi_cap = dw_pcie_ep_find_capability(ep, func_no,
+ PCI_CAP_ID_MSI);
+ ep_func->msix_cap = dw_pcie_ep_find_capability(ep, func_no,
+ PCI_CAP_ID_MSIX);
+
+ list_add_tail(&ep_func->list, &ep->func_list);
+ }
+
offset = dw_pcie_ep_find_ext_capability(pci, PCI_EXT_CAP_ID_REBAR);
ptm_cap_base = dw_pcie_ep_find_ext_capability(pci, PCI_EXT_CAP_ID_PTM);
@@ -714,14 +768,38 @@ int dw_pcie_ep_init_complete(struct dw_pcie_ep *ep)
dw_pcie_dbi_ro_wr_dis(pci);
return 0;
+
+err_remove_edma:
+ dw_pcie_edma_remove(pci);
+
+ return ret;
+}
+
+int dw_pcie_ep_init_complete(struct dw_pcie_ep *ep)
+{
+ struct pci_epc *epc = ep->epc;
+ int ret;
+
+ ret = dw_pcie_ep_late_init(ep);
+ if (ret)
+ goto err_cleanup;
+
+ return 0;
+
+err_cleanup:
+ pci_epc_mem_free_addr(epc, ep->msi_mem_phys, ep->msi_mem,
+ epc->mem->window.page_size);
+ pci_epc_mem_exit(epc);
+ if (ep->ops->deinit)
+ ep->ops->deinit(ep);
+
+ return ret;
}
EXPORT_SYMBOL_GPL(dw_pcie_ep_init_complete);
int dw_pcie_ep_init(struct dw_pcie_ep *ep)
{
int ret;
- void *addr;
- u8 func_no;
struct resource *res;
struct pci_epc *epc;
struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
@@ -729,7 +807,6 @@ int dw_pcie_ep_init(struct dw_pcie_ep *ep)
struct platform_device *pdev = to_platform_device(dev);
struct device_node *np = dev->of_node;
const struct pci_epc_features *epc_features;
- struct dw_pcie_ep_func *ep_func;
INIT_LIST_HEAD(&ep->func_list);
@@ -747,26 +824,6 @@ int dw_pcie_ep_init(struct dw_pcie_ep *ep)
if (ep->ops->pre_init)
ep->ops->pre_init(ep);
- dw_pcie_version_detect(pci);
-
- dw_pcie_iatu_detect(pci);
-
- ep->ib_window_map = devm_bitmap_zalloc(dev, pci->num_ib_windows,
- GFP_KERNEL);
- if (!ep->ib_window_map)
- return -ENOMEM;
-
- ep->ob_window_map = devm_bitmap_zalloc(dev, pci->num_ob_windows,
- GFP_KERNEL);
- if (!ep->ob_window_map)
- return -ENOMEM;
-
- addr = devm_kcalloc(dev, pci->num_ob_windows, sizeof(phys_addr_t),
- GFP_KERNEL);
- if (!addr)
- return -ENOMEM;
- ep->outbound_addr = addr;
-
epc = devm_pci_epc_create(dev, &epc_ops);
if (IS_ERR(epc)) {
dev_err(dev, "Failed to create epc device\n");
@@ -780,20 +837,6 @@ int dw_pcie_ep_init(struct dw_pcie_ep *ep)
if (ret < 0)
epc->max_functions = 1;
- for (func_no = 0; func_no < epc->max_functions; func_no++) {
- ep_func = devm_kzalloc(dev, sizeof(*ep_func), GFP_KERNEL);
- if (!ep_func)
- return -ENOMEM;
-
- ep_func->func_no = func_no;
- ep_func->msi_cap = dw_pcie_ep_find_capability(ep, func_no,
- PCI_CAP_ID_MSI);
- ep_func->msix_cap = dw_pcie_ep_find_capability(ep, func_no,
- PCI_CAP_ID_MSIX);
-
- list_add_tail(&ep_func->list, &ep->func_list);
- }
-
if (ep->ops->ep_init)
ep->ops->ep_init(ep);
@@ -812,25 +855,25 @@ int dw_pcie_ep_init(struct dw_pcie_ep *ep)
goto err_exit_epc_mem;
}
- ret = dw_pcie_edma_detect(pci);
- if (ret)
- goto err_free_epc_mem;
-
if (ep->ops->get_features) {
epc_features = ep->ops->get_features(ep);
if (epc_features->core_init_notifier)
return 0;
}
- ret = dw_pcie_ep_init_complete(ep);
+ /*
+ * NOTE:- Avoid accessing the hardware (Ex:- DBI space) before this
+ * step as platforms that implement 'core_init_notifier' feature may
+ * not have the hardware ready (i.e. core initialized) for access
+ * (Ex: tegra194). Any hardware access on such platforms result
+ * in system hang.
+ */
+ ret = dw_pcie_ep_late_init(ep);
if (ret)
- goto err_remove_edma;
+ goto err_free_epc_mem;
return 0;
-err_remove_edma:
- dw_pcie_edma_remove(pci);
-
err_free_epc_mem:
pci_epc_mem_free_addr(epc, ep->msi_mem_phys, ep->msi_mem,
epc->mem->window.page_size);
--
2.25.1
Hello,
> This series is the continuation of previous work by Vidya Sagar [1] to fix the
> issues related to accessing DBI register space before completing the core
> initialization in some EP platforms like Tegra194/234 and Qcom SM8450.
Applied to controller/dwc-ep, thank you!
[01/02] PCI: designware-ep: Fix DBI access before core init
https://git.kernel.org/pci/pci/c/d3d13b00a2cf
[02/02] PCI: designware-ep: Move pci_epc_init_notify() inside dw_pcie_ep_init_complete()
https://git.kernel.org/pci/pci/c/a171e1d60dad
Krzysztof
Hello,
> > This series is the continuation of previous work by Vidya Sagar [1] to fix the
> > issues related to accessing DBI register space before completing the core
> > initialization in some EP platforms like Tegra194/234 and Qcom SM8450.
>
> Applied to controller/dwc-ep, thank you!
>
> [01/02] PCI: designware-ep: Fix DBI access before core init
> https://git.kernel.org/pci/pci/c/d3d13b00a2cf
> [02/02] PCI: designware-ep: Move pci_epc_init_notify() inside dw_pcie_ep_init_complete()
> https://git.kernel.org/pci/pci/c/a171e1d60dad
Manivannan, I applied this series to "controller/dwc-ep" rather than
"controller/dwc" so that I can have a look at the conflicts we have
between this series and the current "controller/dwc" branch.
There have been changes from both Shimoda-san and Frank Li that both
changed some things and moved some things around within the code base,
so this series no longer cleanly applies.
However, I wanted the CI system to pick the branch for testing, as the
sooner, the better.
Hopefully, we can resolve the conflicts before Bjorn sends his Pull
Request with 6.8 changes.
Krzysztof
On Sun, Jan 07, 2024 at 04:27:07PM +0900, Krzysztof Wilczyński wrote:
> Hello,
Hello Krzysztof, Manivannan,
>
> > This series is the continuation of previous work by Vidya Sagar [1] to fix the
> > issues related to accessing DBI register space before completing the core
> > initialization in some EP platforms like Tegra194/234 and Qcom SM8450.
>
> Applied to controller/dwc-ep, thank you!
>
> [01/02] PCI: designware-ep: Fix DBI access before core init
> https://git.kernel.org/pci/pci/c/d3d13b00a2cf
> [02/02] PCI: designware-ep: Move pci_epc_init_notify() inside dw_pcie_ep_init_complete()
> https://git.kernel.org/pci/pci/c/a171e1d60dad
>
> Krzysztof
Considering that we know that this series introduces new problems
for drivers with a .core_init_notifier (i.e. tegra and qcom), see:
https://lore.kernel.org/linux-pci/ZWYmX8Y%2F7Q9WMxES@x1-carbon/
Do we really want to apply this series as is?
Reading the patch, it appears that (at least some) tegra and
qcom boards currently causes a whole system hang, which IIUC,
renders those boards unusable.
So perhaps the new issues introduced by this series is preferable
to a whole system hang.
As such, I can understand the urgency to merge this series.
However, at the very least, I think that it might be worth amending
the commit message to mention that this will currently not deregister
the dma device in a clean way, leading to e.g. superfluous entries in
/sys/class/dma/ and debugfs warnings being printed to dmesg.
Kind regards,
Niklas
It doesn't look to me like the issues raised by Niklas have really
been resolved:
https://lore.kernel.org/r/ZWYmX8Y%2F7Q9WMxES@x1-carbon/
https://lore.kernel.org/r/ZZ2JXMhdOI1Upabx@x1-carbon
so I'm doubtful that we should apply this as-is. The spurious
/sys/class/dma/ stuff and debugfs warnings sound like things that will
annoy users.
Apart from that, this patch has been floating around a long time, but
I still think this solution is hard to maintain for the same reasons I
mentioned here:
https://lore.kernel.org/linux-pci/20220919224014.GA1030798@bhelgaas/
On Mon, Nov 20, 2023 at 02:10:13PM +0530, Manivannan Sadhasivam wrote:
> The drivers for platforms requiring reference clock from the PCIe host for
> initializing their PCIe EP core, make use of the 'core_init_notifier'
> feature exposed by the DWC common code. On these platforms, access to the
> hw registers like DBI before completing the core initialization will result
> in a whole system hang. But the current DWC EP driver tries to access DBI
> registers during dw_pcie_ep_init() without waiting for core initialization
> and it results in system hang on platforms making use of
> 'core_init_notifier' such as Tegra194 and Qcom SM8450.
I see that only qcom_pcie_epc_features and tegra_pcie_epc_features
*set* "core_init_notifier", but all platforms use it because it's only
tested in dw_pcie_ep_init() (and a test case), which is generic to all
DWC drivers.
"core_init_notifier" is not a notifier. From reading the code, it
only means "if this is set, skip the rest of dw_pcie_ep_init()".
Based on the code, I assume it implies that drivers that set
core_init_notifier must do some additional initialization or
something, but that initialization isn't connected here.
There should be some symbol, maybe a member of pci_epc_features, that
both *does* this initialization and *tells us* that the driver needs
this initialization.
Right now, I think it's something like:
1) this driver sets core_init_notifier
2) that must mean that it also calls dw_pcie_ep_init_notify() somewhere
3) we must avoid DBI access until it does
There's nothing that directly connects those three things.
> To workaround this issue, users of the above mentioned platforms have to
> maintain the dependency with the PCIe host by booting the PCIe EP after
> host boot. But this won't provide a good user experience, since PCIe EP is
> _one_ of the features of those platforms and it doesn't make sense to
> delay the whole platform booting due to the PCIe dependency.
IIUC, "have to maintain the dependency" refers to the situation
*before* this patch, right? This patch improves the user experience
by removing the need for users to enforce this "boot host before EP"
ordering?
> So to fix this issue, let's move all the DBI access during
> dw_pcie_ep_init() in the DWC EP driver to the dw_pcie_ep_init_complete()
> API that gets called only after core initialization on these platforms.
> This makes sure that the DBI register accesses are skipped during
> dw_pcie_ep_init() and accessed later once the core initialization happens.
This patch doesn't "skip" them in dw_pcie_ep_init(); it *moves* them
completely to dw_pcie_ep_late_init() and calls that from the end of
dw_pcie_ep_init().
> For the rest of the platforms, DBI access happens as usual.
I don't really understand what "as usual" means here. I guess it just
means "if the driver doesn't set 'core_init_notifier', nothing
changes"? I would at least make it specific to make it clear that
"rest of the platforms" means "those that don't set
core_init_notifier".
> Co-developed-by: Vidya Sagar <[email protected]>
> Signed-off-by: Vidya Sagar <[email protected]>
> Signed-off-by: Manivannan Sadhasivam <[email protected]>
> ---
> .../pci/controller/dwc/pcie-designware-ep.c | 139 ++++++++++++------
> 1 file changed, 91 insertions(+), 48 deletions(-)
>
> diff --git a/drivers/pci/controller/dwc/pcie-designware-ep.c b/drivers/pci/controller/dwc/pcie-designware-ep.c
> index f6207989fc6a..b1c79cd8e25f 100644
> --- a/drivers/pci/controller/dwc/pcie-designware-ep.c
> +++ b/drivers/pci/controller/dwc/pcie-designware-ep.c
> @@ -662,14 +662,19 @@ static unsigned int dw_pcie_ep_find_ext_capability(struct dw_pcie *pci, int cap)
> return 0;
> }
>
> -int dw_pcie_ep_init_complete(struct dw_pcie_ep *ep)
> +static int dw_pcie_ep_late_init(struct dw_pcie_ep *ep)
> {
> struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
> + struct dw_pcie_ep_func *ep_func;
> + struct device *dev = pci->dev;
> + struct pci_epc *epc = ep->epc;
> unsigned int offset, ptm_cap_base;
> unsigned int nbars;
> u8 hdr_type;
> + u8 func_no;
> + int i, ret;
> + void *addr;
> u32 reg;
> - int i;
>
> hdr_type = dw_pcie_readb_dbi(pci, PCI_HEADER_TYPE) &
> PCI_HEADER_TYPE_MASK;
> @@ -680,6 +685,55 @@ int dw_pcie_ep_init_complete(struct dw_pcie_ep *ep)
> return -EIO;
> }
>
> + dw_pcie_version_detect(pci);
> +
> + dw_pcie_iatu_detect(pci);
> +
> + ret = dw_pcie_edma_detect(pci);
> + if (ret)
> + return ret;
> +
> + if (!ep->ib_window_map) {
> + ep->ib_window_map = devm_bitmap_zalloc(dev, pci->num_ib_windows,
> + GFP_KERNEL);
> + if (!ep->ib_window_map)
> + goto err_remove_edma;
> + }
> +
> + if (!ep->ob_window_map) {
> + ep->ob_window_map = devm_bitmap_zalloc(dev, pci->num_ob_windows,
> + GFP_KERNEL);
> + if (!ep->ob_window_map)
> + goto err_remove_edma;
> + }
> +
> + if (!ep->outbound_addr) {
> + addr = devm_kcalloc(dev, pci->num_ob_windows, sizeof(phys_addr_t),
> + GFP_KERNEL);
> + if (!addr)
> + goto err_remove_edma;
> + ep->outbound_addr = addr;
> + }
> +
> + for (func_no = 0; func_no < epc->max_functions; func_no++) {
> +
> + ep_func = dw_pcie_ep_get_func_from_ep(ep, func_no);
> + if (ep_func)
> + continue;
> +
> + ep_func = devm_kzalloc(dev, sizeof(*ep_func), GFP_KERNEL);
> + if (!ep_func)
> + goto err_remove_edma;
> +
> + ep_func->func_no = func_no;
> + ep_func->msi_cap = dw_pcie_ep_find_capability(ep, func_no,
> + PCI_CAP_ID_MSI);
> + ep_func->msix_cap = dw_pcie_ep_find_capability(ep, func_no,
> + PCI_CAP_ID_MSIX);
> +
> + list_add_tail(&ep_func->list, &ep->func_list);
> + }
> +
> offset = dw_pcie_ep_find_ext_capability(pci, PCI_EXT_CAP_ID_REBAR);
> ptm_cap_base = dw_pcie_ep_find_ext_capability(pci, PCI_EXT_CAP_ID_PTM);
>
> @@ -714,14 +768,38 @@ int dw_pcie_ep_init_complete(struct dw_pcie_ep *ep)
> dw_pcie_dbi_ro_wr_dis(pci);
>
> return 0;
> +
> +err_remove_edma:
> + dw_pcie_edma_remove(pci);
> +
> + return ret;
> +}
> +
> +int dw_pcie_ep_init_complete(struct dw_pcie_ep *ep)
> +{
> + struct pci_epc *epc = ep->epc;
> + int ret;
> +
> + ret = dw_pcie_ep_late_init(ep);
> + if (ret)
> + goto err_cleanup;
> +
> + return 0;
> +
> +err_cleanup:
> + pci_epc_mem_free_addr(epc, ep->msi_mem_phys, ep->msi_mem,
> + epc->mem->window.page_size);
> + pci_epc_mem_exit(epc);
> + if (ep->ops->deinit)
> + ep->ops->deinit(ep);
> +
> + return ret;
> }
> EXPORT_SYMBOL_GPL(dw_pcie_ep_init_complete);
>
> int dw_pcie_ep_init(struct dw_pcie_ep *ep)
> {
> int ret;
> - void *addr;
> - u8 func_no;
> struct resource *res;
> struct pci_epc *epc;
> struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
> @@ -729,7 +807,6 @@ int dw_pcie_ep_init(struct dw_pcie_ep *ep)
> struct platform_device *pdev = to_platform_device(dev);
> struct device_node *np = dev->of_node;
> const struct pci_epc_features *epc_features;
> - struct dw_pcie_ep_func *ep_func;
>
> INIT_LIST_HEAD(&ep->func_list);
>
> @@ -747,26 +824,6 @@ int dw_pcie_ep_init(struct dw_pcie_ep *ep)
> if (ep->ops->pre_init)
> ep->ops->pre_init(ep);
>
> - dw_pcie_version_detect(pci);
> -
> - dw_pcie_iatu_detect(pci);
> -
> - ep->ib_window_map = devm_bitmap_zalloc(dev, pci->num_ib_windows,
> - GFP_KERNEL);
> - if (!ep->ib_window_map)
> - return -ENOMEM;
> -
> - ep->ob_window_map = devm_bitmap_zalloc(dev, pci->num_ob_windows,
> - GFP_KERNEL);
> - if (!ep->ob_window_map)
> - return -ENOMEM;
> -
> - addr = devm_kcalloc(dev, pci->num_ob_windows, sizeof(phys_addr_t),
> - GFP_KERNEL);
> - if (!addr)
> - return -ENOMEM;
> - ep->outbound_addr = addr;
> -
> epc = devm_pci_epc_create(dev, &epc_ops);
> if (IS_ERR(epc)) {
> dev_err(dev, "Failed to create epc device\n");
> @@ -780,20 +837,6 @@ int dw_pcie_ep_init(struct dw_pcie_ep *ep)
> if (ret < 0)
> epc->max_functions = 1;
>
> - for (func_no = 0; func_no < epc->max_functions; func_no++) {
> - ep_func = devm_kzalloc(dev, sizeof(*ep_func), GFP_KERNEL);
> - if (!ep_func)
> - return -ENOMEM;
> -
> - ep_func->func_no = func_no;
> - ep_func->msi_cap = dw_pcie_ep_find_capability(ep, func_no,
> - PCI_CAP_ID_MSI);
> - ep_func->msix_cap = dw_pcie_ep_find_capability(ep, func_no,
> - PCI_CAP_ID_MSIX);
> -
> - list_add_tail(&ep_func->list, &ep->func_list);
> - }
> -
> if (ep->ops->ep_init)
> ep->ops->ep_init(ep);
>
> @@ -812,25 +855,25 @@ int dw_pcie_ep_init(struct dw_pcie_ep *ep)
> goto err_exit_epc_mem;
> }
>
> - ret = dw_pcie_edma_detect(pci);
> - if (ret)
> - goto err_free_epc_mem;
> -
> if (ep->ops->get_features) {
> epc_features = ep->ops->get_features(ep);
> if (epc_features->core_init_notifier)
> return 0;
> }
>
> - ret = dw_pcie_ep_init_complete(ep);
> + /*
> + * NOTE:- Avoid accessing the hardware (Ex:- DBI space) before this
> + * step as platforms that implement 'core_init_notifier' feature may
> + * not have the hardware ready (i.e. core initialized) for access
> + * (Ex: tegra194). Any hardware access on such platforms result
> + * in system hang.
What specifically does "before this step" refer to? I think the
intent is that it's something to do with "core_init_notifier", but
there's no *direct* connection because there's no test of
core_init_notifier except here and the test case.
> + ret = dw_pcie_ep_late_init(ep);
> if (ret)
> - goto err_remove_edma;
> + goto err_free_epc_mem;
>
> return 0;
>
> -err_remove_edma:
> - dw_pcie_edma_remove(pci);
> -
> err_free_epc_mem:
> pci_epc_mem_free_addr(epc, ep->msi_mem_phys, ep->msi_mem,
> epc->mem->window.page_size);
> --
> 2.25.1
>
On Tue, Jan 09, 2024 at 05:58:53PM +0000, Niklas Cassel wrote:
> On Sun, Jan 07, 2024 at 04:27:07PM +0900, Krzysztof Wilczyński wrote:
> > Hello,
>
> Hello Krzysztof, Manivannan,
>
> >
> > > This series is the continuation of previous work by Vidya Sagar [1] to fix the
> > > issues related to accessing DBI register space before completing the core
> > > initialization in some EP platforms like Tegra194/234 and Qcom SM8450.
> >
> > Applied to controller/dwc-ep, thank you!
> >
> > [01/02] PCI: designware-ep: Fix DBI access before core init
> > https://git.kernel.org/pci/pci/c/d3d13b00a2cf
> > [02/02] PCI: designware-ep: Move pci_epc_init_notify() inside dw_pcie_ep_init_complete()
> > https://git.kernel.org/pci/pci/c/a171e1d60dad
> >
> > Krzysztof
>
> Considering that we know that this series introduces new problems
> for drivers with a .core_init_notifier (i.e. tegra and qcom), see:
> https://lore.kernel.org/linux-pci/ZWYmX8Y%2F7Q9WMxES@x1-carbon/
>
> Do we really want to apply this series as is?
>
>
Niklas, I think I explained it in this thread itself. Let me reiterate here
again.
The fact that you are seeing the dmaengine warnings is due to function drivers
not releasing the channels properly. It is not the job of the DWC driver to
release the channels. The channels are requested by the function drivers [1]
and they _should_ release them when the channels are no longer required.
I know that the PCI_EPF_TEST driver is not doing so and so you are seeing the
warnings. But I do not have a device to test that function driver. Qcom
platforms use a dedicated function driver and that releases the channels when it
gets the LINK_DOWN event from EPC [2].
So my conclusion is that the issue is there even without this series. If you
still want me to fix the EPF_TEST driver, I can submit a change, but someone has
to test it.
- Mani
[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/pci/endpoint/functions/pci-epf-test.c#n229
[2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/pci/endpoint/functions/pci-epf-mhi.c#n563
> Reading the patch, it appears that (at least some) tegra and
> qcom boards currently causes a whole system hang, which IIUC,
> renders those boards unusable.
>
> So perhaps the new issues introduced by this series is preferable
> to a whole system hang.
>
> As such, I can understand the urgency to merge this series.
> However, at the very least, I think that it might be worth amending
> the commit message to mention that this will currently not deregister
> the dma device in a clean way, leading to e.g. superfluous entries in
> /sys/class/dma/ and debugfs warnings being printed to dmesg.
>
>
> Kind regards,
> Niklas
--
மணிவண்ணன் சதாசிவம்
Hello Mani,
On Wed, Jan 10, 2024 at 08:41:37AM +0530, Manivannan Sadhasivam wrote:
> On Tue, Jan 09, 2024 at 05:58:53PM +0000, Niklas Cassel wrote:
> > On Sun, Jan 07, 2024 at 04:27:07PM +0900, Krzysztof Wilczyński wrote:
> >
> > Considering that we know that this series introduces new problems
> > for drivers with a .core_init_notifier (i.e. tegra and qcom), see:
> > https://lore.kernel.org/linux-pci/ZWYmX8Y%2F7Q9WMxES@x1-carbon/
> >
> > Do we really want to apply this series as is?
> >
> >
>
> Niklas, I think I explained it in this thread itself. Let me reiterate here
> again.
>
> The fact that you are seeing the dmaengine warnings is due to function drivers
> not releasing the channels properly. It is not the job of the DWC driver to
> release the channels. The channels are requested by the function drivers [1]
> and they _should_ release them when the channels are no longer required.
Sure, the function driver should release the channels.
>
> I know that the PCI_EPF_TEST driver is not doing so and so you are seeing the
> warnings. But I do not have a device to test that function driver. Qcom
> platforms use a dedicated function driver and that releases the channels when it
> gets the LINK_DOWN event from EPC [2].
>
> So my conclusion is that the issue is there even without this series. If you
> still want me to fix the EPF_TEST driver, I can submit a change, but someone has
> to test it.
That conclusion is not fully correct.
Let's take e.g. these error messages that this series introduces:
[ 1000.714355] debugfs: File 'mf' in directory '/' already present!
[ 1000.714890] debugfs: File 'wr_ch_cnt' in directory '/' already present!
[ 1000.715476] debugfs: File 'rd_ch_cnt' in directory '/' already present!
[ 1000.716061] debugfs: Directory 'registers' with parent '/' already present!
These come from dw_edma_core_debugfs_on(), which is called by dw_edma_probe().
This is a direct result from your patch:
https://git.kernel.org/pub/scm/linux/kernel/git/pci/pci.git/commit/?h=controller/dwc-ep&id=9ab5c8bb7a305135b1b6c65cb8db92b4acbef79d
Which moves dw_pcie_edma_detect() from dw_pcie_ep_init_complete() to
dw_pcie_ep_late_init() (since dw_pcie_edma_detect() calls dw_edma_probe()).
So without your patch, those debugfs error messages are not seen.
Thus, I do not think that it is sufficient to only modify the pci-epf-test
driver to release the dma channels, as I don't see how that will avoid e.g.
the debugfs error messages introduced by this patch.
Kind regards,
Niklas
On Wed, Jan 10, 2024 at 10:01:08AM +0000, Niklas Cassel wrote:
> Hello Mani,
>
> On Wed, Jan 10, 2024 at 08:41:37AM +0530, Manivannan Sadhasivam wrote:
> > On Tue, Jan 09, 2024 at 05:58:53PM +0000, Niklas Cassel wrote:
> > > On Sun, Jan 07, 2024 at 04:27:07PM +0900, Krzysztof Wilczyński wrote:
> > >
> > > Considering that we know that this series introduces new problems
> > > for drivers with a .core_init_notifier (i.e. tegra and qcom), see:
> > > https://lore.kernel.org/linux-pci/ZWYmX8Y%2F7Q9WMxES@x1-carbon/
> > >
> > > Do we really want to apply this series as is?
> > >
> > >
> >
> > Niklas, I think I explained it in this thread itself. Let me reiterate here
> > again.
> >
> > The fact that you are seeing the dmaengine warnings is due to function drivers
> > not releasing the channels properly. It is not the job of the DWC driver to
> > release the channels. The channels are requested by the function drivers [1]
> > and they _should_ release them when the channels are no longer required.
>
> Sure, the function driver should release the channels.
>
>
> >
> > I know that the PCI_EPF_TEST driver is not doing so and so you are seeing the
> > warnings. But I do not have a device to test that function driver. Qcom
> > platforms use a dedicated function driver and that releases the channels when it
> > gets the LINK_DOWN event from EPC [2].
> >
> > So my conclusion is that the issue is there even without this series. If you
> > still want me to fix the EPF_TEST driver, I can submit a change, but someone has
> > to test it.
>
> That conclusion is not fully correct.
>
> Let's take e.g. these error messages that this series introduces:
> [ 1000.714355] debugfs: File 'mf' in directory '/' already present!
> [ 1000.714890] debugfs: File 'wr_ch_cnt' in directory '/' already present!
> [ 1000.715476] debugfs: File 'rd_ch_cnt' in directory '/' already present!
> [ 1000.716061] debugfs: Directory 'registers' with parent '/' already present!
>
> These come from dw_edma_core_debugfs_on(), which is called by dw_edma_probe().
>
> This is a direct result from your patch:
> https://git.kernel.org/pub/scm/linux/kernel/git/pci/pci.git/commit/?h=controller/dwc-ep&id=9ab5c8bb7a305135b1b6c65cb8db92b4acbef79d
>
> Which moves dw_pcie_edma_detect() from dw_pcie_ep_init_complete() to
> dw_pcie_ep_late_init() (since dw_pcie_edma_detect() calls dw_edma_probe()).
>
> So without your patch, those debugfs error messages are not seen.
>
> Thus, I do not think that it is sufficient to only modify the pci-epf-test
> driver to release the dma channels, as I don't see how that will avoid e.g.
> the debugfs error messages introduced by this patch.
>
Ah, sorry I overlooked the warnings from the edma core. I think adding
dw_pcie_edma_remove() to the perst_assert() function would fix this issue. But
I'm traveling this week, so couldn't verify it on the device.
Bjorn, Krzysztof feel free to drop this series for 6.8. I will modify this
series to address some other issues discussed so far and resubmit it for 6.9.
- Mani
>
> Kind regards,
> Niklas
--
மணிவண்ணன் சதாசிவம்