The current interconnect provider registration interface is inherently
racy as nodes are not added until the after adding the provider. This
can specifically cause racing DT lookups to fail.
Switch to using the new API where the provider is not registered until
after it has been fully initialised.
Fixes: f0d8048525d7 ("interconnect: Add imx core driver")
Cc: [email protected] # 5.8
Cc: Leonard Crestez <[email protected]>
Cc: Alexandre Bailon <[email protected]>
Signed-off-by: Johan Hovold <[email protected]>
---
drivers/interconnect/imx/imx.c | 20 ++++++++++----------
1 file changed, 10 insertions(+), 10 deletions(-)
diff --git a/drivers/interconnect/imx/imx.c b/drivers/interconnect/imx/imx.c
index 823d9be9771a..979ed610f704 100644
--- a/drivers/interconnect/imx/imx.c
+++ b/drivers/interconnect/imx/imx.c
@@ -295,6 +295,9 @@ int imx_icc_register(struct platform_device *pdev,
provider->xlate = of_icc_xlate_onecell;
provider->data = data;
provider->dev = dev->parent;
+
+ icc_provider_init(provider);
+
platform_set_drvdata(pdev, imx_provider);
if (settings) {
@@ -306,20 +309,18 @@ int imx_icc_register(struct platform_device *pdev,
}
}
- ret = icc_provider_add(provider);
- if (ret) {
- dev_err(dev, "error adding interconnect provider: %d\n", ret);
+ ret = imx_icc_register_nodes(imx_provider, nodes, nodes_count, settings);
+ if (ret)
return ret;
- }
- ret = imx_icc_register_nodes(imx_provider, nodes, nodes_count, settings);
+ ret = icc_provider_register(provider);
if (ret)
- goto provider_del;
+ goto err_unregister_nodes;
return 0;
-provider_del:
- icc_provider_del(provider);
+err_unregister_nodes:
+ imx_icc_unregister_nodes(&imx_provider->provider);
return ret;
}
EXPORT_SYMBOL_GPL(imx_icc_register);
@@ -328,9 +329,8 @@ void imx_icc_unregister(struct platform_device *pdev)
{
struct imx_icc_provider *imx_provider = platform_get_drvdata(pdev);
+ icc_provider_deregister(&imx_provider->provider);
imx_icc_unregister_nodes(&imx_provider->provider);
-
- icc_provider_del(&imx_provider->provider);
}
EXPORT_SYMBOL_GPL(imx_icc_unregister);
--
2.39.1
On 1.02.2023 11:15, Johan Hovold wrote:
> The current interconnect provider registration interface is inherently
> racy as nodes are not added until the after adding the provider. This
> can specifically cause racing DT lookups to fail.
>
> Switch to using the new API where the provider is not registered until
> after it has been fully initialised.
>
> Fixes: f0d8048525d7 ("interconnect: Add imx core driver")
> Cc: [email protected] # 5.8
> Cc: Leonard Crestez <[email protected]>
> Cc: Alexandre Bailon <[email protected]>
> Signed-off-by: Johan Hovold <[email protected]>
> ---
Reviewed-by: Konrad Dybcio <[email protected]>
Konrad
> drivers/interconnect/imx/imx.c | 20 ++++++++++----------
> 1 file changed, 10 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/interconnect/imx/imx.c b/drivers/interconnect/imx/imx.c
> index 823d9be9771a..979ed610f704 100644
> --- a/drivers/interconnect/imx/imx.c
> +++ b/drivers/interconnect/imx/imx.c
> @@ -295,6 +295,9 @@ int imx_icc_register(struct platform_device *pdev,
> provider->xlate = of_icc_xlate_onecell;
> provider->data = data;
> provider->dev = dev->parent;
> +
> + icc_provider_init(provider);
> +
> platform_set_drvdata(pdev, imx_provider);
>
> if (settings) {
> @@ -306,20 +309,18 @@ int imx_icc_register(struct platform_device *pdev,
> }
> }
>
> - ret = icc_provider_add(provider);
> - if (ret) {
> - dev_err(dev, "error adding interconnect provider: %d\n", ret);
> + ret = imx_icc_register_nodes(imx_provider, nodes, nodes_count, settings);
> + if (ret)
> return ret;
> - }
>
> - ret = imx_icc_register_nodes(imx_provider, nodes, nodes_count, settings);
> + ret = icc_provider_register(provider);
> if (ret)
> - goto provider_del;
> + goto err_unregister_nodes;
>
> return 0;
>
> -provider_del:
> - icc_provider_del(provider);
> +err_unregister_nodes:
> + imx_icc_unregister_nodes(&imx_provider->provider);
> return ret;
> }
> EXPORT_SYMBOL_GPL(imx_icc_register);
> @@ -328,9 +329,8 @@ void imx_icc_unregister(struct platform_device *pdev)
> {
> struct imx_icc_provider *imx_provider = platform_get_drvdata(pdev);
>
> + icc_provider_deregister(&imx_provider->provider);
> imx_icc_unregister_nodes(&imx_provider->provider);
> -
> - icc_provider_del(&imx_provider->provider);
> }
> EXPORT_SYMBOL_GPL(imx_icc_unregister);
>
Hello Johan,
On Wed, 1 Feb 2023 11:15:40 +0100
Johan Hovold <[email protected]> wrote:
> The current interconnect provider registration interface is inherently
> racy as nodes are not added until the after adding the provider. This
> can specifically cause racing DT lookups to fail.
>
> Switch to using the new API where the provider is not registered until
> after it has been fully initialised.
>
> Fixes: f0d8048525d7 ("interconnect: Add imx core driver")
> Cc: [email protected] # 5.8
> Cc: Leonard Crestez <[email protected]>
> Cc: Alexandre Bailon <[email protected]>
> Signed-off-by: Johan Hovold <[email protected]>
Georgi pointed me to this series after I reported a bug yesterday [0],
that I found on iMX8MP. So I ran some tests with my original, failing
tree, minus one patch with my debugging code to hunt for the bug, plus
patches 1-4 of this series.
The original code was failing approx 5~10% of the times. With your 4
patches applied it ran 139 times with zero errors, which looks great! I
won't be able to do more testing until next Monday to be extra sure.
[0]
https://lore.kernel.org/linux-arm-kernel/20230202175525.3dba79a7@booty/T/#u
--
Luca Ceresoli, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com
On Fri, Feb 03, 2023 at 05:01:21PM +0100, Luca Ceresoli wrote:
> Hello Johan,
>
> On Wed, 1 Feb 2023 11:15:40 +0100
> Johan Hovold <[email protected]> wrote:
>
> > The current interconnect provider registration interface is inherently
> > racy as nodes are not added until the after adding the provider. This
> > can specifically cause racing DT lookups to fail.
> >
> > Switch to using the new API where the provider is not registered until
> > after it has been fully initialised.
> >
> > Fixes: f0d8048525d7 ("interconnect: Add imx core driver")
> > Cc: [email protected] # 5.8
> > Cc: Leonard Crestez <[email protected]>
> > Cc: Alexandre Bailon <[email protected]>
> > Signed-off-by: Johan Hovold <[email protected]>
>
> Georgi pointed me to this series after I reported a bug yesterday [0],
> that I found on iMX8MP. So I ran some tests with my original, failing
> tree, minus one patch with my debugging code to hunt for the bug, plus
> patches 1-4 of this series.
>
> The original code was failing approx 5~10% of the times. With your 4
> patches applied it ran 139 times with zero errors, which looks great! I
> won't be able to do more testing until next Monday to be extra sure.
Thanks for testing.
It indeed looks like you're hitting the same race, and as the imx
interconnect driver also initialises the provider data num_nodes count
before adding the nodes it results in that NULL-deref (where the qcom
driver failed a bit more gracefully).
Johan
> [0]
> https://lore.kernel.org/linux-arm-kernel/20230202175525.3dba79a7@booty/T/#u
Hello Johan,
On Mon, 6 Feb 2023 09:09:50 +0100
Johan Hovold <[email protected]> wrote:
> On Fri, Feb 03, 2023 at 05:01:21PM +0100, Luca Ceresoli wrote:
> > Hello Johan,
> >
> > On Wed, 1 Feb 2023 11:15:40 +0100
> > Johan Hovold <[email protected]> wrote:
> >
> > > The current interconnect provider registration interface is inherently
> > > racy as nodes are not added until the after adding the provider. This
> > > can specifically cause racing DT lookups to fail.
> > >
> > > Switch to using the new API where the provider is not registered until
> > > after it has been fully initialised.
> > >
> > > Fixes: f0d8048525d7 ("interconnect: Add imx core driver")
> > > Cc: [email protected] # 5.8
> > > Cc: Leonard Crestez <[email protected]>
> > > Cc: Alexandre Bailon <[email protected]>
> > > Signed-off-by: Johan Hovold <[email protected]>
> >
> > Georgi pointed me to this series after I reported a bug yesterday [0],
> > that I found on iMX8MP. So I ran some tests with my original, failing
> > tree, minus one patch with my debugging code to hunt for the bug, plus
> > patches 1-4 of this series.
> >
> > The original code was failing approx 5~10% of the times. With your 4
> > patches applied it ran 139 times with zero errors, which looks great! I
> > won't be able to do more testing until next Monday to be extra sure.
>
> Thanks for testing.
>
> It indeed looks like you're hitting the same race, and as the imx
> interconnect driver also initialises the provider data num_nodes count
> before adding the nodes it results in that NULL-deref (where the qcom
> driver failed a bit more gracefully).
My v6.2-rc5 tree with patches 1 to 4 added has booted 590 times with 0
errors, which add to the 139 times on Friday. This definitely deserves
my:
Tested-by: Luca Ceresoli <[email protected]>
--
Luca Ceresoli, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com