2020-04-16 00:13:33

by Nicolas Saenz Julienne

[permalink] [raw]
Subject: [PATCH 4/4] of: property: Avoid linking devices with circular dependencies

When creating a consumer/supplier relationship between devices it's
essential to make sure they aren't supplying each other creating a
circular dependency.

Introduce a new function to check if such circular dependency exists
between two device nodes and use it in of_link_to_phandle().

Fixes: a3e1d1a7f5fc ("of: property: Add functional dependency link from DT bindings")
Signed-off-by: Nicolas Saenz Julienne <[email protected]>
---

NOTE:
I feel of_link_is_circular() is a little dense, and could benefit from
some abstraction/refactoring. That said, I'd rather get some feedback,
before spending time on it.

drivers/of/property.c | 50 +++++++++++++++++++++++++++++++++++++++++++
1 file changed, 50 insertions(+)

diff --git a/drivers/of/property.c b/drivers/of/property.c
index 2c7978ef22be1..74a5190408c3b 100644
--- a/drivers/of/property.c
+++ b/drivers/of/property.c
@@ -1171,6 +1171,44 @@ static const struct supplier_bindings of_supplier_bindings[] = {
{}
};

+/**
+ * of_link_is_circular - Make sure potential link isn't circular
+ *
+ * @sup_np: Supplier device
+ * @con_np: Consumer device
+ *
+ * This function checks if @sup_np's properties contain a reference to @con_np.
+ *
+ * Will return true if there's a circular dependency and false otherwise.
+ */
+static bool of_link_is_circular(struct device_node *sup_np,
+ struct device_node *con_np)
+{
+ const struct supplier_bindings *s = of_supplier_bindings;
+ struct device_node *tmp;
+ bool matched = false;
+ struct property *p;
+ int i = 0;
+
+ for_each_property_of_node(sup_np, p) {
+ while (!matched && s->parse_prop) {
+ while ((tmp = s->parse_prop(sup_np, p->name, i))) {
+ matched = true;
+ i++;
+
+ if (tmp == con_np)
+ return true;
+ }
+ i = 0;
+ s++;
+ }
+ s = of_supplier_bindings;
+ matched = false;
+ }
+
+ return false;
+}
+
/**
* of_link_to_phandle - Add device link to supplier from supplier phandle
* @dev: consumer device
@@ -1216,6 +1254,18 @@ static int of_link_to_phandle(struct device *dev, struct device_node *sup_np,
return -ENODEV;
}

+ /*
+ * It is possible for consumer device nodes to also supply the device
+ * node they are consuming from. Creating an unwarranted circular
+ * dependency.
+ */
+ if (of_link_is_circular(sup_np, dev->of_node)) {
+ dev_dbg(dev, "Not linking to %pOFP - Circular dependency\n",
+ sup_np);
+ of_node_put(sup_np);
+ return -ENODEV;
+ }
+
/*
* Don't allow linking a device node as a consumer of one of its
* descendant nodes. By definition, a child node can't be a functional
--
2.26.0


2020-04-16 01:01:23

by Saravana Kannan

[permalink] [raw]
Subject: Re: [PATCH 4/4] of: property: Avoid linking devices with circular dependencies

On Wed, Apr 15, 2020 at 8:06 AM Nicolas Saenz Julienne
<[email protected]> wrote:
>
> When creating a consumer/supplier relationship between devices it's
> essential to make sure they aren't supplying each other creating a
> circular dependency.

Kinda correct. But fw_devlink is not just about optimizing probing.
It's also about ensuring sync_state() callbacks work correctly when
drivers are built as modules. And for that to work, circular
"SYNC_STATE_ONLY" device links are allowed. I've explained it in a bit
more detail here [1].

> Introduce a new function to check if such circular dependency exists
> between two device nodes and use it in of_link_to_phandle().
>
> Fixes: a3e1d1a7f5fc ("of: property: Add functional dependency link from DT bindings")
> Signed-off-by: Nicolas Saenz Julienne <[email protected]>
> ---
>
> NOTE:
> I feel of_link_is_circular() is a little dense, and could benefit from
> some abstraction/refactoring. That said, I'd rather get some feedback,
> before spending time on it.

Good call :)

> drivers/of/property.c | 50 +++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 50 insertions(+)
>
> diff --git a/drivers/of/property.c b/drivers/of/property.c
> index 2c7978ef22be1..74a5190408c3b 100644
> --- a/drivers/of/property.c
> +++ b/drivers/of/property.c
> @@ -1171,6 +1171,44 @@ static const struct supplier_bindings of_supplier_bindings[] = {
> {}
> };
>
> +/**
> + * of_link_is_circular - Make sure potential link isn't circular
> + *
> + * @sup_np: Supplier device
> + * @con_np: Consumer device
> + *
> + * This function checks if @sup_np's properties contain a reference to @con_np.
> + *
> + * Will return true if there's a circular dependency and false otherwise.
> + */
> +static bool of_link_is_circular(struct device_node *sup_np,
> + struct device_node *con_np)
> +{
> + const struct supplier_bindings *s = of_supplier_bindings;
> + struct device_node *tmp;
> + bool matched = false;
> + struct property *p;
> + int i = 0;
> +
> + for_each_property_of_node(sup_np, p) {
> + while (!matched && s->parse_prop) {
> + while ((tmp = s->parse_prop(sup_np, p->name, i))) {
> + matched = true;
> + i++;
> +
> + if (tmp == con_np)
> + return true;
> + }
> + i = 0;
> + s++;
> + }
> + s = of_supplier_bindings;
> + matched = false;
> + }
> +
> + return false;
> +}

This only catches circular links made out of 2 devices. If we really
needed such a function that worked correctly to catch bigger
"circles", you'd need to recurse and it'll get super wasteful and
ugly.

Thankfully, device_link_add() already checks for circular dependencies
when we need it and it's much cheaper because the links are at a
device level and not examined at a property level.

Is this a real problem you are hitting with the Raspberry Pi 4's? If
so can you give an example in its DT where you are hitting this?

I'll have to NACK this patch for reasons mentioned above and in [1].
However, I think I have a solution that should work for what I'm
guessing is your real problem. But let me see the description of the
real scenario before I claim to have a solution.

-Saravana

[1] - https://lore.kernel.org/lkml/[email protected]/

2020-04-16 16:04:10

by Nicolas Saenz Julienne

[permalink] [raw]
Subject: Re: [PATCH 4/4] of: property: Avoid linking devices with circular dependencies

On Wed, 2020-04-15 at 11:52 -0700, Saravana Kannan wrote:
> On Wed, Apr 15, 2020 at 8:06 AM Nicolas Saenz Julienne
> <[email protected]> wrote:
> > When creating a consumer/supplier relationship between devices it's
> > essential to make sure they aren't supplying each other creating a
> > circular dependency.
>
> Kinda correct. But fw_devlink is not just about optimizing probing.
> It's also about ensuring sync_state() callbacks work correctly when
> drivers are built as modules. And for that to work, circular
> "SYNC_STATE_ONLY" device links are allowed. I've explained it in a bit
> more detail here [1].

Understood.

[...]

> This only catches circular links made out of 2 devices. If we really
> needed such a function that worked correctly to catch bigger
> "circles", you'd need to recurse and it'll get super wasteful and
> ugly.

Yeah, I was kind of expecting this reply :).

> Thankfully, device_link_add() already checks for circular dependencies
> when we need it and it's much cheaper because the links are at a
> device level and not examined at a property level.
>
> Is this a real problem you are hitting with the Raspberry Pi 4's? If
> so can you give an example in its DT where you are hitting this?

So the DT bit that triggered all this series is in
'arch/arm/boot/dts/bcm283x.dtsi'. Namely the interaction between
'cprman@7e101000' and 'dsi@7e209000.' Both are clock providers and both are
clock consumers of each other.

Well I had a second deeper look at the issue, here is how the circular
dependency breaks the boot process (A being soc, B being cprman and C being
dsi):

Device node A
Device node B -> C
Device node C -> B

The probe sequence is the following (with DL_FLAG_AUTOPROBE_CONSUMER):
1. A device is added, the rest of devices are siblings, nothing is done
2. B device is added, C device doesn't exist, B is added to
'wait_for_suppliers' list with 'need_for_probe' flag set.
3. C device is added, B is picked up from 'wait_for_suppliers' list, device
link created with B consuming from C.
4. C is then parsed, and tried to be linked with B as a consumer this time.
This fails after testing for circular deps (by device_is_dependent()) during
device_link_add(). This leaves C in the 'wait_for_suppliers' list *for ever*
as every further attempt at add_link() on C will fail.

-> Ultimately this prevents C for ever being probed, which also prevents B from
being probed. Which isn't good as B is the main clock provider of the system.

Note that B can live without C. I think some clock re-parenting will not be
accessible, but that's all.

> I'll have to NACK this patch for reasons mentioned above and in [1].
> However, I think I have a solution that should work for what I'm
> guessing is your real problem. But let me see the description of the
> real scenario before I claim to have a solution.

My intuition would be, upon getting a circular dep from device_is_dependent()
with DL_FLAG_AUTOPROBE_CONSUMER to switch need_for_probe to false on both
devices.

Regards,
Nicolas


Attachments:
signature.asc (499.00 B)
This is a digitally signed message part

2020-04-16 20:59:14

by Saravana Kannan

[permalink] [raw]
Subject: Re: [PATCH 4/4] of: property: Avoid linking devices with circular dependencies

On Thu, Apr 16, 2020 at 9:01 AM Nicolas Saenz Julienne
<[email protected]> wrote:
>
> On Wed, 2020-04-15 at 11:52 -0700, Saravana Kannan wrote:
> > On Wed, Apr 15, 2020 at 8:06 AM Nicolas Saenz Julienne
> > <[email protected]> wrote:
> > > When creating a consumer/supplier relationship between devices it's
> > > essential to make sure they aren't supplying each other creating a
> > > circular dependency.
> >
> > Kinda correct. But fw_devlink is not just about optimizing probing.
> > It's also about ensuring sync_state() callbacks work correctly when
> > drivers are built as modules. And for that to work, circular
> > "SYNC_STATE_ONLY" device links are allowed. I've explained it in a bit
> > more detail here [1].
>
> Understood.
>
> [...]
>
> > This only catches circular links made out of 2 devices. If we really
> > needed such a function that worked correctly to catch bigger
> > "circles", you'd need to recurse and it'll get super wasteful and
> > ugly.
>
> Yeah, I was kind of expecting this reply :).
>
> > Thankfully, device_link_add() already checks for circular dependencies
> > when we need it and it's much cheaper because the links are at a
> > device level and not examined at a property level.
> >
> > Is this a real problem you are hitting with the Raspberry Pi 4's? If
> > so can you give an example in its DT where you are hitting this?
>
> So the DT bit that triggered all this series is in
> 'arch/arm/boot/dts/bcm283x.dtsi'. Namely the interaction between
> 'cprman@7e101000' and 'dsi@7e209000.' Both are clock providers and both are
> clock consumers of each other.
>
> Well I had a second deeper look at the issue, here is how the circular
> dependency breaks the boot process (A being soc, B being cprman and C being
> dsi):
>
> Device node A
> Device node B -> C
> Device node C -> B
>
> The probe sequence is the following (with DL_FLAG_AUTOPROBE_CONSUMER):
> 1. A device is added, the rest of devices are siblings, nothing is done
> 2. B device is added, C device doesn't exist, B is added to
> 'wait_for_suppliers' list with 'need_for_probe' flag set.
> 3. C device is added, B is picked up from 'wait_for_suppliers' list, device
> link created with B consuming from C.
> 4. C is then parsed, and tried to be linked with B as a consumer this time.
> This fails after testing for circular deps (by device_is_dependent()) during
> device_link_add(). This leaves C in the 'wait_for_suppliers' list *for ever*
> as every further attempt at add_link() on C will fail.
>
> -> Ultimately this prevents C for ever being probed, which also prevents B from
> being probed. Which isn't good as B is the main clock provider of the system.
>
> Note that B can live without C. I think some clock re-parenting will not be
> accessible, but that's all.
>
> > I'll have to NACK this patch for reasons mentioned above and in [1].
> > However, I think I have a solution that should work for what I'm
> > guessing is your real problem. But let me see the description of the
> > real scenario before I claim to have a solution.
>
> My intuition would be, upon getting a circular dep from device_is_dependent()
> with DL_FLAG_AUTOPROBE_CONSUMER to switch need_for_probe to false on both
> devices.

The problem with that is the devices will start trying to probe and
then defer due to other suppliers that are needed for probing but
haven't been linked yet. So it'll go a bit against what you are trying
to do. Also it doesn't solve the problem of already created links that
are wrong.

I'll send out a patch in reply to your email. I've been meaning to
send that outside of this discussion. It doesn't cover all cases of
cycles, but it'll cover most cases and I think it should fix your case
too.

For a more comprehensive fix, I'd like to do something like what I
explain here [1]. That should be doable for your driver too if you
want to try that approach. But I haven't heard Rob/Frank's opinion on
that.

-Saravana
[1] - https://lore.kernel.org/lkml/CAGETcx_2vdjSWc3BBN-N2WrtJP90ZnH-2vE=2iVuHuaE1YmMWQ@mail.gmail.com/

2020-04-16 21:01:20

by Saravana Kannan

[permalink] [raw]
Subject: [PATCH v1] of: property: Don't retry device_link_add() upon failure

When of_link_to_phandle() was implemented initially, there was no way to
tell if device_link_add() was failing because the supplier device hasn't
been parsed yet, hasn't been added yet, the links were creating a cycle,
etc. Some of these were transient errors that'd go away at a later
point.

However, with the current set of improved checks, if device_link_add()
fails, it'll only be for permanent errors like cycles or out-of-memory
errors.

Also, with the addition of DL_FLAG_SYNC_STATE_ONLY flag [1] to device
links, all the valid dependency cycles due to "proxy" device links
(needed for correctness of sync_state() device callback) will never fail
device_link_add() due to cycles.

So, continuing to retry failing device links (by returning -EAGAIN) is
no longer useful. At worst, it prevents platforms from setting
fw_devlink=on (or better) because it prevents proper boot up. So, let's
not do that anymore.

[1] - https://lore.kernel.org/lkml/[email protected]/
Cc: Nicolas Saenz Julienne <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Signed-off-by: Saravana Kannan <[email protected]>
---
drivers/of/property.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/of/property.c b/drivers/of/property.c
index 252e4f600155..ee1bc267f975 100644
--- a/drivers/of/property.c
+++ b/drivers/of/property.c
@@ -1074,7 +1074,7 @@ static int of_link_to_phandle(struct device *dev, struct device_node *sup_np,
return -EAGAIN;
}
if (!device_link_add(dev, sup_dev, dl_flags))
- ret = -EAGAIN;
+ ret = -EINVAL;
put_device(sup_dev);
return ret;
}
--
2.26.1.301.g55bc3eb7cb9-goog

2020-04-17 16:54:03

by Nicolas Saenz Julienne

[permalink] [raw]
Subject: Re: [PATCH v1] of: property: Don't retry device_link_add() upon failure

On Thu, 2020-04-16 at 13:58 -0700, Saravana Kannan wrote:
> When of_link_to_phandle() was implemented initially, there was no way to
> tell if device_link_add() was failing because the supplier device hasn't
> been parsed yet, hasn't been added yet, the links were creating a cycle,
> etc. Some of these were transient errors that'd go away at a later
> point.
>
> However, with the current set of improved checks, if device_link_add()
> fails, it'll only be for permanent errors like cycles or out-of-memory
> errors.
>
> Also, with the addition of DL_FLAG_SYNC_STATE_ONLY flag [1] to device
> links, all the valid dependency cycles due to "proxy" device links
> (needed for correctness of sync_state() device callback) will never fail
> device_link_add() due to cycles.
>
> So, continuing to retry failing device links (by returning -EAGAIN) is
> no longer useful. At worst, it prevents platforms from setting
> fw_devlink=on (or better) because it prevents proper boot up. So, let's
> not do that anymore.
>
> [1] -
> https://lore.kernel.org/lkml/[email protected]/
> Cc: Nicolas Saenz Julienne <[email protected]>
> Cc: Greg Kroah-Hartman <[email protected]>
> Signed-off-by: Saravana Kannan <[email protected]>
> ---

Tested-by: Nicolas Saenz Julienne <[email protected]>

Thanks!
Nicolas


Attachments:
signature.asc (499.00 B)
This is a digitally signed message part

2020-04-17 18:20:23

by Rob Herring

[permalink] [raw]
Subject: Re: [PATCH v1] of: property: Don't retry device_link_add() upon failure

On Thu, Apr 16, 2020 at 3:58 PM Saravana Kannan <[email protected]> wrote:
>
> When of_link_to_phandle() was implemented initially, there was no way to
> tell if device_link_add() was failing because the supplier device hasn't
> been parsed yet, hasn't been added yet, the links were creating a cycle,
> etc. Some of these were transient errors that'd go away at a later
> point.
>
> However, with the current set of improved checks, if device_link_add()
> fails, it'll only be for permanent errors like cycles or out-of-memory
> errors.

What improved checks? The series from Nicolas?

Is there a dependency between this and Nicolas' series?

Should this go to stable?


>
> Also, with the addition of DL_FLAG_SYNC_STATE_ONLY flag [1] to device
> links, all the valid dependency cycles due to "proxy" device links
> (needed for correctness of sync_state() device callback) will never fail
> device_link_add() due to cycles.
>
> So, continuing to retry failing device links (by returning -EAGAIN) is
> no longer useful. At worst, it prevents platforms from setting
> fw_devlink=on (or better) because it prevents proper boot up. So, let's
> not do that anymore.
>
> [1] - https://lore.kernel.org/lkml/[email protected]/
> Cc: Nicolas Saenz Julienne <[email protected]>
> Cc: Greg Kroah-Hartman <[email protected]>
> Signed-off-by: Saravana Kannan <[email protected]>
> ---
> drivers/of/property.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/of/property.c b/drivers/of/property.c
> index 252e4f600155..ee1bc267f975 100644
> --- a/drivers/of/property.c
> +++ b/drivers/of/property.c
> @@ -1074,7 +1074,7 @@ static int of_link_to_phandle(struct device *dev, struct device_node *sup_np,
> return -EAGAIN;
> }
> if (!device_link_add(dev, sup_dev, dl_flags))
> - ret = -EAGAIN;
> + ret = -EINVAL;
> put_device(sup_dev);
> return ret;
> }
> --
> 2.26.1.301.g55bc3eb7cb9-goog
>

2020-04-17 20:34:48

by Saravana Kannan

[permalink] [raw]
Subject: Re: [PATCH v1] of: property: Don't retry device_link_add() upon failure

On Fri, Apr 17, 2020 at 11:16 AM Rob Herring <[email protected]> wrote:
>
> On Thu, Apr 16, 2020 at 3:58 PM Saravana Kannan <[email protected]> wrote:
> >
> > When of_link_to_phandle() was implemented initially, there was no way to
> > tell if device_link_add() was failing because the supplier device hasn't
> > been parsed yet, hasn't been added yet, the links were creating a cycle,
> > etc. Some of these were transient errors that'd go away at a later
> > point.
> >
> > However, with the current set of improved checks, if device_link_add()
> > fails, it'll only be for permanent errors like cycles or out-of-memory
> > errors.
>
> What improved checks? The series from Nicolas?
>

Checking for OF_POPULATED and getting the device using get_dev_from_fwnode().
OF_POPULATED ensures the node has been parsed. get_dev_from_fwnode()
ensures the device has been added to driver core.

> Is there a dependency between this and Nicolas' series?

No.

> Should this go to stable?

Kind of a grey area. I mean, if of/fw_devlink is already letting a
platform boot all the way, this doesn't fix anything. I doubt anyone
in a stable kernel is turning on this feature if it affects device
probing. I'd say the same for Nicolas' series too. It allows more
platforms to work, but if a platform is fully working, it doesn't
improve anything.

Long story short, your call for stable.

-Saravana

2020-04-28 17:13:15

by Rob Herring (Arm)

[permalink] [raw]
Subject: Re: [PATCH v1] of: property: Don't retry device_link_add() upon failure

On Thu, 16 Apr 2020 13:58:38 -0700, Saravana Kannan wrote:
> When of_link_to_phandle() was implemented initially, there was no way to
> tell if device_link_add() was failing because the supplier device hasn't
> been parsed yet, hasn't been added yet, the links were creating a cycle,
> etc. Some of these were transient errors that'd go away at a later
> point.
>
> However, with the current set of improved checks, if device_link_add()
> fails, it'll only be for permanent errors like cycles or out-of-memory
> errors.
>
> Also, with the addition of DL_FLAG_SYNC_STATE_ONLY flag [1] to device
> links, all the valid dependency cycles due to "proxy" device links
> (needed for correctness of sync_state() device callback) will never fail
> device_link_add() due to cycles.
>
> So, continuing to retry failing device links (by returning -EAGAIN) is
> no longer useful. At worst, it prevents platforms from setting
> fw_devlink=on (or better) because it prevents proper boot up. So, let's
> not do that anymore.
>
> [1] - https://lore.kernel.org/lkml/[email protected]/
> Cc: Nicolas Saenz Julienne <[email protected]>
> Cc: Greg Kroah-Hartman <[email protected]>
> Signed-off-by: Saravana Kannan <[email protected]>
> ---
> drivers/of/property.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>

Applied, thanks.

Rob