2019-07-24 02:37:20

by Saravana Kannan

[permalink] [raw]
Subject: [PATCH v7 0/7] Solve postboot supplier cleanup and optimize probe ordering

Add device-links to track functional dependencies between devices
after they are created (but before they are probed) by looking at
their common DT bindings like clocks, interconnects, etc.

Having functional dependencies automatically added before the devices
are probed, provides the following benefits:

- Optimizes device probe order and avoids the useless work of
attempting probes of devices that will not probe successfully
(because their suppliers aren't present or haven't probed yet).

For example, in a commonly available mobile SoC, registering just
one consumer device's driver at an initcall level earlier than the
supplier device's driver causes 11 failed probe attempts before the
consumer device probes successfully. This was with a kernel with all
the drivers statically compiled in. This problem gets a lot worse if
all the drivers are loaded as modules without direct symbol
dependencies.

- Supplier devices like clock providers, interconnect providers, etc
need to keep the resources they provide active and at a particular
state(s) during boot up even if their current set of consumers don't
request the resource to be active. This is because the rest of the
consumers might not have probed yet and turning off the resource
before all the consumers have probed could lead to a hang or
undesired user experience.

Some frameworks (Eg: regulator) handle this today by turning off
"unused" resources at late_initcall_sync and hoping all the devices
have probed by then. This is not a valid assumption for systems with
loadable modules. Other frameworks (Eg: clock) just don't handle
this due to the lack of a clear signal for when they can turn off
resources. This leads to downstream hacks to handle cases like this
that can easily be solved in the upstream kernel.

By linking devices before they are probed, we give suppliers a clear
count of the number of dependent consumers. Once all of the
consumers are active, the suppliers can turn off the unused
resources without making assumptions about the number of consumers.

By default we just add device-links to track "driver presence" (probe
succeeded) of the supplier device. If any other functionality provided
by device-links are needed, it is left to the consumer/supplier
devices to change the link when they probe.

v1 -> v2:
- Drop patch to speed up of_find_device_by_node()
- Drop depends-on property and use existing bindings

v2 -> v3:
- Refactor the code to have driver core initiate the linking of devs
- Have driver core link consumers to supplier before it's probed
- Add support for drivers to edit the device links before probing

v3 -> v4:
- Tested edit_links() on system with cyclic dependency. Works.
- Added some checks to make sure device link isn't attempted from
parent device node to child device node.
- Added way to pause/resume sync_state callbacks across
of_platform_populate().
- Recursively parse DT node to create device links from parent to
suppliers of parent and all child nodes.

v4 -> v5:
- Fixed copy-pasta bugs with linked list handling
- Walk up the phandle reference till I find an actual device (needed
for regulators to work)
- Added support for linking devices from regulator DT bindings
- Tested the whole series again to make sure cyclic dependencies are
broken with edit_links() and regulator links are created properly.

v5 -> v6:
- Split, squashed and reordered some of the patches.
- Refactored the device linking code to follow the same code pattern for
any property.

v6 -> v7:
- No functional changes.
- Renamed i to index
- Added comment to clarify not having to check property name for every
index
- Added "matched" variable to clarify code. No functional change.
- Added comments to include/linux/device.h for add_links()

I've also not updated this patch series to handle the new patch [1] from
Rafael. Will do that once this patch series is close to being Acked.

[1] - https://lore.kernel.org/lkml/3121545.4lOhFoIcdQ@kreacher/

-Saravana


Saravana Kannan (7):
driver core: Add support for linking devices during device addition
driver core: Add edit_links() callback for drivers
of/platform: Add functional dependency link from DT bindings
driver core: Add sync_state driver/bus callback
of/platform: Pause/resume sync state during init and
of_platform_populate()
of/platform: Create device links for all child-supplier depencencies
of/platform: Don't create device links for default busses

.../admin-guide/kernel-parameters.txt | 5 +
drivers/base/core.c | 168 ++++++++++++++++
drivers/base/dd.c | 29 +++
drivers/of/platform.c | 189 ++++++++++++++++++
include/linux/device.h | 55 +++++
5 files changed, 446 insertions(+)

--
2.22.0.709.g102302147b-goog


2019-07-24 02:37:23

by Saravana Kannan

[permalink] [raw]
Subject: [PATCH v7 3/7] of/platform: Add functional dependency link from DT bindings

Add device-links after the devices are created (but before they are
probed) by looking at common DT bindings like clocks and
interconnects.

Automatically adding device-links for functional dependencies at the
framework level provides the following benefits:

- Optimizes device probe order and avoids the useless work of
attempting probes of devices that will not probe successfully
(because their suppliers aren't present or haven't probed yet).

For example, in a commonly available mobile SoC, registering just
one consumer device's driver at an initcall level earlier than the
supplier device's driver causes 11 failed probe attempts before the
consumer device probes successfully. This was with a kernel with all
the drivers statically compiled in. This problem gets a lot worse if
all the drivers are loaded as modules without direct symbol
dependencies.

- Supplier devices like clock providers, interconnect providers, etc
need to keep the resources they provide active and at a particular
state(s) during boot up even if their current set of consumers don't
request the resource to be active. This is because the rest of the
consumers might not have probed yet and turning off the resource
before all the consumers have probed could lead to a hang or
undesired user experience.

Some frameworks (Eg: regulator) handle this today by turning off
"unused" resources at late_initcall_sync and hoping all the devices
have probed by then. This is not a valid assumption for systems with
loadable modules. Other frameworks (Eg: clock) just don't handle
this due to the lack of a clear signal for when they can turn off
resources. This leads to downstream hacks to handle cases like this
that can easily be solved in the upstream kernel.

By linking devices before they are probed, we give suppliers a clear
count of the number of dependent consumers. Once all of the
consumers are active, the suppliers can turn off the unused
resources without making assumptions about the number of consumers.

By default we just add device-links to track "driver presence" (probe
succeeded) of the supplier device. If any other functionality provided
by device-links are needed, it is left to the consumer/supplier
devices to change the link when they probe.

Signed-off-by: Saravana Kannan <[email protected]>
---
.../admin-guide/kernel-parameters.txt | 5 +
drivers/of/platform.c | 165 ++++++++++++++++++
2 files changed, 170 insertions(+)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 46b826fcb5ad..12937349d79d 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -3170,6 +3170,11 @@
This can be set from sysctl after boot.
See Documentation/admin-guide/sysctl/vm.rst for details.

+ of_devlink [KNL] Make device links from common DT bindings. Useful
+ for optimizing probe order and making sure resources
+ aren't turned off before the consumer devices have
+ probed.
+
ohci1394_dma=early [HW] enable debugging via the ohci1394 driver.
See Documentation/debugging-via-ohci1394.txt for more
info.
diff --git a/drivers/of/platform.c b/drivers/of/platform.c
index 7801e25e6895..4344419a26fc 100644
--- a/drivers/of/platform.c
+++ b/drivers/of/platform.c
@@ -508,6 +508,170 @@ int of_platform_default_populate(struct device_node *root,
}
EXPORT_SYMBOL_GPL(of_platform_default_populate);

+bool of_link_is_valid(struct device_node *con, struct device_node *sup)
+{
+ of_node_get(sup);
+ /*
+ * Don't allow linking a device node as a consumer of one of its
+ * descendant nodes. By definition, a child node can't be a functional
+ * dependency for the parent node.
+ */
+ while (sup) {
+ if (sup == con) {
+ of_node_put(sup);
+ return false;
+ }
+ sup = of_get_next_parent(sup);
+ }
+ return true;
+}
+
+static int of_link_to_phandle(struct device *dev, struct device_node *sup_np)
+{
+ struct platform_device *sup_dev;
+ u32 dl_flags = DL_FLAG_AUTOPROBE_CONSUMER;
+ int ret = 0;
+
+ /*
+ * Since we are trying to create device links, we need to find
+ * the actual device node that owns this supplier phandle.
+ * Often times it's the same node, but sometimes it can be one
+ * of the parents. So walk up the parent till you find a
+ * device.
+ */
+ while (sup_np && !of_find_property(sup_np, "compatible", NULL))
+ sup_np = of_get_next_parent(sup_np);
+ if (!sup_np)
+ return 0;
+
+ if (!of_link_is_valid(dev->of_node, sup_np)) {
+ of_node_put(sup_np);
+ return 0;
+ }
+ sup_dev = of_find_device_by_node(sup_np);
+ of_node_put(sup_np);
+ if (!sup_dev)
+ return -ENODEV;
+ if (!device_link_add(dev, &sup_dev->dev, dl_flags))
+ ret = -ENODEV;
+ put_device(&sup_dev->dev);
+ return ret;
+}
+
+static struct device_node *parse_prop_cells(struct device_node *np,
+ const char *prop, int index,
+ const char *binding,
+ const char *cell)
+{
+ struct of_phandle_args sup_args;
+
+ /* Don't need to check property name for every index. */
+ if (!index && strcmp(prop, binding))
+ return NULL;
+
+ if (of_parse_phandle_with_args(np, binding, cell, index, &sup_args))
+ return NULL;
+
+ return sup_args.np;
+}
+
+static struct device_node *parse_clocks(struct device_node *np,
+ const char *prop, int index)
+{
+ return parse_prop_cells(np, prop, index, "clocks", "#clock-cells");
+}
+
+static struct device_node *parse_interconnects(struct device_node *np,
+ const char *prop, int index)
+{
+ return parse_prop_cells(np, prop, index, "interconnects",
+ "#interconnect-cells");
+}
+
+static int strcmp_suffix(const char *str, const char *suffix)
+{
+ unsigned int len, suffix_len;
+
+ len = strlen(str);
+ suffix_len = strlen(suffix);
+ if (len <= suffix_len)
+ return -1;
+ return strcmp(str + len - suffix_len, suffix);
+}
+
+static struct device_node *parse_regulators(struct device_node *np,
+ const char *prop, int index)
+{
+ if (index || strcmp_suffix(prop, "-supply"))
+ return NULL;
+
+ return of_parse_phandle(np, prop, 0);
+}
+
+/**
+ * struct supplier_bindings - Information for parsing supplier DT binding
+ *
+ * @parse_prop: If the function cannot parse the property, return NULL.
+ * Otherwise, return the phandle listed in the property
+ * that corresponds to the index.
+ */
+struct supplier_bindings {
+ struct device_node *(*parse_prop)(struct device_node *np,
+ const char *name, int index);
+};
+
+static const struct supplier_bindings bindings[] = {
+ { .parse_prop = parse_clocks, },
+ { .parse_prop = parse_interconnects, },
+ { .parse_prop = parse_regulators, },
+ { },
+};
+
+static bool of_link_property(struct device *dev, struct device_node *con_np,
+ const char *prop)
+{
+ struct device_node *phandle;
+ struct supplier_bindings *s = bindings;
+ unsigned int i = 0;
+ bool done = true, matched = false;
+
+ while (!matched && s->parse_prop) {
+ while ((phandle = s->parse_prop(con_np, prop, i))) {
+ matched = true;
+ i++;
+ if (of_link_to_phandle(dev, phandle))
+ /*
+ * Don't stop at the first failure. See
+ * Documentation for bus_type.add_links for
+ * more details.
+ */
+ done = false;
+ }
+ s++;
+ }
+ return done ? 0 : -ENODEV;
+}
+
+static bool of_devlink;
+core_param(of_devlink, of_devlink, bool, 0);
+
+static int of_link_to_suppliers(struct device *dev)
+{
+ struct property *p;
+ bool done = true;
+
+ if (!of_devlink)
+ return 0;
+ if (unlikely(!dev->of_node))
+ return 0;
+
+ for_each_property_of_node(dev->of_node, p)
+ if (of_link_property(dev, dev->of_node, p->name))
+ done = false;
+
+ return done ? 0 : -ENODEV;
+}
+
#ifndef CONFIG_PPC
static const struct of_device_id reserved_mem_matches[] = {
{ .compatible = "qcom,rmtfs-mem" },
@@ -523,6 +687,7 @@ static int __init of_platform_default_populate_init(void)
if (!of_have_populated_dt())
return -ENODEV;

+ platform_bus_type.add_links = of_link_to_suppliers;
/*
* Handle certain compatibles explicitly, since we don't want to create
* platform_devices for every node in /reserved-memory with a
--
2.22.0.709.g102302147b-goog

2019-07-24 02:37:23

by Saravana Kannan

[permalink] [raw]
Subject: [PATCH v7 5/7] of/platform: Pause/resume sync state during init and of_platform_populate()

When all the top level devices are populated from DT during kernel
init, the supplier devices could be added and probed before the
consumer devices are added and linked to the suppliers. To avoid the
sync_state() callback from being called prematurely, pause the
sync_state() callbacks before populating the devices and resume them
at late_initcall_sync().

Similarly, when children devices are populated after kernel init using
of_platform_populate(), there could be supplier-consumer dependencies
between the children devices that are populated. To avoid the same
problem with sync_state() being called prematurely, pause and resume
sync_state() callbacks across of_platform_populate().

Signed-off-by: Saravana Kannan <[email protected]>
---
drivers/of/platform.c | 11 +++++++++++
1 file changed, 11 insertions(+)

diff --git a/drivers/of/platform.c b/drivers/of/platform.c
index 4344419a26fc..71d6138698ec 100644
--- a/drivers/of/platform.c
+++ b/drivers/of/platform.c
@@ -485,6 +485,7 @@ int of_platform_populate(struct device_node *root,
pr_debug("%s()\n", __func__);
pr_debug(" starting at: %pOF\n", root);

+ device_links_supplier_sync_state_pause();
for_each_child_of_node(root, child) {
rc = of_platform_bus_create(child, matches, lookup, parent, true);
if (rc) {
@@ -492,6 +493,8 @@ int of_platform_populate(struct device_node *root,
break;
}
}
+ device_links_supplier_sync_state_resume();
+
of_node_set_flag(root, OF_POPULATED_BUS);

of_node_put(root);
@@ -688,6 +691,7 @@ static int __init of_platform_default_populate_init(void)
return -ENODEV;

platform_bus_type.add_links = of_link_to_suppliers;
+ device_links_supplier_sync_state_pause();
/*
* Handle certain compatibles explicitly, since we don't want to create
* platform_devices for every node in /reserved-memory with a
@@ -708,6 +712,13 @@ static int __init of_platform_default_populate_init(void)
return 0;
}
arch_initcall_sync(of_platform_default_populate_init);
+
+static int __init of_platform_sync_state_init(void)
+{
+ device_links_supplier_sync_state_resume();
+ return 0;
+}
+late_initcall_sync(of_platform_sync_state_init);
#endif

int of_platform_device_destroy(struct device *dev, void *data)
--
2.22.0.709.g102302147b-goog

2019-07-25 17:53:38

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH v7 0/7] Solve postboot supplier cleanup and optimize probe ordering

On Tue, Jul 23, 2019 at 05:10:53PM -0700, Saravana Kannan wrote:
> Add device-links to track functional dependencies between devices
> after they are created (but before they are probed) by looking at
> their common DT bindings like clocks, interconnects, etc.
>
> Having functional dependencies automatically added before the devices
> are probed, provides the following benefits:
>
> - Optimizes device probe order and avoids the useless work of
> attempting probes of devices that will not probe successfully
> (because their suppliers aren't present or haven't probed yet).
>
> For example, in a commonly available mobile SoC, registering just
> one consumer device's driver at an initcall level earlier than the
> supplier device's driver causes 11 failed probe attempts before the
> consumer device probes successfully. This was with a kernel with all
> the drivers statically compiled in. This problem gets a lot worse if
> all the drivers are loaded as modules without direct symbol
> dependencies.
>
> - Supplier devices like clock providers, interconnect providers, etc
> need to keep the resources they provide active and at a particular
> state(s) during boot up even if their current set of consumers don't
> request the resource to be active. This is because the rest of the
> consumers might not have probed yet and turning off the resource
> before all the consumers have probed could lead to a hang or
> undesired user experience.
>
> Some frameworks (Eg: regulator) handle this today by turning off
> "unused" resources at late_initcall_sync and hoping all the devices
> have probed by then. This is not a valid assumption for systems with
> loadable modules. Other frameworks (Eg: clock) just don't handle
> this due to the lack of a clear signal for when they can turn off
> resources. This leads to downstream hacks to handle cases like this
> that can easily be solved in the upstream kernel.
>
> By linking devices before they are probed, we give suppliers a clear
> count of the number of dependent consumers. Once all of the
> consumers are active, the suppliers can turn off the unused
> resources without making assumptions about the number of consumers.
>
> By default we just add device-links to track "driver presence" (probe
> succeeded) of the supplier device. If any other functionality provided
> by device-links are needed, it is left to the consumer/supplier
> devices to change the link when they probe.
>
> v1 -> v2:
> - Drop patch to speed up of_find_device_by_node()
> - Drop depends-on property and use existing bindings
>
> v2 -> v3:
> - Refactor the code to have driver core initiate the linking of devs
> - Have driver core link consumers to supplier before it's probed
> - Add support for drivers to edit the device links before probing
>
> v3 -> v4:
> - Tested edit_links() on system with cyclic dependency. Works.
> - Added some checks to make sure device link isn't attempted from
> parent device node to child device node.
> - Added way to pause/resume sync_state callbacks across
> of_platform_populate().
> - Recursively parse DT node to create device links from parent to
> suppliers of parent and all child nodes.
>
> v4 -> v5:
> - Fixed copy-pasta bugs with linked list handling
> - Walk up the phandle reference till I find an actual device (needed
> for regulators to work)
> - Added support for linking devices from regulator DT bindings
> - Tested the whole series again to make sure cyclic dependencies are
> broken with edit_links() and regulator links are created properly.
>
> v5 -> v6:
> - Split, squashed and reordered some of the patches.
> - Refactored the device linking code to follow the same code pattern for
> any property.
>
> v6 -> v7:
> - No functional changes.
> - Renamed i to index
> - Added comment to clarify not having to check property name for every
> index
> - Added "matched" variable to clarify code. No functional change.
> - Added comments to include/linux/device.h for add_links()
>
> I've also not updated this patch series to handle the new patch [1] from
> Rafael. Will do that once this patch series is close to being Acked.
>
> [1] - https://lore.kernel.org/lkml/3121545.4lOhFoIcdQ@kreacher/


This looks sane to me. Anyone have any objections for me queueing this
up for my tree to get into linux-next now?

thanks,

greg k-h

2019-07-25 21:05:16

by Frank Rowand

[permalink] [raw]
Subject: Re: [PATCH v7 0/7] Solve postboot supplier cleanup and optimize probe ordering

On 7/25/19 6:42 AM, Greg Kroah-Hartman wrote:
> On Tue, Jul 23, 2019 at 05:10:53PM -0700, Saravana Kannan wrote:
>> Add device-links to track functional dependencies between devices
>> after they are created (but before they are probed) by looking at
>> their common DT bindings like clocks, interconnects, etc.
>>
>> Having functional dependencies automatically added before the devices
>> are probed, provides the following benefits:
>>
>> - Optimizes device probe order and avoids the useless work of
>> attempting probes of devices that will not probe successfully
>> (because their suppliers aren't present or haven't probed yet).
>>
>> For example, in a commonly available mobile SoC, registering just
>> one consumer device's driver at an initcall level earlier than the
>> supplier device's driver causes 11 failed probe attempts before the
>> consumer device probes successfully. This was with a kernel with all
>> the drivers statically compiled in. This problem gets a lot worse if
>> all the drivers are loaded as modules without direct symbol
>> dependencies.
>>
>> - Supplier devices like clock providers, interconnect providers, etc
>> need to keep the resources they provide active and at a particular
>> state(s) during boot up even if their current set of consumers don't
>> request the resource to be active. This is because the rest of the
>> consumers might not have probed yet and turning off the resource
>> before all the consumers have probed could lead to a hang or
>> undesired user experience.
>>
>> Some frameworks (Eg: regulator) handle this today by turning off
>> "unused" resources at late_initcall_sync and hoping all the devices
>> have probed by then. This is not a valid assumption for systems with
>> loadable modules. Other frameworks (Eg: clock) just don't handle
>> this due to the lack of a clear signal for when they can turn off
>> resources. This leads to downstream hacks to handle cases like this
>> that can easily be solved in the upstream kernel.
>>
>> By linking devices before they are probed, we give suppliers a clear
>> count of the number of dependent consumers. Once all of the
>> consumers are active, the suppliers can turn off the unused
>> resources without making assumptions about the number of consumers.
>>
>> By default we just add device-links to track "driver presence" (probe
>> succeeded) of the supplier device. If any other functionality provided
>> by device-links are needed, it is left to the consumer/supplier
>> devices to change the link when they probe.
>>
>> v1 -> v2:
>> - Drop patch to speed up of_find_device_by_node()
>> - Drop depends-on property and use existing bindings
>>
>> v2 -> v3:
>> - Refactor the code to have driver core initiate the linking of devs
>> - Have driver core link consumers to supplier before it's probed
>> - Add support for drivers to edit the device links before probing
>>
>> v3 -> v4:
>> - Tested edit_links() on system with cyclic dependency. Works.
>> - Added some checks to make sure device link isn't attempted from
>> parent device node to child device node.
>> - Added way to pause/resume sync_state callbacks across
>> of_platform_populate().
>> - Recursively parse DT node to create device links from parent to
>> suppliers of parent and all child nodes.
>>
>> v4 -> v5:
>> - Fixed copy-pasta bugs with linked list handling
>> - Walk up the phandle reference till I find an actual device (needed
>> for regulators to work)
>> - Added support for linking devices from regulator DT bindings
>> - Tested the whole series again to make sure cyclic dependencies are
>> broken with edit_links() and regulator links are created properly.
>>
>> v5 -> v6:
>> - Split, squashed and reordered some of the patches.
>> - Refactored the device linking code to follow the same code pattern for
>> any property.
>>
>> v6 -> v7:
>> - No functional changes.
>> - Renamed i to index
>> - Added comment to clarify not having to check property name for every
>> index
>> - Added "matched" variable to clarify code. No functional change.
>> - Added comments to include/linux/device.h for add_links()
>>
>> I've also not updated this patch series to handle the new patch [1] from
>> Rafael. Will do that once this patch series is close to being Acked.
>>
>> [1] - https://lore.kernel.org/lkml/3121545.4lOhFoIcdQ@kreacher/
>
>
> This looks sane to me. Anyone have any objections for me queueing this
> up for my tree to get into linux-next now?

I would like for the series to get into linux-next sooner than later,
and spend some time there.

I am _slightly_ more optimistic than Rob that sitting in linux-next for
an extended period might reveal any latent issues, so I would like for
the series to be in linux-next for an extended period of time. (Yes,
my understanding is that Linus does not like patches to be in linux-next
if they are not targeted for the next merge window, but I prefer that
this patch series spend as much time in linux-next as possible).

I have been waiting for the changes to settle down before bringing up
the issue of devicetree overlays. Now that the code seems to be
settling down, I need to look at how these changes impact overlays.
So I do not think the patches will be ready for a Linus pull request
until overlays are considered.

-Frank

>
> thanks,
>
> greg k-h
>


2019-07-26 14:33:57

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH v7 0/7] Solve postboot supplier cleanup and optimize probe ordering

On Thu, Jul 25, 2019 at 02:04:23PM -0700, Frank Rowand wrote:
> On 7/25/19 6:42 AM, Greg Kroah-Hartman wrote:
> > On Tue, Jul 23, 2019 at 05:10:53PM -0700, Saravana Kannan wrote:
> >> Add device-links to track functional dependencies between devices
> >> after they are created (but before they are probed) by looking at
> >> their common DT bindings like clocks, interconnects, etc.
> >>
> >> Having functional dependencies automatically added before the devices
> >> are probed, provides the following benefits:
> >>
> >> - Optimizes device probe order and avoids the useless work of
> >> attempting probes of devices that will not probe successfully
> >> (because their suppliers aren't present or haven't probed yet).
> >>
> >> For example, in a commonly available mobile SoC, registering just
> >> one consumer device's driver at an initcall level earlier than the
> >> supplier device's driver causes 11 failed probe attempts before the
> >> consumer device probes successfully. This was with a kernel with all
> >> the drivers statically compiled in. This problem gets a lot worse if
> >> all the drivers are loaded as modules without direct symbol
> >> dependencies.
> >>
> >> - Supplier devices like clock providers, interconnect providers, etc
> >> need to keep the resources they provide active and at a particular
> >> state(s) during boot up even if their current set of consumers don't
> >> request the resource to be active. This is because the rest of the
> >> consumers might not have probed yet and turning off the resource
> >> before all the consumers have probed could lead to a hang or
> >> undesired user experience.
> >>
> >> Some frameworks (Eg: regulator) handle this today by turning off
> >> "unused" resources at late_initcall_sync and hoping all the devices
> >> have probed by then. This is not a valid assumption for systems with
> >> loadable modules. Other frameworks (Eg: clock) just don't handle
> >> this due to the lack of a clear signal for when they can turn off
> >> resources. This leads to downstream hacks to handle cases like this
> >> that can easily be solved in the upstream kernel.
> >>
> >> By linking devices before they are probed, we give suppliers a clear
> >> count of the number of dependent consumers. Once all of the
> >> consumers are active, the suppliers can turn off the unused
> >> resources without making assumptions about the number of consumers.
> >>
> >> By default we just add device-links to track "driver presence" (probe
> >> succeeded) of the supplier device. If any other functionality provided
> >> by device-links are needed, it is left to the consumer/supplier
> >> devices to change the link when they probe.
> >>
> >> v1 -> v2:
> >> - Drop patch to speed up of_find_device_by_node()
> >> - Drop depends-on property and use existing bindings
> >>
> >> v2 -> v3:
> >> - Refactor the code to have driver core initiate the linking of devs
> >> - Have driver core link consumers to supplier before it's probed
> >> - Add support for drivers to edit the device links before probing
> >>
> >> v3 -> v4:
> >> - Tested edit_links() on system with cyclic dependency. Works.
> >> - Added some checks to make sure device link isn't attempted from
> >> parent device node to child device node.
> >> - Added way to pause/resume sync_state callbacks across
> >> of_platform_populate().
> >> - Recursively parse DT node to create device links from parent to
> >> suppliers of parent and all child nodes.
> >>
> >> v4 -> v5:
> >> - Fixed copy-pasta bugs with linked list handling
> >> - Walk up the phandle reference till I find an actual device (needed
> >> for regulators to work)
> >> - Added support for linking devices from regulator DT bindings
> >> - Tested the whole series again to make sure cyclic dependencies are
> >> broken with edit_links() and regulator links are created properly.
> >>
> >> v5 -> v6:
> >> - Split, squashed and reordered some of the patches.
> >> - Refactored the device linking code to follow the same code pattern for
> >> any property.
> >>
> >> v6 -> v7:
> >> - No functional changes.
> >> - Renamed i to index
> >> - Added comment to clarify not having to check property name for every
> >> index
> >> - Added "matched" variable to clarify code. No functional change.
> >> - Added comments to include/linux/device.h for add_links()
> >>
> >> I've also not updated this patch series to handle the new patch [1] from
> >> Rafael. Will do that once this patch series is close to being Acked.
> >>
> >> [1] - https://lore.kernel.org/lkml/3121545.4lOhFoIcdQ@kreacher/
> >
> >
> > This looks sane to me. Anyone have any objections for me queueing this
> > up for my tree to get into linux-next now?
>
> I would like for the series to get into linux-next sooner than later,
> and spend some time there.

Ok, care to give me an ack for it? :)

thanks,

greg k-h

2019-07-31 05:37:33

by Frank Rowand

[permalink] [raw]
Subject: Re: [PATCH v7 0/7] Solve postboot supplier cleanup and optimize probe ordering

Hi Greg, Rob,

On 7/26/19 7:32 AM, Greg Kroah-Hartman wrote:
> On Thu, Jul 25, 2019 at 02:04:23PM -0700, Frank Rowand wrote:
>> On 7/25/19 6:42 AM, Greg Kroah-Hartman wrote:
>>> On Tue, Jul 23, 2019 at 05:10:53PM -0700, Saravana Kannan wrote:
>>>> Add device-links to track functional dependencies between devices
>>>> after they are created (but before they are probed) by looking at
>>>> their common DT bindings like clocks, interconnects, etc.
>>>>
>>>> Having functional dependencies automatically added before the devices
>>>> are probed, provides the following benefits:
>>>>
>>>> - Optimizes device probe order and avoids the useless work of
>>>> attempting probes of devices that will not probe successfully
>>>> (because their suppliers aren't present or haven't probed yet).
>>>>
>>>> For example, in a commonly available mobile SoC, registering just
>>>> one consumer device's driver at an initcall level earlier than the
>>>> supplier device's driver causes 11 failed probe attempts before the
>>>> consumer device probes successfully. This was with a kernel with all
>>>> the drivers statically compiled in. This problem gets a lot worse if
>>>> all the drivers are loaded as modules without direct symbol
>>>> dependencies.
>>>>
>>>> - Supplier devices like clock providers, interconnect providers, etc
>>>> need to keep the resources they provide active and at a particular
>>>> state(s) during boot up even if their current set of consumers don't
>>>> request the resource to be active. This is because the rest of the
>>>> consumers might not have probed yet and turning off the resource
>>>> before all the consumers have probed could lead to a hang or
>>>> undesired user experience.
>>>>
>>>> Some frameworks (Eg: regulator) handle this today by turning off
>>>> "unused" resources at late_initcall_sync and hoping all the devices
>>>> have probed by then. This is not a valid assumption for systems with
>>>> loadable modules. Other frameworks (Eg: clock) just don't handle
>>>> this due to the lack of a clear signal for when they can turn off
>>>> resources. This leads to downstream hacks to handle cases like this
>>>> that can easily be solved in the upstream kernel.
>>>>
>>>> By linking devices before they are probed, we give suppliers a clear
>>>> count of the number of dependent consumers. Once all of the
>>>> consumers are active, the suppliers can turn off the unused
>>>> resources without making assumptions about the number of consumers.
>>>>
>>>> By default we just add device-links to track "driver presence" (probe
>>>> succeeded) of the supplier device. If any other functionality provided
>>>> by device-links are needed, it is left to the consumer/supplier
>>>> devices to change the link when they probe.
>>>>
>>>> v1 -> v2:
>>>> - Drop patch to speed up of_find_device_by_node()
>>>> - Drop depends-on property and use existing bindings
>>>>
>>>> v2 -> v3:
>>>> - Refactor the code to have driver core initiate the linking of devs
>>>> - Have driver core link consumers to supplier before it's probed
>>>> - Add support for drivers to edit the device links before probing
>>>>
>>>> v3 -> v4:
>>>> - Tested edit_links() on system with cyclic dependency. Works.
>>>> - Added some checks to make sure device link isn't attempted from
>>>> parent device node to child device node.
>>>> - Added way to pause/resume sync_state callbacks across
>>>> of_platform_populate().
>>>> - Recursively parse DT node to create device links from parent to
>>>> suppliers of parent and all child nodes.
>>>>
>>>> v4 -> v5:
>>>> - Fixed copy-pasta bugs with linked list handling
>>>> - Walk up the phandle reference till I find an actual device (needed
>>>> for regulators to work)
>>>> - Added support for linking devices from regulator DT bindings
>>>> - Tested the whole series again to make sure cyclic dependencies are
>>>> broken with edit_links() and regulator links are created properly.
>>>>
>>>> v5 -> v6:
>>>> - Split, squashed and reordered some of the patches.
>>>> - Refactored the device linking code to follow the same code pattern for
>>>> any property.
>>>>
>>>> v6 -> v7:
>>>> - No functional changes.
>>>> - Renamed i to index
>>>> - Added comment to clarify not having to check property name for every
>>>> index
>>>> - Added "matched" variable to clarify code. No functional change.
>>>> - Added comments to include/linux/device.h for add_links()
>>>>
>>>> I've also not updated this patch series to handle the new patch [1] from
>>>> Rafael. Will do that once this patch series is close to being Acked.
>>>>
>>>> [1] - https://lore.kernel.org/lkml/3121545.4lOhFoIcdQ@kreacher/
>>>
>>>
>>> This looks sane to me. Anyone have any objections for me queueing this
>>> up for my tree to get into linux-next now?
>>
>> I would like for the series to get into linux-next sooner than later,
>> and spend some time there.
>
> Ok, care to give me an ack for it? :)

Rob opined to me that if you apply the series, it will go into 5.4 unless
reverted. That is also what I would expect.

I'm going through the series carefully now. This is currently my highest
priority task. I don't yet know if my comments will be minor, or whether
I will have larger changes to request.

So I am not ready to ack the series yet.

-Frank

>
> thanks,
>
> greg k-h
>

2019-08-08 02:07:52

by Frank Rowand

[permalink] [raw]
Subject: Re: [PATCH v7 3/7] of/platform: Add functional dependency link from DT bindings

On 7/23/19 5:10 PM, Saravana Kannan wrote:
> Add device-links after the devices are created (but before they are
> probed) by looking at common DT bindings like clocks and
> interconnects.
>
> Automatically adding device-links for functional dependencies at the
> framework level provides the following benefits:
>
> - Optimizes device probe order and avoids the useless work of
> attempting probes of devices that will not probe successfully
> (because their suppliers aren't present or haven't probed yet).
>
> For example, in a commonly available mobile SoC, registering just
> one consumer device's driver at an initcall level earlier than the
> supplier device's driver causes 11 failed probe attempts before the
> consumer device probes successfully. This was with a kernel with all
> the drivers statically compiled in. This problem gets a lot worse if
> all the drivers are loaded as modules without direct symbol
> dependencies.
>
> - Supplier devices like clock providers, interconnect providers, etc
> need to keep the resources they provide active and at a particular
> state(s) during boot up even if their current set of consumers don't
> request the resource to be active. This is because the rest of the
> consumers might not have probed yet and turning off the resource
> before all the consumers have probed could lead to a hang or
> undesired user experience.
>
> Some frameworks (Eg: regulator) handle this today by turning off
> "unused" resources at late_initcall_sync and hoping all the devices
> have probed by then. This is not a valid assumption for systems with
> loadable modules. Other frameworks (Eg: clock) just don't handle
> this due to the lack of a clear signal for when they can turn off
> resources. This leads to downstream hacks to handle cases like this
> that can easily be solved in the upstream kernel.
>
> By linking devices before they are probed, we give suppliers a clear
> count of the number of dependent consumers. Once all of the
> consumers are active, the suppliers can turn off the unused
> resources without making assumptions about the number of consumers.
>
> By default we just add device-links to track "driver presence" (probe
> succeeded) of the supplier device. If any other functionality provided
> by device-links are needed, it is left to the consumer/supplier
> devices to change the link when they probe.
>
> Signed-off-by: Saravana Kannan <[email protected]>
> ---
> .../admin-guide/kernel-parameters.txt | 5 +
> drivers/of/platform.c | 165 ++++++++++++++++++
> 2 files changed, 170 insertions(+)
>

Documentation/admin-guide/kernel-paramers.rst:

After line 129, add:

OF Devicetree is enabled

> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index 46b826fcb5ad..12937349d79d 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -3170,6 +3170,11 @@
> This can be set from sysctl after boot.
> See Documentation/admin-guide/sysctl/vm.rst for details.
>
> + of_devlink [KNL] Make device links from common DT bindings. Useful
> + for optimizing probe order and making sure resources
> + aren't turned off before the consumer devices have
> + probed.

of_supplier_depend instead of of_devlink ????

of_supplier_depend
[OF, KNL] Make device links from consumer devicetree
nodes to supplier devicetree nodes. The
consumer / supplier relationships are inferred from
scanning the devicetree. The driver for a consumer
device will not be probed until the drivers for all of
its supplier devices have been successfully probed.


> +
> ohci1394_dma=early [HW] enable debugging via the ohci1394 driver.
> See Documentation/debugging-via-ohci1394.txt for more
> info.
> diff --git a/drivers/of/platform.c b/drivers/of/platform.c
> index 7801e25e6895..4344419a26fc 100644
> --- a/drivers/of/platform.c
> +++ b/drivers/of/platform.c
> @@ -508,6 +508,170 @@ int of_platform_default_populate(struct device_node *root,
> }
> EXPORT_SYMBOL_GPL(of_platform_default_populate);
>

> +bool of_link_is_valid(struct device_node *con, struct device_node *sup)

Change to less vague:

bool of_ancestor_of(struct device_node *test_np, struct device_node *np)


> +{
> + of_node_get(sup);
> + /*
> + * Don't allow linking a device node as a consumer of one of its
> + * descendant nodes. By definition, a child node can't be a functional
> + * dependency for the parent node.
> + */
> + while (sup) {
> + if (sup == con) {
> + of_node_put(sup);
> + return false;
> + }
> + sup = of_get_next_parent(sup);
> + }
> + return true;

Change to more generic:

of_node_get(test_np);
while (test_np) {
if (test_np == np) {
of_node_put(test_np);
return true;
}
test_np = of_get_next_parent(test_np);
}
return false;

> +}
> +


/**
* of_link_to_phandle - Add device link to supplier
* @dev: consumer device
* @sup_np: pointer to the supplier device tree node
*
* TODO: ...
*
* Return:
* * 0 if link successfully created for supplier or of_devlink is false
* * an error if unable to create link
*/

Should have dev_debug() or pr_warn() or something on errors in this
function -- the caller does not report any issue

> +static int of_link_to_phandle(struct device *dev, struct device_node *sup_np)
> +{
> + struct platform_device *sup_dev;
> + u32 dl_flags = DL_FLAG_AUTOPROBE_CONSUMER;
> + int ret = 0;
> +
> + /*

> + * Since we are trying to create device links, we need to find
> + * the actual device node that owns this supplier phandle.
> + * Often times it's the same node, but sometimes it can be one
> + * of the parents. So walk up the parent till you find a
> + * device.

Change comment to:

* Find the device node that contains the supplier phandle. It may
* be @sup_np or it may be an ancestor of @sup_np.

> + */

See comment in caller of of_link_to_phandle() - do not hide the final
of_node_put() of sup_np inside of_link_to_phandle(), so need to do
an of_node_get() here.

of_node_get(sup_np);

> + while (sup_np && !of_find_property(sup_np, "compatible", NULL))
> + sup_np = of_get_next_parent(sup_np);
> + if (!sup_np)

> + return 0;

This case should never occur(?), it is an error.

return -ENODEV;

> +
> + if (!of_link_is_valid(dev->of_node, sup_np)) {
> + of_node_put(sup_np);
> + return 0;

Do not use a name that obscures what the function is doing, also
return an actual issue.

if (of_ancestor_of(sup_np, dev->of_node)) {
of_node_put(sup_np);
return -EINVAL;

> + }
> + sup_dev = of_find_device_by_node(sup_np);
> + of_node_put(sup_np);
> + if (!sup_dev)
> + return -ENODEV;
> + if (!device_link_add(dev, &sup_dev->dev, dl_flags))
> + ret = -ENODEV;
> + put_device(&sup_dev->dev);
> + return ret;
> +}
> +

/**
* parse_prop_cells - Property parsing functions for suppliers
*
* @np: pointer to a device tree node containing a list
* @prop_name: Name of property holding a phandle value
* @phandle_index: For properties holding a table of phandles, this is the
* index into the table
* @list_name: property name that contains a list
* @cells_name: property name that specifies phandles' arguments count
*
* This function is useful to parse lists of phandles and their arguments.
*
* Return:
* * Node pointer with refcount incremented, use of_node_put() on it when done.
* * NULL if not found.
*/

> +static struct device_node *parse_prop_cells(struct device_node *np,
> + const char *prop, int index,
> + const char *binding,
> + const char *cell)

Make names consistent with of_parse_phandle_with_args():
Change prop to prop_name
Change index to phandle_index
Change binding to list_name
Change cell to cells_name

> +{
> + struct of_phandle_args sup_args;
> +

> + /* Don't need to check property name for every index. */
> + if (!index && strcmp(prop, binding))
> + return NULL;

I read the discussion on whether to check property name only once
in version 6.

This check is fragile, depending upon the calling code to be properly
structured. Do the check for all values of index. The reduction of
overhead from not checking does not justify the fragileness and the
extra complexity for the code reader to understand why the check can
be bypassed when
index is not zero.

> +
> + if (of_parse_phandle_with_args(np, binding, cell, index, &sup_args))
> + return NULL;
> +
> + return sup_args.np;
> +}
> +
> +static struct device_node *parse_clocks(struct device_node *np,
> + const char *prop, int index)

Change prop to prop_name
Change index to phandle_index

> +{
> + return parse_prop_cells(np, prop, index, "clocks", "#clock-cells");
> +}
> +
> +static struct device_node *parse_interconnects(struct device_node *np,
> + const char *prop, int index)

Change prop to prop_name
Change index to phandle_index

> +{
> + return parse_prop_cells(np, prop, index, "interconnects",
> + "#interconnect-cells");
> +}
> +
> +static int strcmp_suffix(const char *str, const char *suffix)
> +{
> + unsigned int len, suffix_len;
> +
> + len = strlen(str);
> + suffix_len = strlen(suffix);
> + if (len <= suffix_len)
> + return -1;
> + return strcmp(str + len - suffix_len, suffix);
> +}
> +
> +static struct device_node *parse_regulators(struct device_node *np,
> + const char *prop, int index)

Change prop to prop_name
Change index to phandle_index

> +{
> + if (index || strcmp_suffix(prop, "-supply"))
> + return NULL;
> +
> + return of_parse_phandle(np, prop, 0);
> +}
> +
> +/**

> + * struct supplier_bindings - Information for parsing supplier DT binding
> + *
> + * @parse_prop: If the function cannot parse the property, return NULL.
> + * Otherwise, return the phandle listed in the property
> + * that corresponds to the index.

There is no documentation of dynamic function parameters in the docbook
description of a struct. Use this format for now and I will clean up when
I clean up all of the devicetree docbook info.

Change above comment to:

* struct supplier_bindings - Property parsing functions for suppliers
*
* @parse_prop: function name
* parse_prop() finds the node corresponding to a supplier phandle
* @parse_prop.np: Pointer to device node holding supplier phandle property
* @parse_prop.prop_name: Name of property holding a phandle value
* @parse_prop.index: For properties holding a table of phandles, this is the
* index into the table
*
* Return:
* * parse_prop() return values are
* * Node pointer with refcount incremented, use of_node_put() on it when done.
* * NULL if not found.

> + */
> +struct supplier_bindings {
> + struct device_node *(*parse_prop)(struct device_node *np,
> + const char *name, int index);

Change name to prop_name
Change index to phandle_index

> +};
> +
> +static const struct supplier_bindings bindings[] = {
> + { .parse_prop = parse_clocks, },
> + { .parse_prop = parse_interconnects, },
> + { .parse_prop = parse_regulators, },

> + { },

{},

> +};
> +

/**
* of_link_property - TODO:
* dev:
* con_np:
* prop:
*
* TODO...
*
* Any failed attempt to create a link will NOT result in an immediate return.
* of_link_property() must create all possible links even when one of more
* attempts to create a link fail.

Why? isn't one failure enough to prevent probing this device?
Continuing to scan just results in extra work... which will be
repeated every time device_link_check_waiting_consumers() is called

*
* Return:
* * 0 if TODO:
* * -ENODEV on error
*/


I left some "TODO:" sections to be filled out above.


> +static bool of_link_property(struct device *dev, struct device_node *con_np,
> + const char *prop)

Returns 0 or -ENODEV, so bool is incorrect

(Also fixed on 8/8 in patch: "[PATCH 1/2] of/platform: Fix fn definitons for
of_link_is_valid() and of_link_property()")



> +{
> + struct device_node *phandle;
> + struct supplier_bindings *s = bindings;
> + unsigned int i = 0;

> + bool done = true, matched = false;

Change to:

bool matched = false;
int ret = 0;

/* do not stop at first failed link, link all available suppliers */

> +
> + while (!matched && s->parse_prop) {
> + while ((phandle = s->parse_prop(con_np, prop, i))) {
> + matched = true;
> + i++;
> + if (of_link_to_phandle(dev, phandle))


Remove comment:

> + /*
> + * Don't stop at the first failure. See
> + * Documentation for bus_type.add_links for
> + * more details.
> + */

> + done = false;

ret = -ENODEV;

Do not hide of_node_put() inside of_link_to_phandle(), do it here:

of_node_put(phandle);

> + }
> + s++;
> + }

> + return done ? 0 : -ENODEV;

return ret;

> +}
> +
> +static bool of_devlink;
> +core_param(of_devlink, of_devlink, bool, 0);
> +

/**
* of_link_to_suppliers - Add device links to suppliers
* @dev: consumer device
*
* Create device links to all available suppliers of @dev.
* Must NOT stop at the first failed link.
* If some suppliers are not yet available, this function will be
* called again when additional suppliers become available.
*
* Return:
* * 0 if links successfully created for all suppliers
* * an error if one or more suppliers not yet available
*/

> +static int of_link_to_suppliers(struct device *dev)
> +{
> + struct property *p;

> + bool done = true;

remove done

int ret = 0;

> +
> + if (!of_devlink)
> + return 0;

> + if (unlikely(!dev->of_node))
> + return 0;

Check not needed, for_each_property_of_node() will detect !dev->of_node.

> +
> + for_each_property_of_node(dev->of_node, p)
> + if (of_link_property(dev, dev->of_node, p->name))

> + done = false;

ret = -EAGAIN;

> +
> + return done ? 0 : -ENODEV;

return ret;

> +}
> +
> #ifndef CONFIG_PPC
> static const struct of_device_id reserved_mem_matches[] = {
> { .compatible = "qcom,rmtfs-mem" },
> @@ -523,6 +687,7 @@ static int __init of_platform_default_populate_init(void)
> if (!of_have_populated_dt())
> return -ENODEV;
>
> + platform_bus_type.add_links = of_link_to_suppliers;
> /*
> * Handle certain compatibles explicitly, since we don't want to create
> * platform_devices for every node in /reserved-memory with a
> --
> 2.22.0.709.g102302147b-goog
>
>

2019-08-08 02:15:31

by Frank Rowand

[permalink] [raw]
Subject: Re: [PATCH v7 0/7] Solve postboot supplier cleanup and optimize probe ordering

Hi Saravana,

On 7/23/19 5:10 PM, Saravana Kannan wrote:
> Add device-links to track functional dependencies between devices
> after they are created (but before they are probed) by looking at
> their common DT bindings like clocks, interconnects, etc.
>

< snip >

I know that this series has moved on to versions 8 and 9. And some
additional patches submitted.

Version 8 was a rebase to handle device_link changes. The version 8
patch 0/7 description of the changes did not note any functional
changes, so I am assuming that my comments on version 7 are still
applicable.

The version 9 changes do not impact my comments.

I am sending review comments on patches 1, 2, and 3. I will continue
review of patches later in the series when the fall out from these
review comments result in a new series.

-Frank

2019-08-16 01:53:06

by Saravana Kannan

[permalink] [raw]
Subject: Re: [PATCH v7 3/7] of/platform: Add functional dependency link from DT bindings

On Wed, Aug 7, 2019 at 7:06 PM Frank Rowand <[email protected]> wrote:
>
> On 7/23/19 5:10 PM, Saravana Kannan wrote:
> > Add device-links after the devices are created (but before they are
> > probed) by looking at common DT bindings like clocks and
> > interconnects.
> >
> > Automatically adding device-links for functional dependencies at the
> > framework level provides the following benefits:
> >
> > - Optimizes device probe order and avoids the useless work of
> > attempting probes of devices that will not probe successfully
> > (because their suppliers aren't present or haven't probed yet).
> >
> > For example, in a commonly available mobile SoC, registering just
> > one consumer device's driver at an initcall level earlier than the
> > supplier device's driver causes 11 failed probe attempts before the
> > consumer device probes successfully. This was with a kernel with all
> > the drivers statically compiled in. This problem gets a lot worse if
> > all the drivers are loaded as modules without direct symbol
> > dependencies.
> >
> > - Supplier devices like clock providers, interconnect providers, etc
> > need to keep the resources they provide active and at a particular
> > state(s) during boot up even if their current set of consumers don't
> > request the resource to be active. This is because the rest of the
> > consumers might not have probed yet and turning off the resource
> > before all the consumers have probed could lead to a hang or
> > undesired user experience.
> >
> > Some frameworks (Eg: regulator) handle this today by turning off
> > "unused" resources at late_initcall_sync and hoping all the devices
> > have probed by then. This is not a valid assumption for systems with
> > loadable modules. Other frameworks (Eg: clock) just don't handle
> > this due to the lack of a clear signal for when they can turn off
> > resources. This leads to downstream hacks to handle cases like this
> > that can easily be solved in the upstream kernel.
> >
> > By linking devices before they are probed, we give suppliers a clear
> > count of the number of dependent consumers. Once all of the
> > consumers are active, the suppliers can turn off the unused
> > resources without making assumptions about the number of consumers.
> >
> > By default we just add device-links to track "driver presence" (probe
> > succeeded) of the supplier device. If any other functionality provided
> > by device-links are needed, it is left to the consumer/supplier
> > devices to change the link when they probe.
> >
> > Signed-off-by: Saravana Kannan <[email protected]>
> > ---
> > .../admin-guide/kernel-parameters.txt | 5 +
> > drivers/of/platform.c | 165 ++++++++++++++++++
> > 2 files changed, 170 insertions(+)
> >
>
> Documentation/admin-guide/kernel-paramers.rst:
>
> After line 129, add:
>
> OF Devicetree is enabled

Will do.

> > diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> > index 46b826fcb5ad..12937349d79d 100644
> > --- a/Documentation/admin-guide/kernel-parameters.txt
> > +++ b/Documentation/admin-guide/kernel-parameters.txt
> > @@ -3170,6 +3170,11 @@
> > This can be set from sysctl after boot.
> > See Documentation/admin-guide/sysctl/vm.rst for details.
> >
> > + of_devlink [KNL] Make device links from common DT bindings. Useful
> > + for optimizing probe order and making sure resources
> > + aren't turned off before the consumer devices have
> > + probed.
>
> of_supplier_depend instead of of_devlink ????

I'm open to other names, but of_supplier_depend is just odd.
of_devlink stands for of_device_links. If someone wants to know what
device links do, they can read up on the device links documentation?

>
> of_supplier_depend
> [OF, KNL] Make device links from consumer devicetree
> nodes to supplier devicetree nodes.

We are creating device links between devices. Not device tree nodes.
So this would be wrong. I'll replace it with "devicetree nodes" with
"devices"?

> The
> consumer / supplier relationships are inferred from
> scanning the devicetree.

I like this part. How about clarifying it with "from scanning the
common bindings in the devicetree"? Because we aren't trying to scan
the device specific properties.

> The driver for a consumer
> device will not be probed until the drivers for all of
> its supplier devices have been successfully probed.

A driver is never probed. It's a device that's probed. The dependency
is between devices and not drivers. So, how about I replace this with:
"The consumer device will not be probed until all of its supplier
devices are successfully probed"?

Also, any reason you removed the "resources aren't turned off ..."
part? That's one of the biggest improvements of this patch series. Do
you want to rewrite that part too? Or I'll leave my current wording of
that part as is?

>
>
> > +
> > ohci1394_dma=early [HW] enable debugging via the ohci1394 driver.
> > See Documentation/debugging-via-ohci1394.txt for more
> > info.
> > diff --git a/drivers/of/platform.c b/drivers/of/platform.c
> > index 7801e25e6895..4344419a26fc 100644
> > --- a/drivers/of/platform.c
> > +++ b/drivers/of/platform.c
> > @@ -508,6 +508,170 @@ int of_platform_default_populate(struct device_node *root,
> > }
> > EXPORT_SYMBOL_GPL(of_platform_default_populate);
> >
>
> > +bool of_link_is_valid(struct device_node *con, struct device_node *sup)
>
> Change to less vague:
>
> bool of_ancestor_of(struct device_node *test_np, struct device_node *np)

I thought about that when I wrote the code. But the consumer being the
ancestor of the supplier is just one of the things that makes a link
invalid. There can be other tests we might add as we go. That's why I
kept the function name as of_link_is_valid(). Even if I add an
"ancestor_of" helper function, I'll still have of_link_is_valid() as a
wrapper around it.

> > +{
> > + of_node_get(sup);
> > + /*
> > + * Don't allow linking a device node as a consumer of one of its
> > + * descendant nodes. By definition, a child node can't be a functional
> > + * dependency for the parent node.
> > + */
> > + while (sup) {
> > + if (sup == con) {
> > + of_node_put(sup);
> > + return false;
> > + }
> > + sup = of_get_next_parent(sup);
> > + }
> > + return true;
>
> Change to more generic:
>
> of_node_get(test_np);
> while (test_np) {
> if (test_np == np) {
> of_node_put(test_np);
> return true;
> }
> test_np = of_get_next_parent(test_np);
> }
> return false;

If you do insist on this change, I think we need better names than
test_np and np. It's not clear which node needs to be the ancestor for
this function to return true. With consumer/supplier, it's kinda
obvious that a consumer can't be an ancestor of a supplier.

> > +}
> > +
>
>
> /**
> * of_link_to_phandle - Add device link to supplier
> * @dev: consumer device
> * @sup_np: pointer to the supplier device tree node

Could you suggest something that makes it a bit more clear that this
phandle/node doesn't need to be an actual device (as in, one with
compatible property)? phandle kinda make it clear at least to me. I'd
prefer just saying "phandle to supplier". Thoughts?

Also, what's the guideline on which functions needs doc headers? I
always leaned towards adding them only for non-static functions.
That's why I skipped this one. What's the reasoning for needing one
for this? I'm happy to do this, just want to see that there's some
consistent guideline that's being followed and something I can use for
future patches.

> *
> * TODO: ...
> *
> * Return:
> * * 0 if link successfully created for supplier or of_devlink is false

The "or of_devlink is false" isn't true for this function?

> * * an error if unable to create link
> */
>
> Should have dev_debug() or pr_warn() or something on errors in this
> function -- the caller does not report any issue

I think that'll be too spammy during bootup. This function is expected
to fail often and it's not necessary a catastrophic failure. The
caller can print something if they care to. The current set of callers
don't.

> > +static int of_link_to_phandle(struct device *dev, struct device_node *sup_np)
> > +{
> > + struct platform_device *sup_dev;
> > + u32 dl_flags = DL_FLAG_AUTOPROBE_CONSUMER;
> > + int ret = 0;
> > +
> > + /*
>
> > + * Since we are trying to create device links, we need to find
> > + * the actual device node that owns this supplier phandle.
> > + * Often times it's the same node, but sometimes it can be one
> > + * of the parents. So walk up the parent till you find a
> > + * device.
>
> Change comment to:
>
> * Find the device node that contains the supplier phandle. It may
> * be @sup_np or it may be an ancestor of @sup_np.

Aren't the existing comments giving a better explanation with more context?
But I'll do this.

>
> > + */
>
> See comment in caller of of_link_to_phandle() - do not hide the final
> of_node_put() of sup_np inside of_link_to_phandle(), so need to do
> an of_node_get() here.
>
> of_node_get(sup_np);

Will do. Good point.

>
> > + while (sup_np && !of_find_property(sup_np, "compatible", NULL))
> > + sup_np = of_get_next_parent(sup_np);
> > + if (!sup_np)
>
> > + return 0;
>
> This case should never occur(?), it is an error.
>
> return -ENODEV;

I'm not too sure about all the possible DT combinations to say this?
In that case, this isn't an error. As in, the consumer doesn't need to
wait for this non-existent device to get populated/added. I'd lean
towards leaving it as is and address it later if this is actually a
problem. I want of_link_to_phandle to fail only when a link can be
created, but isn't due to current system state (device isn't added,
creating a cyclic link, etc).

>
> > +
> > + if (!of_link_is_valid(dev->of_node, sup_np)) {
> > + of_node_put(sup_np);
> > + return 0;
>
> Do not use a name that obscures what the function is doing, also
> return an actual issue.
>
> if (of_ancestor_of(sup_np, dev->of_node)) {
> of_node_put(sup_np);
> return -EINVAL;

See my comment about not erroring on cases where a link can't ever be
created? So in your case, you'd want the caller to check the error
value to device which ones to ignore and which ones not to? That seems
a bit more fragile when this function is potentially changed in the
furture.

>
> > + }
> > + sup_dev = of_find_device_by_node(sup_np);
> > + of_node_put(sup_np);
> > + if (!sup_dev)
> > + return -ENODEV;
> > + if (!device_link_add(dev, &sup_dev->dev, dl_flags))
> > + ret = -ENODEV;

For example, in the earlier comment you suggested -ENODEV if there's
no device with "compatible" property that encapsulates the supplier
phandle. But -ENODEV makes more sense for this case where there's
actually no device because it hasn't been added yet. And the caller
needs to be able to distinguish between these two. Are we just going
to arbitrarily pick error values just to make sure they don't overlap?

I don't have a strong opinion one way or another, but I'm trying to
understand what's better in the long run where this function can
evolve to add more checks or handle more cases.

> > + put_device(&sup_dev->dev);
> > + return ret;
> > +}
> > +
>
> /**
> * parse_prop_cells - Property parsing functions for suppliers
> *
> * @np: pointer to a device tree node containing a list
> * @prop_name: Name of property holding a phandle value
> * @phandle_index: For properties holding a table of phandles, this is the
> * index into the table
> * @list_name: property name that contains a list
> * @cells_name: property name that specifies phandles' arguments count
> *
> * This function is useful to parse lists of phandles and their arguments.
> *
> * Return:
> * * Node pointer with refcount incremented, use of_node_put() on it when done.
> * * NULL if not found.
> */
>
> > +static struct device_node *parse_prop_cells(struct device_node *np,
> > + const char *prop, int index,
> > + const char *binding,
> > + const char *cell)
>
> Make names consistent with of_parse_phandle_with_args():
> Change prop to prop_name
> Change index to phandle_index

You call this index even in of_parse_phandle_with_args()

> Change binding to list_name
> Change cell to cells_name

This going to cause a lot more line wraps for barely better names. But
I'll reluctantly do this.

>
> > +{
> > + struct of_phandle_args sup_args;
> > +
>
> > + /* Don't need to check property name for every index. */
> > + if (!index && strcmp(prop, binding))
> > + return NULL;
>
> I read the discussion on whether to check property name only once
> in version 6.
>
> This check is fragile, depending upon the calling code to be properly
> structured. Do the check for all values of index. The reduction of
> overhead from not checking does not justify the fragileness and the
> extra complexity for the code reader to understand why the check can
> be bypassed when
> index is not zero.

This is used only in this file. I understand needing the balance
between code complexity/fragility and efficiency. But I think you push
the line too far away from efficiency. This code is literally never
going to fail because it's a static function called only inside this
file. And the check isn't that hard to understand with that tiny
comment.

> > +
> > + if (of_parse_phandle_with_args(np, binding, cell, index, &sup_args))
> > + return NULL;
> > +
> > + return sup_args.np;
> > +}
> > +
> > +static struct device_node *parse_clocks(struct device_node *np,
> > + const char *prop, int index)
>
> Change prop to prop_name
> Change index to phandle_index
>
> > +{
> > + return parse_prop_cells(np, prop, index, "clocks", "#clock-cells");
> > +}
> > +
> > +static struct device_node *parse_interconnects(struct device_node *np,
> > + const char *prop, int index)
>
> Change prop to prop_name
> Change index to phandle_index
>
> > +{
> > + return parse_prop_cells(np, prop, index, "interconnects",
> > + "#interconnect-cells");
> > +}
> > +
> > +static int strcmp_suffix(const char *str, const char *suffix)
> > +{
> > + unsigned int len, suffix_len;
> > +
> > + len = strlen(str);
> > + suffix_len = strlen(suffix);
> > + if (len <= suffix_len)
> > + return -1;
> > + return strcmp(str + len - suffix_len, suffix);
> > +}
> > +
> > +static struct device_node *parse_regulators(struct device_node *np,
> > + const char *prop, int index)
>
> Change prop to prop_name
> Change index to phandle_index
>

Will do all the renames you list above.

> > +{
> > + if (index || strcmp_suffix(prop, "-supply"))
> > + return NULL;
> > +
> > + return of_parse_phandle(np, prop, 0);
> > +}
> > +
> > +/**
>
> > + * struct supplier_bindings - Information for parsing supplier DT binding
> > + *
> > + * @parse_prop: If the function cannot parse the property, return NULL.
> > + * Otherwise, return the phandle listed in the property
> > + * that corresponds to the index.
>
> There is no documentation of dynamic function parameters in the docbook
> description of a struct. Use this format for now and I will clean up when
> I clean up all of the devicetree docbook info.
>
> Change above comment to:
>
> * struct supplier_bindings - Property parsing functions for suppliers
> *
> * @parse_prop: function name
> * parse_prop() finds the node corresponding to a supplier phandle
> * @parse_prop.np: Pointer to device node holding supplier phandle property
> * @parse_prop.prop_name: Name of property holding a phandle value
> * @parse_prop.index: For properties holding a table of phandles, this is the
> * index into the table
> *
> * Return:
> * * parse_prop() return values are
> * * Node pointer with refcount incremented, use of_node_put() on it when done.
> * * NULL if not found.

Will do. Thanks for writing it.

> > + */
> > +struct supplier_bindings {
> > + struct device_node *(*parse_prop)(struct device_node *np,
> > + const char *name, int index);
>
> Change name to prop_name
> Change index to phandle_index
>
> > +};
> > +
> > +static const struct supplier_bindings bindings[] = {
> > + { .parse_prop = parse_clocks, },
> > + { .parse_prop = parse_interconnects, },
> > + { .parse_prop = parse_regulators, },
>
> > + { },
>
> {},
>
> > +};
> > +
>
> /**
> * of_link_property - TODO:
> * dev:
> * con_np:
> * prop:
> *
> * TODO...
> *
> * Any failed attempt to create a link will NOT result in an immediate return.
> * of_link_property() must create all possible links even when one of more
> * attempts to create a link fail.
>
> Why? isn't one failure enough to prevent probing this device?
> Continuing to scan just results in extra work... which will be
> repeated every time device_link_check_waiting_consumers() is called

Context:
As I said in the cover letter, avoiding unnecessary probes is just one
of the reasons for this patch. The other (arguably more important)
reason for this patch is to make sure suppliers know that they have
consumers that are yet to be probed. That way, suppliers can leave
their resource on AND in the right state if they were left on by the
bootloader. For example, if a clock was left on and at 200 MHz, the
clock provider needs to keep that clock ON and at 200 MHz till all the
consumers are probed.

Answer: Let's say a consumer device Z has suppliers A, B and C. If the
linking fails at A and you return immediately, then B and C could
probe and then figure that they have no more consumers (they don't see
a link to Z) and turn off their resources. And Z could fail
catastrophically.

> *
> * Return:
> * * 0 if TODO:
> * * -ENODEV on error
> */
>
>
> I left some "TODO:" sections to be filled out above.

Will do.

>
>
> > +static bool of_link_property(struct device *dev, struct device_node *con_np,
> > + const char *prop)
>
> Returns 0 or -ENODEV, so bool is incorrect
>
> (Also fixed on 8/8 in patch: "[PATCH 1/2] of/platform: Fix fn definitons for
> of_link_is_valid() and of_link_property()")

Right.

>
> > +{
> > + struct device_node *phandle;
> > + struct supplier_bindings *s = bindings;
> > + unsigned int i = 0;
>
> > + bool done = true, matched = false;
>
> Change to:
>
> bool matched = false;
> int ret = 0;
>
> /* do not stop at first failed link, link all available suppliers */
>
> > +
> > + while (!matched && s->parse_prop) {
> > + while ((phandle = s->parse_prop(con_np, prop, i))) {
> > + matched = true;
> > + i++;
> > + if (of_link_to_phandle(dev, phandle))
>
>
> Remove comment:
>
> > + /*
> > + * Don't stop at the first failure. See
> > + * Documentation for bus_type.add_links for
> > + * more details.
> > + */

Ok

>
> > + done = false;
>
> ret = -ENODEV;

This is nicer. Thanks.

>
> Do not hide of_node_put() inside of_link_to_phandle(), do it here:
>
> of_node_put(phandle);

Ok

>
> > + }
> > + s++;
> > + }
>
> > + return done ? 0 : -ENODEV;
>
> return ret;
>
> > +}
> > +
> > +static bool of_devlink;
> > +core_param(of_devlink, of_devlink, bool, 0);
> > +
>
> /**
> * of_link_to_suppliers - Add device links to suppliers
> * @dev: consumer device
> *
> * Create device links to all available suppliers of @dev.
> * Must NOT stop at the first failed link.
> * If some suppliers are not yet available, this function will be
> * called again when additional suppliers become available.
> *
> * Return:
> * * 0 if links successfully created for all suppliers
> * * an error if one or more suppliers not yet available
> */

Ok

> > +static int of_link_to_suppliers(struct device *dev)
> > +{
> > + struct property *p;
>
> > + bool done = true;
>
> remove done
>
> int ret = 0;
>
> > +
> > + if (!of_devlink)
> > + return 0;
>
> > + if (unlikely(!dev->of_node))
> > + return 0;
>
> Check not needed, for_each_property_of_node() will detect !dev->of_node.
>
> > +
> > + for_each_property_of_node(dev->of_node, p)
> > + if (of_link_property(dev, dev->of_node, p->name))
>
> > + done = false;
>
> ret = -EAGAIN;
>
> > +
> > + return done ? 0 : -ENODEV;
>
> return ret;

Thanks. I think I was too caught up on the rest of the logic
complexity that I missed this obviously ugly code. Will fix.

Thanks,
Saravana

2019-08-19 17:17:43

by Frank Rowand

[permalink] [raw]
Subject: Re: [PATCH v7 3/7] of/platform: Add functional dependency link from DT bindings

On 8/15/19 6:50 PM, Saravana Kannan wrote:
> On Wed, Aug 7, 2019 at 7:06 PM Frank Rowand <[email protected]> wrote:
>>
>> On 7/23/19 5:10 PM, Saravana Kannan wrote:
>>> Add device-links after the devices are created (but before they are
>>> probed) by looking at common DT bindings like clocks and
>>> interconnects.


< very big snip (lots of comments that deserve answers) >


>>
>> /**
>> * of_link_property - TODO:
>> * dev:
>> * con_np:
>> * prop:
>> *
>> * TODO...
>> *
>> * Any failed attempt to create a link will NOT result in an immediate return.
>> * of_link_property() must create all possible links even when one of more
>> * attempts to create a link fail.
>>
>> Why? isn't one failure enough to prevent probing this device?
>> Continuing to scan just results in extra work... which will be
>> repeated every time device_link_check_waiting_consumers() is called
>
> Context:
> As I said in the cover letter, avoiding unnecessary probes is just one
> of the reasons for this patch. The other (arguably more important)

Agree that it is more important.


> reason for this patch is to make sure suppliers know that they have
> consumers that are yet to be probed. That way, suppliers can leave
> their resource on AND in the right state if they were left on by the
> bootloader. For example, if a clock was left on and at 200 MHz, the
> clock provider needs to keep that clock ON and at 200 MHz till all the
> consumers are probed.
>
> Answer: Let's say a consumer device Z has suppliers A, B and C. If the
> linking fails at A and you return immediately, then B and C could
> probe and then figure that they have no more consumers (they don't see
> a link to Z) and turn off their resources. And Z could fail
> catastrophically.

Then I think that this approach is fatally flawed in the current implementation.

A device can be added by a module that is loaded. In that case the device
was not present at late boot when the suppliers may turn off their resources.
(I am assuming the details since I have not reviewed the patches later in
the series that implement this part.)

Am I missing something?

If I am wrong, then I'll have more comments for your review replies for
patches 2 and 3.

>

< another snip >

> Thanks,
> Saravana
>

-Frank

2019-08-19 20:50:52

by Saravana Kannan

[permalink] [raw]
Subject: Re: [PATCH v7 3/7] of/platform: Add functional dependency link from DT bindings

On Mon, Aug 19, 2019 at 10:16 AM Frank Rowand <[email protected]> wrote:
>
> On 8/15/19 6:50 PM, Saravana Kannan wrote:
> > On Wed, Aug 7, 2019 at 7:06 PM Frank Rowand <[email protected]> wrote:
> >>
> >> On 7/23/19 5:10 PM, Saravana Kannan wrote:
> >>> Add device-links after the devices are created (but before they are
> >>> probed) by looking at common DT bindings like clocks and
> >>> interconnects.
>
>
> < very big snip (lots of comments that deserve answers) >
>
>
> >>
> >> /**
> >> * of_link_property - TODO:
> >> * dev:
> >> * con_np:
> >> * prop:
> >> *
> >> * TODO...
> >> *
> >> * Any failed attempt to create a link will NOT result in an immediate return.
> >> * of_link_property() must create all possible links even when one of more
> >> * attempts to create a link fail.
> >>
> >> Why? isn't one failure enough to prevent probing this device?
> >> Continuing to scan just results in extra work... which will be
> >> repeated every time device_link_check_waiting_consumers() is called
> >
> > Context:
> > As I said in the cover letter, avoiding unnecessary probes is just one
> > of the reasons for this patch. The other (arguably more important)
>
> Agree that it is more important.
>
>
> > reason for this patch is to make sure suppliers know that they have
> > consumers that are yet to be probed. That way, suppliers can leave
> > their resource on AND in the right state if they were left on by the
> > bootloader. For example, if a clock was left on and at 200 MHz, the
> > clock provider needs to keep that clock ON and at 200 MHz till all the
> > consumers are probed.
> >
> > Answer: Let's say a consumer device Z has suppliers A, B and C. If the
> > linking fails at A and you return immediately, then B and C could
> > probe and then figure that they have no more consumers (they don't see
> > a link to Z) and turn off their resources. And Z could fail
> > catastrophically.
>
> Then I think that this approach is fatally flawed in the current implementation.

I'm waiting to hear how it is fatally flawed. But maybe this is just a
misunderstanding of the problem?

In the text below, I'm not sure if you mixing up two different things
or just that your wording it a bit ambiguous. So pardon my nitpick to
err on the side of clarity.

> A device can be added by a module that is loaded.

No, in the example I gave, of_platform_default_populate_init() would
add all 3 of those devices during arch_initcall_sync().

> In that case the device
> was not present at late boot when the suppliers may turn off their resources.

In that case, the _drivers_ for those devices aren't present at late
boot. So that they can't request to keep the resources on for their
consumer devices. Since there are no consumer requests on resources,
the suppliers turn off their resources at late boot (since there isn't
a better location as of today). The sync_state() call back added in a
subsequent patche in this series will provide the better location.

> (I am assuming the details since I have not reviewed the patches later in
> the series that implement this part.)
>
> Am I missing something?

I think you are mixing up devices getting added/populated with drivers
getting loaded as modules?

> If I am wrong, then I'll have more comments for your review replies for
> patches 2 and 3.

I'll wait for more review replies?

Thanks,
Saravana

2019-08-19 21:31:10

by Frank Rowand

[permalink] [raw]
Subject: Re: [PATCH v7 3/7] of/platform: Add functional dependency link from DT bindings

On 8/19/19 1:49 PM, Saravana Kannan wrote:
> On Mon, Aug 19, 2019 at 10:16 AM Frank Rowand <[email protected]> wrote:
>>
>> On 8/15/19 6:50 PM, Saravana Kannan wrote:
>>> On Wed, Aug 7, 2019 at 7:06 PM Frank Rowand <[email protected]> wrote:
>>>>
>>>> On 7/23/19 5:10 PM, Saravana Kannan wrote:
>>>>> Add device-links after the devices are created (but before they are
>>>>> probed) by looking at common DT bindings like clocks and
>>>>> interconnects.
>>
>>
>> < very big snip (lots of comments that deserve answers) >
>>
>>
>>>>
>>>> /**
>>>> * of_link_property - TODO:
>>>> * dev:
>>>> * con_np:
>>>> * prop:
>>>> *
>>>> * TODO...
>>>> *
>>>> * Any failed attempt to create a link will NOT result in an immediate return.
>>>> * of_link_property() must create all possible links even when one of more
>>>> * attempts to create a link fail.
>>>>
>>>> Why? isn't one failure enough to prevent probing this device?
>>>> Continuing to scan just results in extra work... which will be
>>>> repeated every time device_link_check_waiting_consumers() is called
>>>
>>> Context:
>>> As I said in the cover letter, avoiding unnecessary probes is just one
>>> of the reasons for this patch. The other (arguably more important)
>>
>> Agree that it is more important.
>>
>>
>>> reason for this patch is to make sure suppliers know that they have
>>> consumers that are yet to be probed. That way, suppliers can leave
>>> their resource on AND in the right state if they were left on by the
>>> bootloader. For example, if a clock was left on and at 200 MHz, the
>>> clock provider needs to keep that clock ON and at 200 MHz till all the
>>> consumers are probed.
>>>
>>> Answer: Let's say a consumer device Z has suppliers A, B and C. If the
>>> linking fails at A and you return immediately, then B and C could
>>> probe and then figure that they have no more consumers (they don't see
>>> a link to Z) and turn off their resources. And Z could fail
>>> catastrophically.
>>
>> Then I think that this approach is fatally flawed in the current implementation.
>
> I'm waiting to hear how it is fatally flawed. But maybe this is just a
> misunderstanding of the problem?

Fatally flawed because it does not handle modules that add a consumer
device when the module is loaded.


>
> In the text below, I'm not sure if you mixing up two different things
> or just that your wording it a bit ambiguous. So pardon my nitpick to
> err on the side of clarity.

Please do nitpick. Clarity is good.


>
>> A device can be added by a module that is loaded.
>
> No, in the example I gave, of_platform_default_populate_init() would
> add all 3 of those devices during arch_initcall_sync().

The example you gave does not cover all use cases.

There are modules that add devices when the module is loaded. You can not
ignore systems using such modules.


>
>> In that case the device
>> was not present at late boot when the suppliers may turn off their resources.
>
> In that case, the _drivers_ for those devices aren't present at late
> boot. So that they can't request to keep the resources on for their
> consumer devices. Since there are no consumer requests on resources,
> the suppliers turn off their resources at late boot (since there isn't
> a better location as of today). The sync_state() call back added in a
> subsequent patche in this series will provide the better location.

And the sync_state() call back will not deal with modules that add consumer
devices when the module is loaded, correct?


>
>> (I am assuming the details since I have not reviewed the patches later in
>> the series that implement this part.)
>>
>> Am I missing something?
>
> I think you are mixing up devices getting added/populated with drivers
> getting loaded as modules?

Only some modules add devices when they are loaded. But these modules do
exist.

-Frank

>
>> If I am wrong, then I'll have more comments for your review replies for
>> patches 2 and 3.
>
> I'll wait for more review replies?
>
> Thanks,
> Saravana
>

2019-08-20 00:12:14

by Saravana Kannan

[permalink] [raw]
Subject: Re: [PATCH v7 3/7] of/platform: Add functional dependency link from DT bindings

On Mon, Aug 19, 2019 at 2:30 PM Frank Rowand <[email protected]> wrote:
>
> On 8/19/19 1:49 PM, Saravana Kannan wrote:
> > On Mon, Aug 19, 2019 at 10:16 AM Frank Rowand <[email protected]> wrote:
> >>
> >> On 8/15/19 6:50 PM, Saravana Kannan wrote:
> >>> On Wed, Aug 7, 2019 at 7:06 PM Frank Rowand <[email protected]> wrote:
> >>>>
> >>>> On 7/23/19 5:10 PM, Saravana Kannan wrote:
> >>>>> Add device-links after the devices are created (but before they are
> >>>>> probed) by looking at common DT bindings like clocks and
> >>>>> interconnects.
> >>
> >>
> >> < very big snip (lots of comments that deserve answers) >
> >>
> >>
> >>>>
> >>>> /**
> >>>> * of_link_property - TODO:
> >>>> * dev:
> >>>> * con_np:
> >>>> * prop:
> >>>> *
> >>>> * TODO...
> >>>> *
> >>>> * Any failed attempt to create a link will NOT result in an immediate return.
> >>>> * of_link_property() must create all possible links even when one of more
> >>>> * attempts to create a link fail.
> >>>>
> >>>> Why? isn't one failure enough to prevent probing this device?
> >>>> Continuing to scan just results in extra work... which will be
> >>>> repeated every time device_link_check_waiting_consumers() is called
> >>>
> >>> Context:
> >>> As I said in the cover letter, avoiding unnecessary probes is just one
> >>> of the reasons for this patch. The other (arguably more important)
> >>
> >> Agree that it is more important.
> >>
> >>
> >>> reason for this patch is to make sure suppliers know that they have
> >>> consumers that are yet to be probed. That way, suppliers can leave
> >>> their resource on AND in the right state if they were left on by the
> >>> bootloader. For example, if a clock was left on and at 200 MHz, the
> >>> clock provider needs to keep that clock ON and at 200 MHz till all the
> >>> consumers are probed.
> >>>
> >>> Answer: Let's say a consumer device Z has suppliers A, B and C. If the
> >>> linking fails at A and you return immediately, then B and C could
> >>> probe and then figure that they have no more consumers (they don't see
> >>> a link to Z) and turn off their resources. And Z could fail
> >>> catastrophically.
> >>
> >> Then I think that this approach is fatally flawed in the current implementation.
> >
> > I'm waiting to hear how it is fatally flawed. But maybe this is just a
> > misunderstanding of the problem?
>
> Fatally flawed because it does not handle modules that add a consumer
> device when the module is loaded.

If you are talking about modules adding child devices of the device
they are managing, then that's handled correctly later in the series.

If you are talking about modules adding devices that aren't defined in
DT, then right, I'm not trying to handle that. The module needs to
make sure it keeps the resources needed for new devices it's adding
are in the right state or need to add the right device links.

> > In the text below, I'm not sure if you mixing up two different things
> > or just that your wording it a bit ambiguous. So pardon my nitpick to
> > err on the side of clarity.
>
> Please do nitpick. Clarity is good.
>
>
> >
> >> A device can be added by a module that is loaded.
> >
> > No, in the example I gave, of_platform_default_populate_init() would
> > add all 3 of those devices during arch_initcall_sync().
>
> The example you gave does not cover all use cases.
>
> There are modules that add devices when the module is loaded. You can not
> ignore systems using such modules.

I'll have to agree to disagree on that. While I understand that the
design should be good and I'm happy to work on that, you can't insist
that a patch series shouldn't be allowed because it's only improving
99% of the cases and leaves the other 1% in the status quo. You are
just going to bring the kernel development to a grinding halt.

> >
> >> In that case the device
> >> was not present at late boot when the suppliers may turn off their resources.
> >
> > In that case, the _drivers_ for those devices aren't present at late
> > boot. So that they can't request to keep the resources on for their
> > consumer devices. Since there are no consumer requests on resources,
> > the suppliers turn off their resources at late boot (since there isn't
> > a better location as of today). The sync_state() call back added in a
> > subsequent patche in this series will provide the better location.
>
> And the sync_state() call back will not deal with modules that add consumer
> devices when the module is loaded, correct?

Depends. If it's just more devices from DT, then it'll be fine. If
it's not, then the module needs to take care of the needs of devices
it's adding.

> >
> >> (I am assuming the details since I have not reviewed the patches later in
> >> the series that implement this part.)
> >>
> >> Am I missing something?
> >
> > I think you are mixing up devices getting added/populated with drivers
> > getting loaded as modules?
>
> Only some modules add devices when they are loaded. But these modules do
> exist.

Out of the billions of Android devices, how many do you see this happening in?

Thanks,
Saravana

2019-08-20 04:27:28

by Frank Rowand

[permalink] [raw]
Subject: Re: [PATCH v7 3/7] of/platform: Add functional dependency link from DT bindings

On 8/19/19 5:09 PM, Saravana Kannan wrote:
> On Mon, Aug 19, 2019 at 2:30 PM Frank Rowand <[email protected]> wrote:
>>
>> On 8/19/19 1:49 PM, Saravana Kannan wrote:
>>> On Mon, Aug 19, 2019 at 10:16 AM Frank Rowand <[email protected]> wrote:
>>>>
>>>> On 8/15/19 6:50 PM, Saravana Kannan wrote:
>>>>> On Wed, Aug 7, 2019 at 7:06 PM Frank Rowand <[email protected]> wrote:
>>>>>>
>>>>>> On 7/23/19 5:10 PM, Saravana Kannan wrote:
>>>>>>> Add device-links after the devices are created (but before they are
>>>>>>> probed) by looking at common DT bindings like clocks and
>>>>>>> interconnects.
>>>>
>>>>
>>>> < very big snip (lots of comments that deserve answers) >
>>>>
>>>>
>>>>>>
>>>>>> /**
>>>>>> * of_link_property - TODO:
>>>>>> * dev:
>>>>>> * con_np:
>>>>>> * prop:
>>>>>> *
>>>>>> * TODO...
>>>>>> *
>>>>>> * Any failed attempt to create a link will NOT result in an immediate return.
>>>>>> * of_link_property() must create all possible links even when one of more
>>>>>> * attempts to create a link fail.
>>>>>>
>>>>>> Why? isn't one failure enough to prevent probing this device?
>>>>>> Continuing to scan just results in extra work... which will be
>>>>>> repeated every time device_link_check_waiting_consumers() is called
>>>>>
>>>>> Context:
>>>>> As I said in the cover letter, avoiding unnecessary probes is just one
>>>>> of the reasons for this patch. The other (arguably more important)
>>>>
>>>> Agree that it is more important.
>>>>
>>>>
>>>>> reason for this patch is to make sure suppliers know that they have
>>>>> consumers that are yet to be probed. That way, suppliers can leave
>>>>> their resource on AND in the right state if they were left on by the
>>>>> bootloader. For example, if a clock was left on and at 200 MHz, the
>>>>> clock provider needs to keep that clock ON and at 200 MHz till all the
>>>>> consumers are probed.
>>>>>
>>>>> Answer: Let's say a consumer device Z has suppliers A, B and C. If the
>>>>> linking fails at A and you return immediately, then B and C could
>>>>> probe and then figure that they have no more consumers (they don't see
>>>>> a link to Z) and turn off their resources. And Z could fail
>>>>> catastrophically.
>>>>
>>>> Then I think that this approach is fatally flawed in the current implementation.
>>>
>>> I'm waiting to hear how it is fatally flawed. But maybe this is just a
>>> misunderstanding of the problem?
>>
>> Fatally flawed because it does not handle modules that add a consumer
>> device when the module is loaded.
>
> If you are talking about modules adding child devices of the device
> they are managing, then that's handled correctly later in the series.

They may or they may not. I do not know. I am not going to audit all
current cases of devices being added to check that relationship and I am
not going to monitor all future patches that add devices. Adding devices
is an existing pattern of behavior that the new feature must be able to
handle.

I have not looked at patch 6 yet (the place where modules adding child
devices is handled). I am guessing that patch 6 could be made more
general to remove the parent child relationship restriction.

>
> If you are talking about modules adding devices that aren't defined in
> DT, then right, I'm not trying to handle that. The module needs to
> make sure it keeps the resources needed for new devices it's adding
> are in the right state or need to add the right device links.

I am not talking about devices that are not defined in the devicetree.


>
>>> In the text below, I'm not sure if you mixing up two different things
>>> or just that your wording it a bit ambiguous. So pardon my nitpick to
>>> err on the side of clarity.
>>
>> Please do nitpick. Clarity is good.
>>
>>
>>>
>>>> A device can be added by a module that is loaded.
>>>
>>> No, in the example I gave, of_platform_default_populate_init() would
>>> add all 3 of those devices during arch_initcall_sync().
>>
>> The example you gave does not cover all use cases.
>>
>> There are modules that add devices when the module is loaded. You can not
>> ignore systems using such modules.
>
> I'll have to agree to disagree on that. While I understand that the
> design should be good and I'm happy to work on that, you can't insist
> that a patch series shouldn't be allowed because it's only improving
> 99% of the cases and leaves the other 1% in the status quo. You are
> just going to bring the kernel development to a grinding halt.

No, you do not get to disagree on that. And you are presenting a straw
man argument.

You are proposing a new feature that contributes fragility and complexity
to the house of cards that device instantiation and driver probing already
is.

The feature is clever but it is intertwined into an area that is already
complex and in many cases difficult to work within.

I had hoped that the feature was robust enough and generic enough to
accept. The proposed feature is a hack to paper over a specific problem
that you are facing. I had hoped that the feature would appear generic
enough that I would not have to regard it as an attempt to paper over
the real problem. I have not given up this hope yet but I still am
quite cautious about this approach to addressing your use case.

You have a real bug. I have told you how to fix the real bug. And you
have ignored my suggestion. (To be honest, I do not know for sure that
my suggestion is feasible, but on the surface it appears to be.) Again,
my suggestion is to have the boot loader pass information to the kernel
(via a chosen property) telling the kernel which devices the bootloader
has enabled power to. The power subsystem would use that information
early in boot to do a "get" on the power supplier (I am not using precise
power subsystem terminology, but it should be obvious what I mean).
The consumer device driver would also have to be aware of the information
passed via the chosen property because the power subsystem has done the
"get" on the consumer devices behalf (exactly how the consumer gets
that information is an implementation detail). This approach is
more direct, less subtle, less fragile.


>
>>>
>>>> In that case the device
>>>> was not present at late boot when the suppliers may turn off their resources.
>>>
>>> In that case, the _drivers_ for those devices aren't present at late
>>> boot. So that they can't request to keep the resources on for their
>>> consumer devices. Since there are no consumer requests on resources,
>>> the suppliers turn off their resources at late boot (since there isn't
>>> a better location as of today). The sync_state() call back added in a
>>> subsequent patche in this series will provide the better location.
>>
>> And the sync_state() call back will not deal with modules that add consumer
>> devices when the module is loaded, correct?
>
> Depends. If it's just more devices from DT, then it'll be fine. If
> it's not, then the module needs to take care of the needs of devices
> it's adding.>
>>>
>>>> (I am assuming the details since I have not reviewed the patches later in
>>>> the series that implement this part.)
>>>>
>>>> Am I missing something?
>>>
>>> I think you are mixing up devices getting added/populated with drivers
>>> getting loaded as modules?
>>
>> Only some modules add devices when they are loaded. But these modules do
>> exist.
>
> Out of the billions of Android devices, how many do you see this happening in?

The Linux kernel is not just used by Android devices.

-Frank

>
> Thanks,
> Saravana
>

2019-08-20 22:11:45

by Saravana Kannan

[permalink] [raw]
Subject: Re: [PATCH v7 3/7] of/platform: Add functional dependency link from DT bindings

On Mon, Aug 19, 2019 at 9:26 PM Frank Rowand <[email protected]> wrote:
>
> On 8/19/19 5:09 PM, Saravana Kannan wrote:
> > On Mon, Aug 19, 2019 at 2:30 PM Frank Rowand <[email protected]> wrote:
> >>
> >> On 8/19/19 1:49 PM, Saravana Kannan wrote:
> >>> On Mon, Aug 19, 2019 at 10:16 AM Frank Rowand <[email protected]> wrote:
> >>>>
> >>>> On 8/15/19 6:50 PM, Saravana Kannan wrote:
> >>>>> On Wed, Aug 7, 2019 at 7:06 PM Frank Rowand <[email protected]> wrote:
> >>>>>>
> >>>>>> On 7/23/19 5:10 PM, Saravana Kannan wrote:
> >>>>>>> Add device-links after the devices are created (but before they are
> >>>>>>> probed) by looking at common DT bindings like clocks and
> >>>>>>> interconnects.
> >>>>
> >>>>
> >>>> < very big snip (lots of comments that deserve answers) >
> >>>>
> >>>>
> >>>>>>
> >>>>>> /**
> >>>>>> * of_link_property - TODO:
> >>>>>> * dev:
> >>>>>> * con_np:
> >>>>>> * prop:
> >>>>>> *
> >>>>>> * TODO...
> >>>>>> *
> >>>>>> * Any failed attempt to create a link will NOT result in an immediate return.
> >>>>>> * of_link_property() must create all possible links even when one of more
> >>>>>> * attempts to create a link fail.
> >>>>>>
> >>>>>> Why? isn't one failure enough to prevent probing this device?
> >>>>>> Continuing to scan just results in extra work... which will be
> >>>>>> repeated every time device_link_check_waiting_consumers() is called
> >>>>>
> >>>>> Context:
> >>>>> As I said in the cover letter, avoiding unnecessary probes is just one
> >>>>> of the reasons for this patch. The other (arguably more important)
> >>>>
> >>>> Agree that it is more important.
> >>>>
> >>>>
> >>>>> reason for this patch is to make sure suppliers know that they have
> >>>>> consumers that are yet to be probed. That way, suppliers can leave
> >>>>> their resource on AND in the right state if they were left on by the
> >>>>> bootloader. For example, if a clock was left on and at 200 MHz, the
> >>>>> clock provider needs to keep that clock ON and at 200 MHz till all the
> >>>>> consumers are probed.
> >>>>>
> >>>>> Answer: Let's say a consumer device Z has suppliers A, B and C. If the
> >>>>> linking fails at A and you return immediately, then B and C could
> >>>>> probe and then figure that they have no more consumers (they don't see
> >>>>> a link to Z) and turn off their resources. And Z could fail
> >>>>> catastrophically.
> >>>>
> >>>> Then I think that this approach is fatally flawed in the current implementation.
> >>>
> >>> I'm waiting to hear how it is fatally flawed. But maybe this is just a
> >>> misunderstanding of the problem?
> >>
> >> Fatally flawed because it does not handle modules that add a consumer
> >> device when the module is loaded.
> >
> > If you are talking about modules adding child devices of the device
> > they are managing, then that's handled correctly later in the series.
>
> They may or they may not. I do not know. I am not going to audit all
> current cases of devices being added to check that relationship and I am
> not going to monitor all future patches that add devices. Adding devices
> is an existing pattern of behavior that the new feature must be able to
> handle.
>
> I have not looked at patch 6 yet (the place where modules adding child
> devices is handled). I am guessing that patch 6 could be made more
> general to remove the parent child relationship restriction.

Please do look into it then. I think it already handles all cases.

> >
> > If you are talking about modules adding devices that aren't defined in
> > DT, then right, I'm not trying to handle that. The module needs to
> > make sure it keeps the resources needed for new devices it's adding
> > are in the right state or need to add the right device links.
>
> I am not talking about devices that are not defined in the devicetree.

In that case, I'm sure my patch series handle all the scenarios
correctly. Here's why:
1. For all the top level devices the patches you've reviewed already
show how it's handled correctly.
2. All other devices in the DT are by definition the child devices of
the top level devices and patch 6 handles those cases.

Hopefully this shows to you that all DT cases are handled correctly.

> >>> In the text below, I'm not sure if you mixing up two different things
> >>> or just that your wording it a bit ambiguous. So pardon my nitpick to
> >>> err on the side of clarity.
> >>
> >> Please do nitpick. Clarity is good.
> >>
> >>
> >>>
> >>>> A device can be added by a module that is loaded.
> >>>
> >>> No, in the example I gave, of_platform_default_populate_init() would
> >>> add all 3 of those devices during arch_initcall_sync().
> >>
> >> The example you gave does not cover all use cases.
> >>
> >> There are modules that add devices when the module is loaded. You can not
> >> ignore systems using such modules.
> >
> > I'll have to agree to disagree on that. While I understand that the
> > design should be good and I'm happy to work on that, you can't insist
> > that a patch series shouldn't be allowed because it's only improving
> > 99% of the cases and leaves the other 1% in the status quo. You are
> > just going to bring the kernel development to a grinding halt.
>
> No, you do not get to disagree on that. And you are presenting a straw
> man argument.
>
> You are proposing a new feature that contributes fragility and complexity
> to the house of cards that device instantiation and driver probing already
> is.

Any piece of code is going to "add complexity". It's a question of
benefits vs complexity. Also, I'm mostly reusing existing device links
API. The majority of the complexity is in parsing DT. The driver core
maintainers seem to be fine with adding sync_state() support for
device links (that is independent of DT).

> The feature is clever but it is intertwined into an area that is already
> complex and in many cases difficult to work within.
>
> I had hoped that the feature was robust enough and generic enough to
> accept.

What I'm doing IS the most generic solution instead of doing the same
work multiple times at a framework level. Also, for multi-function
devices, framework level solutions would be worse.

> The proposed feature is a hack to paper over a specific problem
> that you are facing. I had hoped that the feature would appear generic
> enough that I would not have to regard it as an attempt to paper over
> the real problem. I have not given up this hope yet but I still am
> quite cautious about this approach to addressing your use case.
>
> You have a real bug. I have told you how to fix the real bug. And you
> have ignored my suggestion. (To be honest, I do not know for sure that
> my suggestion is feasible, but on the surface it appears to be.)

I'd actually say that your proposal is what's trying to paper over a
generic problem by saying it's specific to one or a few set of
resources. And it looks feasible to you because you haven't dove deep
into this issue.

> Again,
> my suggestion is to have the boot loader pass information to the kernel
> (via a chosen property) telling the kernel which devices the bootloader
> has enabled power to. The power subsystem would use that information
> early in boot to do a "get" on the power supplier (I am not using precise
> power subsystem terminology, but it should be obvious what I mean).
> The consumer device driver would also have to be aware of the information
> passed via the chosen property because the power subsystem has done the
> "get" on the consumer devices behalf (exactly how the consumer gets
> that information is an implementation detail). This approach is
> more direct, less subtle, less fragile.

I'll have to disagree on your claim. You are adding unnecessary
bootloader dependency when the kernel is completely capable of
handling this on its own. You are requiring explicit "gets" by
suppliers and then hoping all the consumers do the corresponding
"puts" to balance it out. Somehow the consumers need to know which
suppliers have parsed which bootloader input. And it's barely
scratching the surface of the problem.

You are assuming this has to do with just power when it can be clocks,
interconnects, etc. Why solve this repeated for each framework when
you can have a generic solution?

Also, while I understand what you mean by "get" it's not going to be
as simple as a reference count to keep the resource on. In reality
you'll need more complex handling. For example, having to keep a
voltage rail at or above X mV because one consumer might fail if the
voltage is < X mV. Or making sure a clock never goes about the
bootloader set frequency before all the consumer drivers are probed to
avoid overclocking one of the consumers. Trying to have this
explicitly coordinated across multiple drivers would be a nightmare.
It gets even more complicated with interconnects.

With my patch series, the consumers don't need to do anything. They
just probe as usual. The suppliers don't need to track or coordinate
with any consumers. For example, regulator suppliers just need to keep
the voltage rail at (or above) the level that the boot loader left it
on at and then apply the aggregated requests from their APIs once they
get the sync_state() callback. And it actually works -- tested for
regulators and clocks (and maybe even interconnects -- I forgot) in a
device I have.

> >>>
> >>>> In that case the device
> >>>> was not present at late boot when the suppliers may turn off their resources.
> >>>
> >>> In that case, the _drivers_ for those devices aren't present at late
> >>> boot. So that they can't request to keep the resources on for their
> >>> consumer devices. Since there are no consumer requests on resources,
> >>> the suppliers turn off their resources at late boot (since there isn't
> >>> a better location as of today). The sync_state() call back added in a
> >>> subsequent patche in this series will provide the better location.
> >>
> >> And the sync_state() call back will not deal with modules that add consumer
> >> devices when the module is loaded, correct?
> >
> > Depends. If it's just more devices from DT, then it'll be fine. If
> > it's not, then the module needs to take care of the needs of devices
> > it's adding.>
> >>>
> >>>> (I am assuming the details since I have not reviewed the patches later in
> >>>> the series that implement this part.)
> >>>>
> >>>> Am I missing something?
> >>>
> >>> I think you are mixing up devices getting added/populated with drivers
> >>> getting loaded as modules?
> >>
> >> Only some modules add devices when they are loaded. But these modules do
> >> exist.
> >
> > Out of the billions of Android devices, how many do you see this happening in?
>
> The Linux kernel is not just used by Android devices.

Ofcourse Linux is used by more than just Android. And Android is just
an ARM64(32) distribution. But how many platforms do you have where a
module adds devices that are not part of DT (because I'm handling the
DT part fine -- see other emails)? How does that count compare to
millions of products that can use this feature? And I'm not breaking
any of the existing platforms that don't use DT either. So saying I
have to fix this for 100% of the use cases for Linux before I can
remove the roadblocks for a common ARM64 kernel that can run on any
ARM64 platform seems like an unreasonable position.

-Saravana

2019-08-21 17:42:50

by Frank Rowand

[permalink] [raw]
Subject: Re: [PATCH v7 3/7] of/platform: Add functional dependency link from DT bindings

On 8/20/19 3:09 PM, Saravana Kannan wrote:
> On Mon, Aug 19, 2019 at 9:26 PM Frank Rowand <[email protected]> wrote:
>>


< snip - the stuff I snipped deserves reply, but I want to focus on just
one topic for this reply >

>> You have a real bug. I have told you how to fix the real bug. And you
>> have ignored my suggestion. (To be honest, I do not know for sure that
>> my suggestion is feasible, but on the surface it appears to be.)
>
> I'd actually say that your proposal is what's trying to paper over a
> generic problem by saying it's specific to one or a few set of
> resources. And it looks feasible to you because you haven't dove deep
> into this issue.

Not saying it is specific to one or a few sets of resources. The
proposal suggests handling every single consumer supplier relationship
for which the bootloader has enabled a supplier resource via an
explicit message communicating the enabled resources. And directly
handling those exact resources.

Think about the definition of "paper over" vs "directly address".


>
>> Again,
>> my suggestion is to have the boot loader pass information to the kernel
>> (via a chosen property) telling the kernel which devices the bootloader
>> has enabled power to. The power subsystem would use that information
>> early in boot to do a "get" on the power supplier (I am not using precise
>> power subsystem terminology, but it should be obvious what I mean).
>> The consumer device driver would also have to be aware of the information
>> passed via the chosen property because the power subsystem has done the
>> "get" on the consumer devices behalf (exactly how the consumer gets
>> that information is an implementation detail). This approach is
>> more direct, less subtle, less fragile.
>
> I'll have to disagree on your claim. You are adding unnecessary
> bootloader dependency when the kernel is completely capable of
> handling this on its own. You are requiring explicit "gets" by
> suppliers and then hoping all the consumers do the corresponding
> "puts" to balance it out. Somehow the consumers need to know which
> suppliers have parsed which bootloader input. And it's barely
> scratching the surface of the problem.

OK, let me flesh out a possible implementation just a little bit.

This is focused on devicetree, as is your patch series. For ACPI
a parallel implementation would exist.

The bootloader chosen property could be a list of tuples, each tuple
containing: consumer phandle, supplier phandle. Each tuple could
contain more data if the implementation demands, but I'm trying to
keep it simple to illustrate the concept.

In early-ish boot a core function processes the chosen property. For
each consumer / supplier pair, the supplier compatible could be used
to determine the supplier type. (This might not be enough info to
determine the supplier type - maybe the consumer property that points
to the supplier will also have to be specified in the chosen property
tuple, or maybe a supplier type could be added to the tuple.) Given
the consumer, supplier, and resource type the appropriate "get"
would be done.

Late in boot, and possible repeated after modules are loaded, a core
function would scan the chosen property tuples, and for each
consumer / supplier pair, if both the consumer and the supplier
drivers are bound, it would be ASSUMED that it is ok to do the
appropriate type of "put", and the "put" would be done.


>
> You are assuming this has to do with just power when it can be clocks,
> interconnects, etc. Why solve this repeated for each framework when
> you can have a generic solution?

No such assumption.


>
> Also, while I understand what you mean by "get" it's not going to be
> as simple as a reference count to keep the resource on. In reality
> you'll need more complex handling. For example, having to keep a
> voltage rail at or above X mV because one consumer might fail if the
> voltage is < X mV. Or making sure a clock never goes about the
> bootloader set frequency before all the consumer drivers are probed to
> avoid overclocking one of the consumers. Trying to have this
> explicitly coordinated across multiple drivers would be a nightmare.
> It gets even more complicated with interconnects.
>
> With my patch series, the consumers don't need to do anything. They
> just probe as usual. The suppliers don't need to track or coordinate
> with any consumers. For example, regulator suppliers just need to keep
> the voltage rail at (or above) the level that the boot loader left it
> on at and then apply the aggregated requests from their APIs once they
> get the sync_state() callback. And it actually works -- tested for
> regulators and clocks (and maybe even interconnects -- I forgot) in a
> device I have.
>

And same for the possible implementation I sketched above. The equivalent
of the sync_state() callback would be done by the end of boot (potentially
repeated after each module loads) core function making a similar call.
Hand waving here about what suppliers to call.

Of course this is not the only way to implement my concept, just an
example to suggest that it might be feasible and it might work.

< snip >

-Frank