2020-11-21 02:06:38

by Saravana Kannan

[permalink] [raw]
Subject: [PATCH v2 00/17] Refactor fw_devlink to significantly improve boot time

The current implementation of fw_devlink is very inefficient because it
tries to get away without creating fwnode links in the name of saving
memory usage. Past attempts to optimize runtime at the cost of memory
usage were blocked with request for data showing that the optimization
made significant improvement for real world scenarios.

We have those scenarios now. There have been several reports of boot
time increase in the order of seconds in this thread [1]. Several OEMs
and SoC manufacturers have also privately reported significant
(350-400ms) increase in boot time due to all the parsing done by
fw_devlink.

So this patch series refactors fw_devlink to be more efficient. The key
difference now is the addition of support for fwnode links -- just a few
simple APIs. This also allows most of the code to be moved out of
firmware specific (DT mostly) code into driver core.

This brings the following benefits:
- Instead of parsing the device tree multiple times (complexity was
close to O(N^3) where N in the number of properties) during bootup,
fw_devlink parses each fwnode node/property only once and creates
fwnode links. The rest of the fw_devlink code then just looks at these
fwnode links to do rest of the work.

- Makes it much easier to debug probe issue due to fw_devlink in the
future. fw_devlink=on blocks the probing of devices if they depend on
a device that hasn't been added yet. With this refactor, it'll be very
easy to tell what that device is because we now have a reference to
the fwnode of the device.

- Much easier to add fw_devlink support to ACPI and other firmware
types. A refactor to move the common bits from DT specific code to
driver core was in my TODO list as a prerequisite to adding ACPI
support to fw_devlink. This series gets that done.

Laurent and Grygorii tested the v1 series and they saw boot time
improvment of about 12 seconds and 3 seconds, respectively.

Thanks,
Saravana

[1] - https://lore.kernel.org/linux-pm/CAGETcx-aiW251dhEXT1GNb9bi6YcX8W=jLBrro5CnPuEjGL09g@mail.gmail.com/

v1 -> v2:
Patches 1-6:
- Added the "why" to explain the reverts.
Patch 7:
- Fixed white space comment.
Patch 8:
- Reworded commit text and some function doc.
Patch 11:
- Fixed the build warning this patch would cause by removing a "const".
Patch 12:
- Added/updated documentation.
- Changed flags from u32 to u8.
Patch 13:
- Squashed with Patch 10. Will use v1 patch number for the rest of the diff
descriptions.
Patch 15:
- Removed an unnecessary unlikely()
Patch 17:
- Refactored fw_devlink_create_devlink() to flip the error handling vs
successful paths.
Patch 18:
- Squashed into Patch 17 as requested by Greg.
- Added Tested-by: tags from Laurent and Grygorii.
New Patch 17:
- New patch to delete useless input to add_links()

Saravana Kannan (17):
Revert "driver core: Avoid deferred probe due to
fw_devlink_pause/resume()"
Revert "driver core: Rename dev_links_info.defer_sync to defer_hook"
Revert "driver core: Don't do deferred probe in parallel with
kernel_init thread"
Revert "driver core: Remove check in
driver_deferred_probe_force_trigger()"
Revert "of: platform: Batch fwnode parsing when adding all top level
devices"
Revert "driver core: fw_devlink: Add support for batching fwnode
parsing"
driver core: Add fwnode_init()
driver core: Add fwnode link support
driver core: Allow only unprobed consumers for SYNC_STATE_ONLY device
links
device property: Add fwnode_is_ancestor_of() and
fwnode_get_next_parent_dev()
driver core: Redefine the meaning of fwnode_operations.add_links()
driver core: Add fw_devlink_parse_fwtree()
driver core: Use device's fwnode to check if it is waiting for
suppliers
of: property: Update implementation of add_links() to create fwnode
links
efi: Update implementation of add_links() to create fwnode links
driver core: Refactor fw_devlink feature
driver core: Delete pointless parameter in fwnode_operations.add_links

drivers/acpi/property.c | 2 +-
drivers/acpi/scan.c | 2 +-
drivers/base/core.c | 555 ++++++++++++++++++++------------
drivers/base/property.c | 52 +++
drivers/base/swnode.c | 2 +-
drivers/firmware/efi/efi-init.c | 32 +-
drivers/of/dynamic.c | 1 +
drivers/of/platform.c | 2 -
drivers/of/property.c | 149 +++------
include/linux/device.h | 10 +-
include/linux/fwnode.h | 73 ++---
include/linux/of.h | 2 +-
include/linux/property.h | 3 +
kernel/irq/irqdomain.c | 2 +-
14 files changed, 495 insertions(+), 392 deletions(-)

--
2.29.2.454.gaff20da3a2-goog


2020-11-21 02:06:52

by Saravana Kannan

[permalink] [raw]
Subject: [PATCH v2 13/17] driver core: Use device's fwnode to check if it is waiting for suppliers

To check if a device is still waiting for its supplier devices to be
added, we used to check if the devices is in a global
waiting_for_suppliers list. Since the global list will be deleted in
subsequent patches, this patch stops using this check.

Instead, this patch uses a more device specific check. It checks if the
device's fwnode has any fwnode links that haven't been converted to
device links yet.

Signed-off-by: Saravana Kannan <[email protected]>
---
drivers/base/core.c | 18 ++++++++----------
1 file changed, 8 insertions(+), 10 deletions(-)

diff --git a/drivers/base/core.c b/drivers/base/core.c
index 395dece1c83a..1873cecb0cc4 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -51,6 +51,7 @@ static DEFINE_MUTEX(wfs_lock);
static LIST_HEAD(deferred_sync);
static unsigned int defer_sync_state_count = 1;
static DEFINE_MUTEX(fwnode_link_lock);
+static bool fw_devlink_is_permissive(void);

/**
* fwnode_link_add - Create a link between two fwnode_handles.
@@ -995,13 +996,13 @@ int device_links_check_suppliers(struct device *dev)
* Device waiting for supplier to become available is not allowed to
* probe.
*/
- mutex_lock(&wfs_lock);
- if (!list_empty(&dev->links.needs_suppliers) &&
- dev->links.need_for_probe) {
- mutex_unlock(&wfs_lock);
+ mutex_lock(&fwnode_link_lock);
+ if (dev->fwnode && !list_empty(&dev->fwnode->suppliers) &&
+ !fw_devlink_is_permissive()) {
+ mutex_unlock(&fwnode_link_lock);
return -EPROBE_DEFER;
}
- mutex_unlock(&wfs_lock);
+ mutex_unlock(&fwnode_link_lock);

device_links_write_lock();

@@ -1167,10 +1168,7 @@ static ssize_t waiting_for_supplier_show(struct device *dev,
bool val;

device_lock(dev);
- mutex_lock(&wfs_lock);
- val = !list_empty(&dev->links.needs_suppliers)
- && dev->links.need_for_probe;
- mutex_unlock(&wfs_lock);
+ val = !list_empty(&dev->fwnode->suppliers);
device_unlock(dev);
return sysfs_emit(buf, "%u\n", val);
}
@@ -2202,7 +2200,7 @@ static int device_add_attrs(struct device *dev)
goto err_remove_dev_groups;
}

- if (fw_devlink_flags && !fw_devlink_is_permissive()) {
+ if (fw_devlink_flags && !fw_devlink_is_permissive() && dev->fwnode) {
error = device_create_file(dev, &dev_attr_waiting_for_supplier);
if (error)
goto err_remove_dev_online;
--
2.29.2.454.gaff20da3a2-goog

2020-11-21 02:07:03

by Saravana Kannan

[permalink] [raw]
Subject: [PATCH v2 14/17] of: property: Update implementation of add_links() to create fwnode links

The semantics of add_links() has changed from creating device link
between devices to creating fwnode links between fwnodes. So, update the
implementation of add_links() to match the new semantics.

Signed-off-by: Saravana Kannan <[email protected]>
---
drivers/of/property.c | 150 ++++++++++++------------------------------
1 file changed, 41 insertions(+), 109 deletions(-)

diff --git a/drivers/of/property.c b/drivers/of/property.c
index 408a7b5f06a9..620d29fdace8 100644
--- a/drivers/of/property.c
+++ b/drivers/of/property.c
@@ -1038,33 +1038,9 @@ static bool of_is_ancestor_of(struct device_node *test_ancestor,
}

/**
- * of_get_next_parent_dev - Add device link to supplier from supplier phandle
- * @np: device tree node
- *
- * Given a device tree node (@np), this function finds its closest ancestor
- * device tree node that has a corresponding struct device.
- *
- * The caller of this function is expected to call put_device() on the returned
- * device when they are done.
- */
-static struct device *of_get_next_parent_dev(struct device_node *np)
-{
- struct device *dev = NULL;
-
- of_node_get(np);
- do {
- np = of_get_next_parent(np);
- if (np)
- dev = get_dev_from_fwnode(&np->fwnode);
- } while (np && !dev);
- of_node_put(np);
- return dev;
-}
-
-/**
- * of_link_to_phandle - Add device link to supplier from supplier phandle
- * @dev: consumer device
- * @sup_np: phandle to supplier device tree node
+ * of_link_to_phandle - Add fwnode link to supplier from supplier phandle
+ * @con_np: consumer device tree node
+ * @sup_np: supplier device tree node
*
* Given a phandle to a supplier device tree node (@sup_np), this function
* finds the device that owns the supplier device tree node and creates a
@@ -1074,16 +1050,14 @@ static struct device *of_get_next_parent_dev(struct device_node *np)
* cases, it returns an error.
*
* Returns:
- * - 0 if link successfully created to supplier
- * - -EAGAIN if linking to the supplier should be reattempted
+ * - 0 if fwnode link successfully created to supplier
* - -EINVAL if the supplier link is invalid and should not be created
- * - -ENODEV if there is no device that corresponds to the supplier phandle
+ * - -ENODEV if struct device will never be create for supplier
*/
-static int of_link_to_phandle(struct device *dev, struct device_node *sup_np,
- u32 dl_flags)
+static int of_link_to_phandle(struct device_node *con_np,
+ struct device_node *sup_np)
{
- struct device *sup_dev, *sup_par_dev;
- int ret = 0;
+ struct device *sup_dev;
struct device_node *tmp_np = sup_np;

of_node_get(sup_np);
@@ -1106,7 +1080,8 @@ static int of_link_to_phandle(struct device *dev, struct device_node *sup_np,
}

if (!sup_np) {
- dev_dbg(dev, "Not linking to %pOFP - No device\n", tmp_np);
+ pr_debug("Not linking %pOFP to %pOFP - No device\n",
+ con_np, tmp_np);
return -ENODEV;
}

@@ -1115,53 +1090,30 @@ static int of_link_to_phandle(struct device *dev, struct device_node *sup_np,
* descendant nodes. By definition, a child node can't be a functional
* dependency for the parent node.
*/
- if (of_is_ancestor_of(dev->of_node, sup_np)) {
- dev_dbg(dev, "Not linking to %pOFP - is descendant\n", sup_np);
+ if (of_is_ancestor_of(con_np, sup_np)) {
+ pr_debug("Not linking %pOFP to %pOFP - is descendant\n",
+ con_np, sup_np);
of_node_put(sup_np);
return -EINVAL;
}
+
+ /*
+ * Don't create links to "early devices" that won't have struct devices
+ * created for them.
+ */
sup_dev = get_dev_from_fwnode(&sup_np->fwnode);
if (!sup_dev && of_node_check_flag(sup_np, OF_POPULATED)) {
- /* Early device without struct device. */
- dev_dbg(dev, "Not linking to %pOFP - No struct device\n",
- sup_np);
+ pr_debug("Not linking %pOFP to %pOFP - No struct device\n",
+ con_np, sup_np);
of_node_put(sup_np);
return -ENODEV;
- } else if (!sup_dev) {
- /*
- * DL_FLAG_SYNC_STATE_ONLY doesn't block probing and supports
- * cycles. So cycle detection isn't necessary and shouldn't be
- * done.
- */
- if (dl_flags & DL_FLAG_SYNC_STATE_ONLY) {
- of_node_put(sup_np);
- return -EAGAIN;
- }
-
- sup_par_dev = of_get_next_parent_dev(sup_np);
-
- if (sup_par_dev && device_is_dependent(dev, sup_par_dev)) {
- /* Cyclic dependency detected, don't try to link */
- dev_dbg(dev, "Not linking to %pOFP - cycle detected\n",
- sup_np);
- ret = -EINVAL;
- } else {
- /*
- * Can't check for cycles or no cycles. So let's try
- * again later.
- */
- ret = -EAGAIN;
- }
-
- of_node_put(sup_np);
- put_device(sup_par_dev);
- return ret;
}
- of_node_put(sup_np);
- if (!device_link_add(dev, sup_dev, dl_flags))
- ret = -EINVAL;
put_device(sup_dev);
- return ret;
+
+ fwnode_link_add(of_fwnode_handle(con_np), of_fwnode_handle(sup_np));
+ of_node_put(sup_np);
+
+ return 0;
}

/**
@@ -1361,37 +1313,29 @@ static const struct supplier_bindings of_supplier_bindings[] = {
* that list phandles to suppliers. If @prop_name isn't one, this function
* doesn't do anything.
*
- * If @prop_name is one, this function attempts to create device links from the
- * consumer device @dev to all the devices of the suppliers listed in
- * @prop_name.
+ * If @prop_name is one, this function attempts to create fwnode links from the
+ * consumer device tree node @con_np to all the suppliers device tree nodes
+ * listed in @prop_name.
*
- * Any failed attempt to create a device link will NOT result in an immediate
+ * Any failed attempt to create a fwnode link will NOT result in an immediate
* return. of_link_property() must create links to all the available supplier
- * devices even when attempts to create a link to one or more suppliers fail.
+ * device tree nodes even when attempts to create a link to one or more
+ * suppliers fail.
*/
-static int of_link_property(struct device *dev, struct device_node *con_np,
- const char *prop_name)
+static int of_link_property(struct device_node *con_np, const char *prop_name)
{
struct device_node *phandle;
const struct supplier_bindings *s = of_supplier_bindings;
unsigned int i = 0;
bool matched = false;
int ret = 0;
- u32 dl_flags;
-
- if (dev->of_node == con_np)
- dl_flags = fw_devlink_get_flags();
- else
- dl_flags = DL_FLAG_SYNC_STATE_ONLY;

/* Do not stop at first failed link, link all available suppliers. */
while (!matched && s->parse_prop) {
while ((phandle = s->parse_prop(con_np, prop_name, i))) {
matched = true;
i++;
- if (of_link_to_phandle(dev, phandle, dl_flags)
- == -EAGAIN)
- ret = -EAGAIN;
+ of_link_to_phandle(con_np, phandle);
of_node_put(phandle);
}
s++;
@@ -1399,31 +1343,19 @@ static int of_link_property(struct device *dev, struct device_node *con_np,
return ret;
}

-static int of_link_to_suppliers(struct device *dev,
- struct device_node *con_np)
+static int of_fwnode_add_links(struct fwnode_handle *fwnode,
+ struct device *dev)
{
- struct device_node *child;
struct property *p;
- int ret = 0;
+ struct device_node *con_np = to_of_node(fwnode);

- for_each_property_of_node(con_np, p)
- if (of_link_property(dev, con_np, p->name))
- ret = -ENODEV;
-
- for_each_available_child_of_node(con_np, child)
- if (of_link_to_suppliers(dev, child) && !ret)
- ret = -EAGAIN;
-
- return ret;
-}
+ if (!con_np)
+ return -EINVAL;

-static int of_fwnode_add_links(const struct fwnode_handle *fwnode,
- struct device *dev)
-{
- if (unlikely(!is_of_node(fwnode)))
- return 0;
+ for_each_property_of_node(con_np, p)
+ of_link_property(con_np, p->name);

- return of_link_to_suppliers(dev, to_of_node(fwnode));
+ return 0;
}

const struct fwnode_operations of_fwnode_ops = {
--
2.29.2.454.gaff20da3a2-goog

2020-11-21 02:07:11

by Saravana Kannan

[permalink] [raw]
Subject: [PATCH v2 07/17] driver core: Add fwnode_init()

There are multiple locations in the kernel where a struct fwnode_handle
is initialized. Add fwnode_init() so that we have one way of
initializing a fwnode_handle.

Signed-off-by: Saravana Kannan <[email protected]>
---
drivers/acpi/property.c | 2 +-
drivers/acpi/scan.c | 2 +-
drivers/base/swnode.c | 2 +-
drivers/firmware/efi/efi-init.c | 8 ++++----
include/linux/fwnode.h | 6 ++++++
include/linux/of.h | 2 +-
kernel/irq/irqdomain.c | 2 +-
7 files changed, 15 insertions(+), 9 deletions(-)

diff --git a/drivers/acpi/property.c b/drivers/acpi/property.c
index d04de10a63e4..24e87b630573 100644
--- a/drivers/acpi/property.c
+++ b/drivers/acpi/property.c
@@ -76,7 +76,7 @@ static bool acpi_nondev_subnode_extract(const union acpi_object *desc,
return false;

dn->name = link->package.elements[0].string.pointer;
- dn->fwnode.ops = &acpi_data_fwnode_ops;
+ fwnode_init(&dn->fwnode, &acpi_data_fwnode_ops);
dn->parent = parent;
INIT_LIST_HEAD(&dn->data.properties);
INIT_LIST_HEAD(&dn->data.subnodes);
diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c
index bc6a79e33220..519963bcc047 100644
--- a/drivers/acpi/scan.c
+++ b/drivers/acpi/scan.c
@@ -1589,7 +1589,7 @@ void acpi_init_device_object(struct acpi_device *device, acpi_handle handle,
device->device_type = type;
device->handle = handle;
device->parent = acpi_bus_get_parent(handle);
- device->fwnode.ops = &acpi_device_fwnode_ops;
+ fwnode_init(&device->fwnode, &acpi_device_fwnode_ops);
acpi_set_device_status(device, sta);
acpi_device_get_busid(device);
acpi_set_pnp_ids(handle, &device->pnp, type);
diff --git a/drivers/base/swnode.c b/drivers/base/swnode.c
index 010828fc785b..4a4b2008fbc2 100644
--- a/drivers/base/swnode.c
+++ b/drivers/base/swnode.c
@@ -653,7 +653,7 @@ swnode_register(const struct software_node *node, struct swnode *parent,
swnode->parent = parent;
swnode->allocated = allocated;
swnode->kobj.kset = swnode_kset;
- swnode->fwnode.ops = &software_node_ops;
+ fwnode_init(&swnode->fwnode, &software_node_ops);

ida_init(&swnode->child_ids);
INIT_LIST_HEAD(&swnode->entry);
diff --git a/drivers/firmware/efi/efi-init.c b/drivers/firmware/efi/efi-init.c
index f55a92ff12c0..b148f1459fb3 100644
--- a/drivers/firmware/efi/efi-init.c
+++ b/drivers/firmware/efi/efi-init.c
@@ -359,9 +359,7 @@ static const struct fwnode_operations efifb_fwnode_ops = {
.add_links = efifb_add_links,
};

-static struct fwnode_handle efifb_fwnode = {
- .ops = &efifb_fwnode_ops,
-};
+static struct fwnode_handle efifb_fwnode;

static int __init register_gop_device(void)
{
@@ -375,8 +373,10 @@ static int __init register_gop_device(void)
if (!pd)
return -ENOMEM;

- if (IS_ENABLED(CONFIG_PCI))
+ if (IS_ENABLED(CONFIG_PCI)) {
+ fwnode_init(&efifb_fwnode, &efifb_fwnode_ops);
pd->dev.fwnode = &efifb_fwnode;
+ }

err = platform_device_add_data(pd, &screen_info, sizeof(screen_info));
if (err)
diff --git a/include/linux/fwnode.h b/include/linux/fwnode.h
index e0abafbb17f8..5589799708b5 100644
--- a/include/linux/fwnode.h
+++ b/include/linux/fwnode.h
@@ -170,6 +170,12 @@ struct fwnode_operations {
} while (false)
#define get_dev_from_fwnode(fwnode) get_device((fwnode)->dev)

+static inline void fwnode_init(struct fwnode_handle *fwnode,
+ const struct fwnode_operations *ops)
+{
+ fwnode->ops = ops;
+}
+
extern u32 fw_devlink_get_flags(void);

#endif
diff --git a/include/linux/of.h b/include/linux/of.h
index 5d51891cbf1a..27fba2472eee 100644
--- a/include/linux/of.h
+++ b/include/linux/of.h
@@ -108,7 +108,7 @@ static inline void of_node_init(struct device_node *node)
#if defined(CONFIG_OF_KOBJ)
kobject_init(&node->kobj, &of_node_ktype);
#endif
- node->fwnode.ops = &of_fwnode_ops;
+ fwnode_init(&node->fwnode, &of_fwnode_ops);
}

#if defined(CONFIG_OF_KOBJ)
diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c
index cf8b374b892d..06fce7e39033 100644
--- a/kernel/irq/irqdomain.c
+++ b/kernel/irq/irqdomain.c
@@ -91,7 +91,7 @@ struct fwnode_handle *__irq_domain_alloc_fwnode(unsigned int type, int id,
fwid->type = type;
fwid->name = n;
fwid->pa = pa;
- fwid->fwnode.ops = &irqchip_fwnode_ops;
+ fwnode_init(&fwid->fwnode, &irqchip_fwnode_ops);
return &fwid->fwnode;
}
EXPORT_SYMBOL_GPL(__irq_domain_alloc_fwnode);
--
2.29.2.454.gaff20da3a2-goog

2020-11-21 02:07:27

by Saravana Kannan

[permalink] [raw]
Subject: [PATCH v2 09/17] driver core: Allow only unprobed consumers for SYNC_STATE_ONLY device links

SYNC_STATE_ONLY device links only affect the behavior of sync_state()
callbacks. Specifically, they prevent sync_state() only callbacks from
being called on a device if one or more of its consumers haven't probed.

So, creating a SYNC_STATE_ONLY device link from an already probed
consumer is useless. So, don't allow creating such device links.

Signed-off-by: Saravana Kannan <[email protected]>
---
drivers/base/core.c | 11 +++++++++++
1 file changed, 11 insertions(+)

diff --git a/drivers/base/core.c b/drivers/base/core.c
index e2b246a44d1a..215ce9e72790 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -649,6 +649,17 @@ struct device_link *device_link_add(struct device *consumer,
goto out;
}

+ /*
+ * SYNC_STATE_ONLY links are useless once a consumer device has probed.
+ * So, only create it if the consumer hasn't probed yet.
+ */
+ if (flags & DL_FLAG_SYNC_STATE_ONLY &&
+ consumer->links.status != DL_DEV_NO_DRIVER &&
+ consumer->links.status != DL_DEV_PROBING) {
+ link = NULL;
+ goto out;
+ }
+
/*
* DL_FLAG_AUTOREMOVE_SUPPLIER indicates that the link will be needed
* longer than for DL_FLAG_AUTOREMOVE_CONSUMER and setting them both
--
2.29.2.454.gaff20da3a2-goog

2020-11-24 17:29:15

by Saravana Kannan

[permalink] [raw]
Subject: Re: [PATCH v2 00/17] Refactor fw_devlink to significantly improve boot time

On Tue, Nov 24, 2020 at 12:29 AM 'Tomi Valkeinen' via kernel-team
<[email protected]> wrote:
>
> Hi,
>
> On 21/11/2020 04:02, Saravana Kannan wrote:
> > The current implementation of fw_devlink is very inefficient because it
> > tries to get away without creating fwnode links in the name of saving
> > memory usage. Past attempts to optimize runtime at the cost of memory
> > usage were blocked with request for data showing that the optimization
> > made significant improvement for real world scenarios.
> >
> > We have those scenarios now. There have been several reports of boot
> > time increase in the order of seconds in this thread [1]. Several OEMs
> > and SoC manufacturers have also privately reported significant
> > (350-400ms) increase in boot time due to all the parsing done by
> > fw_devlink.
> >
> > So this patch series refactors fw_devlink to be more efficient. The key
> > difference now is the addition of support for fwnode links -- just a few
> > simple APIs. This also allows most of the code to be moved out of
> > firmware specific (DT mostly) code into driver core.
> >
> > This brings the following benefits:
> > - Instead of parsing the device tree multiple times (complexity was
> > close to O(N^3) where N in the number of properties) during bootup,
> > fw_devlink parses each fwnode node/property only once and creates
> > fwnode links. The rest of the fw_devlink code then just looks at these
> > fwnode links to do rest of the work.
> >
> > - Makes it much easier to debug probe issue due to fw_devlink in the
> > future. fw_devlink=on blocks the probing of devices if they depend on
> > a device that hasn't been added yet. With this refactor, it'll be very
> > easy to tell what that device is because we now have a reference to
> > the fwnode of the device.
> >
> > - Much easier to add fw_devlink support to ACPI and other firmware
> > types. A refactor to move the common bits from DT specific code to
> > driver core was in my TODO list as a prerequisite to adding ACPI
> > support to fw_devlink. This series gets that done.
> >
> > Laurent and Grygorii tested the v1 series and they saw boot time
> > improvment of about 12 seconds and 3 seconds, respectively.
>
> Tested v2 on OMAP4 SDP. With my particular config, boot time to starting init went from 18.5 seconds
> to 12.5 seconds.

Thanks for testing Tomi!

-Saravana

2020-11-25 01:57:00

by Tomi Valkeinen

[permalink] [raw]
Subject: Re: [PATCH v2 00/17] Refactor fw_devlink to significantly improve boot time

Hi,

On 21/11/2020 04:02, Saravana Kannan wrote:
> The current implementation of fw_devlink is very inefficient because it
> tries to get away without creating fwnode links in the name of saving
> memory usage. Past attempts to optimize runtime at the cost of memory
> usage were blocked with request for data showing that the optimization
> made significant improvement for real world scenarios.
>
> We have those scenarios now. There have been several reports of boot
> time increase in the order of seconds in this thread [1]. Several OEMs
> and SoC manufacturers have also privately reported significant
> (350-400ms) increase in boot time due to all the parsing done by
> fw_devlink.
>
> So this patch series refactors fw_devlink to be more efficient. The key
> difference now is the addition of support for fwnode links -- just a few
> simple APIs. This also allows most of the code to be moved out of
> firmware specific (DT mostly) code into driver core.
>
> This brings the following benefits:
> - Instead of parsing the device tree multiple times (complexity was
> close to O(N^3) where N in the number of properties) during bootup,
> fw_devlink parses each fwnode node/property only once and creates
> fwnode links. The rest of the fw_devlink code then just looks at these
> fwnode links to do rest of the work.
>
> - Makes it much easier to debug probe issue due to fw_devlink in the
> future. fw_devlink=on blocks the probing of devices if they depend on
> a device that hasn't been added yet. With this refactor, it'll be very
> easy to tell what that device is because we now have a reference to
> the fwnode of the device.
>
> - Much easier to add fw_devlink support to ACPI and other firmware
> types. A refactor to move the common bits from DT specific code to
> driver core was in my TODO list as a prerequisite to adding ACPI
> support to fw_devlink. This series gets that done.
>
> Laurent and Grygorii tested the v1 series and they saw boot time
> improvment of about 12 seconds and 3 seconds, respectively.

Tested v2 on OMAP4 SDP. With my particular config, boot time to starting init went from 18.5 seconds
to 12.5 seconds.

Tomi

--
Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki.
Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki

2020-12-03 19:10:38

by Saravana Kannan

[permalink] [raw]
Subject: Re: [PATCH v2 00/17] Refactor fw_devlink to significantly improve boot time

On Tue, Nov 24, 2020 at 12:29 AM 'Tomi Valkeinen' via kernel-team
<[email protected]> wrote:
>
> Hi,
>
> On 21/11/2020 04:02, Saravana Kannan wrote:
> > The current implementation of fw_devlink is very inefficient because it
> > tries to get away without creating fwnode links in the name of saving
> > memory usage. Past attempts to optimize runtime at the cost of memory
> > usage were blocked with request for data showing that the optimization
> > made significant improvement for real world scenarios.
> >
> > We have those scenarios now. There have been several reports of boot
> > time increase in the order of seconds in this thread [1]. Several OEMs
> > and SoC manufacturers have also privately reported significant
> > (350-400ms) increase in boot time due to all the parsing done by
> > fw_devlink.
> >
> > So this patch series refactors fw_devlink to be more efficient. The key
> > difference now is the addition of support for fwnode links -- just a few
> > simple APIs. This also allows most of the code to be moved out of
> > firmware specific (DT mostly) code into driver core.
> >
> > This brings the following benefits:
> > - Instead of parsing the device tree multiple times (complexity was
> > close to O(N^3) where N in the number of properties) during bootup,
> > fw_devlink parses each fwnode node/property only once and creates
> > fwnode links. The rest of the fw_devlink code then just looks at these
> > fwnode links to do rest of the work.
> >
> > - Makes it much easier to debug probe issue due to fw_devlink in the
> > future. fw_devlink=on blocks the probing of devices if they depend on
> > a device that hasn't been added yet. With this refactor, it'll be very
> > easy to tell what that device is because we now have a reference to
> > the fwnode of the device.
> >
> > - Much easier to add fw_devlink support to ACPI and other firmware
> > types. A refactor to move the common bits from DT specific code to
> > driver core was in my TODO list as a prerequisite to adding ACPI
> > support to fw_devlink. This series gets that done.
> >
> > Laurent and Grygorii tested the v1 series and they saw boot time
> > improvment of about 12 seconds and 3 seconds, respectively.
>
> Tested v2 on OMAP4 SDP. With my particular config, boot time to starting init went from 18.5 seconds
> to 12.5 seconds.
>
> Tomi

Rafael,

Friendly reminder for a review.

-Saravana

2020-12-06 07:34:55

by Leon Romanovsky

[permalink] [raw]
Subject: Re: [PATCH v2 07/17] driver core: Add fwnode_init()

On Fri, Nov 20, 2020 at 06:02:22PM -0800, Saravana Kannan wrote:
> There are multiple locations in the kernel where a struct fwnode_handle
> is initialized. Add fwnode_init() so that we have one way of
> initializing a fwnode_handle.
>
> Signed-off-by: Saravana Kannan <[email protected]>
> ---
> drivers/acpi/property.c | 2 +-
> drivers/acpi/scan.c | 2 +-
> drivers/base/swnode.c | 2 +-
> drivers/firmware/efi/efi-init.c | 8 ++++----
> include/linux/fwnode.h | 6 ++++++
> include/linux/of.h | 2 +-
> kernel/irq/irqdomain.c | 2 +-
> 7 files changed, 15 insertions(+), 9 deletions(-)

In this series, I didn't find any extension of fwnode_init() to be it more
than simple assignment. This change looks to me like unnecessary churn and
obfuscation rather than improvement.

"...ops = &...;" is pretty standard in the kernel to initialize ops
structures.

Thanks

2020-12-07 19:29:31

by Saravana Kannan

[permalink] [raw]
Subject: Re: [PATCH v2 07/17] driver core: Add fwnode_init()

On Sat, Dec 5, 2020 at 11:26 PM Leon Romanovsky <[email protected]> wrote:
>
> On Fri, Nov 20, 2020 at 06:02:22PM -0800, Saravana Kannan wrote:
> > There are multiple locations in the kernel where a struct fwnode_handle
> > is initialized. Add fwnode_init() so that we have one way of
> > initializing a fwnode_handle.
> >
> > Signed-off-by: Saravana Kannan <[email protected]>
> > ---
> > drivers/acpi/property.c | 2 +-
> > drivers/acpi/scan.c | 2 +-
> > drivers/base/swnode.c | 2 +-
> > drivers/firmware/efi/efi-init.c | 8 ++++----
> > include/linux/fwnode.h | 6 ++++++
> > include/linux/of.h | 2 +-
> > kernel/irq/irqdomain.c | 2 +-
> > 7 files changed, 15 insertions(+), 9 deletions(-)
>
> In this series, I didn't find any extension of fwnode_init() to be it more
> than simple assignment. This change looks to me like unnecessary churn and
> obfuscation rather than improvement.
>
> "...ops = &...;" is pretty standard in the kernel to initialize ops
> structures.

Subsequent patches make fwnode_init() do more stuff.

-Saravana

2020-12-07 19:56:45

by Leon Romanovsky

[permalink] [raw]
Subject: Re: [PATCH v2 07/17] driver core: Add fwnode_init()

On Mon, Dec 07, 2020 at 11:25:15AM -0800, Saravana Kannan wrote:
> On Sat, Dec 5, 2020 at 11:26 PM Leon Romanovsky <[email protected]> wrote:
> >
> > On Fri, Nov 20, 2020 at 06:02:22PM -0800, Saravana Kannan wrote:
> > > There are multiple locations in the kernel where a struct fwnode_handle
> > > is initialized. Add fwnode_init() so that we have one way of
> > > initializing a fwnode_handle.
> > >
> > > Signed-off-by: Saravana Kannan <[email protected]>
> > > ---
> > > drivers/acpi/property.c | 2 +-
> > > drivers/acpi/scan.c | 2 +-
> > > drivers/base/swnode.c | 2 +-
> > > drivers/firmware/efi/efi-init.c | 8 ++++----
> > > include/linux/fwnode.h | 6 ++++++
> > > include/linux/of.h | 2 +-
> > > kernel/irq/irqdomain.c | 2 +-
> > > 7 files changed, 15 insertions(+), 9 deletions(-)
> >
> > In this series, I didn't find any extension of fwnode_init() to be it more
> > than simple assignment. This change looks to me like unnecessary churn and
> > obfuscation rather than improvement.
> >
> > "...ops = &...;" is pretty standard in the kernel to initialize ops
> > structures.
>
> Subsequent patches make fwnode_init() do more stuff.

But not in this series, right?

Thanks

>
> -Saravana

2020-12-07 20:41:58

by Saravana Kannan

[permalink] [raw]
Subject: Re: [PATCH v2 07/17] driver core: Add fwnode_init()

On Mon, Dec 7, 2020 at 11:54 AM Leon Romanovsky <[email protected]> wrote:
>
> On Mon, Dec 07, 2020 at 11:25:15AM -0800, Saravana Kannan wrote:
> > On Sat, Dec 5, 2020 at 11:26 PM Leon Romanovsky <[email protected]> wrote:
> > >
> > > On Fri, Nov 20, 2020 at 06:02:22PM -0800, Saravana Kannan wrote:
> > > > There are multiple locations in the kernel where a struct fwnode_handle
> > > > is initialized. Add fwnode_init() so that we have one way of
> > > > initializing a fwnode_handle.
> > > >
> > > > Signed-off-by: Saravana Kannan <[email protected]>
> > > > ---
> > > > drivers/acpi/property.c | 2 +-
> > > > drivers/acpi/scan.c | 2 +-
> > > > drivers/base/swnode.c | 2 +-
> > > > drivers/firmware/efi/efi-init.c | 8 ++++----
> > > > include/linux/fwnode.h | 6 ++++++
> > > > include/linux/of.h | 2 +-
> > > > kernel/irq/irqdomain.c | 2 +-
> > > > 7 files changed, 15 insertions(+), 9 deletions(-)
> > >
> > > In this series, I didn't find any extension of fwnode_init() to be it more
> > > than simple assignment. This change looks to me like unnecessary churn and
> > > obfuscation rather than improvement.
> > >
> > > "...ops = &...;" is pretty standard in the kernel to initialize ops
> > > structures.
> >
> > Subsequent patches make fwnode_init() do more stuff.
>
> But not in this series, right?

In this series. The very next patch - Patch 8/17 :)

-Saravana

2020-12-07 22:22:50

by Rob Herring (Arm)

[permalink] [raw]
Subject: Re: [PATCH v2 07/17] driver core: Add fwnode_init()

On Fri, 20 Nov 2020 18:02:22 -0800, Saravana Kannan wrote:
> There are multiple locations in the kernel where a struct fwnode_handle
> is initialized. Add fwnode_init() so that we have one way of
> initializing a fwnode_handle.
>
> Signed-off-by: Saravana Kannan <[email protected]>
> ---
> drivers/acpi/property.c | 2 +-
> drivers/acpi/scan.c | 2 +-
> drivers/base/swnode.c | 2 +-
> drivers/firmware/efi/efi-init.c | 8 ++++----
> include/linux/fwnode.h | 6 ++++++
> include/linux/of.h | 2 +-
> kernel/irq/irqdomain.c | 2 +-
> 7 files changed, 15 insertions(+), 9 deletions(-)
>

Acked-by: Rob Herring <[email protected]>

2020-12-07 22:40:01

by Rob Herring (Arm)

[permalink] [raw]
Subject: Re: [PATCH v2 14/17] of: property: Update implementation of add_links() to create fwnode links

On Fri, 20 Nov 2020 18:02:29 -0800, Saravana Kannan wrote:
> The semantics of add_links() has changed from creating device link
> between devices to creating fwnode links between fwnodes. So, update the
> implementation of add_links() to match the new semantics.
>
> Signed-off-by: Saravana Kannan <[email protected]>
> ---
> drivers/of/property.c | 150 ++++++++++++------------------------------
> 1 file changed, 41 insertions(+), 109 deletions(-)
>

Acked-by: Rob Herring <[email protected]>

2020-12-08 06:38:56

by Leon Romanovsky

[permalink] [raw]
Subject: Re: [PATCH v2 07/17] driver core: Add fwnode_init()

On Mon, Dec 07, 2020 at 12:36:43PM -0800, Saravana Kannan wrote:
> On Mon, Dec 7, 2020 at 11:54 AM Leon Romanovsky <[email protected]> wrote:
> >
> > On Mon, Dec 07, 2020 at 11:25:15AM -0800, Saravana Kannan wrote:
> > > On Sat, Dec 5, 2020 at 11:26 PM Leon Romanovsky <[email protected]> wrote:
> > > >
> > > > On Fri, Nov 20, 2020 at 06:02:22PM -0800, Saravana Kannan wrote:
> > > > > There are multiple locations in the kernel where a struct fwnode_handle
> > > > > is initialized. Add fwnode_init() so that we have one way of
> > > > > initializing a fwnode_handle.
> > > > >
> > > > > Signed-off-by: Saravana Kannan <[email protected]>
> > > > > ---
> > > > > drivers/acpi/property.c | 2 +-
> > > > > drivers/acpi/scan.c | 2 +-
> > > > > drivers/base/swnode.c | 2 +-
> > > > > drivers/firmware/efi/efi-init.c | 8 ++++----
> > > > > include/linux/fwnode.h | 6 ++++++
> > > > > include/linux/of.h | 2 +-
> > > > > kernel/irq/irqdomain.c | 2 +-
> > > > > 7 files changed, 15 insertions(+), 9 deletions(-)
> > > >
> > > > In this series, I didn't find any extension of fwnode_init() to be it more
> > > > than simple assignment. This change looks to me like unnecessary churn and
> > > > obfuscation rather than improvement.
> > > >
> > > > "...ops = &...;" is pretty standard in the kernel to initialize ops
> > > > structures.
> > >
> > > Subsequent patches make fwnode_init() do more stuff.
> >
> > But not in this series, right?
>
> In this series. The very next patch - Patch 8/17 :)

Thanks, sorry for the noise.

>
> -Saravana

2020-12-09 20:20:02

by Greg KH

[permalink] [raw]
Subject: Re: [PATCH v2 00/17] Refactor fw_devlink to significantly improve boot time

On Fri, Nov 20, 2020 at 06:02:15PM -0800, Saravana Kannan wrote:
> The current implementation of fw_devlink is very inefficient because it
> tries to get away without creating fwnode links in the name of saving
> memory usage. Past attempts to optimize runtime at the cost of memory
> usage were blocked with request for data showing that the optimization
> made significant improvement for real world scenarios.
>
> We have those scenarios now. There have been several reports of boot
> time increase in the order of seconds in this thread [1]. Several OEMs
> and SoC manufacturers have also privately reported significant
> (350-400ms) increase in boot time due to all the parsing done by
> fw_devlink.
>
> So this patch series refactors fw_devlink to be more efficient. The key
> difference now is the addition of support for fwnode links -- just a few
> simple APIs. This also allows most of the code to be moved out of
> firmware specific (DT mostly) code into driver core.
>
> This brings the following benefits:
> - Instead of parsing the device tree multiple times (complexity was
> close to O(N^3) where N in the number of properties) during bootup,
> fw_devlink parses each fwnode node/property only once and creates
> fwnode links. The rest of the fw_devlink code then just looks at these
> fwnode links to do rest of the work.
>
> - Makes it much easier to debug probe issue due to fw_devlink in the
> future. fw_devlink=on blocks the probing of devices if they depend on
> a device that hasn't been added yet. With this refactor, it'll be very
> easy to tell what that device is because we now have a reference to
> the fwnode of the device.
>
> - Much easier to add fw_devlink support to ACPI and other firmware
> types. A refactor to move the common bits from DT specific code to
> driver core was in my TODO list as a prerequisite to adding ACPI
> support to fw_devlink. This series gets that done.
>
> Laurent and Grygorii tested the v1 series and they saw boot time
> improvment of about 12 seconds and 3 seconds, respectively.

Now queued up to my tree. Note, I had to hand-apply patches 13 and 16
due to some reason (for 13, I have no idea, for 16 it was due to a
previous patch applied to my tree that I cc:ed you on.)

Verifying I got it all correct would be great :)

thanks,

greg k-h

2020-12-09 20:49:21

by Saravana Kannan

[permalink] [raw]
Subject: Re: [PATCH v2 00/17] Refactor fw_devlink to significantly improve boot time

On Wed, Dec 9, 2020 at 10:15 AM Greg Kroah-Hartman
<[email protected]> wrote:
>
> On Fri, Nov 20, 2020 at 06:02:15PM -0800, Saravana Kannan wrote:
> > The current implementation of fw_devlink is very inefficient because it
> > tries to get away without creating fwnode links in the name of saving
> > memory usage. Past attempts to optimize runtime at the cost of memory
> > usage were blocked with request for data showing that the optimization
> > made significant improvement for real world scenarios.
> >
> > We have those scenarios now. There have been several reports of boot
> > time increase in the order of seconds in this thread [1]. Several OEMs
> > and SoC manufacturers have also privately reported significant
> > (350-400ms) increase in boot time due to all the parsing done by
> > fw_devlink.
> >
> > So this patch series refactors fw_devlink to be more efficient. The key
> > difference now is the addition of support for fwnode links -- just a few
> > simple APIs. This also allows most of the code to be moved out of
> > firmware specific (DT mostly) code into driver core.
> >
> > This brings the following benefits:
> > - Instead of parsing the device tree multiple times (complexity was
> > close to O(N^3) where N in the number of properties) during bootup,
> > fw_devlink parses each fwnode node/property only once and creates
> > fwnode links. The rest of the fw_devlink code then just looks at these
> > fwnode links to do rest of the work.
> >
> > - Makes it much easier to debug probe issue due to fw_devlink in the
> > future. fw_devlink=on blocks the probing of devices if they depend on
> > a device that hasn't been added yet. With this refactor, it'll be very
> > easy to tell what that device is because we now have a reference to
> > the fwnode of the device.
> >
> > - Much easier to add fw_devlink support to ACPI and other firmware
> > types. A refactor to move the common bits from DT specific code to
> > driver core was in my TODO list as a prerequisite to adding ACPI
> > support to fw_devlink. This series gets that done.
> >
> > Laurent and Grygorii tested the v1 series and they saw boot time
> > improvment of about 12 seconds and 3 seconds, respectively.
>
> Now queued up to my tree. Note, I had to hand-apply patches 13 and 16
> due to some reason (for 13, I have no idea, for 16 it was due to a
> previous patch applied to my tree that I cc:ed you on.)
>
> Verifying I got it all correct would be great :)

A quick diff of drivers/base/core.c between driver-core-testing and my
local tree doesn't show any major diff (only some unrelated comment
fixes). So, it looks fine.

The patch 13 conflict is probably due to having to rebase the v2
series on top of this:
https://lore.kernel.org/lkml/[email protected]/

And looks like Patch 16 was handled fine.

Thanks for applying the series.

-Saravana

2020-12-10 09:27:50

by Greg KH

[permalink] [raw]
Subject: Re: [PATCH v2 00/17] Refactor fw_devlink to significantly improve boot time

On Wed, Dec 09, 2020 at 12:24:32PM -0800, Saravana Kannan wrote:
> On Wed, Dec 9, 2020 at 10:15 AM Greg Kroah-Hartman
> <[email protected]> wrote:
> >
> > On Fri, Nov 20, 2020 at 06:02:15PM -0800, Saravana Kannan wrote:
> > > The current implementation of fw_devlink is very inefficient because it
> > > tries to get away without creating fwnode links in the name of saving
> > > memory usage. Past attempts to optimize runtime at the cost of memory
> > > usage were blocked with request for data showing that the optimization
> > > made significant improvement for real world scenarios.
> > >
> > > We have those scenarios now. There have been several reports of boot
> > > time increase in the order of seconds in this thread [1]. Several OEMs
> > > and SoC manufacturers have also privately reported significant
> > > (350-400ms) increase in boot time due to all the parsing done by
> > > fw_devlink.
> > >
> > > So this patch series refactors fw_devlink to be more efficient. The key
> > > difference now is the addition of support for fwnode links -- just a few
> > > simple APIs. This also allows most of the code to be moved out of
> > > firmware specific (DT mostly) code into driver core.
> > >
> > > This brings the following benefits:
> > > - Instead of parsing the device tree multiple times (complexity was
> > > close to O(N^3) where N in the number of properties) during bootup,
> > > fw_devlink parses each fwnode node/property only once and creates
> > > fwnode links. The rest of the fw_devlink code then just looks at these
> > > fwnode links to do rest of the work.
> > >
> > > - Makes it much easier to debug probe issue due to fw_devlink in the
> > > future. fw_devlink=on blocks the probing of devices if they depend on
> > > a device that hasn't been added yet. With this refactor, it'll be very
> > > easy to tell what that device is because we now have a reference to
> > > the fwnode of the device.
> > >
> > > - Much easier to add fw_devlink support to ACPI and other firmware
> > > types. A refactor to move the common bits from DT specific code to
> > > driver core was in my TODO list as a prerequisite to adding ACPI
> > > support to fw_devlink. This series gets that done.
> > >
> > > Laurent and Grygorii tested the v1 series and they saw boot time
> > > improvment of about 12 seconds and 3 seconds, respectively.
> >
> > Now queued up to my tree. Note, I had to hand-apply patches 13 and 16
> > due to some reason (for 13, I have no idea, for 16 it was due to a
> > previous patch applied to my tree that I cc:ed you on.)
> >
> > Verifying I got it all correct would be great :)
>
> A quick diff of drivers/base/core.c between driver-core-testing and my
> local tree doesn't show any major diff (only some unrelated comment
> fixes). So, it looks fine.
>
> The patch 13 conflict is probably due to having to rebase the v2
> series on top of this:
> https://lore.kernel.org/lkml/[email protected]/
>
> And looks like Patch 16 was handled fine.

Great, thanks for verifying!

greg k-h

2022-06-27 12:36:10

by Abel Vesa

[permalink] [raw]
Subject: Re: [PATCH v2 13/17] driver core: Use device's fwnode to check if it is waiting for suppliers

On 20-11-20 18:02:28, Saravana Kannan wrote:
> To check if a device is still waiting for its supplier devices to be
> added, we used to check if the devices is in a global
> waiting_for_suppliers list. Since the global list will be deleted in
> subsequent patches, this patch stops using this check.
>
> Instead, this patch uses a more device specific check. It checks if the
> device's fwnode has any fwnode links that haven't been converted to
> device links yet.
>
> Signed-off-by: Saravana Kannan <[email protected]>
> ---
> drivers/base/core.c | 18 ++++++++----------
> 1 file changed, 8 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/base/core.c b/drivers/base/core.c
> index 395dece1c83a..1873cecb0cc4 100644
> --- a/drivers/base/core.c
> +++ b/drivers/base/core.c
> @@ -51,6 +51,7 @@ static DEFINE_MUTEX(wfs_lock);
> static LIST_HEAD(deferred_sync);
> static unsigned int defer_sync_state_count = 1;
> static DEFINE_MUTEX(fwnode_link_lock);
> +static bool fw_devlink_is_permissive(void);
>
> /**
> * fwnode_link_add - Create a link between two fwnode_handles.
> @@ -995,13 +996,13 @@ int device_links_check_suppliers(struct device *dev)
> * Device waiting for supplier to become available is not allowed to
> * probe.
> */
> - mutex_lock(&wfs_lock);
> - if (!list_empty(&dev->links.needs_suppliers) &&
> - dev->links.need_for_probe) {
> - mutex_unlock(&wfs_lock);
> + mutex_lock(&fwnode_link_lock);
> + if (dev->fwnode && !list_empty(&dev->fwnode->suppliers) &&
> + !fw_devlink_is_permissive()) {
> + mutex_unlock(&fwnode_link_lock);

Hi Saravana,

First of, sorry for going back to this.

There is a scenario where this check will not work and probably should
work. It goes like this:

A clock controller is not allowed to probe because it uses a clock from a child device of a
consumer, like so:

dispcc: clock-controller@af00000 {
clocks = <&dsi0_phy 0>;
};

mdss: mdss@ae00000 {
clocks = <&dispcc DISP_CC_MDSS_MDP_CLK>;

dsi0_phy: dsi-phy@ae94400 {
clocks = <&dispcc DISP_CC_MDSS_AHB_CLK>,
};
};

This is a real scenario actually, but I stripped it down to the essentials.

So, the dsi0_phy will be "device_add'ed" (through of_platform_populate) by the mdss probe.
The mdss will probe defer waiting for the DISP_CC_MDSS_MDP_CLK, while
the dispcc will probe defer waiting for the dsi0_phy (supplier).

Basically, this 'supplier availability check' does not work when a supplier might
be populated by a consumer of the device that is currently trying to probe.


Abel


> return -EPROBE_DEFER;
> }
> - mutex_unlock(&wfs_lock);
> + mutex_unlock(&fwnode_link_lock);
>
> device_links_write_lock();
>
> @@ -1167,10 +1168,7 @@ static ssize_t waiting_for_supplier_show(struct device *dev,
> bool val;
>
> device_lock(dev);
> - mutex_lock(&wfs_lock);
> - val = !list_empty(&dev->links.needs_suppliers)
> - && dev->links.need_for_probe;
> - mutex_unlock(&wfs_lock);
> + val = !list_empty(&dev->fwnode->suppliers);
> device_unlock(dev);
> return sysfs_emit(buf, "%u\n", val);
> }
> @@ -2202,7 +2200,7 @@ static int device_add_attrs(struct device *dev)
> goto err_remove_dev_groups;
> }
>
> - if (fw_devlink_flags && !fw_devlink_is_permissive()) {
> + if (fw_devlink_flags && !fw_devlink_is_permissive() && dev->fwnode) {
> error = device_create_file(dev, &dev_attr_waiting_for_supplier);
> if (error)
> goto err_remove_dev_online;
> --
> 2.29.2.454.gaff20da3a2-goog
>
>

2022-06-27 22:43:47

by Saravana Kannan

[permalink] [raw]
Subject: Re: [PATCH v2 13/17] driver core: Use device's fwnode to check if it is waiting for suppliers

On Mon, Jun 27, 2022 at 4:42 AM Abel Vesa <[email protected]> wrote:
>
> On 20-11-20 18:02:28, Saravana Kannan wrote:
> > To check if a device is still waiting for its supplier devices to be
> > added, we used to check if the devices is in a global
> > waiting_for_suppliers list. Since the global list will be deleted in
> > subsequent patches, this patch stops using this check.
> >
> > Instead, this patch uses a more device specific check. It checks if the
> > device's fwnode has any fwnode links that haven't been converted to
> > device links yet.
> >
> > Signed-off-by: Saravana Kannan <[email protected]>
> > ---
> > drivers/base/core.c | 18 ++++++++----------
> > 1 file changed, 8 insertions(+), 10 deletions(-)
> >
> > diff --git a/drivers/base/core.c b/drivers/base/core.c
> > index 395dece1c83a..1873cecb0cc4 100644
> > --- a/drivers/base/core.c
> > +++ b/drivers/base/core.c
> > @@ -51,6 +51,7 @@ static DEFINE_MUTEX(wfs_lock);
> > static LIST_HEAD(deferred_sync);
> > static unsigned int defer_sync_state_count = 1;
> > static DEFINE_MUTEX(fwnode_link_lock);
> > +static bool fw_devlink_is_permissive(void);
> >
> > /**
> > * fwnode_link_add - Create a link between two fwnode_handles.
> > @@ -995,13 +996,13 @@ int device_links_check_suppliers(struct device *dev)
> > * Device waiting for supplier to become available is not allowed to
> > * probe.
> > */
> > - mutex_lock(&wfs_lock);
> > - if (!list_empty(&dev->links.needs_suppliers) &&
> > - dev->links.need_for_probe) {
> > - mutex_unlock(&wfs_lock);
> > + mutex_lock(&fwnode_link_lock);
> > + if (dev->fwnode && !list_empty(&dev->fwnode->suppliers) &&
> > + !fw_devlink_is_permissive()) {
> > + mutex_unlock(&fwnode_link_lock);
>
> Hi Saravana,
>
> First of, sorry for going back to this.

No worries at all. If there's an issue with fw_devlink, I want to have it fixed.

> There is a scenario where this check will not work and probably should
> work. It goes like this:
>
> A clock controller is not allowed to probe because it uses a clock from a child device of a
> consumer, like so:
>
> dispcc: clock-controller@af00000 {
> clocks = <&dsi0_phy 0>;
> };
>
> mdss: mdss@ae00000 {
> clocks = <&dispcc DISP_CC_MDSS_MDP_CLK>;
>
> dsi0_phy: dsi-phy@ae94400 {
> clocks = <&dispcc DISP_CC_MDSS_AHB_CLK>,
> };
> };
>
> This is a real scenario actually, but I stripped it down to the essentials.

I'm well aware of this scenario and explicitly wrote code to address this :)

See this comment in fw_devlink_create_devlink()

/*
* If we can't find the supplier device from its fwnode, it might be
* due to a cyclic dependency between fwnodes. Some of these cycles can
* be broken by applying logic. Check for these types of cycles and
* break them so that devices in the cycle probe properly.
*
* If the supplier's parent is dependent on the consumer, then the
* consumer and supplier have a cyclic dependency. Since fw_devlink
* can't tell which of the inferred dependencies are incorrect, don't
* enforce probe ordering between any of the devices in this cyclic
* dependency. Do this by relaxing all the fw_devlink device links in
* this cycle and by treating the fwnode link between the consumer and
* the supplier as an invalid dependency.
*/

Applying this comment to your example, dispcc is the "consumer",
dsi0_phy is the "supplier" and mdss is the "supplier's parent".

And because we can't guarantee the order of addition of these top
level devices is why I also have this piece of recursive call inside
__fw_devlink_link_to_suppliers():

/*
* If a device link was successfully created to a supplier, we
* now need to try and link the supplier to all its suppliers.
*
* This is needed to detect and delete false dependencies in
* fwnode links that haven't been converted to a device link
* yet. See comments in fw_devlink_create_devlink() for more
* details on the false dependency.
*
* Without deleting these false dependencies, some devices will
* never probe because they'll keep waiting for their false
* dependency fwnode links to be converted to device links.
*/
sup_dev = get_dev_from_fwnode(sup);
__fw_devlink_link_to_suppliers(sup_dev, sup_dev->fwnode);
put_device(sup_dev);

So when mdss gets added, we'll link it to dispcc and then check if
dispcc has any suppliers it needs to link to. And that's when the
logic will catch the cycle and fix it.

Can you tell me why this wouldn't unblock the probing of dispcc? Are
you actually hitting this on a device? If so, can you please check why
this logic isn't sufficient to catch and undo the cycle?

Thanks,
Saravana

> So, the dsi0_phy will be "device_add'ed" (through of_platform_populate) by the mdss probe.
> The mdss will probe defer waiting for the DISP_CC_MDSS_MDP_CLK, while
> the dispcc will probe defer waiting for the dsi0_phy (supplier).
>
> Basically, this 'supplier availability check' does not work when a supplier might
> be populated by a consumer of the device that is currently trying to probe.
>
>
> Abel
>
>
> > return -EPROBE_DEFER;
> > }
> > - mutex_unlock(&wfs_lock);
> > + mutex_unlock(&fwnode_link_lock);
> >
> > device_links_write_lock();
> >
> > @@ -1167,10 +1168,7 @@ static ssize_t waiting_for_supplier_show(struct device *dev,
> > bool val;
> >
> > device_lock(dev);
> > - mutex_lock(&wfs_lock);
> > - val = !list_empty(&dev->links.needs_suppliers)
> > - && dev->links.need_for_probe;
> > - mutex_unlock(&wfs_lock);
> > + val = !list_empty(&dev->fwnode->suppliers);
> > device_unlock(dev);
> > return sysfs_emit(buf, "%u\n", val);
> > }
> > @@ -2202,7 +2200,7 @@ static int device_add_attrs(struct device *dev)
> > goto err_remove_dev_groups;
> > }
> >
> > - if (fw_devlink_flags && !fw_devlink_is_permissive()) {
> > + if (fw_devlink_flags && !fw_devlink_is_permissive() && dev->fwnode) {
> > error = device_create_file(dev, &dev_attr_waiting_for_supplier);
> > if (error)
> > goto err_remove_dev_online;
> > --
> > 2.29.2.454.gaff20da3a2-goog
> >
> >

2022-06-28 15:50:49

by Abel Vesa

[permalink] [raw]
Subject: Re: [PATCH v2 13/17] driver core: Use device's fwnode to check if it is waiting for suppliers

On 22-06-27 15:30:25, Saravana Kannan wrote:
> On Mon, Jun 27, 2022 at 4:42 AM Abel Vesa <[email protected]> wrote:
> >
> > On 20-11-20 18:02:28, Saravana Kannan wrote:
> > > To check if a device is still waiting for its supplier devices to be
> > > added, we used to check if the devices is in a global
> > > waiting_for_suppliers list. Since the global list will be deleted in
> > > subsequent patches, this patch stops using this check.
> > >
> > > Instead, this patch uses a more device specific check. It checks if the
> > > device's fwnode has any fwnode links that haven't been converted to
> > > device links yet.
> > >
> > > Signed-off-by: Saravana Kannan <[email protected]>
> > > ---
> > > drivers/base/core.c | 18 ++++++++----------
> > > 1 file changed, 8 insertions(+), 10 deletions(-)
> > >
> > > diff --git a/drivers/base/core.c b/drivers/base/core.c
> > > index 395dece1c83a..1873cecb0cc4 100644
> > > --- a/drivers/base/core.c
> > > +++ b/drivers/base/core.c
> > > @@ -51,6 +51,7 @@ static DEFINE_MUTEX(wfs_lock);
> > > static LIST_HEAD(deferred_sync);
> > > static unsigned int defer_sync_state_count = 1;
> > > static DEFINE_MUTEX(fwnode_link_lock);
> > > +static bool fw_devlink_is_permissive(void);
> > >
> > > /**
> > > * fwnode_link_add - Create a link between two fwnode_handles.
> > > @@ -995,13 +996,13 @@ int device_links_check_suppliers(struct device *dev)
> > > * Device waiting for supplier to become available is not allowed to
> > > * probe.
> > > */
> > > - mutex_lock(&wfs_lock);
> > > - if (!list_empty(&dev->links.needs_suppliers) &&
> > > - dev->links.need_for_probe) {
> > > - mutex_unlock(&wfs_lock);
> > > + mutex_lock(&fwnode_link_lock);
> > > + if (dev->fwnode && !list_empty(&dev->fwnode->suppliers) &&
> > > + !fw_devlink_is_permissive()) {
> > > + mutex_unlock(&fwnode_link_lock);
> >
> > Hi Saravana,
> >
> > First of, sorry for going back to this.
>
> No worries at all. If there's an issue with fw_devlink, I want to have it fixed.
>
> > There is a scenario where this check will not work and probably should
> > work. It goes like this:
> >
> > A clock controller is not allowed to probe because it uses a clock from a child device of a
> > consumer, like so:
> >
> > dispcc: clock-controller@af00000 {
> > clocks = <&dsi0_phy 0>;
> > };
> >
> > mdss: mdss@ae00000 {
> > clocks = <&dispcc DISP_CC_MDSS_MDP_CLK>;
> >
> > dsi0_phy: dsi-phy@ae94400 {
> > clocks = <&dispcc DISP_CC_MDSS_AHB_CLK>,
> > };
> > };
> >
> > This is a real scenario actually, but I stripped it down to the essentials.
>
> I'm well aware of this scenario and explicitly wrote code to address this :)
>

Actually, the problem seems to be when you have two dsi phys.
Like so:

dispcc: clock-controller@af00000 {
clocks = <&dsi0_phy 0>;
clocks = <&dsi1_phy 0>;
};

mdss: mdss@ae00000 {
clocks = <&dispcc DISP_CC_MDSS_MDP_CLK>;

dsi0_phy: dsi-phy@ae94400 {
clocks = <&dispcc DISP_CC_MDSS_AHB_CLK>,
};

dsi1_phy: dsi-phy@ae64400 {
clocks = <&dispcc DISP_CC_MDSS_AHB_CLK>,
};
};

And from what I've seen happening so far is that the device_is_dependent
check for the parent of the supplier (if it also a consumer) seems to return
false on second pass of the same link due to the DL_FLAG_SYNC_STATE_ONLY
being set this time around.

> See this comment in fw_devlink_create_devlink()
>
> /*
> * If we can't find the supplier device from its fwnode, it might be
> * due to a cyclic dependency between fwnodes. Some of these cycles can
> * be broken by applying logic. Check for these types of cycles and
> * break them so that devices in the cycle probe properly.
> *
> * If the supplier's parent is dependent on the consumer, then the
> * consumer and supplier have a cyclic dependency. Since fw_devlink
> * can't tell which of the inferred dependencies are incorrect, don't
> * enforce probe ordering between any of the devices in this cyclic
> * dependency. Do this by relaxing all the fw_devlink device links in
> * this cycle and by treating the fwnode link between the consumer and
> * the supplier as an invalid dependency.
> */
>

So when this thing you mentioned above is happening for the second dsi
phy (order doesn't matter), since the dsi phy itself cannot be found,
the device_is_dependent is run for the same link: dispcc -> mdss
(supplier -> consumer), but again, since it has the
DL_FLAG_SYNC_STATE_ONLY this time around, it will skip that specific
link.

> Applying this comment to your example, dispcc is the "consumer",
> dsi0_phy is the "supplier" and mdss is the "supplier's parent".
>
> And because we can't guarantee the order of addition of these top
> level devices is why I also have this piece of recursive call inside
> __fw_devlink_link_to_suppliers():
>
> /*
> * If a device link was successfully created to a supplier, we
> * now need to try and link the supplier to all its suppliers.
> *
> * This is needed to detect and delete false dependencies in
> * fwnode links that haven't been converted to a device link
> * yet. See comments in fw_devlink_create_devlink() for more
> * details on the false dependency.
> *
> * Without deleting these false dependencies, some devices will
> * never probe because they'll keep waiting for their false
> * dependency fwnode links to be converted to device links.
> */
> sup_dev = get_dev_from_fwnode(sup);
> __fw_devlink_link_to_suppliers(sup_dev, sup_dev->fwnode);
> put_device(sup_dev);
>
> So when mdss gets added, we'll link it to dispcc and then check if
> dispcc has any suppliers it needs to link to. And that's when the
> logic will catch the cycle and fix it.
>
> Can you tell me why this wouldn't unblock the probing of dispcc? Are
> you actually hitting this on a device? If so, can you please check why
> this logic isn't sufficient to catch and undo the cycle?
>

This is happening on Qualcomm SDM845 with Linus's tree.

> Thanks,
> Saravana
>
> > So, the dsi0_phy will be "device_add'ed" (through of_platform_populate) by the mdss probe.
> > The mdss will probe defer waiting for the DISP_CC_MDSS_MDP_CLK, while
> > the dispcc will probe defer waiting for the dsi0_phy (supplier).
> >
> > Basically, this 'supplier availability check' does not work when a supplier might
> > be populated by a consumer of the device that is currently trying to probe.
> >
> >
> > Abel
> >
> >
> > > return -EPROBE_DEFER;
> > > }
> > > - mutex_unlock(&wfs_lock);
> > > + mutex_unlock(&fwnode_link_lock);
> > >
> > > device_links_write_lock();
> > >
> > > @@ -1167,10 +1168,7 @@ static ssize_t waiting_for_supplier_show(struct device *dev,
> > > bool val;
> > >
> > > device_lock(dev);
> > > - mutex_lock(&wfs_lock);
> > > - val = !list_empty(&dev->links.needs_suppliers)
> > > - && dev->links.need_for_probe;
> > > - mutex_unlock(&wfs_lock);
> > > + val = !list_empty(&dev->fwnode->suppliers);
> > > device_unlock(dev);
> > > return sysfs_emit(buf, "%u\n", val);
> > > }
> > > @@ -2202,7 +2200,7 @@ static int device_add_attrs(struct device *dev)
> > > goto err_remove_dev_groups;
> > > }
> > >
> > > - if (fw_devlink_flags && !fw_devlink_is_permissive()) {
> > > + if (fw_devlink_flags && !fw_devlink_is_permissive() && dev->fwnode) {
> > > error = device_create_file(dev, &dev_attr_waiting_for_supplier);
> > > if (error)
> > > goto err_remove_dev_online;
> > > --
> > > 2.29.2.454.gaff20da3a2-goog
> > >
> > >
>

2022-06-28 16:05:18

by Abel Vesa

[permalink] [raw]
Subject: Re: [PATCH v2 13/17] driver core: Use device's fwnode to check if it is waiting for suppliers

On 22-06-28 18:24:29, Abel Vesa wrote:
> On 22-06-27 15:30:25, Saravana Kannan wrote:
> > On Mon, Jun 27, 2022 at 4:42 AM Abel Vesa <[email protected]> wrote:
> > >

Oups, forget this reply since it not to the right message-id.

Will do it properly right now.


> > > On 20-11-20 18:02:28, Saravana Kannan wrote:
> > > > To check if a device is still waiting for its supplier devices to be
> > > > added, we used to check if the devices is in a global
> > > > waiting_for_suppliers list. Since the global list will be deleted in
> > > > subsequent patches, this patch stops using this check.
> > > >
> > > > Instead, this patch uses a more device specific check. It checks if the
> > > > device's fwnode has any fwnode links that haven't been converted to
> > > > device links yet.
> > > >
> > > > Signed-off-by: Saravana Kannan <[email protected]>
> > > > ---
> > > > drivers/base/core.c | 18 ++++++++----------
> > > > 1 file changed, 8 insertions(+), 10 deletions(-)
> > > >
> > > > diff --git a/drivers/base/core.c b/drivers/base/core.c
> > > > index 395dece1c83a..1873cecb0cc4 100644
> > > > --- a/drivers/base/core.c
> > > > +++ b/drivers/base/core.c
> > > > @@ -51,6 +51,7 @@ static DEFINE_MUTEX(wfs_lock);
> > > > static LIST_HEAD(deferred_sync);
> > > > static unsigned int defer_sync_state_count = 1;
> > > > static DEFINE_MUTEX(fwnode_link_lock);
> > > > +static bool fw_devlink_is_permissive(void);
> > > >
> > > > /**
> > > > * fwnode_link_add - Create a link between two fwnode_handles.
> > > > @@ -995,13 +996,13 @@ int device_links_check_suppliers(struct device *dev)
> > > > * Device waiting for supplier to become available is not allowed to
> > > > * probe.
> > > > */
> > > > - mutex_lock(&wfs_lock);
> > > > - if (!list_empty(&dev->links.needs_suppliers) &&
> > > > - dev->links.need_for_probe) {
> > > > - mutex_unlock(&wfs_lock);
> > > > + mutex_lock(&fwnode_link_lock);
> > > > + if (dev->fwnode && !list_empty(&dev->fwnode->suppliers) &&
> > > > + !fw_devlink_is_permissive()) {
> > > > + mutex_unlock(&fwnode_link_lock);
> > >
> > > Hi Saravana,
> > >
> > > First of, sorry for going back to this.
> >
> > No worries at all. If there's an issue with fw_devlink, I want to have it fixed.
> >
> > > There is a scenario where this check will not work and probably should
> > > work. It goes like this:
> > >
> > > A clock controller is not allowed to probe because it uses a clock from a child device of a
> > > consumer, like so:
> > >
> > > dispcc: clock-controller@af00000 {
> > > clocks = <&dsi0_phy 0>;
> > > };
> > >
> > > mdss: mdss@ae00000 {
> > > clocks = <&dispcc DISP_CC_MDSS_MDP_CLK>;
> > >
> > > dsi0_phy: dsi-phy@ae94400 {
> > > clocks = <&dispcc DISP_CC_MDSS_AHB_CLK>,
> > > };
> > > };
> > >
> > > This is a real scenario actually, but I stripped it down to the essentials.
> >
> > I'm well aware of this scenario and explicitly wrote code to address this :)
> >
>
> Actually, the problem seems to be when you have two dsi phys.
> Like so:
>
> dispcc: clock-controller@af00000 {
> clocks = <&dsi0_phy 0>;
> clocks = <&dsi1_phy 0>;
> };
>
> mdss: mdss@ae00000 {
> clocks = <&dispcc DISP_CC_MDSS_MDP_CLK>;
>
> dsi0_phy: dsi-phy@ae94400 {
> clocks = <&dispcc DISP_CC_MDSS_AHB_CLK>,
> };
>
> dsi1_phy: dsi-phy@ae64400 {
> clocks = <&dispcc DISP_CC_MDSS_AHB_CLK>,
> };
> };
>
> And from what I've seen happening so far is that the device_is_dependent
> check for the parent of the supplier (if it also a consumer) seems to return
> false on second pass of the same link due to the DL_FLAG_SYNC_STATE_ONLY
> being set this time around.
>
> > See this comment in fw_devlink_create_devlink()
> >
> > /*
> > * If we can't find the supplier device from its fwnode, it might be
> > * due to a cyclic dependency between fwnodes. Some of these cycles can
> > * be broken by applying logic. Check for these types of cycles and
> > * break them so that devices in the cycle probe properly.
> > *
> > * If the supplier's parent is dependent on the consumer, then the
> > * consumer and supplier have a cyclic dependency. Since fw_devlink
> > * can't tell which of the inferred dependencies are incorrect, don't
> > * enforce probe ordering between any of the devices in this cyclic
> > * dependency. Do this by relaxing all the fw_devlink device links in
> > * this cycle and by treating the fwnode link between the consumer and
> > * the supplier as an invalid dependency.
> > */
> >
>
> So when this thing you mentioned above is happening for the second dsi
> phy (order doesn't matter), since the dsi phy itself cannot be found,
> the device_is_dependent is run for the same link: dispcc -> mdss
> (supplier -> consumer), but again, since it has the
> DL_FLAG_SYNC_STATE_ONLY this time around, it will skip that specific
> link.
>
> > Applying this comment to your example, dispcc is the "consumer",
> > dsi0_phy is the "supplier" and mdss is the "supplier's parent".
> >
> > And because we can't guarantee the order of addition of these top
> > level devices is why I also have this piece of recursive call inside
> > __fw_devlink_link_to_suppliers():
> >
> > /*
> > * If a device link was successfully created to a supplier, we
> > * now need to try and link the supplier to all its suppliers.
> > *
> > * This is needed to detect and delete false dependencies in
> > * fwnode links that haven't been converted to a device link
> > * yet. See comments in fw_devlink_create_devlink() for more
> > * details on the false dependency.
> > *
> > * Without deleting these false dependencies, some devices will
> > * never probe because they'll keep waiting for their false
> > * dependency fwnode links to be converted to device links.
> > */
> > sup_dev = get_dev_from_fwnode(sup);
> > __fw_devlink_link_to_suppliers(sup_dev, sup_dev->fwnode);
> > put_device(sup_dev);
> >
> > So when mdss gets added, we'll link it to dispcc and then check if
> > dispcc has any suppliers it needs to link to. And that's when the
> > logic will catch the cycle and fix it.
> >
> > Can you tell me why this wouldn't unblock the probing of dispcc? Are
> > you actually hitting this on a device? If so, can you please check why
> > this logic isn't sufficient to catch and undo the cycle?
> >
>
> This is happening on Qualcomm SDM845 with Linus's tree.
>
> > Thanks,
> > Saravana
> >
> > > So, the dsi0_phy will be "device_add'ed" (through of_platform_populate) by the mdss probe.
> > > The mdss will probe defer waiting for the DISP_CC_MDSS_MDP_CLK, while
> > > the dispcc will probe defer waiting for the dsi0_phy (supplier).
> > >
> > > Basically, this 'supplier availability check' does not work when a supplier might
> > > be populated by a consumer of the device that is currently trying to probe.
> > >
> > >
> > > Abel
> > >
> > >
> > > > return -EPROBE_DEFER;
> > > > }
> > > > - mutex_unlock(&wfs_lock);
> > > > + mutex_unlock(&fwnode_link_lock);
> > > >
> > > > device_links_write_lock();
> > > >
> > > > @@ -1167,10 +1168,7 @@ static ssize_t waiting_for_supplier_show(struct device *dev,
> > > > bool val;
> > > >
> > > > device_lock(dev);
> > > > - mutex_lock(&wfs_lock);
> > > > - val = !list_empty(&dev->links.needs_suppliers)
> > > > - && dev->links.need_for_probe;
> > > > - mutex_unlock(&wfs_lock);
> > > > + val = !list_empty(&dev->fwnode->suppliers);
> > > > device_unlock(dev);
> > > > return sysfs_emit(buf, "%u\n", val);
> > > > }
> > > > @@ -2202,7 +2200,7 @@ static int device_add_attrs(struct device *dev)
> > > > goto err_remove_dev_groups;
> > > > }
> > > >
> > > > - if (fw_devlink_flags && !fw_devlink_is_permissive()) {
> > > > + if (fw_devlink_flags && !fw_devlink_is_permissive() && dev->fwnode) {
> > > > error = device_create_file(dev, &dev_attr_waiting_for_supplier);
> > > > if (error)
> > > > goto err_remove_dev_online;
> > > > --
> > > > 2.29.2.454.gaff20da3a2-goog
> > > >
> > > >
> >

2022-06-28 16:30:33

by Abel Vesa

[permalink] [raw]
Subject: Re: [PATCH v2 13/17] driver core: Use device's fwnode to check if it is waiting for suppliers

On 22-06-27 15:30:25, Saravana Kannan wrote:
> On Mon, Jun 27, 2022 at 4:42 AM Abel Vesa <[email protected]> wrote:
> >
> > On 20-11-20 18:02:28, Saravana Kannan wrote:
> > > To check if a device is still waiting for its supplier devices to be
> > > added, we used to check if the devices is in a global
> > > waiting_for_suppliers list. Since the global list will be deleted in
> > > subsequent patches, this patch stops using this check.
> > >
> > > Instead, this patch uses a more device specific check. It checks if the
> > > device's fwnode has any fwnode links that haven't been converted to
> > > device links yet.
> > >
> > > Signed-off-by: Saravana Kannan <[email protected]>
> > > ---
> > > drivers/base/core.c | 18 ++++++++----------
> > > 1 file changed, 8 insertions(+), 10 deletions(-)
> > >
> > > diff --git a/drivers/base/core.c b/drivers/base/core.c
> > > index 395dece1c83a..1873cecb0cc4 100644
> > > --- a/drivers/base/core.c
> > > +++ b/drivers/base/core.c
> > > @@ -51,6 +51,7 @@ static DEFINE_MUTEX(wfs_lock);
> > > static LIST_HEAD(deferred_sync);
> > > static unsigned int defer_sync_state_count = 1;
> > > static DEFINE_MUTEX(fwnode_link_lock);
> > > +static bool fw_devlink_is_permissive(void);
> > >
> > > /**
> > > * fwnode_link_add - Create a link between two fwnode_handles.
> > > @@ -995,13 +996,13 @@ int device_links_check_suppliers(struct device *dev)
> > > * Device waiting for supplier to become available is not allowed to
> > > * probe.
> > > */
> > > - mutex_lock(&wfs_lock);
> > > - if (!list_empty(&dev->links.needs_suppliers) &&
> > > - dev->links.need_for_probe) {
> > > - mutex_unlock(&wfs_lock);
> > > + mutex_lock(&fwnode_link_lock);
> > > + if (dev->fwnode && !list_empty(&dev->fwnode->suppliers) &&
> > > + !fw_devlink_is_permissive()) {
> > > + mutex_unlock(&fwnode_link_lock);
> >
> > Hi Saravana,
> >
> > First of, sorry for going back to this.
>
> No worries at all. If there's an issue with fw_devlink, I want to have it fixed.
>
> > There is a scenario where this check will not work and probably should
> > work. It goes like this:
> >
> > A clock controller is not allowed to probe because it uses a clock from a child device of a
> > consumer, like so:
> >
> > dispcc: clock-controller@af00000 {
> > clocks = <&dsi0_phy 0>;
> > };
> >
> > mdss: mdss@ae00000 {
> > clocks = <&dispcc DISP_CC_MDSS_MDP_CLK>;
> >
> > dsi0_phy: dsi-phy@ae94400 {
> > clocks = <&dispcc DISP_CC_MDSS_AHB_CLK>,
> > };
> > };
> >
> > This is a real scenario actually, but I stripped it down to the essentials.
>
> I'm well aware of this scenario and explicitly wrote code to address this :)
>

Actually, the problem seems to be when you have two dsi phys.
Like so:

dispcc: clock-controller@af00000 {
clocks = <&dsi0_phy 0>;
clocks = <&dsi1_phy 0>;
};

mdss: mdss@ae00000 {
clocks = <&dispcc DISP_CC_MDSS_MDP_CLK>;

dsi0_phy: dsi-phy@ae94400 {
clocks = <&dispcc DISP_CC_MDSS_AHB_CLK>,
};

dsi1_phy: dsi-phy@ae64400 {
clocks = <&dispcc DISP_CC_MDSS_AHB_CLK>,
};
};

And from what I've seen happening so far is that the device_is_dependent
check for the parent of the supplier (if it also a consumer) seems to return
false on second pass of the same link due to the DL_FLAG_SYNC_STATE_ONLY
being set this time around.

> See this comment in fw_devlink_create_devlink()
>
> /*
> * If we can't find the supplier device from its fwnode, it might be
> * due to a cyclic dependency between fwnodes. Some of these cycles can
> * be broken by applying logic. Check for these types of cycles and
> * break them so that devices in the cycle probe properly.
> *
> * If the supplier's parent is dependent on the consumer, then the
> * consumer and supplier have a cyclic dependency. Since fw_devlink
> * can't tell which of the inferred dependencies are incorrect, don't
> * enforce probe ordering between any of the devices in this cyclic
> * dependency. Do this by relaxing all the fw_devlink device links in
> * this cycle and by treating the fwnode link between the consumer and
> * the supplier as an invalid dependency.
> */
>

So when this thing you mentioned above is happening for the second dsi
phy (order doesn't matter), since the dsi phy itself cannot be found,
the device_is_dependent is run for the same link: dispcc -> mdss
(supplier -> consumer), but again, since it has the
DL_FLAG_SYNC_STATE_ONLY this time around, it will skip that specific
link.

> Applying this comment to your example, dispcc is the "consumer",
> dsi0_phy is the "supplier" and mdss is the "supplier's parent".
>
> And because we can't guarantee the order of addition of these top
> level devices is why I also have this piece of recursive call inside
> __fw_devlink_link_to_suppliers():
>
> /*
> * If a device link was successfully created to a supplier, we
> * now need to try and link the supplier to all its suppliers.
> *
> * This is needed to detect and delete false dependencies in
> * fwnode links that haven't been converted to a device link
> * yet. See comments in fw_devlink_create_devlink() for more
> * details on the false dependency.
> *
> * Without deleting these false dependencies, some devices will
> * never probe because they'll keep waiting for their false
> * dependency fwnode links to be converted to device links.
> */
> sup_dev = get_dev_from_fwnode(sup);
> __fw_devlink_link_to_suppliers(sup_dev, sup_dev->fwnode);
> put_device(sup_dev);
>
> So when mdss gets added, we'll link it to dispcc and then check if
> dispcc has any suppliers it needs to link to. And that's when the
> logic will catch the cycle and fix it.
>
> Can you tell me why this wouldn't unblock the probing of dispcc? Are
> you actually hitting this on a device? If so, can you please check why
> this logic isn't sufficient to catch and undo the cycle?
>

This is happening on Qualcomm SDM845 with Linus's tree.

> Thanks,
> Saravana
>
> > So, the dsi0_phy will be "device_add'ed" (through of_platform_populate) by the mdss probe.
> > The mdss will probe defer waiting for the DISP_CC_MDSS_MDP_CLK, while
> > the dispcc will probe defer waiting for the dsi0_phy (supplier).
> >
> > Basically, this 'supplier availability check' does not work when a supplier might
> > be populated by a consumer of the device that is currently trying to probe.
> >
> >
> > Abel
> >
> >
> > > return -EPROBE_DEFER;
> > > }
> > > - mutex_unlock(&wfs_lock);
> > > + mutex_unlock(&fwnode_link_lock);
> > >
> > > device_links_write_lock();
> > >
> > > @@ -1167,10 +1168,7 @@ static ssize_t waiting_for_supplier_show(struct device *dev,
> > > bool val;
> > >
> > > device_lock(dev);
> > > - mutex_lock(&wfs_lock);
> > > - val = !list_empty(&dev->links.needs_suppliers)
> > > - && dev->links.need_for_probe;
> > > - mutex_unlock(&wfs_lock);
> > > + val = !list_empty(&dev->fwnode->suppliers);
> > > device_unlock(dev);
> > > return sysfs_emit(buf, "%u\n", val);
> > > }
> > > @@ -2202,7 +2200,7 @@ static int device_add_attrs(struct device *dev)
> > > goto err_remove_dev_groups;
> > > }
> > >
> > > - if (fw_devlink_flags && !fw_devlink_is_permissive()) {
> > > + if (fw_devlink_flags && !fw_devlink_is_permissive() && dev->fwnode) {
> > > error = device_create_file(dev, &dev_attr_waiting_for_supplier);
> > > if (error)
> > > goto err_remove_dev_online;
> > > --
> > > 2.29.2.454.gaff20da3a2-goog
> > >
> > >
>

2022-06-28 18:22:02

by Saravana Kannan

[permalink] [raw]
Subject: Re: [PATCH v2 13/17] driver core: Use device's fwnode to check if it is waiting for suppliers

On Tue, Jun 28, 2022 at 8:55 AM Abel Vesa <[email protected]> wrote:
>
> On 22-06-27 15:30:25, Saravana Kannan wrote:
> > On Mon, Jun 27, 2022 at 4:42 AM Abel Vesa <[email protected]> wrote:
> > >
> > > On 20-11-20 18:02:28, Saravana Kannan wrote:
> > > > To check if a device is still waiting for its supplier devices to be
> > > > added, we used to check if the devices is in a global
> > > > waiting_for_suppliers list. Since the global list will be deleted in
> > > > subsequent patches, this patch stops using this check.
> > > >
> > > > Instead, this patch uses a more device specific check. It checks if the
> > > > device's fwnode has any fwnode links that haven't been converted to
> > > > device links yet.
> > > >
> > > > Signed-off-by: Saravana Kannan <[email protected]>
> > > > ---
> > > > drivers/base/core.c | 18 ++++++++----------
> > > > 1 file changed, 8 insertions(+), 10 deletions(-)
> > > >
> > > > diff --git a/drivers/base/core.c b/drivers/base/core.c
> > > > index 395dece1c83a..1873cecb0cc4 100644
> > > > --- a/drivers/base/core.c
> > > > +++ b/drivers/base/core.c
> > > > @@ -51,6 +51,7 @@ static DEFINE_MUTEX(wfs_lock);
> > > > static LIST_HEAD(deferred_sync);
> > > > static unsigned int defer_sync_state_count = 1;
> > > > static DEFINE_MUTEX(fwnode_link_lock);
> > > > +static bool fw_devlink_is_permissive(void);
> > > >
> > > > /**
> > > > * fwnode_link_add - Create a link between two fwnode_handles.
> > > > @@ -995,13 +996,13 @@ int device_links_check_suppliers(struct device *dev)
> > > > * Device waiting for supplier to become available is not allowed to
> > > > * probe.
> > > > */
> > > > - mutex_lock(&wfs_lock);
> > > > - if (!list_empty(&dev->links.needs_suppliers) &&
> > > > - dev->links.need_for_probe) {
> > > > - mutex_unlock(&wfs_lock);
> > > > + mutex_lock(&fwnode_link_lock);
> > > > + if (dev->fwnode && !list_empty(&dev->fwnode->suppliers) &&
> > > > + !fw_devlink_is_permissive()) {
> > > > + mutex_unlock(&fwnode_link_lock);
> > >
> > > Hi Saravana,
> > >
> > > First of, sorry for going back to this.
> >
> > No worries at all. If there's an issue with fw_devlink, I want to have it fixed.
> >
> > > There is a scenario where this check will not work and probably should
> > > work. It goes like this:
> > >
> > > A clock controller is not allowed to probe because it uses a clock from a child device of a
> > > consumer, like so:
> > >
> > > dispcc: clock-controller@af00000 {
> > > clocks = <&dsi0_phy 0>;
> > > };
> > >
> > > mdss: mdss@ae00000 {
> > > clocks = <&dispcc DISP_CC_MDSS_MDP_CLK>;
> > >
> > > dsi0_phy: dsi-phy@ae94400 {
> > > clocks = <&dispcc DISP_CC_MDSS_AHB_CLK>,
> > > };
> > > };
> > >
> > > This is a real scenario actually, but I stripped it down to the essentials.
> >
> > I'm well aware of this scenario and explicitly wrote code to address this :)
> >
>
> Actually, the problem seems to be when you have two dsi phys.
> Like so:
>
> dispcc: clock-controller@af00000 {
> clocks = <&dsi0_phy 0>;
> clocks = <&dsi1_phy 0>;
> };
>
> mdss: mdss@ae00000 {
> clocks = <&dispcc DISP_CC_MDSS_MDP_CLK>;
>
> dsi0_phy: dsi-phy@ae94400 {
> clocks = <&dispcc DISP_CC_MDSS_AHB_CLK>,
> };
>
> dsi1_phy: dsi-phy@ae64400 {
> clocks = <&dispcc DISP_CC_MDSS_AHB_CLK>,
> };
> };
>
> And from what I've seen happening so far is that the device_is_dependent
> check for the parent of the supplier (if it also a consumer) seems to return
> false on second pass of the same link due to the DL_FLAG_SYNC_STATE_ONLY
> being set this time around.
>
> > See this comment in fw_devlink_create_devlink()
> >
> > /*
> > * If we can't find the supplier device from its fwnode, it might be
> > * due to a cyclic dependency between fwnodes. Some of these cycles can
> > * be broken by applying logic. Check for these types of cycles and
> > * break them so that devices in the cycle probe properly.
> > *
> > * If the supplier's parent is dependent on the consumer, then the
> > * consumer and supplier have a cyclic dependency. Since fw_devlink
> > * can't tell which of the inferred dependencies are incorrect, don't
> > * enforce probe ordering between any of the devices in this cyclic
> > * dependency. Do this by relaxing all the fw_devlink device links in
> > * this cycle and by treating the fwnode link between the consumer and
> > * the supplier as an invalid dependency.
> > */
> >
>
> So when this thing you mentioned above is happening for the second dsi
> phy (order doesn't matter), since the dsi phy itself cannot be found,
> the device_is_dependent is run for the same link: dispcc -> mdss
> (supplier -> consumer), but again, since it has the
> DL_FLAG_SYNC_STATE_ONLY this time around, it will skip that specific
> link.

Ugh... I knew there was this gap, but didn't expect it to be a real world issue.

There are different ways of addressing this and they all fall
somewhere within a spectrum of:
"stop enforcing very specific edges of the dependency graph when you
find a cycles"
To
"just don't enforce any dependency for devices in a cycle and let the
drivers figure out when to -EPROBE_DEFER".

And each of those are of varying complexity. Ideally I'd prefer to
relax specific edges, but I need to balance it out with the code
complexity. Let me soak this for a few weeks to decide on what option
to take.

Thanks for the report.

-Saravana

>
> > Applying this comment to your example, dispcc is the "consumer",
> > dsi0_phy is the "supplier" and mdss is the "supplier's parent".
> >
> > And because we can't guarantee the order of addition of these top
> > level devices is why I also have this piece of recursive call inside
> > __fw_devlink_link_to_suppliers():
> >
> > /*
> > * If a device link was successfully created to a supplier, we
> > * now need to try and link the supplier to all its suppliers.
> > *
> > * This is needed to detect and delete false dependencies in
> > * fwnode links that haven't been converted to a device link
> > * yet. See comments in fw_devlink_create_devlink() for more
> > * details on the false dependency.
> > *
> > * Without deleting these false dependencies, some devices will
> > * never probe because they'll keep waiting for their false
> > * dependency fwnode links to be converted to device links.
> > */
> > sup_dev = get_dev_from_fwnode(sup);
> > __fw_devlink_link_to_suppliers(sup_dev, sup_dev->fwnode);
> > put_device(sup_dev);
> >
> > So when mdss gets added, we'll link it to dispcc and then check if
> > dispcc has any suppliers it needs to link to. And that's when the
> > logic will catch the cycle and fix it.
> >
> > Can you tell me why this wouldn't unblock the probing of dispcc? Are
> > you actually hitting this on a device? If so, can you please check why
> > this logic isn't sufficient to catch and undo the cycle?
> >
>
> This is happening on Qualcomm SDM845 with Linus's tree.
>
> > Thanks,
> > Saravana
> >
> > > So, the dsi0_phy will be "device_add'ed" (through of_platform_populate) by the mdss probe.
> > > The mdss will probe defer waiting for the DISP_CC_MDSS_MDP_CLK, while
> > > the dispcc will probe defer waiting for the dsi0_phy (supplier).
> > >
> > > Basically, this 'supplier availability check' does not work when a supplier might
> > > be populated by a consumer of the device that is currently trying to probe.
> > >
> > >
> > > Abel
> > >
> > >
> > > > return -EPROBE_DEFER;
> > > > }
> > > > - mutex_unlock(&wfs_lock);
> > > > + mutex_unlock(&fwnode_link_lock);
> > > >
> > > > device_links_write_lock();
> > > >
> > > > @@ -1167,10 +1168,7 @@ static ssize_t waiting_for_supplier_show(struct device *dev,
> > > > bool val;
> > > >
> > > > device_lock(dev);
> > > > - mutex_lock(&wfs_lock);
> > > > - val = !list_empty(&dev->links.needs_suppliers)
> > > > - && dev->links.need_for_probe;
> > > > - mutex_unlock(&wfs_lock);
> > > > + val = !list_empty(&dev->fwnode->suppliers);
> > > > device_unlock(dev);
> > > > return sysfs_emit(buf, "%u\n", val);
> > > > }
> > > > @@ -2202,7 +2200,7 @@ static int device_add_attrs(struct device *dev)
> > > > goto err_remove_dev_groups;
> > > > }
> > > >
> > > > - if (fw_devlink_flags && !fw_devlink_is_permissive()) {
> > > > + if (fw_devlink_flags && !fw_devlink_is_permissive() && dev->fwnode) {
> > > > error = device_create_file(dev, &dev_attr_waiting_for_supplier);
> > > > if (error)
> > > > goto err_remove_dev_online;
> > > > --
> > > > 2.29.2.454.gaff20da3a2-goog
> > > >
> > > >
> >

2023-01-05 15:03:43

by Dmitry Baryshkov

[permalink] [raw]
Subject: Re: [PATCH v2 13/17] driver core: Use device's fwnode to check if it is waiting for suppliers

Hi,

On 28/06/2022 21:09, Saravana Kannan wrote:
> On Tue, Jun 28, 2022 at 8:55 AM Abel Vesa <[email protected]> wrote:
>>
>> On 22-06-27 15:30:25, Saravana Kannan wrote:
>>> On Mon, Jun 27, 2022 at 4:42 AM Abel Vesa <[email protected]> wrote:
>>>>
>>>> On 20-11-20 18:02:28, Saravana Kannan wrote:
>>>>> To check if a device is still waiting for its supplier devices to be
>>>>> added, we used to check if the devices is in a global
>>>>> waiting_for_suppliers list. Since the global list will be deleted in
>>>>> subsequent patches, this patch stops using this check.
>>>>>
>>>>> Instead, this patch uses a more device specific check. It checks if the
>>>>> device's fwnode has any fwnode links that haven't been converted to
>>>>> device links yet.
>>>>>
>>>>> Signed-off-by: Saravana Kannan <[email protected]>
>>>>> ---
>>>>> drivers/base/core.c | 18 ++++++++----------
>>>>> 1 file changed, 8 insertions(+), 10 deletions(-)
>>>>>
>>>>> diff --git a/drivers/base/core.c b/drivers/base/core.c
>>>>> index 395dece1c83a..1873cecb0cc4 100644
>>>>> --- a/drivers/base/core.c
>>>>> +++ b/drivers/base/core.c
>>>>> @@ -51,6 +51,7 @@ static DEFINE_MUTEX(wfs_lock);
>>>>> static LIST_HEAD(deferred_sync);
>>>>> static unsigned int defer_sync_state_count = 1;
>>>>> static DEFINE_MUTEX(fwnode_link_lock);
>>>>> +static bool fw_devlink_is_permissive(void);
>>>>>
>>>>> /**
>>>>> * fwnode_link_add - Create a link between two fwnode_handles.
>>>>> @@ -995,13 +996,13 @@ int device_links_check_suppliers(struct device *dev)
>>>>> * Device waiting for supplier to become available is not allowed to
>>>>> * probe.
>>>>> */
>>>>> - mutex_lock(&wfs_lock);
>>>>> - if (!list_empty(&dev->links.needs_suppliers) &&
>>>>> - dev->links.need_for_probe) {
>>>>> - mutex_unlock(&wfs_lock);
>>>>> + mutex_lock(&fwnode_link_lock);
>>>>> + if (dev->fwnode && !list_empty(&dev->fwnode->suppliers) &&
>>>>> + !fw_devlink_is_permissive()) {
>>>>> + mutex_unlock(&fwnode_link_lock);
>>>>
>>>> Hi Saravana,
>>>>
>>>> First of, sorry for going back to this.
>>>
>>> No worries at all. If there's an issue with fw_devlink, I want to have it fixed.
>>>
>>>> There is a scenario where this check will not work and probably should
>>>> work. It goes like this:
>>>>
>>>> A clock controller is not allowed to probe because it uses a clock from a child device of a
>>>> consumer, like so:
>>>>
>>>> dispcc: clock-controller@af00000 {
>>>> clocks = <&dsi0_phy 0>;
>>>> };
>>>>
>>>> mdss: mdss@ae00000 {
>>>> clocks = <&dispcc DISP_CC_MDSS_MDP_CLK>;
>>>>
>>>> dsi0_phy: dsi-phy@ae94400 {
>>>> clocks = <&dispcc DISP_CC_MDSS_AHB_CLK>,
>>>> };
>>>> };
>>>>
>>>> This is a real scenario actually, but I stripped it down to the essentials.
>>>
>>> I'm well aware of this scenario and explicitly wrote code to address this :)
>>>
>>
>> Actually, the problem seems to be when you have two dsi phys.
>> Like so:
>>
>> dispcc: clock-controller@af00000 {
>> clocks = <&dsi0_phy 0>;
>> clocks = <&dsi1_phy 0>;
>> };
>>
>> mdss: mdss@ae00000 {
>> clocks = <&dispcc DISP_CC_MDSS_MDP_CLK>;
>>
>> dsi0_phy: dsi-phy@ae94400 {
>> clocks = <&dispcc DISP_CC_MDSS_AHB_CLK>,
>> };
>>
>> dsi1_phy: dsi-phy@ae64400 {
>> clocks = <&dispcc DISP_CC_MDSS_AHB_CLK>,
>> };
>> };
>>
>> And from what I've seen happening so far is that the device_is_dependent
>> check for the parent of the supplier (if it also a consumer) seems to return
>> false on second pass of the same link due to the DL_FLAG_SYNC_STATE_ONLY
>> being set this time around.
>>
>>> See this comment in fw_devlink_create_devlink()
>>>
>>> /*
>>> * If we can't find the supplier device from its fwnode, it might be
>>> * due to a cyclic dependency between fwnodes. Some of these cycles can
>>> * be broken by applying logic. Check for these types of cycles and
>>> * break them so that devices in the cycle probe properly.
>>> *
>>> * If the supplier's parent is dependent on the consumer, then the
>>> * consumer and supplier have a cyclic dependency. Since fw_devlink
>>> * can't tell which of the inferred dependencies are incorrect, don't
>>> * enforce probe ordering between any of the devices in this cyclic
>>> * dependency. Do this by relaxing all the fw_devlink device links in
>>> * this cycle and by treating the fwnode link between the consumer and
>>> * the supplier as an invalid dependency.
>>> */
>>>
>>
>> So when this thing you mentioned above is happening for the second dsi
>> phy (order doesn't matter), since the dsi phy itself cannot be found,
>> the device_is_dependent is run for the same link: dispcc -> mdss
>> (supplier -> consumer), but again, since it has the
>> DL_FLAG_SYNC_STATE_ONLY this time around, it will skip that specific
>> link.
>
> Ugh... I knew there was this gap, but didn't expect it to be a real world issue.
>
> There are different ways of addressing this and they all fall
> somewhere within a spectrum of:
> "stop enforcing very specific edges of the dependency graph when you
> find a cycles"
> To
> "just don't enforce any dependency for devices in a cycle and let the
> drivers figure out when to -EPROBE_DEFER".
>
> And each of those are of varying complexity. Ideally I'd prefer to
> relax specific edges, but I need to balance it out with the code
> complexity. Let me soak this for a few weeks to decide on what option
> to take.

I wanted to check if there is any progress on this topic? It appears
that few weeks turned into few months already and the issue is still
present. If not, can we please re-consider applying [1] while Saravana
is working on a better fix?

[1]
https://lore.kernel.org/all/[email protected]/

>
> Thanks for the report.
>
> -Saravana
>
>>
>>> Applying this comment to your example, dispcc is the "consumer",
>>> dsi0_phy is the "supplier" and mdss is the "supplier's parent".
>>>
>>> And because we can't guarantee the order of addition of these top
>>> level devices is why I also have this piece of recursive call inside
>>> __fw_devlink_link_to_suppliers():
>>>
>>> /*
>>> * If a device link was successfully created to a supplier, we
>>> * now need to try and link the supplier to all its suppliers.
>>> *
>>> * This is needed to detect and delete false dependencies in
>>> * fwnode links that haven't been converted to a device link
>>> * yet. See comments in fw_devlink_create_devlink() for more
>>> * details on the false dependency.
>>> *
>>> * Without deleting these false dependencies, some devices will
>>> * never probe because they'll keep waiting for their false
>>> * dependency fwnode links to be converted to device links.
>>> */
>>> sup_dev = get_dev_from_fwnode(sup);
>>> __fw_devlink_link_to_suppliers(sup_dev, sup_dev->fwnode);
>>> put_device(sup_dev);
>>>
>>> So when mdss gets added, we'll link it to dispcc and then check if
>>> dispcc has any suppliers it needs to link to. And that's when the
>>> logic will catch the cycle and fix it.
>>>
>>> Can you tell me why this wouldn't unblock the probing of dispcc? Are
>>> you actually hitting this on a device? If so, can you please check why
>>> this logic isn't sufficient to catch and undo the cycle?
>>>
>>
>> This is happening on Qualcomm SDM845 with Linus's tree.
>>
>>> Thanks,
>>> Saravana
>>>
>>>> So, the dsi0_phy will be "device_add'ed" (through of_platform_populate) by the mdss probe.
>>>> The mdss will probe defer waiting for the DISP_CC_MDSS_MDP_CLK, while
>>>> the dispcc will probe defer waiting for the dsi0_phy (supplier).
>>>>
>>>> Basically, this 'supplier availability check' does not work when a supplier might
>>>> be populated by a consumer of the device that is currently trying to probe.
>>>>
>>>>
>>>> Abel
>>>>
>>>>
>>>>> return -EPROBE_DEFER;
>>>>> }
>>>>> - mutex_unlock(&wfs_lock);
>>>>> + mutex_unlock(&fwnode_link_lock);
>>>>>
>>>>> device_links_write_lock();
>>>>>
>>>>> @@ -1167,10 +1168,7 @@ static ssize_t waiting_for_supplier_show(struct device *dev,
>>>>> bool val;
>>>>>
>>>>> device_lock(dev);
>>>>> - mutex_lock(&wfs_lock);
>>>>> - val = !list_empty(&dev->links.needs_suppliers)
>>>>> - && dev->links.need_for_probe;
>>>>> - mutex_unlock(&wfs_lock);
>>>>> + val = !list_empty(&dev->fwnode->suppliers);
>>>>> device_unlock(dev);
>>>>> return sysfs_emit(buf, "%u\n", val);
>>>>> }
>>>>> @@ -2202,7 +2200,7 @@ static int device_add_attrs(struct device *dev)
>>>>> goto err_remove_dev_groups;
>>>>> }
>>>>>
>>>>> - if (fw_devlink_flags && !fw_devlink_is_permissive()) {
>>>>> + if (fw_devlink_flags && !fw_devlink_is_permissive() && dev->fwnode) {
>>>>> error = device_create_file(dev, &dev_attr_waiting_for_supplier);
>>>>> if (error)
>>>>> goto err_remove_dev_online;
>>>>> --
>>>>> 2.29.2.454.gaff20da3a2-goog
>>>>>
>>>>>
>>>

--
With best wishes
Dmitry