2023-02-07 01:42:19

by Saravana Kannan

[permalink] [raw]
Subject: [PATCH v3 00/12] fw_devlink improvements

Naresh, Tony, Abel, Geert, Dmitry, Maxim(s), Miquel, Luca, Doug, Martin,
Jean-Philippe,

Can I get your Tested-by's for this v3 series please?

Vladimir,

Ccing you because DSA's and fw_devlink have known/existing problems
(still in my TODOs to fix). But I want to make sure this series doesn't
cause additional problems for DSA.

All,

This patch series improves fw_devlink in the following ways:

1. It no longer cares about a fwnode having a "compatible" property. It
figures this out more dynamically. The only expectation is that
fwnodes that are converted to devices actually get probed by a driver
for the dependencies to be enforced correctly.

2. Finer grained dependency tracking. fw_devlink will now create device
links from the consumer to the actual resource's device (if it has one,
Eg: gpio_device) instead of the parent supplier device. This improves
things like async suspend/resume ordering, potentially remove the need
for frameworks to create device links, more parallelized async probing,
and better sync_state() tracking.

3. Handle hardware/software quirks where a child firmware node gets
populated as a device before its parent firmware node AND actually
supplies a non-optional resource to the parent firmware node's
device.

4. Way more robust at cycle handling (see patch for the insane cases).

5. Stops depending on OF_POPULATED to figure out some corner cases.

6. Simplifies the work that needs to be done by the firmware specific
code.

The v3 series has gone through my usual testing on my end and looks good
to me.

Thanks,
Saravana

[1] - https://lore.kernel.org/lkml/[email protected]/
[2] - https://lore.kernel.org/lkml/CAGETcx-JUV1nj8wBJrTPfyvM7=Mre5j_vkVmZojeiumUGG6QZQ@mail.gmail.com/

v1 -> v2:
- Fixed Patch 1 to handle a corner case discussed in [2].
- New patch 10 to handle "fsl,imx8mq-gpc" being initialized by 2 drivers.
- New patch 11 to add fw_devlink support for SCMI devices.

v2 -> v3:
- Addressed most of Andy's comments in v2
- Added Colin and Sudeep's Tested-by for the series (except the imx and
renesas patches)
- Added Sudeep's Acked-by for the scmi patch.
- Added Geert's Reviewed-by for the renesas patch.
- Fixed gpiolib crash reported by Naresh.
- Patch 6: Fix __fwnode_links_move_consumers() to preserve fwnode link flags.
- New Patch 12 to fix nvmem-cells issue reported by Maxim(s)/Miquel.
- Deleted some stale function doc in Patch 8

Cc: Abel Vesa <[email protected]>
Cc: Alexander Stein <[email protected]>
Cc: Tony Lindgren <[email protected]>
Cc: Sudeep Holla <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Cc: John Stultz <[email protected]>
Cc: Doug Anderson <[email protected]>
Cc: Guenter Roeck <[email protected]>
Cc: Dmitry Baryshkov <[email protected]>
Cc: Maxim Kiselev <[email protected]>
Cc: Maxim Kochetkov <[email protected]>
Cc: Miquel Raynal <[email protected]>
Cc: Luca Weiss <[email protected]>
Cc: Colin Foster <[email protected]>
Cc: Martin Kepplinger <[email protected]>
Cc: Jean-Philippe Brucker <[email protected]>
Cc: Vladimir Oltean <[email protected]>

Saravana Kannan (12):
driver core: fw_devlink: Don't purge child fwnode's consumer links
driver core: fw_devlink: Improve check for fwnode with no
device/driver
soc: renesas: Move away from using OF_POPULATED for fw_devlink
gpiolib: Clear the gpio_device's fwnode initialized flag before adding
driver core: fw_devlink: Add DL_FLAG_CYCLE support to device links
driver core: fw_devlink: Allow marking a fwnode link as being part of
a cycle
driver core: fw_devlink: Consolidate device link flag computation
driver core: fw_devlink: Make cycle detection more robust
of: property: Simplify of_link_to_phandle()
irqchip/irq-imx-gpcv2: Mark fwnode device as not initialized
firmware: arm_scmi: Set fwnode for the scmi_device
mtd: mtdpart: Don't create platform device that'll never probe

drivers/base/core.c | 449 +++++++++++++++++++++-----------
drivers/firmware/arm_scmi/bus.c | 3 +-
drivers/gpio/gpiolib.c | 7 +
drivers/irqchip/irq-imx-gpcv2.c | 1 +
drivers/mtd/mtdpart.c | 10 +
drivers/of/property.c | 84 +-----
drivers/soc/imx/gpcv2.c | 2 +-
drivers/soc/renesas/rcar-sysc.c | 2 +-
include/linux/device.h | 1 +
include/linux/fwnode.h | 12 +-
10 files changed, 344 insertions(+), 227 deletions(-)

--
2.39.1.519.gcb327c4b5f-goog



2023-02-07 01:42:25

by Saravana Kannan

[permalink] [raw]
Subject: [PATCH v3 01/12] driver core: fw_devlink: Don't purge child fwnode's consumer links

When a device X is bound successfully to a driver, if it has a child
firmware node Y that doesn't have a struct device created by then, we
delete fwnode links where the child firmware node Y is the supplier. We
did this to avoid blocking the consumers of the child firmware node Y
from deferring probe indefinitely.

While that a step in the right direction, it's better to make the
consumers of the child firmware node Y to be consumers of the device X
because device X is probably implementing whatever functionality is
represented by child firmware node Y. By doing this, we capture the
device dependencies more accurately and ensure better
probe/suspend/resume ordering.

Signed-off-by: Saravana Kannan <[email protected]>
Tested-by: Colin Foster <[email protected]>
Tested-by: Sudeep Holla <[email protected]>
---
drivers/base/core.c | 97 ++++++++++++++++++++++++++++++++++++---------
1 file changed, 79 insertions(+), 18 deletions(-)

diff --git a/drivers/base/core.c b/drivers/base/core.c
index a3e14143ec0c..001e1914858d 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -54,11 +54,12 @@ static LIST_HEAD(deferred_sync);
static unsigned int defer_sync_state_count = 1;
static DEFINE_MUTEX(fwnode_link_lock);
static bool fw_devlink_is_permissive(void);
+static void __fw_devlink_link_to_consumers(struct device *dev);
static bool fw_devlink_drv_reg_done;
static bool fw_devlink_best_effort;

/**
- * fwnode_link_add - Create a link between two fwnode_handles.
+ * __fwnode_link_add - Create a link between two fwnode_handles.
* @con: Consumer end of the link.
* @sup: Supplier end of the link.
*
@@ -74,22 +75,18 @@ static bool fw_devlink_best_effort;
* Attempts to create duplicate links between the same pair of fwnode handles
* are ignored and there is no reference counting.
*/
-int fwnode_link_add(struct fwnode_handle *con, struct fwnode_handle *sup)
+static int __fwnode_link_add(struct fwnode_handle *con,
+ struct fwnode_handle *sup)
{
struct fwnode_link *link;
- int ret = 0;
-
- mutex_lock(&fwnode_link_lock);

list_for_each_entry(link, &sup->consumers, s_hook)
if (link->consumer == con)
- goto out;
+ return 0;

link = kzalloc(sizeof(*link), GFP_KERNEL);
- if (!link) {
- ret = -ENOMEM;
- goto out;
- }
+ if (!link)
+ return -ENOMEM;

link->supplier = sup;
INIT_LIST_HEAD(&link->s_hook);
@@ -100,9 +97,17 @@ int fwnode_link_add(struct fwnode_handle *con, struct fwnode_handle *sup)
list_add(&link->c_hook, &con->suppliers);
pr_debug("%pfwP Linked as a fwnode consumer to %pfwP\n",
con, sup);
-out:
- mutex_unlock(&fwnode_link_lock);

+ return 0;
+}
+
+int fwnode_link_add(struct fwnode_handle *con, struct fwnode_handle *sup)
+{
+ int ret;
+
+ mutex_lock(&fwnode_link_lock);
+ ret = __fwnode_link_add(con, sup);
+ mutex_unlock(&fwnode_link_lock);
return ret;
}

@@ -181,6 +186,51 @@ void fw_devlink_purge_absent_suppliers(struct fwnode_handle *fwnode)
}
EXPORT_SYMBOL_GPL(fw_devlink_purge_absent_suppliers);

+/**
+ * __fwnode_links_move_consumers - Move consumer from @from to @to fwnode_handle
+ * @from: move consumers away from this fwnode
+ * @to: move consumers to this fwnode
+ *
+ * Move all consumer links from @from fwnode to @to fwnode.
+ */
+static void __fwnode_links_move_consumers(struct fwnode_handle *from,
+ struct fwnode_handle *to)
+{
+ struct fwnode_link *link, *tmp;
+
+ list_for_each_entry_safe(link, tmp, &from->consumers, s_hook) {
+ __fwnode_link_add(link->consumer, to);
+ __fwnode_link_del(link);
+ }
+}
+
+/**
+ * __fw_devlink_pickup_dangling_consumers - Pick up dangling consumers
+ * @fwnode: fwnode from which to pick up dangling consumers
+ * @new_sup: fwnode of new supplier
+ *
+ * If the @fwnode has a corresponding struct device and the device supports
+ * probing (that is, added to a bus), then we want to let fw_devlink create
+ * MANAGED device links to this device, so leave @fwnode and its descendant's
+ * fwnode links alone.
+ *
+ * Otherwise, move its consumers to the new supplier @new_sup.
+ */
+static void __fw_devlink_pickup_dangling_consumers(struct fwnode_handle *fwnode,
+ struct fwnode_handle *new_sup)
+{
+ struct fwnode_handle *child;
+
+ if (fwnode->dev && fwnode->dev->bus)
+ return;
+
+ fwnode->flags |= FWNODE_FLAG_NOT_DEVICE;
+ __fwnode_links_move_consumers(fwnode, new_sup);
+
+ fwnode_for_each_available_child_node(fwnode, child)
+ __fw_devlink_pickup_dangling_consumers(child, new_sup);
+}
+
#ifdef CONFIG_SRCU
static DEFINE_MUTEX(device_links_lock);
DEFINE_STATIC_SRCU(device_links_srcu);
@@ -1267,16 +1317,23 @@ void device_links_driver_bound(struct device *dev)
* them. So, fw_devlink no longer needs to create device links to any
* of the device's suppliers.
*
- * Also, if a child firmware node of this bound device is not added as
- * a device by now, assume it is never going to be added and make sure
- * other devices don't defer probe indefinitely by waiting for such a
- * child device.
+ * Also, if a child firmware node of this bound device is not added as a
+ * device by now, assume it is never going to be added. Make this bound
+ * device the fallback supplier to the dangling consumers of the child
+ * firmware node because this bound device is probably implementing the
+ * child firmware node functionality and we don't want the dangling
+ * consumers to defer probe indefinitely waiting for a device for the
+ * child firmware node.
*/
if (dev->fwnode && dev->fwnode->dev == dev) {
struct fwnode_handle *child;
fwnode_links_purge_suppliers(dev->fwnode);
+ mutex_lock(&fwnode_link_lock);
fwnode_for_each_available_child_node(dev->fwnode, child)
- fw_devlink_purge_absent_suppliers(child);
+ __fw_devlink_pickup_dangling_consumers(child,
+ dev->fwnode);
+ __fw_devlink_link_to_consumers(dev);
+ mutex_unlock(&fwnode_link_lock);
}
device_remove_file(dev, &dev_attr_waiting_for_supplier);

@@ -1855,7 +1912,11 @@ static int fw_devlink_create_devlink(struct device *con,
fwnode_is_ancestor_of(sup_handle, con->fwnode))
return -EINVAL;

- sup_dev = get_dev_from_fwnode(sup_handle);
+ if (sup_handle->flags & FWNODE_FLAG_NOT_DEVICE)
+ sup_dev = fwnode_get_next_parent_dev(sup_handle);
+ else
+ sup_dev = get_dev_from_fwnode(sup_handle);
+
if (sup_dev) {
/*
* If it's one of those drivers that don't actually bind to
--
2.39.1.519.gcb327c4b5f-goog


2023-02-07 01:42:31

by Saravana Kannan

[permalink] [raw]
Subject: [PATCH v3 02/12] driver core: fw_devlink: Improve check for fwnode with no device/driver

fw_devlink shouldn't defer the probe of a device to wait on a supplier
that'll never have a struct device or will never be probed by a driver.
We currently check if a supplier falls into this category, but don't
check its ancestors. We need to check the ancestors too because if the
ancestor will never probe, then the supplier will never probe either.

Signed-off-by: Saravana Kannan <[email protected]>
Tested-by: Colin Foster <[email protected]>
Tested-by: Sudeep Holla <[email protected]>
---
drivers/base/core.c | 40 ++++++++++++++++++++++++++++++++++++++--
1 file changed, 38 insertions(+), 2 deletions(-)

diff --git a/drivers/base/core.c b/drivers/base/core.c
index 001e1914858d..368bfd96b511 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -1867,6 +1867,35 @@ static int fw_devlink_relax_cycle(struct device *con, void *sup)
return ret;
}

+static bool fwnode_init_without_drv(struct fwnode_handle *fwnode)
+{
+ struct device *dev;
+ bool ret;
+
+ if (!(fwnode->flags & FWNODE_FLAG_INITIALIZED))
+ return false;
+
+ dev = get_dev_from_fwnode(fwnode);
+ ret = !dev || dev->links.status == DL_DEV_NO_DRIVER;
+ put_device(dev);
+
+ return ret;
+}
+
+static bool fwnode_ancestor_init_without_drv(struct fwnode_handle *fwnode)
+{
+ struct fwnode_handle *parent;
+
+ fwnode_for_each_parent_node(fwnode, parent) {
+ if (fwnode_init_without_drv(parent)) {
+ fwnode_handle_put(parent);
+ return true;
+ }
+ }
+
+ return false;
+}
+
/**
* fw_devlink_create_devlink - Create a device link from a consumer to fwnode
* @con: consumer device for the device link
@@ -1948,9 +1977,16 @@ static int fw_devlink_create_devlink(struct device *con,
goto out;
}

- /* Supplier that's already initialized without a struct device. */
- if (sup_handle->flags & FWNODE_FLAG_INITIALIZED)
+ /*
+ * Supplier or supplier's ancestor already initialized without a struct
+ * device or being probed by a driver.
+ */
+ if (fwnode_init_without_drv(sup_handle) ||
+ fwnode_ancestor_init_without_drv(sup_handle)) {
+ dev_dbg(con, "Not linking %pfwP - Might never probe\n",
+ sup_handle);
return -EINVAL;
+ }

/*
* DL_FLAG_SYNC_STATE_ONLY doesn't block probing and supports
--
2.39.1.519.gcb327c4b5f-goog


2023-02-07 01:42:33

by Saravana Kannan

[permalink] [raw]
Subject: [PATCH v3 03/12] soc: renesas: Move away from using OF_POPULATED for fw_devlink

The OF_POPULATED flag was set to let fw_devlink know that the device
tree node will not have a struct device created for it. This information
is used by fw_devlink to avoid deferring the probe of consumers of this
device tree node.

Let's use fwnode_dev_initialized() instead because it achieves the same
effect without using OF specific flags. This allows more generic code to
be written in driver core.

Signed-off-by: Saravana Kannan <[email protected]>
Reviewed-by: Geert Uytterhoeven <[email protected]>
---
drivers/soc/renesas/rcar-sysc.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/soc/renesas/rcar-sysc.c b/drivers/soc/renesas/rcar-sysc.c
index b0a80de34c98..eed47696e825 100644
--- a/drivers/soc/renesas/rcar-sysc.c
+++ b/drivers/soc/renesas/rcar-sysc.c
@@ -437,7 +437,7 @@ static int __init rcar_sysc_pd_init(void)

error = of_genpd_add_provider_onecell(np, &domains->onecell_data);
if (!error)
- of_node_set_flag(np, OF_POPULATED);
+ fwnode_dev_initialized(of_fwnode_handle(np), true);

out_put:
of_node_put(np);
--
2.39.1.519.gcb327c4b5f-goog


2023-02-07 01:42:49

by Saravana Kannan

[permalink] [raw]
Subject: [PATCH v3 04/12] gpiolib: Clear the gpio_device's fwnode initialized flag before adding

Registering an irqdomain sets the flag for the fwnode. But having the
flag set when a device is added is interpreted by fw_devlink to mean the
device has already been initialized and will never probe. This prevents
fw_devlink from creating device links with the gpio_device as a
supplier. So, clear the flag before adding the device.

Signed-off-by: Saravana Kannan <[email protected]>
Acked-by: Bartosz Golaszewski <[email protected]>
Tested-by: Colin Foster <[email protected]>
Tested-by: Sudeep Holla <[email protected]>
---
drivers/gpio/gpiolib.c | 7 +++++++
1 file changed, 7 insertions(+)

diff --git a/drivers/gpio/gpiolib.c b/drivers/gpio/gpiolib.c
index 939c776b9488..bdb9493857eb 100644
--- a/drivers/gpio/gpiolib.c
+++ b/drivers/gpio/gpiolib.c
@@ -578,6 +578,13 @@ static int gpiochip_setup_dev(struct gpio_device *gdev)
{
int ret;

+ /*
+ * If fwnode doesn't belong to another device, it's safe to clear its
+ * initialized flag.
+ */
+ if (gdev->dev.fwnode && !gdev->dev.fwnode->dev)
+ fwnode_dev_initialized(gdev->dev.fwnode, false);
+
ret = gcdev_register(gdev, gpio_devt);
if (ret)
return ret;
--
2.39.1.519.gcb327c4b5f-goog


2023-02-07 01:43:04

by Saravana Kannan

[permalink] [raw]
Subject: [PATCH v3 05/12] driver core: fw_devlink: Add DL_FLAG_CYCLE support to device links

fw_devlink uses DL_FLAG_SYNC_STATE_ONLY device link flag for two
purposes:

1. To allow a parent device to proxy its child device's dependency on a
supplier so that the supplier doesn't get its sync_state() callback
before the child device/consumer can be added and probed. In this
usage scenario, we need to ignore cycles for ensure correctness of
sync_state() callbacks.

2. When there are dependency cycles in firmware, we don't know which of
those dependencies are valid. So, we have to ignore them all wrt
probe ordering while still making sure the sync_state() callbacks
come correctly.

However, when detecting dependency cycles, there can be multiple
dependency cycles between two devices that we need to detect. For
example:

A -> B -> A and A -> C -> B -> A.

To detect multiple cycles correct, we need to be able to differentiate
DL_FLAG_SYNC_STATE_ONLY device links used for (1) vs (2) above.

To allow this differentiation, add a DL_FLAG_CYCLE that can be use to
mark use case (2). We can then use the DL_FLAG_CYCLE to decide which
DL_FLAG_SYNC_STATE_ONLY device links to follow when looking for
dependency cycles.

Fixes: 2de9d8e0d2fe ("driver core: fw_devlink: Improve handling of cyclic dependencies")
Signed-off-by: Saravana Kannan <[email protected]>
Tested-by: Colin Foster <[email protected]>
Tested-by: Sudeep Holla <[email protected]>
---
drivers/base/core.c | 28 ++++++++++++++++++----------
include/linux/device.h | 1 +
2 files changed, 19 insertions(+), 10 deletions(-)

diff --git a/drivers/base/core.c b/drivers/base/core.c
index 368bfd96b511..071c454844d6 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -322,6 +322,12 @@ static bool device_is_ancestor(struct device *dev, struct device *target)
return false;
}

+static inline bool device_link_flag_is_sync_state_only(u32 flags)
+{
+ return (flags & ~(DL_FLAG_INFERRED | DL_FLAG_CYCLE)) ==
+ (DL_FLAG_SYNC_STATE_ONLY | DL_FLAG_MANAGED);
+}
+
/**
* device_is_dependent - Check if one device depends on another one
* @dev: Device to check dependencies for.
@@ -348,8 +354,7 @@ int device_is_dependent(struct device *dev, void *target)
return ret;

list_for_each_entry(link, &dev->links.consumers, s_node) {
- if ((link->flags & ~DL_FLAG_INFERRED) ==
- (DL_FLAG_SYNC_STATE_ONLY | DL_FLAG_MANAGED))
+ if (device_link_flag_is_sync_state_only(link->flags))
continue;

if (link->consumer == target)
@@ -422,8 +427,7 @@ static int device_reorder_to_tail(struct device *dev, void *not_used)

device_for_each_child(dev, NULL, device_reorder_to_tail);
list_for_each_entry(link, &dev->links.consumers, s_node) {
- if ((link->flags & ~DL_FLAG_INFERRED) ==
- (DL_FLAG_SYNC_STATE_ONLY | DL_FLAG_MANAGED))
+ if (device_link_flag_is_sync_state_only(link->flags))
continue;
device_reorder_to_tail(link->consumer, NULL);
}
@@ -684,7 +688,8 @@ postcore_initcall(devlink_class_init);
DL_FLAG_AUTOREMOVE_SUPPLIER | \
DL_FLAG_AUTOPROBE_CONSUMER | \
DL_FLAG_SYNC_STATE_ONLY | \
- DL_FLAG_INFERRED)
+ DL_FLAG_INFERRED | \
+ DL_FLAG_CYCLE)

#define DL_ADD_VALID_FLAGS (DL_MANAGED_LINK_FLAGS | DL_FLAG_STATELESS | \
DL_FLAG_PM_RUNTIME | DL_FLAG_RPM_ACTIVE)
@@ -753,8 +758,6 @@ struct device_link *device_link_add(struct device *consumer,
if (!consumer || !supplier || consumer == supplier ||
flags & ~DL_ADD_VALID_FLAGS ||
(flags & DL_FLAG_STATELESS && flags & DL_MANAGED_LINK_FLAGS) ||
- (flags & DL_FLAG_SYNC_STATE_ONLY &&
- (flags & ~DL_FLAG_INFERRED) != DL_FLAG_SYNC_STATE_ONLY) ||
(flags & DL_FLAG_AUTOPROBE_CONSUMER &&
flags & (DL_FLAG_AUTOREMOVE_CONSUMER |
DL_FLAG_AUTOREMOVE_SUPPLIER)))
@@ -770,6 +773,10 @@ struct device_link *device_link_add(struct device *consumer,
if (!(flags & DL_FLAG_STATELESS))
flags |= DL_FLAG_MANAGED;

+ if (flags & DL_FLAG_SYNC_STATE_ONLY &&
+ !device_link_flag_is_sync_state_only(flags))
+ return NULL;
+
device_links_write_lock();
device_pm_lock();

@@ -1729,7 +1736,7 @@ static void fw_devlink_relax_link(struct device_link *link)
if (!(link->flags & DL_FLAG_INFERRED))
return;

- if (link->flags == (DL_FLAG_MANAGED | FW_DEVLINK_FLAGS_PERMISSIVE))
+ if (device_link_flag_is_sync_state_only(link->flags))
return;

pm_runtime_drop_link(link);
@@ -1853,8 +1860,8 @@ static int fw_devlink_relax_cycle(struct device *con, void *sup)
return ret;

list_for_each_entry(link, &con->links.consumers, s_node) {
- if ((link->flags & ~DL_FLAG_INFERRED) ==
- (DL_FLAG_SYNC_STATE_ONLY | DL_FLAG_MANAGED))
+ if (!(link->flags & DL_FLAG_CYCLE) &&
+ device_link_flag_is_sync_state_only(link->flags))
continue;

if (!fw_devlink_relax_cycle(link->consumer, sup))
@@ -1863,6 +1870,7 @@ static int fw_devlink_relax_cycle(struct device *con, void *sup)
ret = 1;

fw_devlink_relax_link(link);
+ link->flags |= DL_FLAG_CYCLE;
}
return ret;
}
diff --git a/include/linux/device.h b/include/linux/device.h
index 44e3acae7b36..f4d20655d2d7 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -328,6 +328,7 @@ enum device_link_state {
#define DL_FLAG_MANAGED BIT(6)
#define DL_FLAG_SYNC_STATE_ONLY BIT(7)
#define DL_FLAG_INFERRED BIT(8)
+#define DL_FLAG_CYCLE BIT(9)

/**
* enum dl_dev_state - Device driver presence tracking information.
--
2.39.1.519.gcb327c4b5f-goog


2023-02-07 01:43:08

by Saravana Kannan

[permalink] [raw]
Subject: [PATCH v3 06/12] driver core: fw_devlink: Allow marking a fwnode link as being part of a cycle

To improve detection and handling of dependency cycles, we need to be
able to mark fwnode links as being part of cycles. fwnode links marked
as being part of a cycle should not block their consumers from probing.

Fixes: 2de9d8e0d2fe ("driver core: fw_devlink: Improve handling of cyclic dependencies")
Signed-off-by: Saravana Kannan <[email protected]>
Tested-by: Colin Foster <[email protected]>
Tested-by: Sudeep Holla <[email protected]>
---
drivers/base/core.c | 50 +++++++++++++++++++++++++++++++++---------
include/linux/fwnode.h | 11 +++++++++-
2 files changed, 50 insertions(+), 11 deletions(-)

diff --git a/drivers/base/core.c b/drivers/base/core.c
index 071c454844d6..4869b6fdfeaf 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -76,13 +76,15 @@ static bool fw_devlink_best_effort;
* are ignored and there is no reference counting.
*/
static int __fwnode_link_add(struct fwnode_handle *con,
- struct fwnode_handle *sup)
+ struct fwnode_handle *sup, u8 flags)
{
struct fwnode_link *link;

list_for_each_entry(link, &sup->consumers, s_hook)
- if (link->consumer == con)
+ if (link->consumer == con) {
+ link->flags |= flags;
return 0;
+ }

link = kzalloc(sizeof(*link), GFP_KERNEL);
if (!link)
@@ -92,6 +94,7 @@ static int __fwnode_link_add(struct fwnode_handle *con,
INIT_LIST_HEAD(&link->s_hook);
link->consumer = con;
INIT_LIST_HEAD(&link->c_hook);
+ link->flags = flags;

list_add(&link->s_hook, &sup->consumers);
list_add(&link->c_hook, &con->suppliers);
@@ -106,7 +109,7 @@ int fwnode_link_add(struct fwnode_handle *con, struct fwnode_handle *sup)
int ret;

mutex_lock(&fwnode_link_lock);
- ret = __fwnode_link_add(con, sup);
+ ret = __fwnode_link_add(con, sup, 0);
mutex_unlock(&fwnode_link_lock);
return ret;
}
@@ -126,6 +129,19 @@ static void __fwnode_link_del(struct fwnode_link *link)
kfree(link);
}

+/**
+ * __fwnode_link_cycle - Mark a fwnode link as being part of a cycle.
+ * @link: the fwnode_link to be marked
+ *
+ * The fwnode_link_lock needs to be held when this function is called.
+ */
+static void __fwnode_link_cycle(struct fwnode_link *link)
+{
+ pr_debug("%pfwf: Relaxing link with %pfwf\n",
+ link->consumer, link->supplier);
+ link->flags |= FWLINK_FLAG_CYCLE;
+}
+
/**
* fwnode_links_purge_suppliers - Delete all supplier links of fwnode_handle.
* @fwnode: fwnode whose supplier links need to be deleted
@@ -199,7 +215,7 @@ static void __fwnode_links_move_consumers(struct fwnode_handle *from,
struct fwnode_link *link, *tmp;

list_for_each_entry_safe(link, tmp, &from->consumers, s_hook) {
- __fwnode_link_add(link->consumer, to);
+ __fwnode_link_add(link->consumer, to, link->flags);
__fwnode_link_del(link);
}
}
@@ -1041,6 +1057,21 @@ static bool dev_is_best_effort(struct device *dev)
(dev->fwnode && (dev->fwnode->flags & FWNODE_FLAG_BEST_EFFORT));
}

+static struct fwnode_handle *fwnode_links_check_suppliers(
+ struct fwnode_handle *fwnode)
+{
+ struct fwnode_link *link;
+
+ if (!fwnode || fw_devlink_is_permissive())
+ return NULL;
+
+ list_for_each_entry(link, &fwnode->suppliers, c_hook)
+ if (!(link->flags & FWLINK_FLAG_CYCLE))
+ return link->supplier;
+
+ return NULL;
+}
+
/**
* device_links_check_suppliers - Check presence of supplier drivers.
* @dev: Consumer device.
@@ -1068,11 +1099,8 @@ int device_links_check_suppliers(struct device *dev)
* probe.
*/
mutex_lock(&fwnode_link_lock);
- if (dev->fwnode && !list_empty(&dev->fwnode->suppliers) &&
- !fw_devlink_is_permissive()) {
- sup_fw = list_first_entry(&dev->fwnode->suppliers,
- struct fwnode_link,
- c_hook)->supplier;
+ sup_fw = fwnode_links_check_suppliers(dev->fwnode);
+ if (sup_fw) {
if (!dev_is_best_effort(dev)) {
fwnode_ret = -EPROBE_DEFER;
dev_err_probe(dev, -EPROBE_DEFER,
@@ -1261,7 +1289,9 @@ static ssize_t waiting_for_supplier_show(struct device *dev,
bool val;

device_lock(dev);
- val = !list_empty(&dev->fwnode->suppliers);
+ mutex_lock(&fwnode_link_lock);
+ val = !!fwnode_links_check_suppliers(dev->fwnode);
+ mutex_unlock(&fwnode_link_lock);
device_unlock(dev);
return sysfs_emit(buf, "%u\n", val);
}
diff --git a/include/linux/fwnode.h b/include/linux/fwnode.h
index 89b9bdfca925..fdf2ee0285b7 100644
--- a/include/linux/fwnode.h
+++ b/include/linux/fwnode.h
@@ -18,7 +18,7 @@ struct fwnode_operations;
struct device;

/*
- * fwnode link flags
+ * fwnode flags
*
* LINKS_ADDED: The fwnode has already be parsed to add fwnode links.
* NOT_DEVICE: The fwnode will never be populated as a struct device.
@@ -36,6 +36,7 @@ struct device;
#define FWNODE_FLAG_INITIALIZED BIT(2)
#define FWNODE_FLAG_NEEDS_CHILD_BOUND_ON_ADD BIT(3)
#define FWNODE_FLAG_BEST_EFFORT BIT(4)
+#define FWNODE_FLAG_VISITED BIT(5)

struct fwnode_handle {
struct fwnode_handle *secondary;
@@ -46,11 +47,19 @@ struct fwnode_handle {
u8 flags;
};

+/*
+ * fwnode link flags
+ *
+ * CYCLE: The fwnode link is part of a cycle. Don't defer probe.
+ */
+#define FWLINK_FLAG_CYCLE BIT(0)
+
struct fwnode_link {
struct fwnode_handle *supplier;
struct list_head s_hook;
struct fwnode_handle *consumer;
struct list_head c_hook;
+ u8 flags;
};

/**
--
2.39.1.519.gcb327c4b5f-goog


2023-02-07 01:43:22

by Saravana Kannan

[permalink] [raw]
Subject: [PATCH v3 07/12] driver core: fw_devlink: Consolidate device link flag computation

Consolidate the code that computes the flags to be used when creating a
device link from a fwnode link.

Fixes: 2de9d8e0d2fe ("driver core: fw_devlink: Improve handling of cyclic dependencies")
Signed-off-by: Saravana Kannan <[email protected]>
Tested-by: Colin Foster <[email protected]>
Tested-by: Sudeep Holla <[email protected]>
---
drivers/base/core.c | 28 +++++++++++++++-------------
include/linux/fwnode.h | 1 -
2 files changed, 15 insertions(+), 14 deletions(-)

diff --git a/drivers/base/core.c b/drivers/base/core.c
index 4869b6fdfeaf..e4b60436a4a4 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -1727,8 +1727,11 @@ static int __init fw_devlink_strict_setup(char *arg)
}
early_param("fw_devlink.strict", fw_devlink_strict_setup);

-u32 fw_devlink_get_flags(void)
+static inline u32 fw_devlink_get_flags(u8 fwlink_flags)
{
+ if (fwlink_flags & FWLINK_FLAG_CYCLE)
+ return FW_DEVLINK_FLAGS_PERMISSIVE | DL_FLAG_CYCLE;
+
return fw_devlink_flags;
}

@@ -1938,7 +1941,7 @@ static bool fwnode_ancestor_init_without_drv(struct fwnode_handle *fwnode)
* fw_devlink_create_devlink - Create a device link from a consumer to fwnode
* @con: consumer device for the device link
* @sup_handle: fwnode handle of supplier
- * @flags: devlink flags
+ * @link: fwnode link that's being converted to a device link
*
* This function will try to create a device link between the consumer device
* @con and the supplier device represented by @sup_handle.
@@ -1955,10 +1958,17 @@ static bool fwnode_ancestor_init_without_drv(struct fwnode_handle *fwnode)
* possible to do that in the future
*/
static int fw_devlink_create_devlink(struct device *con,
- struct fwnode_handle *sup_handle, u32 flags)
+ struct fwnode_handle *sup_handle,
+ struct fwnode_link *link)
{
struct device *sup_dev;
int ret = 0;
+ u32 flags;
+
+ if (con->fwnode == link->consumer)
+ flags = fw_devlink_get_flags(link->flags);
+ else
+ flags = FW_DEVLINK_FLAGS_PERMISSIVE;

/*
* In some cases, a device P might also be a supplier to its child node
@@ -2091,7 +2101,6 @@ static void __fw_devlink_link_to_consumers(struct device *dev)
struct fwnode_link *link, *tmp;

list_for_each_entry_safe(link, tmp, &fwnode->consumers, s_hook) {
- u32 dl_flags = fw_devlink_get_flags();
struct device *con_dev;
bool own_link = true;
int ret;
@@ -2121,14 +2130,13 @@ static void __fw_devlink_link_to_consumers(struct device *dev)
con_dev = NULL;
} else {
own_link = false;
- dl_flags = FW_DEVLINK_FLAGS_PERMISSIVE;
}
}

if (!con_dev)
continue;

- ret = fw_devlink_create_devlink(con_dev, fwnode, dl_flags);
+ ret = fw_devlink_create_devlink(con_dev, fwnode, link);
put_device(con_dev);
if (!own_link || ret == -EAGAIN)
continue;
@@ -2169,19 +2177,13 @@ static void __fw_devlink_link_to_suppliers(struct device *dev,
bool own_link = (dev->fwnode == fwnode);
struct fwnode_link *link, *tmp;
struct fwnode_handle *child = NULL;
- u32 dl_flags;
-
- if (own_link)
- dl_flags = fw_devlink_get_flags();
- else
- dl_flags = FW_DEVLINK_FLAGS_PERMISSIVE;

list_for_each_entry_safe(link, tmp, &fwnode->suppliers, c_hook) {
int ret;
struct device *sup_dev;
struct fwnode_handle *sup = link->supplier;

- ret = fw_devlink_create_devlink(dev, sup, dl_flags);
+ ret = fw_devlink_create_devlink(dev, sup, link);
if (!own_link || ret == -EAGAIN)
continue;

diff --git a/include/linux/fwnode.h b/include/linux/fwnode.h
index fdf2ee0285b7..5700451b300f 100644
--- a/include/linux/fwnode.h
+++ b/include/linux/fwnode.h
@@ -207,7 +207,6 @@ static inline void fwnode_dev_initialized(struct fwnode_handle *fwnode,
fwnode->flags &= ~FWNODE_FLAG_INITIALIZED;
}

-extern u32 fw_devlink_get_flags(void);
extern bool fw_devlink_is_strict(void);
int fwnode_link_add(struct fwnode_handle *con, struct fwnode_handle *sup);
void fwnode_links_purge(struct fwnode_handle *fwnode);
--
2.39.1.519.gcb327c4b5f-goog


2023-02-07 01:43:33

by Saravana Kannan

[permalink] [raw]
Subject: [PATCH v3 08/12] driver core: fw_devlink: Make cycle detection more robust

fw_devlink could only detect a single and simple cycle because it relied
mainly on device link cycle detection code that only checked for cycles
between devices. The expectation was that the firmware wouldn't have
complicated cycles and multiple cycles between devices. That expectation
has been proven to be wrong.

For example, fw_devlink could handle:

+-+ +-+
|A+------> |B+
+-+ +++
^ |
| |
+----------+

But it couldn't handle even something as "simple" as:

+---------------------+
| |
v |
+-+ +-+ +++
|A+------> |B+------> |C|
+-+ +++ +-+
^ |
| |
+----------+

But firmware has even more complicated cycles like:

+---------------------+
| |
v |
+-+ +---+ +++
+--+A+------>| B +-----> |C|<--+
| +-+ ++--+ +++ |
| ^ | ^ | |
| | | | | |
| +---------+ +---------+ |
| |
+------------------------------+

And this is without including parent child dependencies or nodes in the
cycle that are just firmware nodes that'll never have a struct device
created for them.

The proper way to treat these devices it to not force any probe ordering
between them, while still enforce dependencies between node in the
cycles (A, B and C) and their consumers.

So this patch goes all out and just deals with all types of cycles. It
does this by:

1. Following dependencies across device links, parent-child and fwnode
links.
2. When it find cycles, it mark the device links and fwnode links as
such instead of just deleting them or making the indistinguishable
from proxy SYNC_STATE_ONLY device links.

This way, when new nodes get added, we can immediately find and mark any
new cycles whether the new node is a device or firmware node.

Fixes: 2de9d8e0d2fe ("driver core: fw_devlink: Improve handling of cyclic dependencies")
Signed-off-by: Saravana Kannan <[email protected]>
Tested-by: Colin Foster <[email protected]>
Tested-by: Sudeep Holla <[email protected]>
---
drivers/base/core.c | 248 +++++++++++++++++++++++---------------------
1 file changed, 129 insertions(+), 119 deletions(-)

diff --git a/drivers/base/core.c b/drivers/base/core.c
index e4b60436a4a4..aaf79ea9647c 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -1866,47 +1866,6 @@ static void fw_devlink_unblock_consumers(struct device *dev)
device_links_write_unlock();
}

-/**
- * fw_devlink_relax_cycle - Convert cyclic links to SYNC_STATE_ONLY links
- * @con: Device to check dependencies for.
- * @sup: Device to check against.
- *
- * Check if @sup depends on @con or any device dependent on it (its child or
- * its consumer etc). When such a cyclic dependency is found, convert all
- * device links created solely by fw_devlink into SYNC_STATE_ONLY device links.
- * This is the equivalent of doing fw_devlink=permissive just between the
- * devices in the cycle. We need to do this because, at this point, fw_devlink
- * can't tell which of these dependencies is not a real dependency.
- *
- * Return 1 if a cycle is found. Otherwise, return 0.
- */
-static int fw_devlink_relax_cycle(struct device *con, void *sup)
-{
- struct device_link *link;
- int ret;
-
- if (con == sup)
- return 1;
-
- ret = device_for_each_child(con, sup, fw_devlink_relax_cycle);
- if (ret)
- return ret;
-
- list_for_each_entry(link, &con->links.consumers, s_node) {
- if (!(link->flags & DL_FLAG_CYCLE) &&
- device_link_flag_is_sync_state_only(link->flags))
- continue;
-
- if (!fw_devlink_relax_cycle(link->consumer, sup))
- continue;
-
- ret = 1;
-
- fw_devlink_relax_link(link);
- link->flags |= DL_FLAG_CYCLE;
- }
- return ret;
-}

static bool fwnode_init_without_drv(struct fwnode_handle *fwnode)
{
@@ -1937,6 +1896,111 @@ static bool fwnode_ancestor_init_without_drv(struct fwnode_handle *fwnode)
return false;
}

+/**
+ * __fw_devlink_relax_cycles - Relax and mark dependency cycles.
+ * @con: Potential consumer device.
+ * @sup_handle: Potential supplier's fwnode.
+ *
+ * Needs to be called with fwnode_lock and device link lock held.
+ *
+ * Check if @sup_handle or any of its ancestors or suppliers direct/indirectly
+ * depend on @con. This function can detect multiple cyles between @sup_handle
+ * and @con. When such dependency cycles are found, convert all device links
+ * created solely by fw_devlink into SYNC_STATE_ONLY device links. Also, mark
+ * all fwnode links in the cycle with FWLINK_FLAG_CYCLE so that when they are
+ * converted into a device link in the future, they are created as
+ * SYNC_STATE_ONLY device links. This is the equivalent of doing
+ * fw_devlink=permissive just between the devices in the cycle. We need to do
+ * this because, at this point, fw_devlink can't tell which of these
+ * dependencies is not a real dependency.
+ *
+ * Return true if one or more cycles were found. Otherwise, return false.
+ */
+static bool __fw_devlink_relax_cycles(struct device *con,
+ struct fwnode_handle *sup_handle)
+{
+ struct device *sup_dev = NULL, *par_dev = NULL;
+ struct fwnode_link *link;
+ struct device_link *dev_link;
+ bool ret = false;
+
+ if (!sup_handle)
+ return false;
+
+ /*
+ * We aren't trying to find all cycles. Just a cycle between con and
+ * sup_handle.
+ */
+ if (sup_handle->flags & FWNODE_FLAG_VISITED)
+ return false;
+
+ sup_handle->flags |= FWNODE_FLAG_VISITED;
+
+ sup_dev = get_dev_from_fwnode(sup_handle);
+
+ /* Termination condition. */
+ if (sup_dev == con) {
+ ret = true;
+ goto out;
+ }
+
+ /*
+ * If sup_dev is bound to a driver and @con hasn't started binding to a
+ * driver, sup_dev can't be a consumer of @con. So, no need to check
+ * further.
+ */
+ if (sup_dev && sup_dev->links.status == DL_DEV_DRIVER_BOUND &&
+ con->links.status == DL_DEV_NO_DRIVER) {
+ ret = false;
+ goto out;
+ }
+
+ list_for_each_entry(link, &sup_handle->suppliers, c_hook) {
+ if (__fw_devlink_relax_cycles(con, link->supplier)) {
+ __fwnode_link_cycle(link);
+ ret = true;
+ }
+ }
+
+ /*
+ * Give priority to device parent over fwnode parent to account for any
+ * quirks in how fwnodes are converted to devices.
+ */
+ if (sup_dev)
+ par_dev = get_device(sup_dev->parent);
+ else
+ par_dev = fwnode_get_next_parent_dev(sup_handle);
+
+ if (par_dev && __fw_devlink_relax_cycles(con, par_dev->fwnode))
+ ret = true;
+
+ if (!sup_dev)
+ goto out;
+
+ list_for_each_entry(dev_link, &sup_dev->links.suppliers, c_node) {
+ /*
+ * Ignore a SYNC_STATE_ONLY flag only if it wasn't marked as
+ * such due to a cycle.
+ */
+ if (device_link_flag_is_sync_state_only(dev_link->flags) &&
+ !(dev_link->flags & DL_FLAG_CYCLE))
+ continue;
+
+ if (__fw_devlink_relax_cycles(con,
+ dev_link->supplier->fwnode)) {
+ fw_devlink_relax_link(dev_link);
+ dev_link->flags |= DL_FLAG_CYCLE;
+ ret = true;
+ }
+ }
+
+out:
+ sup_handle->flags &= ~FWNODE_FLAG_VISITED;
+ put_device(sup_dev);
+ put_device(par_dev);
+ return ret;
+}
+
/**
* fw_devlink_create_devlink - Create a device link from a consumer to fwnode
* @con: consumer device for the device link
@@ -1989,6 +2053,21 @@ static int fw_devlink_create_devlink(struct device *con,
fwnode_is_ancestor_of(sup_handle, con->fwnode))
return -EINVAL;

+ /*
+ * SYNC_STATE_ONLY device links don't block probing and supports cycles.
+ * So cycle detection isn't necessary and shouldn't be done.
+ */
+ if (!(flags & DL_FLAG_SYNC_STATE_ONLY)) {
+ device_links_write_lock();
+ if (__fw_devlink_relax_cycles(con, sup_handle)) {
+ __fwnode_link_cycle(link);
+ flags = fw_devlink_get_flags(link->flags);
+ dev_info(con, "Fixed dependency cycle(s) with %pfwf\n",
+ sup_handle);
+ }
+ device_links_write_unlock();
+ }
+
if (sup_handle->flags & FWNODE_FLAG_NOT_DEVICE)
sup_dev = fwnode_get_next_parent_dev(sup_handle);
else
@@ -2002,23 +2081,16 @@ static int fw_devlink_create_devlink(struct device *con,
*/
if (sup_dev->links.status == DL_DEV_NO_DRIVER &&
sup_handle->flags & FWNODE_FLAG_INITIALIZED) {
+ dev_dbg(con,
+ "Not linking %pfwf - dev might never probe\n",
+ sup_handle);
ret = -EINVAL;
goto out;
}

- /*
- * If this fails, it is due to cycles in device links. Just
- * give up on this link and treat it as invalid.
- */
- if (!device_link_add(con, sup_dev, flags) &&
- !(flags & DL_FLAG_SYNC_STATE_ONLY)) {
- dev_info(con, "Fixing up cyclic dependency with %s\n",
- dev_name(sup_dev));
- device_links_write_lock();
- fw_devlink_relax_cycle(con, sup_dev);
- device_links_write_unlock();
- device_link_add(con, sup_dev,
- FW_DEVLINK_FLAGS_PERMISSIVE);
+ if (!device_link_add(con, sup_dev, flags)) {
+ dev_err(con, "Failed to create device link with %s\n",
+ dev_name(sup_dev));
ret = -EINVAL;
}

@@ -2031,49 +2103,12 @@ static int fw_devlink_create_devlink(struct device *con,
*/
if (fwnode_init_without_drv(sup_handle) ||
fwnode_ancestor_init_without_drv(sup_handle)) {
- dev_dbg(con, "Not linking %pfwP - Might never probe\n",
+ dev_dbg(con, "Not linking %pfwf - might never become dev\n",
sup_handle);
return -EINVAL;
}

- /*
- * DL_FLAG_SYNC_STATE_ONLY doesn't block probing and supports
- * cycles. So cycle detection isn't necessary and shouldn't be
- * done.
- */
- if (flags & DL_FLAG_SYNC_STATE_ONLY)
- return -EAGAIN;
-
- /*
- * If we can't find the supplier device from its fwnode, it might be
- * due to a cyclic dependency between fwnodes. Some of these cycles can
- * be broken by applying logic. Check for these types of cycles and
- * break them so that devices in the cycle probe properly.
- *
- * If the supplier's parent is dependent on the consumer, then the
- * consumer and supplier have a cyclic dependency. Since fw_devlink
- * can't tell which of the inferred dependencies are incorrect, don't
- * enforce probe ordering between any of the devices in this cyclic
- * dependency. Do this by relaxing all the fw_devlink device links in
- * this cycle and by treating the fwnode link between the consumer and
- * the supplier as an invalid dependency.
- */
- sup_dev = fwnode_get_next_parent_dev(sup_handle);
- if (sup_dev && device_is_dependent(con, sup_dev)) {
- dev_info(con, "Fixing up cyclic dependency with %pfwP (%s)\n",
- sup_handle, dev_name(sup_dev));
- device_links_write_lock();
- fw_devlink_relax_cycle(con, sup_dev);
- device_links_write_unlock();
- ret = -EINVAL;
- } else {
- /*
- * Can't check for cycles or no cycles. So let's try
- * again later.
- */
- ret = -EAGAIN;
- }
-
+ ret = -EAGAIN;
out:
put_device(sup_dev);
return ret;
@@ -2156,10 +2191,7 @@ static void __fw_devlink_link_to_consumers(struct device *dev)
*
* The function creates normal (non-SYNC_STATE_ONLY) device links between @dev
* and the real suppliers of @dev. Once these device links are created, the
- * fwnode links are deleted. When such device links are successfully created,
- * this function is called recursively on those supplier devices. This is
- * needed to detect and break some invalid cycles in fwnode links. See
- * fw_devlink_create_devlink() for more details.
+ * fwnode links are deleted.
*
* In addition, it also looks at all the suppliers of the entire fwnode tree
* because some of the child devices of @dev that have not been added yet
@@ -2180,7 +2212,6 @@ static void __fw_devlink_link_to_suppliers(struct device *dev,

list_for_each_entry_safe(link, tmp, &fwnode->suppliers, c_hook) {
int ret;
- struct device *sup_dev;
struct fwnode_handle *sup = link->supplier;

ret = fw_devlink_create_devlink(dev, sup, link);
@@ -2188,27 +2219,6 @@ static void __fw_devlink_link_to_suppliers(struct device *dev,
continue;

__fwnode_link_del(link);
-
- /* If no device link was created, nothing more to do. */
- if (ret)
- continue;
-
- /*
- * If a device link was successfully created to a supplier, we
- * now need to try and link the supplier to all its suppliers.
- *
- * This is needed to detect and delete false dependencies in
- * fwnode links that haven't been converted to a device link
- * yet. See comments in fw_devlink_create_devlink() for more
- * details on the false dependency.
- *
- * Without deleting these false dependencies, some devices will
- * never probe because they'll keep waiting for their false
- * dependency fwnode links to be converted to device links.
- */
- sup_dev = get_dev_from_fwnode(sup);
- __fw_devlink_link_to_suppliers(sup_dev, sup_dev->fwnode);
- put_device(sup_dev);
}

/*
--
2.39.1.519.gcb327c4b5f-goog


2023-02-07 01:43:51

by Saravana Kannan

[permalink] [raw]
Subject: [PATCH v3 09/12] of: property: Simplify of_link_to_phandle()

The driver core now:
- Has the parent device of a supplier pick up the consumers if the
supplier never has a device created for it.
- Ignores a supplier if the supplier has no parent device and will never
be probed by a driver

And already prevents creating a device link with the consumer as a
supplier of a parent.

So, we no longer need to find the "compatible" node of the supplier or
do any other checks in of_link_to_phandle(). We simply need to make sure
that the supplier is available in DT.

Signed-off-by: Saravana Kannan <[email protected]>
Tested-by: Colin Foster <[email protected]>
Tested-by: Sudeep Holla <[email protected]>
---
drivers/of/property.c | 84 +++++++------------------------------------
1 file changed, 13 insertions(+), 71 deletions(-)

diff --git a/drivers/of/property.c b/drivers/of/property.c
index 134cfc980b70..c651aad6f34b 100644
--- a/drivers/of/property.c
+++ b/drivers/of/property.c
@@ -1062,20 +1062,6 @@ of_fwnode_device_get_match_data(const struct fwnode_handle *fwnode,
return of_device_get_match_data(dev);
}

-static bool of_is_ancestor_of(struct device_node *test_ancestor,
- struct device_node *child)
-{
- of_node_get(child);
- while (child) {
- if (child == test_ancestor) {
- of_node_put(child);
- return true;
- }
- child = of_get_next_parent(child);
- }
- return false;
-}
-
static struct device_node *of_get_compat_node(struct device_node *np)
{
of_node_get(np);
@@ -1106,71 +1092,27 @@ static struct device_node *of_get_compat_node_parent(struct device_node *np)
return node;
}

-/**
- * of_link_to_phandle - Add fwnode link to supplier from supplier phandle
- * @con_np: consumer device tree node
- * @sup_np: supplier device tree node
- *
- * Given a phandle to a supplier device tree node (@sup_np), this function
- * finds the device that owns the supplier device tree node and creates a
- * device link from @dev consumer device to the supplier device. This function
- * doesn't create device links for invalid scenarios such as trying to create a
- * link with a parent device as the consumer of its child device. In such
- * cases, it returns an error.
- *
- * Returns:
- * - 0 if fwnode link successfully created to supplier
- * - -EINVAL if the supplier link is invalid and should not be created
- * - -ENODEV if struct device will never be create for supplier
- */
-static int of_link_to_phandle(struct device_node *con_np,
+static void of_link_to_phandle(struct device_node *con_np,
struct device_node *sup_np)
{
- struct device *sup_dev;
- struct device_node *tmp_np = sup_np;
+ struct device_node *tmp_np = of_node_get(sup_np);

- /*
- * Find the device node that contains the supplier phandle. It may be
- * @sup_np or it may be an ancestor of @sup_np.
- */
- sup_np = of_get_compat_node(sup_np);
- if (!sup_np) {
- pr_debug("Not linking %pOFP to %pOFP - No device\n",
- con_np, tmp_np);
- return -ENODEV;
- }
+ /* Check that sup_np and its ancestors are available. */
+ while (tmp_np) {
+ if (of_fwnode_handle(tmp_np)->dev) {
+ of_node_put(tmp_np);
+ break;
+ }

- /*
- * Don't allow linking a device node as a consumer of one of its
- * descendant nodes. By definition, a child node can't be a functional
- * dependency for the parent node.
- */
- if (of_is_ancestor_of(con_np, sup_np)) {
- pr_debug("Not linking %pOFP to %pOFP - is descendant\n",
- con_np, sup_np);
- of_node_put(sup_np);
- return -EINVAL;
- }
+ if (!of_device_is_available(tmp_np)) {
+ of_node_put(tmp_np);
+ return;
+ }

- /*
- * Don't create links to "early devices" that won't have struct devices
- * created for them.
- */
- sup_dev = get_dev_from_fwnode(&sup_np->fwnode);
- if (!sup_dev &&
- (of_node_check_flag(sup_np, OF_POPULATED) ||
- sup_np->fwnode.flags & FWNODE_FLAG_NOT_DEVICE)) {
- pr_debug("Not linking %pOFP to %pOFP - No struct device\n",
- con_np, sup_np);
- of_node_put(sup_np);
- return -ENODEV;
+ tmp_np = of_get_next_parent(tmp_np);
}
- put_device(sup_dev);

fwnode_link_add(of_fwnode_handle(con_np), of_fwnode_handle(sup_np));
- of_node_put(sup_np);
-
- return 0;
}

/**
--
2.39.1.519.gcb327c4b5f-goog


2023-02-07 01:44:05

by Saravana Kannan

[permalink] [raw]
Subject: [PATCH v3 10/12] irqchip/irq-imx-gpcv2: Mark fwnode device as not initialized

Since this device is only partially initialized by the irqchip driver,
we need to mark the fwnode device as not initialized. This is to let
fw_devlink know that the device will be completely initialized at a
later point. That way, fw_devlink will continue to defer the probe of
the power domain consumers till the power domain driver successfully
binds to the struct device and completes the initialization of the
device.

Signed-off-by: Saravana Kannan <[email protected]>
---
drivers/irqchip/irq-imx-gpcv2.c | 1 +
drivers/soc/imx/gpcv2.c | 2 +-
2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/irqchip/irq-imx-gpcv2.c b/drivers/irqchip/irq-imx-gpcv2.c
index b9c22f764b4d..8a0e82067924 100644
--- a/drivers/irqchip/irq-imx-gpcv2.c
+++ b/drivers/irqchip/irq-imx-gpcv2.c
@@ -283,6 +283,7 @@ static int __init imx_gpcv2_irqchip_init(struct device_node *node,
* later the GPC power domain driver will not be skipped.
*/
of_node_clear_flag(node, OF_POPULATED);
+ fwnode_dev_initialized(domain->fwnode, false);
return 0;
}

diff --git a/drivers/soc/imx/gpcv2.c b/drivers/soc/imx/gpcv2.c
index 7a47d14fde44..4b3300b090a8 100644
--- a/drivers/soc/imx/gpcv2.c
+++ b/drivers/soc/imx/gpcv2.c
@@ -1518,7 +1518,7 @@ static int imx_gpcv2_probe(struct platform_device *pdev)
domain->genpd.power_off = imx_pgc_power_down;

pd_pdev->dev.parent = dev;
- pd_pdev->dev.of_node = np;
+ device_set_node(&pd_pdev->dev, of_fwnode_handle(np));

ret = platform_device_add(pd_pdev);
if (ret) {
--
2.39.1.519.gcb327c4b5f-goog


2023-02-07 01:44:09

by Saravana Kannan

[permalink] [raw]
Subject: [PATCH v3 11/12] firmware: arm_scmi: Set fwnode for the scmi_device

This allows fw_devlink to track and enforce supplier-consumer
dependencies for scmi_device.

Signed-off-by: Saravana Kannan <[email protected]>
Acked-by: Sudeep Holla <[email protected]>
Tested-by: Colin Foster <[email protected]>
Tested-by: Sudeep Holla <[email protected]>
---
drivers/firmware/arm_scmi/bus.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/firmware/arm_scmi/bus.c b/drivers/firmware/arm_scmi/bus.c
index 35bb70724d44..cc2eba067575 100644
--- a/drivers/firmware/arm_scmi/bus.c
+++ b/drivers/firmware/arm_scmi/bus.c
@@ -12,6 +12,7 @@
#include <linux/kernel.h>
#include <linux/slab.h>
#include <linux/device.h>
+#include <linux/of.h>

#include "common.h"

@@ -191,7 +192,7 @@ scmi_device_create(struct device_node *np, struct device *parent, int protocol,
scmi_dev->id = id;
scmi_dev->protocol_id = protocol;
scmi_dev->dev.parent = parent;
- scmi_dev->dev.of_node = np;
+ device_set_node(&scmi_dev->dev, of_fwnode_handle(np));
scmi_dev->dev.bus = &scmi_bus_type;
scmi_dev->dev.release = scmi_device_release;
dev_set_name(&scmi_dev->dev, "scmi_dev.%d", id);
--
2.39.1.519.gcb327c4b5f-goog


2023-02-07 01:44:11

by Saravana Kannan

[permalink] [raw]
Subject: [PATCH v3 12/12] mtd: mtdpart: Don't create platform device that'll never probe

These "nvmem-cells" platform devices never get probed because there's no
platform driver for it and it's never used anywhere else. So it's a
waste of memory. These devices also cause fw_devlink to block nvmem
consumers of "nvmem-cells" partition from probing because the supplier
device never probes.

So stop creating platform devices for nvmem-cells partitions to avoid
wasting memory and to avoid blocking probing of consumers.

Reported-by: Maxim Kiselev <[email protected]>
Fixes: bcdf0315a61a ("mtd: call of_platform_populate() for MTD partitions")
Signed-off-by: Saravana Kannan <[email protected]>
---
drivers/mtd/mtdpart.c | 10 ++++++++++
1 file changed, 10 insertions(+)

diff --git a/drivers/mtd/mtdpart.c b/drivers/mtd/mtdpart.c
index d442fa94c872..85f5ee6f06fc 100644
--- a/drivers/mtd/mtdpart.c
+++ b/drivers/mtd/mtdpart.c
@@ -577,6 +577,7 @@ static int mtd_part_of_parse(struct mtd_info *master,
{
struct mtd_part_parser *parser;
struct device_node *np;
+ struct device_node *child;
struct property *prop;
struct device *dev;
const char *compat;
@@ -594,6 +595,15 @@ static int mtd_part_of_parse(struct mtd_info *master,
else
np = of_get_child_by_name(np, "partitions");

+ /*
+ * Don't create devices that are added to a bus but will never get
+ * probed. That'll cause fw_devlink to block probing of consumers of
+ * this partition until the partition device is probed.
+ */
+ for_each_child_of_node(np, child)
+ if (of_device_is_compatible(child, "nvmem-cells"))
+ of_node_set_flag(child, OF_POPULATED);
+
of_property_for_each_string(np, "compatible", prop, compat) {
parser = mtd_part_get_compatible_parser(compat);
if (!parser)
--
2.39.1.519.gcb327c4b5f-goog


2023-02-07 07:52:18

by Maksim Kiselev

[permalink] [raw]
Subject: Re: [PATCH v3 12/12] mtd: mtdpart: Don't create platform device that'll never probe

вт, 7 февр. 2023 г. в 04:42, Saravana Kannan <[email protected]>:
>
> These "nvmem-cells" platform devices never get probed because there's no
> platform driver for it and it's never used anywhere else. So it's a
> waste of memory. These devices also cause fw_devlink to block nvmem
> consumers of "nvmem-cells" partition from probing because the supplier
> device never probes.
>
> So stop creating platform devices for nvmem-cells partitions to avoid
> wasting memory and to avoid blocking probing of consumers.
>
> Reported-by: Maxim Kiselev <[email protected]>
> Fixes: bcdf0315a61a ("mtd: call of_platform_populate() for MTD partitions")
> Signed-off-by: Saravana Kannan <[email protected]>
> ---
> drivers/mtd/mtdpart.c | 10 ++++++++++
> 1 file changed, 10 insertions(+)
>
> diff --git a/drivers/mtd/mtdpart.c b/drivers/mtd/mtdpart.c
> index d442fa94c872..85f5ee6f06fc 100644
> --- a/drivers/mtd/mtdpart.c
> +++ b/drivers/mtd/mtdpart.c
> @@ -577,6 +577,7 @@ static int mtd_part_of_parse(struct mtd_info *master,
> {
> struct mtd_part_parser *parser;
> struct device_node *np;
> + struct device_node *child;
> struct property *prop;
> struct device *dev;
> const char *compat;
> @@ -594,6 +595,15 @@ static int mtd_part_of_parse(struct mtd_info *master,
> else
> np = of_get_child_by_name(np, "partitions");
>
> + /*
> + * Don't create devices that are added to a bus but will never get
> + * probed. That'll cause fw_devlink to block probing of consumers of
> + * this partition until the partition device is probed.
> + */
> + for_each_child_of_node(np, child)
> + if (of_device_is_compatible(child, "nvmem-cells"))
> + of_node_set_flag(child, OF_POPULATED);
> +
> of_property_for_each_string(np, "compatible", prop, compat) {
> parser = mtd_part_get_compatible_parser(compat);
> if (!parser)
> --
> 2.39.1.519.gcb327c4b5f-goog
>

Hi, Saravana!

Now it works pretty well. Thank you so much for your efforts.

> Reported-by: Maxim Kiselev <[email protected]>
> Fixes: bcdf0315a61a ("mtd: call of_platform_populate() for MTD partitions")
> Signed-off-by: Saravana Kannan <[email protected]>

Tested-by: Maksim Kiselev <[email protected]>

2023-02-07 07:56:41

by Geert Uytterhoeven

[permalink] [raw]
Subject: Re: [PATCH v3 03/12] soc: renesas: Move away from using OF_POPULATED for fw_devlink

On Tue, Feb 7, 2023 at 2:42 AM Saravana Kannan <[email protected]> wrote:
> The OF_POPULATED flag was set to let fw_devlink know that the device
> tree node will not have a struct device created for it. This information
> is used by fw_devlink to avoid deferring the probe of consumers of this
> device tree node.
>
> Let's use fwnode_dev_initialized() instead because it achieves the same
> effect without using OF specific flags. This allows more generic code to
> be written in driver core.
>
> Signed-off-by: Saravana Kannan <[email protected]>
> Reviewed-by: Geert Uytterhoeven <[email protected]>

You've missed my earlier:
Acked-by: Geert Uytterhoeven <[email protected]>

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

2023-02-07 09:24:14

by Luca Weiss

[permalink] [raw]
Subject: Re: [PATCH v3 00/12] fw_devlink improvements

On Tue Feb 7, 2023 at 2:41 AM CET, Saravana Kannan wrote:
> Naresh, Tony, Abel, Geert, Dmitry, Maxim(s), Miquel, Luca, Doug, Martin,
> Jean-Philippe,
>
> Can I get your Tested-by's for this v3 series please?

Hi Saravana,

Seems to be alright on the same platform where it broke previously.

Tested-by: Luca Weiss <[email protected]> # qcom/sm7225-fairphone-fp4

Regards
Luca

>
> Vladimir,
>
> Ccing you because DSA's and fw_devlink have known/existing problems
> (still in my TODOs to fix). But I want to make sure this series doesn't
> cause additional problems for DSA.
>
> All,
>
> This patch series improves fw_devlink in the following ways:
>
> 1. It no longer cares about a fwnode having a "compatible" property. It
> figures this out more dynamically. The only expectation is that
> fwnodes that are converted to devices actually get probed by a driver
> for the dependencies to be enforced correctly.
>
> 2. Finer grained dependency tracking. fw_devlink will now create device
> links from the consumer to the actual resource's device (if it has one,
> Eg: gpio_device) instead of the parent supplier device. This improves
> things like async suspend/resume ordering, potentially remove the need
> for frameworks to create device links, more parallelized async probing,
> and better sync_state() tracking.
>
> 3. Handle hardware/software quirks where a child firmware node gets
> populated as a device before its parent firmware node AND actually
> supplies a non-optional resource to the parent firmware node's
> device.
>
> 4. Way more robust at cycle handling (see patch for the insane cases).
>
> 5. Stops depending on OF_POPULATED to figure out some corner cases.
>
> 6. Simplifies the work that needs to be done by the firmware specific
> code.
>
> The v3 series has gone through my usual testing on my end and looks good
> to me.
>
> Thanks,
> Saravana
>
> [1] - https://lore.kernel.org/lkml/[email protected]/
> [2] - https://lore.kernel.org/lkml/CAGETcx-JUV1nj8wBJrTPfyvM7=Mre5j_vkVmZojeiumUGG6QZQ@mail.gmail.com/
>
> v1 -> v2:
> - Fixed Patch 1 to handle a corner case discussed in [2].
> - New patch 10 to handle "fsl,imx8mq-gpc" being initialized by 2 drivers.
> - New patch 11 to add fw_devlink support for SCMI devices.
>
> v2 -> v3:
> - Addressed most of Andy's comments in v2
> - Added Colin and Sudeep's Tested-by for the series (except the imx and
> renesas patches)
> - Added Sudeep's Acked-by for the scmi patch.
> - Added Geert's Reviewed-by for the renesas patch.
> - Fixed gpiolib crash reported by Naresh.
> - Patch 6: Fix __fwnode_links_move_consumers() to preserve fwnode link flags.
> - New Patch 12 to fix nvmem-cells issue reported by Maxim(s)/Miquel.
> - Deleted some stale function doc in Patch 8
>
> Cc: Abel Vesa <[email protected]>
> Cc: Alexander Stein <[email protected]>
> Cc: Tony Lindgren <[email protected]>
> Cc: Sudeep Holla <[email protected]>
> Cc: Geert Uytterhoeven <[email protected]>
> Cc: John Stultz <[email protected]>
> Cc: Doug Anderson <[email protected]>
> Cc: Guenter Roeck <[email protected]>
> Cc: Dmitry Baryshkov <[email protected]>
> Cc: Maxim Kiselev <[email protected]>
> Cc: Maxim Kochetkov <[email protected]>
> Cc: Miquel Raynal <[email protected]>
> Cc: Luca Weiss <[email protected]>
> Cc: Colin Foster <[email protected]>
> Cc: Martin Kepplinger <[email protected]>
> Cc: Jean-Philippe Brucker <[email protected]>
> Cc: Vladimir Oltean <[email protected]>
>
> Saravana Kannan (12):
> driver core: fw_devlink: Don't purge child fwnode's consumer links
> driver core: fw_devlink: Improve check for fwnode with no
> device/driver
> soc: renesas: Move away from using OF_POPULATED for fw_devlink
> gpiolib: Clear the gpio_device's fwnode initialized flag before adding
> driver core: fw_devlink: Add DL_FLAG_CYCLE support to device links
> driver core: fw_devlink: Allow marking a fwnode link as being part of
> a cycle
> driver core: fw_devlink: Consolidate device link flag computation
> driver core: fw_devlink: Make cycle detection more robust
> of: property: Simplify of_link_to_phandle()
> irqchip/irq-imx-gpcv2: Mark fwnode device as not initialized
> firmware: arm_scmi: Set fwnode for the scmi_device
> mtd: mtdpart: Don't create platform device that'll never probe
>
> drivers/base/core.c | 449 +++++++++++++++++++++-----------
> drivers/firmware/arm_scmi/bus.c | 3 +-
> drivers/gpio/gpiolib.c | 7 +
> drivers/irqchip/irq-imx-gpcv2.c | 1 +
> drivers/mtd/mtdpart.c | 10 +
> drivers/of/property.c | 84 +-----
> drivers/soc/imx/gpcv2.c | 2 +-
> drivers/soc/renesas/rcar-sysc.c | 2 +-
> include/linux/device.h | 1 +
> include/linux/fwnode.h | 12 +-
> 10 files changed, 344 insertions(+), 227 deletions(-)
>
> --
> 2.39.1.519.gcb327c4b5f-goog


2023-02-07 10:21:24

by Andy Shevchenko

[permalink] [raw]
Subject: Re: [PATCH v3 04/12] gpiolib: Clear the gpio_device's fwnode initialized flag before adding

On Mon, Feb 06, 2023 at 05:41:56PM -0800, Saravana Kannan wrote:
> Registering an irqdomain sets the flag for the fwnode. But having the
> flag set when a device is added is interpreted by fw_devlink to mean the
> device has already been initialized and will never probe. This prevents
> fw_devlink from creating device links with the gpio_device as a
> supplier. So, clear the flag before adding the device.

...

> + if (gdev->dev.fwnode && !gdev->dev.fwnode->dev)
> + fwnode_dev_initialized(gdev->dev.fwnode, false);

Please, do not dereference fwnode from struct device. We have dev_fwnode()
for that.

--
With Best Regards,
Andy Shevchenko



2023-02-07 10:28:54

by Andy Shevchenko

[permalink] [raw]
Subject: Re: [PATCH v3 04/12] gpiolib: Clear the gpio_device's fwnode initialized flag before adding

On Tue, Feb 07, 2023 at 12:20:59PM +0200, Andy Shevchenko wrote:
> On Mon, Feb 06, 2023 at 05:41:56PM -0800, Saravana Kannan wrote:
> > Registering an irqdomain sets the flag for the fwnode. But having the
> > flag set when a device is added is interpreted by fw_devlink to mean the
> > device has already been initialized and will never probe. This prevents
> > fw_devlink from creating device links with the gpio_device as a
> > supplier. So, clear the flag before adding the device.

...

> > + if (gdev->dev.fwnode && !gdev->dev.fwnode->dev)
> > + fwnode_dev_initialized(gdev->dev.fwnode, false);
>
> Please, do not dereference fwnode from struct device. We have dev_fwnode()
> for that.

Note, with that fixed you may add my
Reviewed-by: Andy Shevchenko <[email protected]>

--
With Best Regards,
Andy Shevchenko



2023-02-07 15:34:52

by Doug Anderson

[permalink] [raw]
Subject: Re: [PATCH v3 00/12] fw_devlink improvements

Hi,

On Mon, Feb 6, 2023 at 5:42 PM Saravana Kannan <[email protected]> wrote:
>
> Naresh, Tony, Abel, Geert, Dmitry, Maxim(s), Miquel, Luca, Doug, Martin,
> Jean-Philippe,
>
> Can I get your Tested-by's for this v3 series please?
>
> Vladimir,
>
> Ccing you because DSA's and fw_devlink have known/existing problems
> (still in my TODOs to fix). But I want to make sure this series doesn't
> cause additional problems for DSA.
>
> All,
>
> This patch series improves fw_devlink in the following ways:
>
> 1. It no longer cares about a fwnode having a "compatible" property. It
> figures this out more dynamically. The only expectation is that
> fwnodes that are converted to devices actually get probed by a driver
> for the dependencies to be enforced correctly.
>
> 2. Finer grained dependency tracking. fw_devlink will now create device
> links from the consumer to the actual resource's device (if it has one,
> Eg: gpio_device) instead of the parent supplier device. This improves
> things like async suspend/resume ordering, potentially remove the need
> for frameworks to create device links, more parallelized async probing,
> and better sync_state() tracking.
>
> 3. Handle hardware/software quirks where a child firmware node gets
> populated as a device before its parent firmware node AND actually
> supplies a non-optional resource to the parent firmware node's
> device.
>
> 4. Way more robust at cycle handling (see patch for the insane cases).
>
> 5. Stops depending on OF_POPULATED to figure out some corner cases.
>
> 6. Simplifies the work that needs to be done by the firmware specific
> code.
>
> The v3 series has gone through my usual testing on my end and looks good
> to me.
>
> Thanks,
> Saravana
>
> [1] - https://lore.kernel.org/lkml/[email protected]/
> [2] - https://lore.kernel.org/lkml/CAGETcx-JUV1nj8wBJrTPfyvM7=Mre5j_vkVmZojeiumUGG6QZQ@mail.gmail.com/
>
> v1 -> v2:
> - Fixed Patch 1 to handle a corner case discussed in [2].
> - New patch 10 to handle "fsl,imx8mq-gpc" being initialized by 2 drivers.
> - New patch 11 to add fw_devlink support for SCMI devices.
>
> v2 -> v3:
> - Addressed most of Andy's comments in v2
> - Added Colin and Sudeep's Tested-by for the series (except the imx and
> renesas patches)
> - Added Sudeep's Acked-by for the scmi patch.
> - Added Geert's Reviewed-by for the renesas patch.
> - Fixed gpiolib crash reported by Naresh.
> - Patch 6: Fix __fwnode_links_move_consumers() to preserve fwnode link flags.
> - New Patch 12 to fix nvmem-cells issue reported by Maxim(s)/Miquel.
> - Deleted some stale function doc in Patch 8
>
> Cc: Abel Vesa <[email protected]>
> Cc: Alexander Stein <[email protected]>
> Cc: Tony Lindgren <[email protected]>
> Cc: Sudeep Holla <[email protected]>
> Cc: Geert Uytterhoeven <[email protected]>
> Cc: John Stultz <[email protected]>
> Cc: Doug Anderson <[email protected]>
> Cc: Guenter Roeck <[email protected]>
> Cc: Dmitry Baryshkov <[email protected]>
> Cc: Maxim Kiselev <[email protected]>
> Cc: Maxim Kochetkov <[email protected]>
> Cc: Miquel Raynal <[email protected]>
> Cc: Luca Weiss <[email protected]>
> Cc: Colin Foster <[email protected]>
> Cc: Martin Kepplinger <[email protected]>
> Cc: Jean-Philippe Brucker <[email protected]>
> Cc: Vladimir Oltean <[email protected]>
>
> Saravana Kannan (12):
> driver core: fw_devlink: Don't purge child fwnode's consumer links
> driver core: fw_devlink: Improve check for fwnode with no
> device/driver
> soc: renesas: Move away from using OF_POPULATED for fw_devlink
> gpiolib: Clear the gpio_device's fwnode initialized flag before adding
> driver core: fw_devlink: Add DL_FLAG_CYCLE support to device links
> driver core: fw_devlink: Allow marking a fwnode link as being part of
> a cycle
> driver core: fw_devlink: Consolidate device link flag computation
> driver core: fw_devlink: Make cycle detection more robust
> of: property: Simplify of_link_to_phandle()
> irqchip/irq-imx-gpcv2: Mark fwnode device as not initialized
> firmware: arm_scmi: Set fwnode for the scmi_device
> mtd: mtdpart: Don't create platform device that'll never probe

I tested the whole series together on several devices. I tried to test
on a wide variety since previous versions had broken due to all the
dependency cycles in the display and some of these devices used
different components in their display pipeline. I didn't do massive
testing but did confirm that basic devices came up, including display.

Devices tested with your v3 applied atop v6.2-rc7-11-g05ecb680708a:

* sc7180-trogdor-lazor (with ps8640 bridge), which had failed to bring
up the display on v2.

* sc7180-trogdor-lazor (with sn65dsi86 bridge)

* sc7180-trogdor-pazquel (with sn65dsi86 bridge)

* sc7180-trogdor-homestar (with ps8640 bridge)

* sc7180-trogdor-wormdingler

* sc7280-herobrine-villager

Tested-by: Douglas Anderson <[email protected]>

2023-02-07 18:16:24

by Saravana Kannan

[permalink] [raw]
Subject: Re: [PATCH v3 00/12] fw_devlink improvements

On Tue, Feb 7, 2023 at 7:28 AM Doug Anderson <[email protected]> wrote:
>
> Hi,
>
> On Mon, Feb 6, 2023 at 5:42 PM Saravana Kannan <[email protected]> wrote:
> >
> > Naresh, Tony, Abel, Geert, Dmitry, Maxim(s), Miquel, Luca, Doug, Martin,
> > Jean-Philippe,
> >
> > Can I get your Tested-by's for this v3 series please?
> >
> > Vladimir,
> >
> > Ccing you because DSA's and fw_devlink have known/existing problems
> > (still in my TODOs to fix). But I want to make sure this series doesn't
> > cause additional problems for DSA.
> >
> > All,
> >
> > This patch series improves fw_devlink in the following ways:
> >
> > 1. It no longer cares about a fwnode having a "compatible" property. It
> > figures this out more dynamically. The only expectation is that
> > fwnodes that are converted to devices actually get probed by a driver
> > for the dependencies to be enforced correctly.
> >
> > 2. Finer grained dependency tracking. fw_devlink will now create device
> > links from the consumer to the actual resource's device (if it has one,
> > Eg: gpio_device) instead of the parent supplier device. This improves
> > things like async suspend/resume ordering, potentially remove the need
> > for frameworks to create device links, more parallelized async probing,
> > and better sync_state() tracking.
> >
> > 3. Handle hardware/software quirks where a child firmware node gets
> > populated as a device before its parent firmware node AND actually
> > supplies a non-optional resource to the parent firmware node's
> > device.
> >
> > 4. Way more robust at cycle handling (see patch for the insane cases).
> >
> > 5. Stops depending on OF_POPULATED to figure out some corner cases.
> >
> > 6. Simplifies the work that needs to be done by the firmware specific
> > code.
> >
> > The v3 series has gone through my usual testing on my end and looks good
> > to me.
> >
> > Thanks,
> > Saravana
> >
> > [1] - https://lore.kernel.org/lkml/[email protected]/
> > [2] - https://lore.kernel.org/lkml/CAGETcx-JUV1nj8wBJrTPfyvM7=Mre5j_vkVmZojeiumUGG6QZQ@mail.gmail.com/
> >
> > v1 -> v2:
> > - Fixed Patch 1 to handle a corner case discussed in [2].
> > - New patch 10 to handle "fsl,imx8mq-gpc" being initialized by 2 drivers.
> > - New patch 11 to add fw_devlink support for SCMI devices.
> >
> > v2 -> v3:
> > - Addressed most of Andy's comments in v2
> > - Added Colin and Sudeep's Tested-by for the series (except the imx and
> > renesas patches)
> > - Added Sudeep's Acked-by for the scmi patch.
> > - Added Geert's Reviewed-by for the renesas patch.
> > - Fixed gpiolib crash reported by Naresh.
> > - Patch 6: Fix __fwnode_links_move_consumers() to preserve fwnode link flags.
> > - New Patch 12 to fix nvmem-cells issue reported by Maxim(s)/Miquel.
> > - Deleted some stale function doc in Patch 8
> >
> > Cc: Abel Vesa <[email protected]>
> > Cc: Alexander Stein <[email protected]>
> > Cc: Tony Lindgren <[email protected]>
> > Cc: Sudeep Holla <[email protected]>
> > Cc: Geert Uytterhoeven <[email protected]>
> > Cc: John Stultz <[email protected]>
> > Cc: Doug Anderson <[email protected]>
> > Cc: Guenter Roeck <[email protected]>
> > Cc: Dmitry Baryshkov <[email protected]>
> > Cc: Maxim Kiselev <[email protected]>
> > Cc: Maxim Kochetkov <[email protected]>
> > Cc: Miquel Raynal <[email protected]>
> > Cc: Luca Weiss <[email protected]>
> > Cc: Colin Foster <[email protected]>
> > Cc: Martin Kepplinger <[email protected]>
> > Cc: Jean-Philippe Brucker <[email protected]>
> > Cc: Vladimir Oltean <[email protected]>
> >
> > Saravana Kannan (12):
> > driver core: fw_devlink: Don't purge child fwnode's consumer links
> > driver core: fw_devlink: Improve check for fwnode with no
> > device/driver
> > soc: renesas: Move away from using OF_POPULATED for fw_devlink
> > gpiolib: Clear the gpio_device's fwnode initialized flag before adding
> > driver core: fw_devlink: Add DL_FLAG_CYCLE support to device links
> > driver core: fw_devlink: Allow marking a fwnode link as being part of
> > a cycle
> > driver core: fw_devlink: Consolidate device link flag computation
> > driver core: fw_devlink: Make cycle detection more robust
> > of: property: Simplify of_link_to_phandle()
> > irqchip/irq-imx-gpcv2: Mark fwnode device as not initialized
> > firmware: arm_scmi: Set fwnode for the scmi_device
> > mtd: mtdpart: Don't create platform device that'll never probe
>
> I tested the whole series together on several devices. I tried to test
> on a wide variety since previous versions had broken due to all the
> dependency cycles in the display and some of these devices used
> different components in their display pipeline. I didn't do massive
> testing but did confirm that basic devices came up, including display.
>
> Devices tested with your v3 applied atop v6.2-rc7-11-g05ecb680708a:
>
> * sc7180-trogdor-lazor (with ps8640 bridge), which had failed to bring
> up the display on v2.
>
> * sc7180-trogdor-lazor (with sn65dsi86 bridge)
>
> * sc7180-trogdor-pazquel (with sn65dsi86 bridge)
>
> * sc7180-trogdor-homestar (with ps8640 bridge)
>
> * sc7180-trogdor-wormdingler
>
> * sc7280-herobrine-villager
>
> Tested-by: Douglas Anderson <[email protected]>

Thanks a lot for all this testing and helping me debug the ps8640 issue Doug!

-Saravana

2023-02-07 20:57:45

by Geert Uytterhoeven

[permalink] [raw]
Subject: Re: [PATCH v3 09/12] of: property: Simplify of_link_to_phandle()

Hi Saravana,

On Tue, Feb 7, 2023 at 2:42 AM Saravana Kannan <[email protected]> wrote:
> The driver core now:
> - Has the parent device of a supplier pick up the consumers if the
> supplier never has a device created for it.
> - Ignores a supplier if the supplier has no parent device and will never
> be probed by a driver
>
> And already prevents creating a device link with the consumer as a
> supplier of a parent.
>
> So, we no longer need to find the "compatible" node of the supplier or
> do any other checks in of_link_to_phandle(). We simply need to make sure
> that the supplier is available in DT.
>
> Signed-off-by: Saravana Kannan <[email protected]>

Thanks for your patch!

This patch introduces a regression when dynamically loading DT overlays.
Unfortunately this happens when using the out-of-tree OF configfs,
which is not supported upstream. Still, there may be (obscure)
in-tree users.

When loading a DT overlay[1] to enable an SPI controller, and
instantiate a connected SPI EEPROM:

$ overlay add 25lc040
OF: overlay: WARNING: memory leak will occur if overlay removed,
property: /keys/status
OF: overlay: WARNING: memory leak will occur if overlay removed,
property: /soc/spi@e6e90000/pinctrl-0
OF: overlay: WARNING: memory leak will occur if overlay removed,
property: /soc/spi@e6e90000/pinctrl-names
OF: overlay: WARNING: memory leak will occur if overlay removed,
property: /soc/spi@e6e90000/cs-gpios
OF: overlay: WARNING: memory leak will occur if overlay removed,
property: /soc/spi@e6e90000/status
OF: overlay: WARNING: memory leak will occur if overlay removed,
property: /__symbols__/msiof0_pins

The SPI controller and the SPI EEPROM are no longer instantiated.

# cat /sys/kernel/debug/devices_deferred
e6e90000.spi platform: wait for supplier msiof0

Let's remove the overlay again:

$ overlay rm 25lc040
input: keys as /devices/platform/keys/input/input1

And retry:

$ overlay add 25lc040
OF: overlay: WARNING: memory leak will occur if overlay removed,
property: /keys/status
OF: overlay: WARNING: memory leak will occur if overlay removed,
property: /soc/spi@e6e90000/pinctrl-0
OF: overlay: WARNING: memory leak will occur if overlay removed,
property: /soc/spi@e6e90000/pinctrl-names
OF: overlay: WARNING: memory leak will occur if overlay removed,
property: /soc/spi@e6e90000/cs-gpios
OF: overlay: WARNING: memory leak will occur if overlay removed,
property: /soc/spi@e6e90000/status
OF: overlay: WARNING: memory leak will occur if overlay removed,
property: /__symbols__/msiof0_pins
spi_sh_msiof e6e90000.spi: DMA available
spi_sh_msiof e6e90000.spi: registered master spi0
spi spi0.0: setup mode 0, 8 bits/w, 100000 Hz max --> 0
at25 spi0.0: 512 Byte at25 eeprom, pagesize 16
spi_sh_msiof e6e90000.spi: registered child spi0.0

Now it succeeds, and the SPI EEPROM is available, and works.

Without this patch, or with this patch reverted after applying the
full series:

$ overlay add 25lc040
OF: overlay: WARNING: memory leak will occur if overlay removed,
property: /keys/status
OF: overlay: WARNING: memory leak will occur if overlay removed,
property: /soc/spi@e6e90000/pinctrl-0
OF: overlay: WARNING: memory leak will occur if overlay removed,
property: /soc/spi@e6e90000/pinctrl-names
OF: overlay: WARNING: memory leak will occur if overlay removed,
property: /soc/spi@e6e90000/cs-gpios
OF: overlay: WARNING: memory leak will occur if overlay removed,
property: /soc/spi@e6e90000/status
OF: overlay: WARNING: memory leak will occur if overlay removed,
property: /__symbols__/msiof0_pins
OF: Not linking spi@e6e90000 to interrupt-controller@f1010000 - No
struct device
spi_sh_msiof e6e90000.spi: DMA available
spi_sh_msiof e6e90000.spi: registered master spi0
spi spi0.0: setup mode 0, 8 bits/w, 100000 Hz max --> 0
at25 spi0.0: 444 bps (2 bytes in 9 ticks)
at25 spi0.0: 512 Byte at25 eeprom, pagesize 16
spi_sh_msiof e6e90000.spi: registered child spi0.0

The SPI EEPROM is available on the first try after boot.

All output is with #define DEBUG in drivers/of/property.c, and with
CONFIG_SPI_DEBUG=y.

Note that your patch has no impact on drivers/of/unittest.c, as that
checks only internal DT structures, not actual device instantiation.

Thanks! ;-)

[1] https://git.kernel.org/pub/scm/linux/kernel/git/geert/renesas-drivers.git/diff/arch/arm64/boot/dts/renesas/r8a77990-ebisu-cn41-msiof0-25lc040.dtso?h=topic/renesas-overlays&id=86d0cf6fa7f191145380485c22f684873c5cce26

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

2023-02-07 21:36:20

by Geert Uytterhoeven

[permalink] [raw]
Subject: Re: [PATCH v3 00/12] fw_devlink improvements

Hi Saravana,

On Tue, Feb 7, 2023 at 2:42 AM Saravana Kannan <[email protected]> wrote:
> Naresh, Tony, Abel, Geert, Dmitry, Maxim(s), Miquel, Luca, Doug, Martin,
> Jean-Philippe,
>
> Can I get your Tested-by's for this v3 series please?

I have tested this on a variety of Renesas arm32/arm64 platforms,
and on several RISC-V platforms.
Apart from the regression related to dynamic overlays caused by
"[PATCH v3 09/12] of: property: Simplify of_link_to_phandle()"
(which you may decide to ignore for now ;-)
Tested-by: Geert Uytterhoeven <[email protected]>

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

2023-02-07 23:13:39

by Saravana Kannan

[permalink] [raw]
Subject: Re: [PATCH v3 00/12] fw_devlink improvements

On Tue, Feb 7, 2023 at 1:36 PM Geert Uytterhoeven <[email protected]> wrote:
>
> Hi Saravana,
>
> On Tue, Feb 7, 2023 at 2:42 AM Saravana Kannan <[email protected]> wrote:
> > Naresh, Tony, Abel, Geert, Dmitry, Maxim(s), Miquel, Luca, Doug, Martin,
> > Jean-Philippe,
> >
> > Can I get your Tested-by's for this v3 series please?
>
> I have tested this on a variety of Renesas arm32/arm64 platforms,
> and on several RISC-V platforms.
> Apart from the regression related to dynamic overlays caused by
> "[PATCH v3 09/12] of: property: Simplify of_link_to_phandle()"
> (which you may decide to ignore for now ;-)
> Tested-by: Geert Uytterhoeven <[email protected]>

Thanks a lot for the extensive testing Geert!

I'll take a look at that issue with the out of tree code separately.

-Saravana


-Saravana

2023-02-08 02:08:44

by Saravana Kannan

[permalink] [raw]
Subject: Re: [PATCH v3 09/12] of: property: Simplify of_link_to_phandle()

On Tue, Feb 7, 2023 at 12:57 PM Geert Uytterhoeven <[email protected]> wrote:
>
> Hi Saravana,
>
> On Tue, Feb 7, 2023 at 2:42 AM Saravana Kannan <[email protected]> wrote:
> > The driver core now:
> > - Has the parent device of a supplier pick up the consumers if the
> > supplier never has a device created for it.
> > - Ignores a supplier if the supplier has no parent device and will never
> > be probed by a driver
> >
> > And already prevents creating a device link with the consumer as a
> > supplier of a parent.
> >
> > So, we no longer need to find the "compatible" node of the supplier or
> > do any other checks in of_link_to_phandle(). We simply need to make sure
> > that the supplier is available in DT.
> >
> > Signed-off-by: Saravana Kannan <[email protected]>
>
> Thanks for your patch!
>
> This patch introduces a regression when dynamically loading DT overlays.
> Unfortunately this happens when using the out-of-tree OF configfs,
> which is not supported upstream. Still, there may be (obscure)
> in-tree users.
>
> When loading a DT overlay[1] to enable an SPI controller, and
> instantiate a connected SPI EEPROM:
>
> $ overlay add 25lc040
> OF: overlay: WARNING: memory leak will occur if overlay removed,
> property: /keys/status
> OF: overlay: WARNING: memory leak will occur if overlay removed,
> property: /soc/spi@e6e90000/pinctrl-0
> OF: overlay: WARNING: memory leak will occur if overlay removed,
> property: /soc/spi@e6e90000/pinctrl-names
> OF: overlay: WARNING: memory leak will occur if overlay removed,
> property: /soc/spi@e6e90000/cs-gpios
> OF: overlay: WARNING: memory leak will occur if overlay removed,
> property: /soc/spi@e6e90000/status
> OF: overlay: WARNING: memory leak will occur if overlay removed,
> property: /__symbols__/msiof0_pins
>
> The SPI controller and the SPI EEPROM are no longer instantiated.
>
> # cat /sys/kernel/debug/devices_deferred
> e6e90000.spi platform: wait for supplier msiof0
>
> Let's remove the overlay again:
>
> $ overlay rm 25lc040
> input: keys as /devices/platform/keys/input/input1
>
> And retry:
>
> $ overlay add 25lc040
> OF: overlay: WARNING: memory leak will occur if overlay removed,
> property: /keys/status
> OF: overlay: WARNING: memory leak will occur if overlay removed,
> property: /soc/spi@e6e90000/pinctrl-0
> OF: overlay: WARNING: memory leak will occur if overlay removed,
> property: /soc/spi@e6e90000/pinctrl-names
> OF: overlay: WARNING: memory leak will occur if overlay removed,
> property: /soc/spi@e6e90000/cs-gpios
> OF: overlay: WARNING: memory leak will occur if overlay removed,
> property: /soc/spi@e6e90000/status
> OF: overlay: WARNING: memory leak will occur if overlay removed,
> property: /__symbols__/msiof0_pins
> spi_sh_msiof e6e90000.spi: DMA available
> spi_sh_msiof e6e90000.spi: registered master spi0
> spi spi0.0: setup mode 0, 8 bits/w, 100000 Hz max --> 0
> at25 spi0.0: 512 Byte at25 eeprom, pagesize 16
> spi_sh_msiof e6e90000.spi: registered child spi0.0
>
> Now it succeeds, and the SPI EEPROM is available, and works.
>
> Without this patch, or with this patch reverted after applying the
> full series:
>
> $ overlay add 25lc040
> OF: overlay: WARNING: memory leak will occur if overlay removed,
> property: /keys/status
> OF: overlay: WARNING: memory leak will occur if overlay removed,
> property: /soc/spi@e6e90000/pinctrl-0
> OF: overlay: WARNING: memory leak will occur if overlay removed,
> property: /soc/spi@e6e90000/pinctrl-names
> OF: overlay: WARNING: memory leak will occur if overlay removed,
> property: /soc/spi@e6e90000/cs-gpios
> OF: overlay: WARNING: memory leak will occur if overlay removed,
> property: /soc/spi@e6e90000/status
> OF: overlay: WARNING: memory leak will occur if overlay removed,
> property: /__symbols__/msiof0_pins
> OF: Not linking spi@e6e90000 to interrupt-controller@f1010000 - No
> struct device
> spi_sh_msiof e6e90000.spi: DMA available
> spi_sh_msiof e6e90000.spi: registered master spi0
> spi spi0.0: setup mode 0, 8 bits/w, 100000 Hz max --> 0
> at25 spi0.0: 444 bps (2 bytes in 9 ticks)
> at25 spi0.0: 512 Byte at25 eeprom, pagesize 16
> spi_sh_msiof e6e90000.spi: registered child spi0.0
>
> The SPI EEPROM is available on the first try after boot.

Sigh... I spent way too long trying to figure out if I caused a memory
leak. I should have scrolled down further! Doesn't look like that part
is related to anything I did.

There are some flags set to avoid re-parsing fwnodes multiple times.
My guess is that the issue you are seeing has to do with how many of
the in memory structs are reused vs not when an overlay is
applied/removed and some of these flags might not be getting cleared
and this is having a bigger impact with this patch (because the fwnode
links are no longer anchored on "compatible" nodes).

With/without this patch (let's keep the series) can you look at how
the following things change between each step you do above (add,
remove, retry):
1) List of directories under /sys/class/devlink
2) Enable the debug logs inside __fwnode_link_add(),
__fwnode_link_del(), device_link_add()

My guess is that the final solution would entail clearing
FWNODE_FLAG_LINKS_ADDED for some fwnodes.

Thanks,
Saravana

2023-02-08 07:31:23

by Geert Uytterhoeven

[permalink] [raw]
Subject: Re: [PATCH v3 09/12] of: property: Simplify of_link_to_phandle()

Hi Saravana,

On Wed, Feb 8, 2023 at 3:08 AM Saravana Kannan <[email protected]> wrote:
> On Tue, Feb 7, 2023 at 12:57 PM Geert Uytterhoeven <[email protected]> wrote:
> > On Tue, Feb 7, 2023 at 2:42 AM Saravana Kannan <[email protected]> wrote:
> > > The driver core now:
> > > - Has the parent device of a supplier pick up the consumers if the
> > > supplier never has a device created for it.
> > > - Ignores a supplier if the supplier has no parent device and will never
> > > be probed by a driver
> > >
> > > And already prevents creating a device link with the consumer as a
> > > supplier of a parent.
> > >
> > > So, we no longer need to find the "compatible" node of the supplier or
> > > do any other checks in of_link_to_phandle(). We simply need to make sure
> > > that the supplier is available in DT.
> > >
> > > Signed-off-by: Saravana Kannan <[email protected]>
> >
> > Thanks for your patch!
> >
> > This patch introduces a regression when dynamically loading DT overlays.
> > Unfortunately this happens when using the out-of-tree OF configfs,
> > which is not supported upstream. Still, there may be (obscure)
> > in-tree users.
> >
> > When loading a DT overlay[1] to enable an SPI controller, and
> > instantiate a connected SPI EEPROM:
> >
> > $ overlay add 25lc040
> > OF: overlay: WARNING: memory leak will occur if overlay removed,
> > property: /keys/status
> > OF: overlay: WARNING: memory leak will occur if overlay removed,
> > property: /soc/spi@e6e90000/pinctrl-0
> > OF: overlay: WARNING: memory leak will occur if overlay removed,
> > property: /soc/spi@e6e90000/pinctrl-names
> > OF: overlay: WARNING: memory leak will occur if overlay removed,
> > property: /soc/spi@e6e90000/cs-gpios
> > OF: overlay: WARNING: memory leak will occur if overlay removed,
> > property: /soc/spi@e6e90000/status
> > OF: overlay: WARNING: memory leak will occur if overlay removed,
> > property: /__symbols__/msiof0_pins
> >
> > The SPI controller and the SPI EEPROM are no longer instantiated.
> >
> > # cat /sys/kernel/debug/devices_deferred
> > e6e90000.spi platform: wait for supplier msiof0
> >
> > Let's remove the overlay again:
> >
> > $ overlay rm 25lc040
> > input: keys as /devices/platform/keys/input/input1
> >
> > And retry:
> >
> > $ overlay add 25lc040
> > OF: overlay: WARNING: memory leak will occur if overlay removed,
> > property: /keys/status
> > OF: overlay: WARNING: memory leak will occur if overlay removed,
> > property: /soc/spi@e6e90000/pinctrl-0
> > OF: overlay: WARNING: memory leak will occur if overlay removed,
> > property: /soc/spi@e6e90000/pinctrl-names
> > OF: overlay: WARNING: memory leak will occur if overlay removed,
> > property: /soc/spi@e6e90000/cs-gpios
> > OF: overlay: WARNING: memory leak will occur if overlay removed,
> > property: /soc/spi@e6e90000/status
> > OF: overlay: WARNING: memory leak will occur if overlay removed,
> > property: /__symbols__/msiof0_pins
> > spi_sh_msiof e6e90000.spi: DMA available
> > spi_sh_msiof e6e90000.spi: registered master spi0
> > spi spi0.0: setup mode 0, 8 bits/w, 100000 Hz max --> 0
> > at25 spi0.0: 512 Byte at25 eeprom, pagesize 16
> > spi_sh_msiof e6e90000.spi: registered child spi0.0
> >
> > Now it succeeds, and the SPI EEPROM is available, and works.
> >
> > Without this patch, or with this patch reverted after applying the
> > full series:
> >
> > $ overlay add 25lc040
> > OF: overlay: WARNING: memory leak will occur if overlay removed,
> > property: /keys/status
> > OF: overlay: WARNING: memory leak will occur if overlay removed,
> > property: /soc/spi@e6e90000/pinctrl-0
> > OF: overlay: WARNING: memory leak will occur if overlay removed,
> > property: /soc/spi@e6e90000/pinctrl-names
> > OF: overlay: WARNING: memory leak will occur if overlay removed,
> > property: /soc/spi@e6e90000/cs-gpios
> > OF: overlay: WARNING: memory leak will occur if overlay removed,
> > property: /soc/spi@e6e90000/status
> > OF: overlay: WARNING: memory leak will occur if overlay removed,
> > property: /__symbols__/msiof0_pins
> > OF: Not linking spi@e6e90000 to interrupt-controller@f1010000 - No
> > struct device
> > spi_sh_msiof e6e90000.spi: DMA available
> > spi_sh_msiof e6e90000.spi: registered master spi0
> > spi spi0.0: setup mode 0, 8 bits/w, 100000 Hz max --> 0
> > at25 spi0.0: 444 bps (2 bytes in 9 ticks)
> > at25 spi0.0: 512 Byte at25 eeprom, pagesize 16
> > spi_sh_msiof e6e90000.spi: registered child spi0.0
> >
> > The SPI EEPROM is available on the first try after boot.
>
> Sigh... I spent way too long trying to figure out if I caused a memory
> leak. I should have scrolled down further! Doesn't look like that part
> is related to anything I did.

Please ignore the memory leak messages. They are known issues
in the upstream DT overlay code, and not related to your series.

> There are some flags set to avoid re-parsing fwnodes multiple times.
> My guess is that the issue you are seeing has to do with how many of
> the in memory structs are reused vs not when an overlay is
> applied/removed and some of these flags might not be getting cleared
> and this is having a bigger impact with this patch (because the fwnode
> links are no longer anchored on "compatible" nodes).
>
> With/without this patch (let's keep the series) can you look at how
> the following things change between each step you do above (add,
> remove, retry):
> 1) List of directories under /sys/class/devlink
> 2) Enable the debug logs inside __fwnode_link_add(),
> __fwnode_link_del(), device_link_add()

Thanks, I'll give that a try, later...

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

2023-02-08 07:32:46

by Saravana Kannan

[permalink] [raw]
Subject: Re: [PATCH v3 09/12] of: property: Simplify of_link_to_phandle()

On Tue, Feb 7, 2023 at 6:08 PM Saravana Kannan <[email protected]> wrote:
>
> On Tue, Feb 7, 2023 at 12:57 PM Geert Uytterhoeven <[email protected]> wrote:
> >
> > Hi Saravana,
> >
> > On Tue, Feb 7, 2023 at 2:42 AM Saravana Kannan <[email protected]> wrote:
> > > The driver core now:
> > > - Has the parent device of a supplier pick up the consumers if the
> > > supplier never has a device created for it.
> > > - Ignores a supplier if the supplier has no parent device and will never
> > > be probed by a driver
> > >
> > > And already prevents creating a device link with the consumer as a
> > > supplier of a parent.
> > >
> > > So, we no longer need to find the "compatible" node of the supplier or
> > > do any other checks in of_link_to_phandle(). We simply need to make sure
> > > that the supplier is available in DT.
> > >
> > > Signed-off-by: Saravana Kannan <[email protected]>
> >
> > Thanks for your patch!
> >
> > This patch introduces a regression when dynamically loading DT overlays.
> > Unfortunately this happens when using the out-of-tree OF configfs,
> > which is not supported upstream. Still, there may be (obscure)
> > in-tree users.
> >
> > When loading a DT overlay[1] to enable an SPI controller, and
> > instantiate a connected SPI EEPROM:
> >
> > $ overlay add 25lc040
> > OF: overlay: WARNING: memory leak will occur if overlay removed,
> > property: /keys/status
> > OF: overlay: WARNING: memory leak will occur if overlay removed,
> > property: /soc/spi@e6e90000/pinctrl-0
> > OF: overlay: WARNING: memory leak will occur if overlay removed,
> > property: /soc/spi@e6e90000/pinctrl-names
> > OF: overlay: WARNING: memory leak will occur if overlay removed,
> > property: /soc/spi@e6e90000/cs-gpios
> > OF: overlay: WARNING: memory leak will occur if overlay removed,
> > property: /soc/spi@e6e90000/status
> > OF: overlay: WARNING: memory leak will occur if overlay removed,
> > property: /__symbols__/msiof0_pins
> >
> > The SPI controller and the SPI EEPROM are no longer instantiated.
> >
> > # cat /sys/kernel/debug/devices_deferred
> > e6e90000.spi platform: wait for supplier msiof0
> >
> > Let's remove the overlay again:
> >
> > $ overlay rm 25lc040
> > input: keys as /devices/platform/keys/input/input1
> >
> > And retry:
> >
> > $ overlay add 25lc040
> > OF: overlay: WARNING: memory leak will occur if overlay removed,
> > property: /keys/status
> > OF: overlay: WARNING: memory leak will occur if overlay removed,
> > property: /soc/spi@e6e90000/pinctrl-0
> > OF: overlay: WARNING: memory leak will occur if overlay removed,
> > property: /soc/spi@e6e90000/pinctrl-names
> > OF: overlay: WARNING: memory leak will occur if overlay removed,
> > property: /soc/spi@e6e90000/cs-gpios
> > OF: overlay: WARNING: memory leak will occur if overlay removed,
> > property: /soc/spi@e6e90000/status
> > OF: overlay: WARNING: memory leak will occur if overlay removed,
> > property: /__symbols__/msiof0_pins
> > spi_sh_msiof e6e90000.spi: DMA available
> > spi_sh_msiof e6e90000.spi: registered master spi0
> > spi spi0.0: setup mode 0, 8 bits/w, 100000 Hz max --> 0
> > at25 spi0.0: 512 Byte at25 eeprom, pagesize 16
> > spi_sh_msiof e6e90000.spi: registered child spi0.0
> >
> > Now it succeeds, and the SPI EEPROM is available, and works.
> >
> > Without this patch, or with this patch reverted after applying the
> > full series:
> >
> > $ overlay add 25lc040
> > OF: overlay: WARNING: memory leak will occur if overlay removed,
> > property: /keys/status
> > OF: overlay: WARNING: memory leak will occur if overlay removed,
> > property: /soc/spi@e6e90000/pinctrl-0
> > OF: overlay: WARNING: memory leak will occur if overlay removed,
> > property: /soc/spi@e6e90000/pinctrl-names
> > OF: overlay: WARNING: memory leak will occur if overlay removed,
> > property: /soc/spi@e6e90000/cs-gpios
> > OF: overlay: WARNING: memory leak will occur if overlay removed,
> > property: /soc/spi@e6e90000/status
> > OF: overlay: WARNING: memory leak will occur if overlay removed,
> > property: /__symbols__/msiof0_pins
> > OF: Not linking spi@e6e90000 to interrupt-controller@f1010000 - No
> > struct device
> > spi_sh_msiof e6e90000.spi: DMA available
> > spi_sh_msiof e6e90000.spi: registered master spi0
> > spi spi0.0: setup mode 0, 8 bits/w, 100000 Hz max --> 0
> > at25 spi0.0: 444 bps (2 bytes in 9 ticks)
> > at25 spi0.0: 512 Byte at25 eeprom, pagesize 16
> > spi_sh_msiof e6e90000.spi: registered child spi0.0
> >
> > The SPI EEPROM is available on the first try after boot.
>
> Sigh... I spent way too long trying to figure out if I caused a memory
> leak. I should have scrolled down further! Doesn't look like that part
> is related to anything I did.
>
> There are some flags set to avoid re-parsing fwnodes multiple times.
> My guess is that the issue you are seeing has to do with how many of
> the in memory structs are reused vs not when an overlay is
> applied/removed and some of these flags might not be getting cleared
> and this is having a bigger impact with this patch (because the fwnode
> links are no longer anchored on "compatible" nodes).
>
> With/without this patch (let's keep the series) can you look at how
> the following things change between each step you do above (add,
> remove, retry):
> 1) List of directories under /sys/class/devlink
> 2) Enable the debug logs inside __fwnode_link_add(),
> __fwnode_link_del(), device_link_add()
>
> My guess is that the final solution would entail clearing
> FWNODE_FLAG_LINKS_ADDED for some fwnodes.

You replied just as I was about to hit send. So sending this anyway...

Ok, I took a closer look and I think it's a bit of a mess. The fact
that it even worked for you without this patch is a bit of a
coincidence.

Let's just take platform devices that are created by
driver/of/platform.c as an example.

The main problem is that when you add/remove properties to a DT node
of an existing platform device, nothing is really done about it at the
device level. We don't even unbind and rebind the driver so the driver
could make use of the new properties. We don't remove and add back the
device so whoever might use the new property will use it. And if you
are adding a new node, it'll only trigger any platform device level
impact if it's a new node of a "simple-bus" (or similar bus) device.

Problem 1:
So if you add a new child node to an existing probed device that adds
its children explicitly (as in, the parent is not a "simple-bus" like
device), nothing will happen. The newly added child device node will
get converted into a platform device, not will the parent device
notice it. So in your case of adding msiof0_pins, it's just that when
the consumer gets the pins, the driver doesn't get involved much and
it's the pinctrl framework that reads the DT and figures it out.

With this patch, the fwnode links point to the actual resource and the
actual parent device inherits them if they don't get converted to a
struct device. But since we are adding this msiof0_pins after the
parent device has probed, the fwnode link isn't inherited by the
parent pinctrl device.

Problem 2:
So if you add a property to an already bound device, nothing is done
by the driver. In your overlay example, if you move the status="okay"
line to be the first property you change in the msiof0 spi device,
you'll probably see that fw_devlink is no longer the one blocking the
probe. This is because the platform device will get added as soon as
the status flips from disabled to enabled and at that point fw_devlink
will think it has no suppliers and won't do any probe deferring. And
then as the new properties get added nothing will happen at the device
or fw_devlink level. If the msiof0's spi driver fails immediately with
NOT -EPROBE_DEFER when platform device is added because it couldn't
find any pinctrl property, then msiof0 will never probe (unless you
remove and add the driver). If it had failed with -EPROBE_DEFER, then
it might probe again if something else triggers a deferred probe
attempt. Clearly, things working/not working based on the order of
properties in DT is not a good implementation.

Problem 3:
What if you enable a previously disabled supplier. There's no way to
handle that from a fw_devlink level without re-parsing the entire
device tree because existing devices might be consumers now.

Anyway, long story short, it's sorta worked due to coincidence and
it's quite messy to get it to work correctly.

Another way to get this to work is to:
1) unload pinctrl driver, unload spi driver.
2) apply overlay
3) reload pinctrl driver, reload spi driver.

This is assuming unloading those 2 drivers doesn't crash your system.

In terms of difficult + inefficiency of solving the problems, the
easiest/efficient to hardest/inefficient would be problem 1, 2 and
then 3. I'll think about them, but it's broken anyway without the
series/patch. The only real guarantee as of today is that we aren't
leaking any memory or corrupting anything.

-Saravana

2023-02-08 07:33:50

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH v3 09/12] of: property: Simplify of_link_to_phandle()

On Tue, Feb 07, 2023 at 09:57:14PM +0100, Geert Uytterhoeven wrote:
> Hi Saravana,
>
> On Tue, Feb 7, 2023 at 2:42 AM Saravana Kannan <[email protected]> wrote:
> > The driver core now:
> > - Has the parent device of a supplier pick up the consumers if the
> > supplier never has a device created for it.
> > - Ignores a supplier if the supplier has no parent device and will never
> > be probed by a driver
> >
> > And already prevents creating a device link with the consumer as a
> > supplier of a parent.
> >
> > So, we no longer need to find the "compatible" node of the supplier or
> > do any other checks in of_link_to_phandle(). We simply need to make sure
> > that the supplier is available in DT.
> >
> > Signed-off-by: Saravana Kannan <[email protected]>
>
> Thanks for your patch!
>
> This patch introduces a regression when dynamically loading DT overlays.
> Unfortunately this happens when using the out-of-tree OF configfs,
> which is not supported upstream. Still, there may be (obscure)
> in-tree users.

As we can't do anything about out-of-tree code, why does this matter?

thanks,

greg k-h

2023-02-08 07:50:56

by Geert Uytterhoeven

[permalink] [raw]
Subject: Re: [PATCH v3 09/12] of: property: Simplify of_link_to_phandle()

Hi Greg,

On Wed, Feb 8, 2023 at 8:33 AM Greg Kroah-Hartman
<[email protected]> wrote:
> On Tue, Feb 07, 2023 at 09:57:14PM +0100, Geert Uytterhoeven wrote:
> > On Tue, Feb 7, 2023 at 2:42 AM Saravana Kannan <[email protected]> wrote:
> > > The driver core now:
> > > - Has the parent device of a supplier pick up the consumers if the
> > > supplier never has a device created for it.
> > > - Ignores a supplier if the supplier has no parent device and will never
> > > be probed by a driver
> > >
> > > And already prevents creating a device link with the consumer as a
> > > supplier of a parent.
> > >
> > > So, we no longer need to find the "compatible" node of the supplier or
> > > do any other checks in of_link_to_phandle(). We simply need to make sure
> > > that the supplier is available in DT.
> > >
> > > Signed-off-by: Saravana Kannan <[email protected]>
> >
> > Thanks for your patch!
> >
> > This patch introduces a regression when dynamically loading DT overlays.
> > Unfortunately this happens when using the out-of-tree OF configfs,
> > which is not supported upstream. Still, there may be (obscure)
> > in-tree users.
>
> As we can't do anything about out-of-tree code, why does this matter?

Because the actual DT overlay mechanism is upstream.

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

2023-02-08 07:57:21

by Geert Uytterhoeven

[permalink] [raw]
Subject: Re: [PATCH v3 09/12] of: property: Simplify of_link_to_phandle()

Hi Saravana,

On Wed, Feb 8, 2023 at 8:32 AM Saravana Kannan <[email protected]> wrote:
> On Tue, Feb 7, 2023 at 6:08 PM Saravana Kannan <[email protected]> wrote:
> > On Tue, Feb 7, 2023 at 12:57 PM Geert Uytterhoeven <[email protected]> wrote:
> > > On Tue, Feb 7, 2023 at 2:42 AM Saravana Kannan <[email protected]> wrote:
> > > > The driver core now:
> > > > - Has the parent device of a supplier pick up the consumers if the
> > > > supplier never has a device created for it.
> > > > - Ignores a supplier if the supplier has no parent device and will never
> > > > be probed by a driver
> > > >
> > > > And already prevents creating a device link with the consumer as a
> > > > supplier of a parent.
> > > >
> > > > So, we no longer need to find the "compatible" node of the supplier or
> > > > do any other checks in of_link_to_phandle(). We simply need to make sure
> > > > that the supplier is available in DT.
> > > >
> > > > Signed-off-by: Saravana Kannan <[email protected]>
> > >
> > > Thanks for your patch!
> > >
> > > This patch introduces a regression when dynamically loading DT overlays.
> > > Unfortunately this happens when using the out-of-tree OF configfs,
> > > which is not supported upstream. Still, there may be (obscure)
> > > in-tree users.
> > >
> > > When loading a DT overlay[1] to enable an SPI controller, and
> > > instantiate a connected SPI EEPROM:

[...]

> > > The SPI controller and the SPI EEPROM are no longer instantiated.

> > Sigh... I spent way too long trying to figure out if I caused a memory
> > leak. I should have scrolled down further! Doesn't look like that part
> > is related to anything I did.
> >
> > There are some flags set to avoid re-parsing fwnodes multiple times.
> > My guess is that the issue you are seeing has to do with how many of
> > the in memory structs are reused vs not when an overlay is
> > applied/removed and some of these flags might not be getting cleared
> > and this is having a bigger impact with this patch (because the fwnode
> > links are no longer anchored on "compatible" nodes).
> >
> > With/without this patch (let's keep the series) can you look at how
> > the following things change between each step you do above (add,
> > remove, retry):
> > 1) List of directories under /sys/class/devlink
> > 2) Enable the debug logs inside __fwnode_link_add(),
> > __fwnode_link_del(), device_link_add()
> >
> > My guess is that the final solution would entail clearing
> > FWNODE_FLAG_LINKS_ADDED for some fwnodes.
>
> You replied just as I was about to hit send. So sending this anyway...
>
> Ok, I took a closer look and I think it's a bit of a mess. The fact
> that it even worked for you without this patch is a bit of a
> coincidence.
>
> Let's just take platform devices that are created by
> driver/of/platform.c as an example.
>
> The main problem is that when you add/remove properties to a DT node
> of an existing platform device, nothing is really done about it at the
> device level. We don't even unbind and rebind the driver so the driver
> could make use of the new properties. We don't remove and add back the
> device so whoever might use the new property will use it. And if you
> are adding a new node, it'll only trigger any platform device level
> impact if it's a new node of a "simple-bus" (or similar bus) device.
>
> Problem 1:
> So if you add a new child node to an existing probed device that adds
> its children explicitly (as in, the parent is not a "simple-bus" like
> device), nothing will happen. The newly added child device node will
> get converted into a platform device, not will the parent device
> notice it. So in your case of adding msiof0_pins, it's just that when
> the consumer gets the pins, the driver doesn't get involved much and
> it's the pinctrl framework that reads the DT and figures it out.
>
> With this patch, the fwnode links point to the actual resource and the
> actual parent device inherits them if they don't get converted to a
> struct device. But since we are adding this msiof0_pins after the
> parent device has probed, the fwnode link isn't inherited by the
> parent pinctrl device.
>
> Problem 2:
> So if you add a property to an already bound device, nothing is done
> by the driver. In your overlay example, if you move the status="okay"
> line to be the first property you change in the msiof0 spi device,
> you'll probably see that fw_devlink is no longer the one blocking the
> probe. This is because the platform device will get added as soon as
> the status flips from disabled to enabled and at that point fw_devlink
> will think it has no suppliers and won't do any probe deferring. And
> then as the new properties get added nothing will happen at the device
> or fw_devlink level. If the msiof0's spi driver fails immediately with
> NOT -EPROBE_DEFER when platform device is added because it couldn't
> find any pinctrl property, then msiof0 will never probe (unless you
> remove and add the driver). If it had failed with -EPROBE_DEFER, then
> it might probe again if something else triggers a deferred probe
> attempt. Clearly, things working/not working based on the order of
> properties in DT is not a good implementation.
>
> Problem 3:
> What if you enable a previously disabled supplier. There's no way to
> handle that from a fw_devlink level without re-parsing the entire
> device tree because existing devices might be consumers now.
>
> Anyway, long story short, it's sorta worked due to coincidence and
> it's quite messy to get it to work correctly.

Several subsystems register notifiers to be informed of the events
above. E.g. drivers/spi/spi.c:

if (IS_ENABLED(CONFIG_OF_DYNAMIC))
WARN_ON(of_reconfig_notifier_register(&spi_of_notifier));
if (IS_ENABLED(CONFIG_ACPI))
WARN_ON(acpi_reconfig_notifier_register(&spi_acpi_notifier));

So my issue might be triggered using ACPI, too.

> Another way to get this to work is to:
> 1) unload pinctrl driver, unload spi driver.
> 2) apply overlay
> 3) reload pinctrl driver, reload spi driver.
>
> This is assuming unloading those 2 drivers doesn't crash your system.

Unloading the pinctrl driver is not an option.

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

2023-02-08 08:35:57

by Saravana Kannan

[permalink] [raw]
Subject: Re: [PATCH v3 09/12] of: property: Simplify of_link_to_phandle()

On Tue, Feb 7, 2023 at 11:57 PM Geert Uytterhoeven <[email protected]> wrote:
>
> Hi Saravana,
>
> On Wed, Feb 8, 2023 at 8:32 AM Saravana Kannan <[email protected]> wrote:
> > On Tue, Feb 7, 2023 at 6:08 PM Saravana Kannan <[email protected]> wrote:
> > > On Tue, Feb 7, 2023 at 12:57 PM Geert Uytterhoeven <[email protected]> wrote:
> > > > On Tue, Feb 7, 2023 at 2:42 AM Saravana Kannan <[email protected]> wrote:
> > > > > The driver core now:
> > > > > - Has the parent device of a supplier pick up the consumers if the
> > > > > supplier never has a device created for it.
> > > > > - Ignores a supplier if the supplier has no parent device and will never
> > > > > be probed by a driver
> > > > >
> > > > > And already prevents creating a device link with the consumer as a
> > > > > supplier of a parent.
> > > > >
> > > > > So, we no longer need to find the "compatible" node of the supplier or
> > > > > do any other checks in of_link_to_phandle(). We simply need to make sure
> > > > > that the supplier is available in DT.
> > > > >
> > > > > Signed-off-by: Saravana Kannan <[email protected]>
> > > >
> > > > Thanks for your patch!
> > > >
> > > > This patch introduces a regression when dynamically loading DT overlays.
> > > > Unfortunately this happens when using the out-of-tree OF configfs,
> > > > which is not supported upstream. Still, there may be (obscure)
> > > > in-tree users.
> > > >
> > > > When loading a DT overlay[1] to enable an SPI controller, and
> > > > instantiate a connected SPI EEPROM:
>
> [...]
>
> > > > The SPI controller and the SPI EEPROM are no longer instantiated.
>
> > > Sigh... I spent way too long trying to figure out if I caused a memory
> > > leak. I should have scrolled down further! Doesn't look like that part
> > > is related to anything I did.
> > >
> > > There are some flags set to avoid re-parsing fwnodes multiple times.
> > > My guess is that the issue you are seeing has to do with how many of
> > > the in memory structs are reused vs not when an overlay is
> > > applied/removed and some of these flags might not be getting cleared
> > > and this is having a bigger impact with this patch (because the fwnode
> > > links are no longer anchored on "compatible" nodes).
> > >
> > > With/without this patch (let's keep the series) can you look at how
> > > the following things change between each step you do above (add,
> > > remove, retry):
> > > 1) List of directories under /sys/class/devlink
> > > 2) Enable the debug logs inside __fwnode_link_add(),
> > > __fwnode_link_del(), device_link_add()
> > >
> > > My guess is that the final solution would entail clearing
> > > FWNODE_FLAG_LINKS_ADDED for some fwnodes.
> >
> > You replied just as I was about to hit send. So sending this anyway...
> >
> > Ok, I took a closer look and I think it's a bit of a mess. The fact
> > that it even worked for you without this patch is a bit of a
> > coincidence.
> >
> > Let's just take platform devices that are created by
> > driver/of/platform.c as an example.
> >
> > The main problem is that when you add/remove properties to a DT node
> > of an existing platform device, nothing is really done about it at the
> > device level. We don't even unbind and rebind the driver so the driver
> > could make use of the new properties. We don't remove and add back the
> > device so whoever might use the new property will use it. And if you
> > are adding a new node, it'll only trigger any platform device level
> > impact if it's a new node of a "simple-bus" (or similar bus) device.
> >
> > Problem 1:
> > So if you add a new child node to an existing probed device that adds
> > its children explicitly (as in, the parent is not a "simple-bus" like
> > device), nothing will happen. The newly added child device node will
> > get converted into a platform device, not will the parent device
> > notice it. So in your case of adding msiof0_pins, it's just that when
> > the consumer gets the pins, the driver doesn't get involved much and
> > it's the pinctrl framework that reads the DT and figures it out.
> >
> > With this patch, the fwnode links point to the actual resource and the
> > actual parent device inherits them if they don't get converted to a
> > struct device. But since we are adding this msiof0_pins after the
> > parent device has probed, the fwnode link isn't inherited by the
> > parent pinctrl device.
> >
> > Problem 2:
> > So if you add a property to an already bound device, nothing is done
> > by the driver. In your overlay example, if you move the status="okay"
> > line to be the first property you change in the msiof0 spi device,
> > you'll probably see that fw_devlink is no longer the one blocking the
> > probe. This is because the platform device will get added as soon as
> > the status flips from disabled to enabled and at that point fw_devlink
> > will think it has no suppliers and won't do any probe deferring. And
> > then as the new properties get added nothing will happen at the device
> > or fw_devlink level. If the msiof0's spi driver fails immediately with
> > NOT -EPROBE_DEFER when platform device is added because it couldn't
> > find any pinctrl property, then msiof0 will never probe (unless you
> > remove and add the driver). If it had failed with -EPROBE_DEFER, then
> > it might probe again if something else triggers a deferred probe
> > attempt. Clearly, things working/not working based on the order of
> > properties in DT is not a good implementation.
> >
> > Problem 3:
> > What if you enable a previously disabled supplier. There's no way to
> > handle that from a fw_devlink level without re-parsing the entire
> > device tree because existing devices might be consumers now.
> >
> > Anyway, long story short, it's sorta worked due to coincidence and
> > it's quite messy to get it to work correctly.
>
> Several subsystems register notifiers to be informed of the events
> above. E.g. drivers/spi/spi.c:
>
> if (IS_ENABLED(CONFIG_OF_DYNAMIC))
> WARN_ON(of_reconfig_notifier_register(&spi_of_notifier));
> if (IS_ENABLED(CONFIG_ACPI))
> WARN_ON(acpi_reconfig_notifier_register(&spi_acpi_notifier));
>
> So my issue might be triggered using ACPI, too.

Yeah, I did notice this before my email. Here's an ugly hack (at end
of email) to test my theory about Problem 1. I didn't compile test it
(because I should go to bed now), but you get the idea. Can you give
this a shot? It should fix your specific case. Basically for all
overlays (I hope the function is only used for overlays) we assume all
nodes are NOT devices until they actually get added as a device. Don't
review the code, it's not meant to be :)

-Saravana

--- a/drivers/of/dynamic.c
+++ b/drivers/of/dynamic.c
@@ -226,6 +226,7 @@ static void __of_attach_node(struct device_node *np)
np->sibling = np->parent->child;
np->parent->child = np;
of_node_clear_flag(np, OF_DETACHED);
+ np->fwnode.flags |= FWNODE_FLAG_NOT_DEVICE;
}

/**
diff --git a/drivers/of/platform.c b/drivers/of/platform.c
index 81c8c227ab6b..7299cd668e51 100644
--- a/drivers/of/platform.c
+++ b/drivers/of/platform.c
@@ -732,6 +732,7 @@ static int of_platform_notify(struct notifier_block *nb,
if (of_node_check_flag(rd->dn, OF_POPULATED))
return NOTIFY_OK;

+ rd->dn->fwnode.flags &= ~FWNODE_FLAG_NOT_DEVICE;
/* pdev_parent may be NULL when no bus platform device */
pdev_parent = of_find_device_by_node(rd->dn->parent);
pdev = of_platform_device_create(rd->dn, NULL,
diff --git a/drivers/spi/spi.c b/drivers/spi/spi.c
index 15f174f4e056..1de55561b25d 100644
--- a/drivers/spi/spi.c
+++ b/drivers/spi/spi.c
@@ -4436,6 +4436,7 @@ static int of_spi_notify(struct notifier_block
*nb, unsigned long action,
return NOTIFY_OK;
}

+ rd->dn->fwnode.flags &= ~FWNODE_FLAG_NOT_DEVICE;
spi = of_register_spi_device(ctlr, rd->dn);
put_device(&ctlr->dev);

2023-02-08 13:37:34

by Andy Shevchenko

[permalink] [raw]
Subject: Re: [PATCH v3 09/12] of: property: Simplify of_link_to_phandle()

On Tue, Feb 07, 2023 at 11:31:57PM -0800, Saravana Kannan wrote:
> On Tue, Feb 7, 2023 at 6:08 PM Saravana Kannan <[email protected]> wrote:

...

> Another way to get this to work is to:
> 1) unload pinctrl driver, unload spi driver.
> 2) apply overlay
> 3) reload pinctrl driver, reload spi driver.
>
> This is assuming unloading those 2 drivers doesn't crash your system.

Just a side note.

For ACPI case the ACPICA prevents appearing of the same device in the
namespace, so the above is kinda enforced, that's why overlays work better
there (but have a lot of limitations).

--
With Best Regards,
Andy Shevchenko



2023-02-10 10:13:53

by Vladimir Oltean

[permalink] [raw]
Subject: Re: [PATCH v3 00/12] fw_devlink improvements

Hi Saravana,

On Mon, Feb 06, 2023 at 05:41:52PM -0800, Saravana Kannan wrote:
> Vladimir,
>
> Ccing you because DSA's and fw_devlink have known/existing problems
> (still in my TODOs to fix). But I want to make sure this series doesn't
> cause additional problems for DSA.
>
> All,
>
> This patch series improves fw_devlink in the following ways:
>
> 1. It no longer cares about a fwnode having a "compatible" property. It
> figures this out more dynamically. The only expectation is that
> fwnodes that are converted to devices actually get probed by a driver
> for the dependencies to be enforced correctly.
>
> 2. Finer grained dependency tracking. fw_devlink will now create device
> links from the consumer to the actual resource's device (if it has one,
> Eg: gpio_device) instead of the parent supplier device. This improves
> things like async suspend/resume ordering, potentially remove the need
> for frameworks to create device links, more parallelized async probing,
> and better sync_state() tracking.
>
> 3. Handle hardware/software quirks where a child firmware node gets
> populated as a device before its parent firmware node AND actually
> supplies a non-optional resource to the parent firmware node's
> device.
>
> 4. Way more robust at cycle handling (see patch for the insane cases).
>
> 5. Stops depending on OF_POPULATED to figure out some corner cases.
>
> 6. Simplifies the work that needs to be done by the firmware specific
> code.
>
> The v3 series has gone through my usual testing on my end and looks good
> to me.

Booted on an NXP LS1028A (arch/arm64/boot/dts/freescale/fsl-ls1028a-rdb.dts)
and a Turris MOX (arch/arm64/boot/dts/marvell/armada-3720-turris-mox.dts)
with no observed regressions. Is there something specific you would like
me to test?

2023-02-10 19:27:53

by Saravana Kannan

[permalink] [raw]
Subject: Re: [PATCH v3 00/12] fw_devlink improvements

On Fri, Feb 10, 2023 at 2:13 AM Vladimir Oltean <[email protected]> wrote:
>
> Hi Saravana,
>
> On Mon, Feb 06, 2023 at 05:41:52PM -0800, Saravana Kannan wrote:
> > Vladimir,
> >
> > Ccing you because DSA's and fw_devlink have known/existing problems
> > (still in my TODOs to fix). But I want to make sure this series doesn't
> > cause additional problems for DSA.
> >
> > All,
> >
> > This patch series improves fw_devlink in the following ways:
> >
> > 1. It no longer cares about a fwnode having a "compatible" property. It
> > figures this out more dynamically. The only expectation is that
> > fwnodes that are converted to devices actually get probed by a driver
> > for the dependencies to be enforced correctly.
> >
> > 2. Finer grained dependency tracking. fw_devlink will now create device
> > links from the consumer to the actual resource's device (if it has one,
> > Eg: gpio_device) instead of the parent supplier device. This improves
> > things like async suspend/resume ordering, potentially remove the need
> > for frameworks to create device links, more parallelized async probing,
> > and better sync_state() tracking.
> >
> > 3. Handle hardware/software quirks where a child firmware node gets
> > populated as a device before its parent firmware node AND actually
> > supplies a non-optional resource to the parent firmware node's
> > device.
> >
> > 4. Way more robust at cycle handling (see patch for the insane cases).
> >
> > 5. Stops depending on OF_POPULATED to figure out some corner cases.
> >
> > 6. Simplifies the work that needs to be done by the firmware specific
> > code.
> >
> > The v3 series has gone through my usual testing on my end and looks good
> > to me.
>
> Booted on an NXP LS1028A (arch/arm64/boot/dts/freescale/fsl-ls1028a-rdb.dts)
> and a Turris MOX (arch/arm64/boot/dts/marvell/armada-3720-turris-mox.dts)
> with no observed regressions.

Thanks for testing Vladimir!

> Is there something specific you would like
> me to test?


Not really, I just want to make sure the common DSA architectures
don't hit any regression. In the hardware you tested, are there cases
of PHYs where the supplier is the parent MDIO? I remember that being
the only case where I needed special casing
(FWNODE_FLAG_NEEDS_CHILD_BOUND_ON_ADD) in fw_devlink -- so it'll be
good to make sure I didn't accidentally break anything there.


-Saravana

2023-02-10 21:08:23

by Vladimir Oltean

[permalink] [raw]
Subject: Re: [PATCH v3 00/12] fw_devlink improvements

On Fri, Feb 10, 2023 at 11:27:11AM -0800, Saravana Kannan wrote:
> On Fri, Feb 10, 2023 at 2:13 AM Vladimir Oltean <[email protected]> wrote:
> >
> > Hi Saravana,
> >
> > On Mon, Feb 06, 2023 at 05:41:52PM -0800, Saravana Kannan wrote:
> > > Vladimir,
> > >
> > > Ccing you because DSA's and fw_devlink have known/existing problems
> > > (still in my TODOs to fix). But I want to make sure this series doesn't
> > > cause additional problems for DSA.
> > >
> > > All,
> > >
> > > This patch series improves fw_devlink in the following ways:
> > >
> > > 1. It no longer cares about a fwnode having a "compatible" property. It
> > > figures this out more dynamically. The only expectation is that
> > > fwnodes that are converted to devices actually get probed by a driver
> > > for the dependencies to be enforced correctly.
> > >
> > > 2. Finer grained dependency tracking. fw_devlink will now create device
> > > links from the consumer to the actual resource's device (if it has one,
> > > Eg: gpio_device) instead of the parent supplier device. This improves
> > > things like async suspend/resume ordering, potentially remove the need
> > > for frameworks to create device links, more parallelized async probing,
> > > and better sync_state() tracking.
> > >
> > > 3. Handle hardware/software quirks where a child firmware node gets
> > > populated as a device before its parent firmware node AND actually
> > > supplies a non-optional resource to the parent firmware node's
> > > device.
> > >
> > > 4. Way more robust at cycle handling (see patch for the insane cases).
> > >
> > > 5. Stops depending on OF_POPULATED to figure out some corner cases.
> > >
> > > 6. Simplifies the work that needs to be done by the firmware specific
> > > code.
> > >
> > > The v3 series has gone through my usual testing on my end and looks good
> > > to me.
> >
> > Booted on an NXP LS1028A (arch/arm64/boot/dts/freescale/fsl-ls1028a-rdb.dts)
> > and a Turris MOX (arch/arm64/boot/dts/marvell/armada-3720-turris-mox.dts)
> > with no observed regressions.
>
> Thanks for testing Vladimir!
>
> > Is there something specific you would like
> > me to test?
>
> Not really, I just want to make sure the common DSA architectures
> don't hit any regression. In the hardware you tested, are there cases
> of PHYs where the supplier is the parent MDIO? I remember that being
> the only case where I needed special casing
> (FWNODE_FLAG_NEEDS_CHILD_BOUND_ON_ADD) in fw_devlink -- so it'll be
> good to make sure I didn't accidentally break anything there.
>
> -Saravana

Yes and no (I never had a system which depended on FWNODE_FLAG_NEEDS_CHILD_BOUND_ON_ADD).

Yes, because well, yes, in arch/arm64/boot/dts/marvell/armada-3720-turris-mox.dts,
the PHYs will depend on interrupts provided by their (parent) switch. However this
is not explicit in the device tree. To make it explicit, one would need to add:

diff --git a/arch/arm64/boot/dts/marvell/armada-3720-turris-mox.dts b/arch/arm64/boot/dts/marvell/armada-3720-turris-mox.dts
index cd0988317623..d789cda49e35 100644
--- a/arch/arm64/boot/dts/marvell/armada-3720-turris-mox.dts
+++ b/arch/arm64/boot/dts/marvell/armada-3720-turris-mox.dts
@@ -305,12 +305,14 @@ phy1: ethernet-phy@1 {
};

/* switch nodes are enabled by U-Boot if modules are present */
- switch0@10 {
+ switch0_peridot: switch0@10 {
compatible = "marvell,mv88e6190";
reg = <0x10>;
dsa,member = <0 0>;
interrupt-parent = <&moxtet>;
interrupts = <MOXTET_IRQ_PERIDOT(0)>;
+ interrupt-controller;
+ #interrupt-cells = <2>;
status = "disabled";

mdio {
@@ -319,34 +321,42 @@ mdio {

switch0phy1: switch0phy1@1 {
reg = <0x1>;
+ interrupts-extended = <&switch0_peridot 1 IRQ_TYPE_LEVEL_HIGH>;
};

switch0phy2: switch0phy2@2 {
reg = <0x2>;
+ interrupts-extended = <&switch0_peridot 2 IRQ_TYPE_LEVEL_HIGH>;
};

switch0phy3: switch0phy3@3 {
reg = <0x3>;
+ interrupts-extended = <&switch0_peridot 3 IRQ_TYPE_LEVEL_HIGH>;
};

switch0phy4: switch0phy4@4 {
reg = <0x4>;
+ interrupts-extended = <&switch0_peridot 4 IRQ_TYPE_LEVEL_HIGH>;
};

switch0phy5: switch0phy5@5 {
reg = <0x5>;
+ interrupts-extended = <&switch0_peridot 5 IRQ_TYPE_LEVEL_HIGH>;
};

switch0phy6: switch0phy6@6 {
reg = <0x6>;
+ interrupts-extended = <&switch0_peridot 6 IRQ_TYPE_LEVEL_HIGH>;
};

switch0phy7: switch0phy7@7 {
reg = <0x7>;
+ interrupts-extended = <&switch0_peridot 7 IRQ_TYPE_LEVEL_HIGH>;
};

switch0phy8: switch0phy8@8 {
reg = <0x8>;
+ interrupts-extended = <&switch0_peridot 8 IRQ_TYPE_LEVEL_HIGH>;
};
};

@@ -430,12 +440,14 @@ port-sfp@a {
};
};

- switch0@2 {
+ switch0_topaz: switch0@2 {
compatible = "marvell,mv88e6085";
reg = <0x2>;
dsa,member = <0 0>;
interrupt-parent = <&moxtet>;
interrupts = <MOXTET_IRQ_TOPAZ>;
+ interrupt-controller;
+ #interrupt-cells = <2>;
status = "disabled";

mdio {
@@ -444,18 +456,22 @@ mdio {

switch0phy1_topaz: switch0phy1@11 {
reg = <0x11>;
+ interrupts-extended = <&switch0_topaz 0x11 IRQ_TYPE_LEVEL_HIGH>;
};

switch0phy2_topaz: switch0phy2@12 {
reg = <0x12>;
+ interrupts-extended = <&switch0_topaz 0x12 IRQ_TYPE_LEVEL_HIGH>;
};

switch0phy3_topaz: switch0phy3@13 {
reg = <0x13>;
+ interrupts-extended = <&switch0_topaz 0x13 IRQ_TYPE_LEVEL_HIGH>;
};

switch0phy4_topaz: switch0phy4@14 {
reg = <0x14>;
+ interrupts-extended = <&switch0_topaz 0x14 IRQ_TYPE_LEVEL_HIGH>;
};
};

@@ -497,12 +513,14 @@ port@5 {
};
};

- switch1@11 {
+ switch1_peridot: switch1@11 {
compatible = "marvell,mv88e6190";
reg = <0x11>;
dsa,member = <0 1>;
interrupt-parent = <&moxtet>;
interrupts = <MOXTET_IRQ_PERIDOT(1)>;
+ interrupt-controller;
+ #interrupt-cells = <2>;
status = "disabled";

mdio {
@@ -511,34 +529,42 @@ mdio {

switch1phy1: switch1phy1@1 {
reg = <0x1>;
+ interrupts-extended = <&switch1_peridot 1 IRQ_TYPE_LEVEL_HIGH>;
};

switch1phy2: switch1phy2@2 {
reg = <0x2>;
+ interrupts-extended = <&switch1_peridot 2 IRQ_TYPE_LEVEL_HIGH>;
};

switch1phy3: switch1phy3@3 {
reg = <0x3>;
+ interrupts-extended = <&switch1_peridot 3 IRQ_TYPE_LEVEL_HIGH>;
};

switch1phy4: switch1phy4@4 {
reg = <0x4>;
+ interrupts-extended = <&switch1_peridot 4 IRQ_TYPE_LEVEL_HIGH>;
};

switch1phy5: switch1phy5@5 {
reg = <0x5>;
+ interrupts-extended = <&switch1_peridot 5 IRQ_TYPE_LEVEL_HIGH>;
};

switch1phy6: switch1phy6@6 {
reg = <0x6>;
+ interrupts-extended = <&switch1_peridot 6 IRQ_TYPE_LEVEL_HIGH>;
};

switch1phy7: switch1phy7@7 {
reg = <0x7>;
+ interrupts-extended = <&switch1_peridot 7 IRQ_TYPE_LEVEL_HIGH>;
};

switch1phy8: switch1phy8@8 {
reg = <0x8>;
+ interrupts-extended = <&switch1_peridot 8 IRQ_TYPE_LEVEL_HIGH>;
};
};

@@ -622,12 +648,14 @@ port-sfp@a {
};
};

- switch1@2 {
+ switch1_topaz: switch1@2 {
compatible = "marvell,mv88e6085";
reg = <0x2>;
dsa,member = <0 1>;
interrupt-parent = <&moxtet>;
interrupts = <MOXTET_IRQ_TOPAZ>;
+ interrupt-controller;
+ #interrupt-cells = <2>;
status = "disabled";

mdio {
@@ -636,18 +664,22 @@ mdio {

switch1phy1_topaz: switch1phy1@11 {
reg = <0x11>;
+ interrupts-extended = <&switch1_topaz 0x11 IRQ_TYPE_LEVEL_HIGH>;
};

switch1phy2_topaz: switch1phy2@12 {
reg = <0x12>;
+ interrupts-extended = <&switch1_topaz 0x12 IRQ_TYPE_LEVEL_HIGH>;
};

switch1phy3_topaz: switch1phy3@13 {
reg = <0x13>;
+ interrupts-extended = <&switch1_topaz 0x13 IRQ_TYPE_LEVEL_HIGH>;
};

switch1phy4_topaz: switch1phy4@14 {
reg = <0x14>;
+ interrupts-extended = <&switch1_topaz 0x14 IRQ_TYPE_LEVEL_HIGH>;
};
};

@@ -689,12 +721,14 @@ port@5 {
};
};

- switch2@12 {
+ switch2_peridot: switch2@12 {
compatible = "marvell,mv88e6190";
reg = <0x12>;
dsa,member = <0 2>;
interrupt-parent = <&moxtet>;
interrupts = <MOXTET_IRQ_PERIDOT(2)>;
+ interrupt-controller;
+ #interrupt-cells = <2>;
status = "disabled";

mdio {
@@ -703,34 +737,42 @@ mdio {

switch2phy1: switch2phy1@1 {
reg = <0x1>;
+ interrupts-extended = <&switch2_peridot 1 IRQ_TYPE_LEVEL_HIGH>;
};

switch2phy2: switch2phy2@2 {
reg = <0x2>;
+ interrupts-extended = <&switch2_peridot 2 IRQ_TYPE_LEVEL_HIGH>;
};

switch2phy3: switch2phy3@3 {
reg = <0x3>;
+ interrupts-extended = <&switch2_peridot 3 IRQ_TYPE_LEVEL_HIGH>;
};

switch2phy4: switch2phy4@4 {
reg = <0x4>;
+ interrupts-extended = <&switch2_peridot 4 IRQ_TYPE_LEVEL_HIGH>;
};

switch2phy5: switch2phy5@5 {
reg = <0x5>;
+ interrupts-extended = <&switch2_peridot 5 IRQ_TYPE_LEVEL_HIGH>;
};

switch2phy6: switch2phy6@6 {
reg = <0x6>;
+ interrupts-extended = <&switch2_peridot 6 IRQ_TYPE_LEVEL_HIGH>;
};

switch2phy7: switch2phy7@7 {
reg = <0x7>;
+ interrupts-extended = <&switch2_peridot 7 IRQ_TYPE_LEVEL_HIGH>;
};

switch2phy8: switch2phy8@8 {
reg = <0x8>;
+ interrupts-extended = <&switch2_peridot 8 IRQ_TYPE_LEVEL_HIGH>;
};
};

@@ -805,12 +847,14 @@ port-sfp@a {
};
};

- switch2@2 {
+ switch2_topaz: switch2@2 {
compatible = "marvell,mv88e6085";
reg = <0x2>;
dsa,member = <0 2>;
interrupt-parent = <&moxtet>;
interrupts = <MOXTET_IRQ_TOPAZ>;
+ interrupt-controller;
+ #interrupt-cells = <2>;
status = "disabled";

mdio {
@@ -819,18 +863,22 @@ mdio {

switch2phy1_topaz: switch2phy1@11 {
reg = <0x11>;
+ interrupts-extended = <&switch2_topaz 0x11 IRQ_TYPE_LEVEL_HIGH>;
};

switch2phy2_topaz: switch2phy2@12 {
reg = <0x12>;
+ interrupts-extended = <&switch2_topaz 0x12 IRQ_TYPE_LEVEL_HIGH>;
};

switch2phy3_topaz: switch2phy3@13 {
reg = <0x13>;
+ interrupts-extended = <&switch2_topaz 0x13 IRQ_TYPE_LEVEL_HIGH>;
};

switch2phy4_topaz: switch2phy4@14 {
reg = <0x14>;
+ interrupts-extended = <&switch2_topaz 0x14 IRQ_TYPE_LEVEL_HIGH>;
};
};


However, as I had explained in one of the first discussions here:
https://lore.kernel.org/netdev/20210901012826.iuy2bhvkrgahhrl7@skbuf/

it was always hit-or-miss whether the above device tree had an issue
with fw_devlink or not: it depended on how the driver was written (and
the mv88e6xxx switch driver was tricking the fw_devlink logic from that
time to drop the device links because of an unrelated -EPROBE_DEFER).

What I had done to "untrick" fw_devlink so that I could see the issue
(which was originally reported by Alvin Šipraga) was to modify the
mv88e6xxx driver, and change the placement of mv88e6xxx_mdios_register()
to a point after which we will never hit -EPROBE_DEFER (from driver probe()
to the dsa_switch_ops :: setup() method):

diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index 0a5d6c7bb128..48650465660d 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -3672,11 +3672,18 @@ static int mv88e6390_setup_errata(struct mv88e6xxx_chip *chip)
return mv88e6xxx_software_reset(chip);
}

+static int mv88e6xxx_mdios_register(struct mv88e6xxx_chip *chip,
+ struct device_node *np);
+static void mv88e6xxx_mdios_unregister(struct mv88e6xxx_chip *chip);
+
static void mv88e6xxx_teardown(struct dsa_switch *ds)
{
+ struct mv88e6xxx_chip *chip = ds->priv;
+
mv88e6xxx_teardown_devlink_params(ds);
dsa_devlink_resources_unregister(ds);
mv88e6xxx_teardown_devlink_regions_global(ds);
+ mv88e6xxx_mdios_unregister(chip);
}

static int mv88e6xxx_setup(struct dsa_switch *ds)
@@ -3686,6 +3693,10 @@ static int mv88e6xxx_setup(struct dsa_switch *ds)
int err;
int i;

+ err = mv88e6xxx_mdios_register(chip, chip->dev->of_node);
+ if (err)
+ return err;
+
chip->ds = ds;
ds->slave_mii_bus = mv88e6xxx_default_mdio_bus(chip);

@@ -3811,8 +3822,10 @@ static int mv88e6xxx_setup(struct dsa_switch *ds)
unlock:
mv88e6xxx_reg_unlock(chip);

- if (err)
+ if (err) {
+ mv88e6xxx_mdios_unregister(chip);
return err;
+ }

/* Have to be called without holding the register lock, since
* they take the devlink lock, and we later take the locks in
@@ -3837,6 +3850,7 @@ static int mv88e6xxx_setup(struct dsa_switch *ds)
mv88e6xxx_teardown_devlink_params(ds);
out_resources:
dsa_devlink_resources_unregister(ds);
+ mv88e6xxx_mdios_unregister(chip);

return err;
}
@@ -7220,18 +7234,12 @@ static int mv88e6xxx_probe(struct mdio_device *mdiodev)
if (err)
goto out_g1_atu_prob_irq;

- err = mv88e6xxx_mdios_register(chip, np);
- if (err)
- goto out_g1_vtu_prob_irq;
-
err = mv88e6xxx_register_switch(chip);
if (err)
- goto out_mdio;
+ goto out_g1_vtu_prob_irq;

return 0;

-out_mdio:
- mv88e6xxx_mdios_unregister(chip);
out_g1_vtu_prob_irq:
mv88e6xxx_g1_vtu_prob_irq_free(chip);
out_g1_atu_prob_irq:
@@ -7268,7 +7276,6 @@ static void mv88e6xxx_remove(struct mdio_device *mdiodev)

mv88e6xxx_phy_destroy(chip);
mv88e6xxx_unregister_switch(chip);
- mv88e6xxx_mdios_unregister(chip);

mv88e6xxx_g1_vtu_prob_irq_free(chip);
mv88e6xxx_g1_atu_prob_irq_free(chip);

After applying both of the above changes on top of yours, I confirm that
the PHYs on the mv88e6xxx on Turris MOX still probe with their specific
PHY driver rather than the generic one, and with interrupts (not poll mode):

[ 6.125894] mv88e6085 d0032004.mdio-mii:11 lan9 (uninitialized): PHY [!soc!internal-regs@d0000000!mdio@32004!switch1@11!mdio:01] driver [Marvell 88E6390 Family] (irq=49)
[ 6.222024] mv88e6085 d0032004.mdio-mii:11 lan10 (uninitialized): PHY [!soc!internal-regs@d0000000!mdio@32004!switch1@11!mdio:02] driver [Marvell 88E6390 Family] (irq=50)
[ 6.277554] mv88e6085 d0032004.mdio-mii:11 lan11 (uninitialized): PHY [!soc!internal-regs@d0000000!mdio@32004!switch1@11!mdio:03] driver [Marvell 88E6390 Family] (irq=51)
[ 6.361556] mv88e6085 d0032004.mdio-mii:11 lan12 (uninitialized): PHY [!soc!internal-regs@d0000000!mdio@32004!switch1@11!mdio:04] driver [Marvell 88E6390 Family] (irq=52)
[ 6.445574] mv88e6085 d0032004.mdio-mii:11 lan13 (uninitialized): PHY [!soc!internal-regs@d0000000!mdio@32004!switch1@11!mdio:05] driver [Marvell 88E6390 Family] (irq=53)
[ 6.529560] mv88e6085 d0032004.mdio-mii:11 lan14 (uninitialized): PHY [!soc!internal-regs@d0000000!mdio@32004!switch1@11!mdio:06] driver [Marvell 88E6390 Family] (irq=54)
[ 6.589555] mv88e6085 d0032004.mdio-mii:11 lan15 (uninitialized): PHY [!soc!internal-regs@d0000000!mdio@32004!switch1@11!mdio:07] driver [Marvell 88E6390 Family] (irq=55)
[ 6.673559] mv88e6085 d0032004.mdio-mii:11 lan16 (uninitialized): PHY [!soc!internal-regs@d0000000!mdio@32004!switch1@11!mdio:08] driver [Marvell 88E6390 Family] (irq=56)
[ 6.733558] mv88e6085 d0032004.mdio-mii:12 lan17 (uninitialized): PHY [!soc!internal-regs@d0000000!mdio@32004!switch2@12!mdio:01] driver [Marvell 88E6390 Family] (irq=74)
[ 6.817557] mv88e6085 d0032004.mdio-mii:12 lan18 (uninitialized): PHY [!soc!internal-regs@d0000000!mdio@32004!switch2@12!mdio:02] driver [Marvell 88E6390 Family] (irq=75)
[ 6.889628] mv88e6085 d0032004.mdio-mii:12 lan19 (uninitialized): PHY [!soc!internal-regs@d0000000!mdio@32004!switch2@12!mdio:03] driver [Marvell 88E6390 Family] (irq=76)
[ 6.973593] mv88e6085 d0032004.mdio-mii:12 lan20 (uninitialized): PHY [!soc!internal-regs@d0000000!mdio@32004!switch2@12!mdio:04] driver [Marvell 88E6390 Family] (irq=77)
[ 7.057572] mv88e6085 d0032004.mdio-mii:12 lan21 (uninitialized): PHY [!soc!internal-regs@d0000000!mdio@32004!switch2@12!mdio:05] driver [Marvell 88E6390 Family] (irq=78)
[ 7.109547] mv88e6085 d0032004.mdio-mii:12 lan22 (uninitialized): PHY [!soc!internal-regs@d0000000!mdio@32004!switch2@12!mdio:06] driver [Marvell 88E6390 Family] (irq=79)
[ 7.193553] mv88e6085 d0032004.mdio-mii:12 lan23 (uninitialized): PHY [!soc!internal-regs@d0000000!mdio@32004!switch2@12!mdio:07] driver [Marvell 88E6390 Family] (irq=80)
[ 7.277550] mv88e6085 d0032004.mdio-mii:12 lan24 (uninitialized): PHY [!soc!internal-regs@d0000000!mdio@32004!switch2@12!mdio:08] driver [Marvell 88E6390 Family] (irq=81)
[ 7.365556] mv88e6085 d0032004.mdio-mii:10 lan1 (uninitialized): PHY [!soc!internal-regs@d0000000!mdio@32004!switch0@10!mdio:01] driver [Marvell 88E6390 Family] (irq=109)
[ 7.421549] mv88e6085 d0032004.mdio-mii:10 lan2 (uninitialized): PHY [!soc!internal-regs@d0000000!mdio@32004!switch0@10!mdio:02] driver [Marvell 88E6390 Family] (irq=110)
[ 7.505554] mv88e6085 d0032004.mdio-mii:10 lan3 (uninitialized): PHY [!soc!internal-regs@d0000000!mdio@32004!switch0@10!mdio:03] driver [Marvell 88E6390 Family] (irq=111)
[ 7.589571] mv88e6085 d0032004.mdio-mii:10 lan4 (uninitialized): PHY [!soc!internal-regs@d0000000!mdio@32004!switch0@10!mdio:04] driver [Marvell 88E6390 Family] (irq=112)
[ 7.673560] mv88e6085 d0032004.mdio-mii:10 lan5 (uninitialized): PHY [!soc!internal-regs@d0000000!mdio@32004!switch0@10!mdio:05] driver [Marvell 88E6390 Family] (irq=113)
[ 7.733547] mv88e6085 d0032004.mdio-mii:10 lan6 (uninitialized): PHY [!soc!internal-regs@d0000000!mdio@32004!switch0@10!mdio:06] driver [Marvell 88E6390 Family] (irq=114)
[ 7.817555] mv88e6085 d0032004.mdio-mii:10 lan7 (uninitialized): PHY [!soc!internal-regs@d0000000!mdio@32004!switch0@10!mdio:07] driver [Marvell 88E6390 Family] (irq=115)
[ 7.901572] mv88e6085 d0032004.mdio-mii:10 lan8 (uninitialized): PHY [!soc!internal-regs@d0000000!mdio@32004!switch0@10!mdio:08] driver [Marvell 88E6390 Family] (irq=116)

even though I am seeing these error messages earlier in the boot process (maybe this is something to look into):

[ 0.910219] mdio_bus d0032004.mdio-mii:10: Failed to create device link with d0032004.mdio-mii:10
[ 0.910228] mdio_bus d0032004.mdio-mii:10: Failed to create device link with d0032004.mdio-mii:10
[ 0.910237] mdio_bus d0032004.mdio-mii:10: Failed to create device link with d0032004.mdio-mii:10
[ 0.910244] mdio_bus d0032004.mdio-mii:10: Failed to create device link with d0032004.mdio-mii:10
[ 0.910251] mdio_bus d0032004.mdio-mii:10: Failed to create device link with d0032004.mdio-mii:10
[ 0.910259] mdio_bus d0032004.mdio-mii:10: Failed to create device link with d0032004.mdio-mii:10
[ 0.910266] mdio_bus d0032004.mdio-mii:10: Failed to create device link with d0032004.mdio-mii:10
[ 0.910273] mdio_bus d0032004.mdio-mii:10: Failed to create device link with d0032004.mdio-mii:10
[ 0.910936] mv88e6085 d0032004.mdio-mii:10: switch 0x3900 detected: Marvell 88E6390, revision 1
[ 0.928360] mdio_bus d0032004.mdio-mii:11: Failed to create device link with d0032004.mdio-mii:11
[ 0.928380] mdio_bus d0032004.mdio-mii:11: Failed to create device link with d0032004.mdio-mii:11
[ 0.928388] mdio_bus d0032004.mdio-mii:11: Failed to create device link with d0032004.mdio-mii:11
[ 0.928396] mdio_bus d0032004.mdio-mii:11: Failed to create device link with d0032004.mdio-mii:11
[ 0.928403] mdio_bus d0032004.mdio-mii:11: Failed to create device link with d0032004.mdio-mii:11
[ 0.928410] mdio_bus d0032004.mdio-mii:11: Failed to create device link with d0032004.mdio-mii:11
[ 0.928418] mdio_bus d0032004.mdio-mii:11: Failed to create device link with d0032004.mdio-mii:11
[ 0.928425] mdio_bus d0032004.mdio-mii:11: Failed to create device link with d0032004.mdio-mii:11
[ 0.928993] mv88e6085 d0032004.mdio-mii:11: switch 0x1900 detected: Marvell 88E6190, revision 1
[ 0.943306] mdio_bus d0032004.mdio-mii:12: Failed to create device link with d0032004.mdio-mii:12
[ 0.943327] mdio_bus d0032004.mdio-mii:12: Failed to create device link with d0032004.mdio-mii:12
[ 0.943334] mdio_bus d0032004.mdio-mii:12: Failed to create device link with d0032004.mdio-mii:12
[ 0.943342] mdio_bus d0032004.mdio-mii:12: Failed to create device link with d0032004.mdio-mii:12
[ 0.943349] mdio_bus d0032004.mdio-mii:12: Failed to create device link with d0032004.mdio-mii:12
[ 0.943357] mdio_bus d0032004.mdio-mii:12: Failed to create device link with d0032004.mdio-mii:12
[ 0.943364] mdio_bus d0032004.mdio-mii:12: Failed to create device link with d0032004.mdio-mii:12
[ 0.943371] mdio_bus d0032004.mdio-mii:12: Failed to create device link with d0032004.mdio-mii:12
[ 0.943879] mv88e6085 d0032004.mdio-mii:12: switch 0x1900 detected: Marvell 88E6190, revision 1


If _on top_ of all the above, I also remove the logic that sets FWNODE_FLAG_NEEDS_CHILD_BOUND_ON_ADD:

drivers/net/phy/mdio_bus.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/phy/mdio_bus.c b/drivers/net/phy/mdio_bus.c
index 00d5bcdf0e6f..4d4c42771d37 100644
--- a/drivers/net/phy/mdio_bus.c
+++ b/drivers/net/phy/mdio_bus.c
@@ -665,9 +665,9 @@ int __mdiobus_register(struct mii_bus *bus, struct module *owner)
if (!bus->read && !bus->read_c45)
return -EINVAL;

- if (bus->parent && bus->parent->of_node)
- bus->parent->of_node->fwnode.flags |=
- FWNODE_FLAG_NEEDS_CHILD_BOUND_ON_ADD;
+// if (bus->parent && bus->parent->of_node)
+// bus->parent->of_node->fwnode.flags |=
+// FWNODE_FLAG_NEEDS_CHILD_BOUND_ON_ADD;

WARN(bus->state != MDIOBUS_ALLOCATED &&
bus->state != MDIOBUS_UNREGISTERED,

then *finally* I get something approximating Alvin's reported issue.
In my case, one switch out of 3 gets its PHYs bound to the Generic PHY
driver (why not all is a story for another time):

[ 6.106204] mv88e6085 d0032004.mdio-mii:11 lan9 (uninitialized): PHY [!soc!internal-regs@d0000000!mdio@32004!switch1@11!mdio:01] driver [Marvell 88E6390 Family] (irq=49)
[ 6.193561] mv88e6085 d0032004.mdio-mii:11 lan10 (uninitialized): PHY [!soc!internal-regs@d0000000!mdio@32004!switch1@11!mdio:02] driver [Marvell 88E6390 Family] (irq=50)
[ 6.249654] mv88e6085 d0032004.mdio-mii:11 lan11 (uninitialized): PHY [!soc!internal-regs@d0000000!mdio@32004!switch1@11!mdio:03] driver [Marvell 88E6390 Family] (irq=51)
[ 6.333580] mv88e6085 d0032004.mdio-mii:11 lan12 (uninitialized): PHY [!soc!internal-regs@d0000000!mdio@32004!switch1@11!mdio:04] driver [Marvell 88E6390 Family] (irq=52)
[ 6.417577] mv88e6085 d0032004.mdio-mii:11 lan13 (uninitialized): PHY [!soc!internal-regs@d0000000!mdio@32004!switch1@11!mdio:05] driver [Marvell 88E6390 Family] (irq=53)
[ 6.501561] mv88e6085 d0032004.mdio-mii:11 lan14 (uninitialized): PHY [!soc!internal-regs@d0000000!mdio@32004!switch1@11!mdio:06] driver [Marvell 88E6390 Family] (irq=54)
[ 6.561655] mv88e6085 d0032004.mdio-mii:11 lan15 (uninitialized): PHY [!soc!internal-regs@d0000000!mdio@32004!switch1@11!mdio:07] driver [Marvell 88E6390 Family] (irq=55)
[ 6.645583] mv88e6085 d0032004.mdio-mii:11 lan16 (uninitialized): PHY [!soc!internal-regs@d0000000!mdio@32004!switch1@11!mdio:08] driver [Marvell 88E6390 Family] (irq=56)
[ 6.733557] mv88e6085 d0032004.mdio-mii:12 lan17 (uninitialized): PHY [!soc!internal-regs@d0000000!mdio@32004!switch2@12!mdio:01] driver [Marvell 88E6390 Family] (irq=74)
[ 6.817563] mv88e6085 d0032004.mdio-mii:12 lan18 (uninitialized): PHY [!soc!internal-regs@d0000000!mdio@32004!switch2@12!mdio:02] driver [Marvell 88E6390 Family] (irq=75)
[ 6.873653] mv88e6085 d0032004.mdio-mii:12 lan19 (uninitialized): PHY [!soc!internal-regs@d0000000!mdio@32004!switch2@12!mdio:03] driver [Marvell 88E6390 Family] (irq=76)
[ 6.957582] mv88e6085 d0032004.mdio-mii:12 lan20 (uninitialized): PHY [!soc!internal-regs@d0000000!mdio@32004!switch2@12!mdio:04] driver [Marvell 88E6390 Family] (irq=77)
[ 7.041567] mv88e6085 d0032004.mdio-mii:12 lan21 (uninitialized): PHY [!soc!internal-regs@d0000000!mdio@32004!switch2@12!mdio:05] driver [Marvell 88E6390 Family] (irq=78)
[ 7.093561] mv88e6085 d0032004.mdio-mii:12 lan22 (uninitialized): PHY [!soc!internal-regs@d0000000!mdio@32004!switch2@12!mdio:06] driver [Marvell 88E6390 Family] (irq=79)
[ 7.177476] mv88e6085 d0032004.mdio-mii:12 lan23 (uninitialized): PHY [!soc!internal-regs@d0000000!mdio@32004!switch2@12!mdio:07] driver [Marvell 88E6390 Family] (irq=80)
[ 7.245560] mv88e6085 d0032004.mdio-mii:12 lan24 (uninitialized): PHY [!soc!internal-regs@d0000000!mdio@32004!switch2@12!mdio:08] driver [Marvell 88E6390 Family] (irq=81)
[ 7.269178] mv88e6085 d0032004.mdio-mii:10 lan1 (uninitialized): PHY [!soc!internal-regs@d0000000!mdio@32004!switch0@10!mdio:01] driver [Generic PHY] (irq=POLL)
[ 7.288234] mv88e6085 d0032004.mdio-mii:10 lan2 (uninitialized): PHY [!soc!internal-regs@d0000000!mdio@32004!switch0@10!mdio:02] driver [Generic PHY] (irq=POLL)
[ 7.307295] mv88e6085 d0032004.mdio-mii:10 lan3 (uninitialized): PHY [!soc!internal-regs@d0000000!mdio@32004!switch0@10!mdio:03] driver [Generic PHY] (irq=POLL)
[ 7.326342] mv88e6085 d0032004.mdio-mii:10 lan4 (uninitialized): PHY [!soc!internal-regs@d0000000!mdio@32004!switch0@10!mdio:04] driver [Generic PHY] (irq=POLL)
[ 7.345384] mv88e6085 d0032004.mdio-mii:10 lan5 (uninitialized): PHY [!soc!internal-regs@d0000000!mdio@32004!switch0@10!mdio:05] driver [Generic PHY] (irq=POLL)
[ 7.364449] mv88e6085 d0032004.mdio-mii:10 lan6 (uninitialized): PHY [!soc!internal-regs@d0000000!mdio@32004!switch0@10!mdio:06] driver [Generic PHY] (irq=POLL)
[ 7.383496] mv88e6085 d0032004.mdio-mii:10 lan7 (uninitialized): PHY [!soc!internal-regs@d0000000!mdio@32004!switch0@10!mdio:07] driver [Generic PHY] (irq=POLL)
[ 7.402563] mv88e6085 d0032004.mdio-mii:10 lan8 (uninitialized): PHY [!soc!internal-regs@d0000000!mdio@32004!switch0@10!mdio:08] driver [Generic PHY] (irq=POLL)

So I guess that FWNODE_FLAG_NEEDS_CHILD_BOUND_ON_ADD does something.

OTOH, if I apply just my patch to remove the FWNODE_FLAG_NEEDS_CHILD_BOUND_ON_ADD,
but without making the device tree change or the mv88e6xxx driver change, the PHYs
still probe with the correct driver and with interrupts (i.e. they don't get their
probing deferred). This also seems to prove my point that my patches to bring the
Turris MOX to a testable state for this fwnode flag are indeed required.

HTH.

2023-02-10 21:33:16

by Saravana Kannan

[permalink] [raw]
Subject: Re: [PATCH v3 00/12] fw_devlink improvements

On Fri, Feb 10, 2023 at 1:08 PM Vladimir Oltean <[email protected]> wrote:
>
> On Fri, Feb 10, 2023 at 11:27:11AM -0800, Saravana Kannan wrote:
> > On Fri, Feb 10, 2023 at 2:13 AM Vladimir Oltean <[email protected]> wrote:
> > >
> > > Hi Saravana,
> > >
> > > On Mon, Feb 06, 2023 at 05:41:52PM -0800, Saravana Kannan wrote:
> > > > Vladimir,
> > > >
> > > > Ccing you because DSA's and fw_devlink have known/existing problems
> > > > (still in my TODOs to fix). But I want to make sure this series doesn't
> > > > cause additional problems for DSA.
> > > >
> > > > All,
> > > >
> > > > This patch series improves fw_devlink in the following ways:
> > > >
> > > > 1. It no longer cares about a fwnode having a "compatible" property. It
> > > > figures this out more dynamically. The only expectation is that
> > > > fwnodes that are converted to devices actually get probed by a driver
> > > > for the dependencies to be enforced correctly.
> > > >
> > > > 2. Finer grained dependency tracking. fw_devlink will now create device
> > > > links from the consumer to the actual resource's device (if it has one,
> > > > Eg: gpio_device) instead of the parent supplier device. This improves
> > > > things like async suspend/resume ordering, potentially remove the need
> > > > for frameworks to create device links, more parallelized async probing,
> > > > and better sync_state() tracking.
> > > >
> > > > 3. Handle hardware/software quirks where a child firmware node gets
> > > > populated as a device before its parent firmware node AND actually
> > > > supplies a non-optional resource to the parent firmware node's
> > > > device.
> > > >
> > > > 4. Way more robust at cycle handling (see patch for the insane cases).
> > > >
> > > > 5. Stops depending on OF_POPULATED to figure out some corner cases.
> > > >
> > > > 6. Simplifies the work that needs to be done by the firmware specific
> > > > code.
> > > >
> > > > The v3 series has gone through my usual testing on my end and looks good
> > > > to me.
> > >
> > > Booted on an NXP LS1028A (arch/arm64/boot/dts/freescale/fsl-ls1028a-rdb.dts)
> > > and a Turris MOX (arch/arm64/boot/dts/marvell/armada-3720-turris-mox.dts)
> > > with no observed regressions.
> >
> > Thanks for testing Vladimir!
> >
> > > Is there something specific you would like
> > > me to test?
> >
> > Not really, I just want to make sure the common DSA architectures
> > don't hit any regression. In the hardware you tested, are there cases
> > of PHYs where the supplier is the parent MDIO? I remember that being
> > the only case where I needed special casing
> > (FWNODE_FLAG_NEEDS_CHILD_BOUND_ON_ADD) in fw_devlink -- so it'll be
> > good to make sure I didn't accidentally break anything there.
> >
> > -Saravana
>
> Yes and no (I never had a system which depended on FWNODE_FLAG_NEEDS_CHILD_BOUND_ON_ADD).
>
> Yes, because well, yes, in arch/arm64/boot/dts/marvell/armada-3720-turris-mox.dts,
> the PHYs will depend on interrupts provided by their (parent) switch. However this
> is not explicit in the device tree. To make it explicit, one would need to add:
>
> diff --git a/arch/arm64/boot/dts/marvell/armada-3720-turris-mox.dts b/arch/arm64/boot/dts/marvell/armada-3720-turris-mox.dts
> index cd0988317623..d789cda49e35 100644
> --- a/arch/arm64/boot/dts/marvell/armada-3720-turris-mox.dts
> +++ b/arch/arm64/boot/dts/marvell/armada-3720-turris-mox.dts

-----8<---- Snipped DT diff -----

> However, as I had explained in one of the first discussions here:
> https://lore.kernel.org/netdev/20210901012826.iuy2bhvkrgahhrl7@skbuf/
>
> it was always hit-or-miss whether the above device tree had an issue
> with fw_devlink or not: it depended on how the driver was written (and
> the mv88e6xxx switch driver was tricking the fw_devlink logic from that
> time to drop the device links because of an unrelated -EPROBE_DEFER).

Yeah, I never forgot this issue. That's why I used "additional" in my
cover letter :)

So far I've not needed to change fw_devlink in a way that'd break this
unintentional "tricky behavior" but I might be coming up to that wall
soon. So this reply is becoming more relevant to me:
https://lore.kernel.org/lkml/CAGETcx8De_qm9hVtK5CznfWke9nmOfV8OcvAW6kmwyeb7APr=g@mail.gmail.com/

Not sure if you've had a chance to read or think about it.

> What I had done to "untrick" fw_devlink so that I could see the issue
> (which was originally reported by Alvin Šipraga) was to modify the
> mv88e6xxx driver, and change the placement of mv88e6xxx_mdios_register()
> to a point after which we will never hit -EPROBE_DEFER (from driver probe()
> to the dsa_switch_ops :: setup() method):
>
> diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
> index 0a5d6c7bb128..48650465660d 100644
> --- a/drivers/net/dsa/mv88e6xxx/chip.c
> +++ b/drivers/net/dsa/mv88e6xxx/chip.c

-----8<---- snipped

> After applying both of the above changes on top of yours, I confirm that
> the PHYs on the mv88e6xxx on Turris MOX still probe with their specific
> PHY driver rather than the generic one, and with interrupts (not poll mode):
>
-----8<---- snipped

>
> even though I am seeing these error messages earlier in the boot process (maybe this is something to look into):
>
> [ 0.910219] mdio_bus d0032004.mdio-mii:10: Failed to create device link with d0032004.mdio-mii:10

-----8<---- snipped

> [ 0.943879] mv88e6085 d0032004.mdio-mii:12: switch 0x1900 detected: Marvell 88E6190, revision 1
>
>
> If _on top_ of all the above, I also remove the logic that sets FWNODE_FLAG_NEEDS_CHILD_BOUND_ON_ADD:

> then *finally* I get something approximating Alvin's reported issue.
> In my case, one switch out of 3 gets its PHYs bound to the Generic PHY
> driver (why not all is a story for another time):

-----8<---- snipped

> So I guess that FWNODE_FLAG_NEEDS_CHILD_BOUND_ON_ADD does something.

Thanks for the extensive effort into testing this!

-Saravana

2023-02-13 13:05:26

by Geert Uytterhoeven

[permalink] [raw]
Subject: Re: [PATCH v3 09/12] of: property: Simplify of_link_to_phandle()

Hi Saravana,

On Wed, Feb 8, 2023 at 3:08 AM Saravana Kannan <[email protected]> wrote:
> On Tue, Feb 7, 2023 at 12:57 PM Geert Uytterhoeven <[email protected]> wrote:
> > On Tue, Feb 7, 2023 at 2:42 AM Saravana Kannan <[email protected]> wrote:
> > > The driver core now:
> > > - Has the parent device of a supplier pick up the consumers if the
> > > supplier never has a device created for it.
> > > - Ignores a supplier if the supplier has no parent device and will never
> > > be probed by a driver
> > >
> > > And already prevents creating a device link with the consumer as a
> > > supplier of a parent.
> > >
> > > So, we no longer need to find the "compatible" node of the supplier or
> > > do any other checks in of_link_to_phandle(). We simply need to make sure
> > > that the supplier is available in DT.
> > >
> > > Signed-off-by: Saravana Kannan <[email protected]>
> >
> > Thanks for your patch!
> >
> > This patch introduces a regression when dynamically loading DT overlays.
> > Unfortunately this happens when using the out-of-tree OF configfs,
> > which is not supported upstream. Still, there may be (obscure)
> > in-tree users.
> >
> > When loading a DT overlay[1] to enable an SPI controller, and
> > instantiate a connected SPI EEPROM:
> >
> > $ overlay add 25lc040
> > OF: overlay: WARNING: memory leak will occur if overlay removed,
> > property: /keys/status
> > OF: overlay: WARNING: memory leak will occur if overlay removed,
> > property: /soc/spi@e6e90000/pinctrl-0
> > OF: overlay: WARNING: memory leak will occur if overlay removed,
> > property: /soc/spi@e6e90000/pinctrl-names
> > OF: overlay: WARNING: memory leak will occur if overlay removed,
> > property: /soc/spi@e6e90000/cs-gpios
> > OF: overlay: WARNING: memory leak will occur if overlay removed,
> > property: /soc/spi@e6e90000/status
> > OF: overlay: WARNING: memory leak will occur if overlay removed,
> > property: /__symbols__/msiof0_pins
> >
> > The SPI controller and the SPI EEPROM are no longer instantiated.
> >
> > # cat /sys/kernel/debug/devices_deferred
> > e6e90000.spi platform: wait for supplier msiof0
> >
> > Let's remove the overlay again:
> >
> > $ overlay rm 25lc040
> > input: keys as /devices/platform/keys/input/input1
> >
> > And retry:
> >
> > $ overlay add 25lc040
> > OF: overlay: WARNING: memory leak will occur if overlay removed,
> > property: /keys/status
> > OF: overlay: WARNING: memory leak will occur if overlay removed,
> > property: /soc/spi@e6e90000/pinctrl-0
> > OF: overlay: WARNING: memory leak will occur if overlay removed,
> > property: /soc/spi@e6e90000/pinctrl-names
> > OF: overlay: WARNING: memory leak will occur if overlay removed,
> > property: /soc/spi@e6e90000/cs-gpios
> > OF: overlay: WARNING: memory leak will occur if overlay removed,
> > property: /soc/spi@e6e90000/status
> > OF: overlay: WARNING: memory leak will occur if overlay removed,
> > property: /__symbols__/msiof0_pins
> > spi_sh_msiof e6e90000.spi: DMA available
> > spi_sh_msiof e6e90000.spi: registered master spi0
> > spi spi0.0: setup mode 0, 8 bits/w, 100000 Hz max --> 0
> > at25 spi0.0: 512 Byte at25 eeprom, pagesize 16
> > spi_sh_msiof e6e90000.spi: registered child spi0.0
> >
> > Now it succeeds, and the SPI EEPROM is available, and works.
> >
> > Without this patch, or with this patch reverted after applying the
> > full series:
> >
> > $ overlay add 25lc040
> > OF: overlay: WARNING: memory leak will occur if overlay removed,
> > property: /keys/status
> > OF: overlay: WARNING: memory leak will occur if overlay removed,
> > property: /soc/spi@e6e90000/pinctrl-0
> > OF: overlay: WARNING: memory leak will occur if overlay removed,
> > property: /soc/spi@e6e90000/pinctrl-names
> > OF: overlay: WARNING: memory leak will occur if overlay removed,
> > property: /soc/spi@e6e90000/cs-gpios
> > OF: overlay: WARNING: memory leak will occur if overlay removed,
> > property: /soc/spi@e6e90000/status
> > OF: overlay: WARNING: memory leak will occur if overlay removed,
> > property: /__symbols__/msiof0_pins
> > OF: Not linking spi@e6e90000 to interrupt-controller@f1010000 - No
> > struct device
> > spi_sh_msiof e6e90000.spi: DMA available
> > spi_sh_msiof e6e90000.spi: registered master spi0
> > spi spi0.0: setup mode 0, 8 bits/w, 100000 Hz max --> 0
> > at25 spi0.0: 444 bps (2 bytes in 9 ticks)
> > at25 spi0.0: 512 Byte at25 eeprom, pagesize 16
> > spi_sh_msiof e6e90000.spi: registered child spi0.0
> >
> > The SPI EEPROM is available on the first try after boot.
>
> Sigh... I spent way too long trying to figure out if I caused a memory
> leak. I should have scrolled down further! Doesn't look like that part
> is related to anything I did.
>
> There are some flags set to avoid re-parsing fwnodes multiple times.
> My guess is that the issue you are seeing has to do with how many of
> the in memory structs are reused vs not when an overlay is
> applied/removed and some of these flags might not be getting cleared
> and this is having a bigger impact with this patch (because the fwnode
> links are no longer anchored on "compatible" nodes).
>
> With/without this patch (let's keep the series) can you look at how
> the following things change between each step you do above (add,
> remove, retry):
> 1) List of directories under /sys/class/devlink
> 2) Enable the debug logs inside __fwnode_link_add(),
> __fwnode_link_del(), device_link_add()

Without "of: property: Simplify of_link_to_phandle()":

- Adding overlay:

spi@e6e90000 Linked as a fwnode consumer to clock-controller@e6150000
spi@e6e90000 Linked as a fwnode consumer to system-controller@e6180000
spi@e6e90000 Linked as a fwnode consumer to pinctrl@e6060000
spi@e6e90000 Linked as a fwnode consumer to gpio@e6055000
platform e6e90000.spi: Linked as a consumer to e6055000.gpio
spi@e6e90000 Dropping the fwnode link to gpio@e6055000
platform e6e90000.spi: Linked as a consumer to e6060000.pinctrl
spi@e6e90000 Dropping the fwnode link to pinctrl@e6060000
spi@e6e90000 Dropping the fwnode link to system-controller@e6180000
platform e6e90000.spi: Linked as a consumer to e6150000.clock-controller
spi@e6e90000 Dropping the fwnode link to clock-controller@e6150000

+platform:e6055000.gpio--platform:e6e90000.spi ->
../../devices/virtual/devlink/platform:e6055000.gpio--platform:e6e90000.spi
+platform:e6060000.pinctrl--platform:e6e90000.spi ->
../../devices/virtual/devlink/platform:e6060000.pinctrl--platform:e6e90000.spi
+platform:e6150000.clock-controller--platform:e6e90000.spi ->
../../devices/virtual/devlink/platform:e6150000.clock-controller--platform:e6e90000.spi
-platform:e6060000.pinctrl--platform:keys ->
../../devices/virtual/devlink/platform:e6060000.pinctrl--platform:keys

SPI EEPROM works

- Removing overlay:

platform keys: Linked as a sync state only consumer to e6055000.gpio

-platform:e6055000.gpio--platform:e6e90000.spi ->
../../devices/virtual/devlink/platform:e6055000.gpio--platform:e6e90000.spi
-platform:e6060000.pinctrl--platform:e6e90000.spi ->
../../devices/virtual/devlink/platform:e6060000.pinctrl--platform:e6e90000.spi
-platform:e6150000.clock-controller--platform:e6e90000.spi ->
../../devices/virtual/devlink/platform:e6150000.clock-controller--platform:e6e90000.spi

platform:e6060000.pinctrl--platform:keys link is not recreated?!?!?

- Adding overlay again:

No debug output
No change in sys/class/devlink?!?!?
SPI EEPROM works


With "of: property: Simplify of_link_to_phandle()":

- Adding overlay:

spi@e6e90000 Linked as a fwnode consumer to
interrupt-controller@f1010000
spi@e6e90000 Linked as a fwnode consumer to clock-controller@e6150000
spi@e6e90000 Linked as a fwnode consumer to system-controller@e6180000
spi@e6e90000 Linked as a fwnode consumer to msiof0
spi@e6e90000 Linked as a fwnode consumer to gpio@e6055000
platform e6e90000.spi: Linked as a consumer to e6055000.gpio
spi@e6e90000 Dropping the fwnode link to gpio@e6055000
spi@e6e90000 Dropping the fwnode link to system-controller@e6180000
platform e6e90000.spi: Linked as a consumer to e6150000.clock-controller
spi@e6e90000 Dropping the fwnode link to clock-controller@e6150000
platform e6e90000.spi: Linked as a consumer to soc
spi@e6e90000 Dropping the fwnode link to interrupt-controller@f1010000

+platform:e6055000.gpio--platform:e6e90000.spi ->
../../devices/virtual/devlink/platform:e6055000.gpio--platform:e6e90000.spi
+platform:e6150000.clock-controller--platform:e6e90000.spi ->
../../devices/virtual/devlink/platform:e6150000.clock-controller--platform:e6e90000.spi
+platform:soc--platform:e6e90000.spi ->
../../devices/virtual/devlink/platform:soc--platform:e6e90000.spi

-platform:e6060000.pinctrl--platform:keys ->
../../devices/virtual/devlink/platform:e6060000.pinctrl--platform:keys

SPI EEPROM does not work

- Removing overlay:

platform keys: Linked as a sync state only consumer to e6055000.gpio
spi@e6e90000 Dropping the fwnode link to msiof0

-platform:e6055000.gpio--platform:e6e90000.spi ->
../../devices/virtual/devlink/platform:e6055000.gpio--platform:e6e90000.spi
-platform:e6150000.clock-controller--platform:e6e90000.spi ->
../../devices/virtual/devlink/platform:e6150000.clock-controller--platform:e6e90000.spi
-platform:soc--platform:e6e90000.spi ->
../../devices/virtual/devlink/platform:soc--platform:e6e90000.spi

platform:e6060000.pinctrl--platform:keys link is not recreated?!?!?


- Adding overlay again:

No debug output
No change in sys/class/devlink?!?!?
SPI EEPROM works

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

2023-02-13 13:11:10

by Geert Uytterhoeven

[permalink] [raw]
Subject: Re: [PATCH v3 09/12] of: property: Simplify of_link_to_phandle()

Hi Saravana,

On Wed, Feb 8, 2023 at 9:35 AM Saravana Kannan <[email protected]> wrote:
> On Tue, Feb 7, 2023 at 11:57 PM Geert Uytterhoeven <[email protected]> wrote:
> > On Wed, Feb 8, 2023 at 8:32 AM Saravana Kannan <[email protected]> wrote:
> > > On Tue, Feb 7, 2023 at 6:08 PM Saravana Kannan <[email protected]> wrote:
> > > > On Tue, Feb 7, 2023 at 12:57 PM Geert Uytterhoeven <[email protected]> wrote:
> > > > > On Tue, Feb 7, 2023 at 2:42 AM Saravana Kannan <[email protected]> wrote:
> > > > > > The driver core now:
> > > > > > - Has the parent device of a supplier pick up the consumers if the
> > > > > > supplier never has a device created for it.
> > > > > > - Ignores a supplier if the supplier has no parent device and will never
> > > > > > be probed by a driver
> > > > > >
> > > > > > And already prevents creating a device link with the consumer as a
> > > > > > supplier of a parent.
> > > > > >
> > > > > > So, we no longer need to find the "compatible" node of the supplier or
> > > > > > do any other checks in of_link_to_phandle(). We simply need to make sure
> > > > > > that the supplier is available in DT.
> > > > > >
> > > > > > Signed-off-by: Saravana Kannan <[email protected]>
> > > > >
> > > > > Thanks for your patch!
> > > > >
> > > > > This patch introduces a regression when dynamically loading DT overlays.
> > > > > Unfortunately this happens when using the out-of-tree OF configfs,
> > > > > which is not supported upstream. Still, there may be (obscure)
> > > > > in-tree users.
> > > > >
> > > > > When loading a DT overlay[1] to enable an SPI controller, and
> > > > > instantiate a connected SPI EEPROM:
> >
> > [...]
> >
> > > > > The SPI controller and the SPI EEPROM are no longer instantiated.
> >
> > > > Sigh... I spent way too long trying to figure out if I caused a memory
> > > > leak. I should have scrolled down further! Doesn't look like that part
> > > > is related to anything I did.
> > > >
> > > > There are some flags set to avoid re-parsing fwnodes multiple times.
> > > > My guess is that the issue you are seeing has to do with how many of
> > > > the in memory structs are reused vs not when an overlay is
> > > > applied/removed and some of these flags might not be getting cleared
> > > > and this is having a bigger impact with this patch (because the fwnode
> > > > links are no longer anchored on "compatible" nodes).
> > > >
> > > > With/without this patch (let's keep the series) can you look at how
> > > > the following things change between each step you do above (add,
> > > > remove, retry):
> > > > 1) List of directories under /sys/class/devlink
> > > > 2) Enable the debug logs inside __fwnode_link_add(),
> > > > __fwnode_link_del(), device_link_add()
> > > >
> > > > My guess is that the final solution would entail clearing
> > > > FWNODE_FLAG_LINKS_ADDED for some fwnodes.
> > >
> > > You replied just as I was about to hit send. So sending this anyway...
> > >
> > > Ok, I took a closer look and I think it's a bit of a mess. The fact
> > > that it even worked for you without this patch is a bit of a
> > > coincidence.
> > >
> > > Let's just take platform devices that are created by
> > > driver/of/platform.c as an example.
> > >
> > > The main problem is that when you add/remove properties to a DT node
> > > of an existing platform device, nothing is really done about it at the
> > > device level. We don't even unbind and rebind the driver so the driver
> > > could make use of the new properties. We don't remove and add back the
> > > device so whoever might use the new property will use it. And if you
> > > are adding a new node, it'll only trigger any platform device level
> > > impact if it's a new node of a "simple-bus" (or similar bus) device.
> > >
> > > Problem 1:
> > > So if you add a new child node to an existing probed device that adds
> > > its children explicitly (as in, the parent is not a "simple-bus" like
> > > device), nothing will happen. The newly added child device node will
> > > get converted into a platform device, not will the parent device
> > > notice it. So in your case of adding msiof0_pins, it's just that when
> > > the consumer gets the pins, the driver doesn't get involved much and
> > > it's the pinctrl framework that reads the DT and figures it out.
> > >
> > > With this patch, the fwnode links point to the actual resource and the
> > > actual parent device inherits them if they don't get converted to a
> > > struct device. But since we are adding this msiof0_pins after the
> > > parent device has probed, the fwnode link isn't inherited by the
> > > parent pinctrl device.
> > >
> > > Problem 2:
> > > So if you add a property to an already bound device, nothing is done
> > > by the driver. In your overlay example, if you move the status="okay"
> > > line to be the first property you change in the msiof0 spi device,
> > > you'll probably see that fw_devlink is no longer the one blocking the
> > > probe. This is because the platform device will get added as soon as
> > > the status flips from disabled to enabled and at that point fw_devlink
> > > will think it has no suppliers and won't do any probe deferring. And
> > > then as the new properties get added nothing will happen at the device
> > > or fw_devlink level. If the msiof0's spi driver fails immediately with
> > > NOT -EPROBE_DEFER when platform device is added because it couldn't
> > > find any pinctrl property, then msiof0 will never probe (unless you
> > > remove and add the driver). If it had failed with -EPROBE_DEFER, then
> > > it might probe again if something else triggers a deferred probe
> > > attempt. Clearly, things working/not working based on the order of
> > > properties in DT is not a good implementation.
> > >
> > > Problem 3:
> > > What if you enable a previously disabled supplier. There's no way to
> > > handle that from a fw_devlink level without re-parsing the entire
> > > device tree because existing devices might be consumers now.
> > >
> > > Anyway, long story short, it's sorta worked due to coincidence and
> > > it's quite messy to get it to work correctly.
> >
> > Several subsystems register notifiers to be informed of the events
> > above. E.g. drivers/spi/spi.c:
> >
> > if (IS_ENABLED(CONFIG_OF_DYNAMIC))
> > WARN_ON(of_reconfig_notifier_register(&spi_of_notifier));
> > if (IS_ENABLED(CONFIG_ACPI))
> > WARN_ON(acpi_reconfig_notifier_register(&spi_acpi_notifier));
> >
> > So my issue might be triggered using ACPI, too.
>
> Yeah, I did notice this before my email. Here's an ugly hack (at end
> of email) to test my theory about Problem 1. I didn't compile test it
> (because I should go to bed now), but you get the idea. Can you give
> this a shot? It should fix your specific case. Basically for all
> overlays (I hope the function is only used for overlays) we assume all
> nodes are NOT devices until they actually get added as a device. Don't
> review the code, it's not meant to be :)
>
> -Saravana
>
> --- a/drivers/of/dynamic.c
> +++ b/drivers/of/dynamic.c
> @@ -226,6 +226,7 @@ static void __of_attach_node(struct device_node *np)
> np->sibling = np->parent->child;
> np->parent->child = np;
> of_node_clear_flag(np, OF_DETACHED);
> + np->fwnode.flags |= FWNODE_FLAG_NOT_DEVICE;
> }
>
> /**
> diff --git a/drivers/of/platform.c b/drivers/of/platform.c
> index 81c8c227ab6b..7299cd668e51 100644
> --- a/drivers/of/platform.c
> +++ b/drivers/of/platform.c
> @@ -732,6 +732,7 @@ static int of_platform_notify(struct notifier_block *nb,
> if (of_node_check_flag(rd->dn, OF_POPULATED))
> return NOTIFY_OK;
>
> + rd->dn->fwnode.flags &= ~FWNODE_FLAG_NOT_DEVICE;
> /* pdev_parent may be NULL when no bus platform device */
> pdev_parent = of_find_device_by_node(rd->dn->parent);
> pdev = of_platform_device_create(rd->dn, NULL,
> diff --git a/drivers/spi/spi.c b/drivers/spi/spi.c
> index 15f174f4e056..1de55561b25d 100644
> --- a/drivers/spi/spi.c
> +++ b/drivers/spi/spi.c
> @@ -4436,6 +4436,7 @@ static int of_spi_notify(struct notifier_block
> *nb, unsigned long action,
> return NOTIFY_OK;
> }
>
> + rd->dn->fwnode.flags &= ~FWNODE_FLAG_NOT_DEVICE;
> spi = of_register_spi_device(ctlr, rd->dn);
> put_device(&ctlr->dev);

Thanks, these changes fix my SPI EEPROM in a DT overlay.
A similar change should be applied to the i2c bus core (and to other
users of of_reconfig_notifier_register()?).

For reference, the same debug output and /sys/class/devlink
changes with this fix applied can be found below.

Note that there are still a few remaining issues, for which I do not
know the full impact:
- platform:e6060000.pinctrl--platform:keys link is not recreated
on overlay remove,
- There is no change in /sys/class/devlink after an add/remove/add
cycle.
Shouldn't removing a DT overlay restore /sys/class/devlink to
the exact same state as before adding the DT overlay?

With extra FWNODE_FLAG_NOT_DEVICE handling:

- Adding overlay:

spi@e6e90000 Linked as a fwnode consumer to
interrupt-controller@f1010000
spi@e6e90000 Linked as a fwnode consumer to clock-controller@e6150000
spi@e6e90000 Linked as a fwnode consumer to system-controller@e6180000
spi@e6e90000 Linked as a fwnode consumer to msiof0
spi@e6e90000 Linked as a fwnode consumer to gpio@e6055000
platform e6e90000.spi: Linked as a consumer to e6055000.gpio
spi@e6e90000 Dropping the fwnode link to gpio@e6055000
platform e6e90000.spi: Linked as a consumer to e6060000.pinctrl
spi@e6e90000 Dropping the fwnode link to msiof0
spi@e6e90000 Dropping the fwnode link to system-controller@e6180000
platform e6e90000.spi: Linked as a consumer to e6150000.clock-controller
spi@e6e90000 Dropping the fwnode link to clock-controller@e6150000
platform e6e90000.spi: Linked as a consumer to soc
spi@e6e90000 Dropping the fwnode link to interrupt-controller@f1010000

+platform:e6055000.gpio--platform:e6e90000.spi ->
../../devices/virtual/devlink/platform:e6055000.gpio--platform:e6e90000.spi
+platform:e6060000.pinctrl--platform:e6e90000.spi ->
../../devices/virtual/devlink/platform:e6060000.pinctrl--platform:e6e90000.spi
+platform:e6150000.clock-controller--platform:e6e90000.spi ->
../../devices/virtual/devlink/platform:e6150000.clock-controller--platform:e6e90000.spi
+platform:soc--platform:e6e90000.spi ->
../../devices/virtual/devlink/platform:soc--platform:e6e90000.spi
-platform:e6060000.pinctrl--platform:keys ->
../../devices/virtual/devlink/platform:e6060000.pinctrl--platform:keys

SPI EEPROM works

- Removing overlay:

platform keys: Linked as a sync state only consumer to e6055000.gpio

-platform:e6055000.gpio--platform:e6e90000.spi ->
../../devices/virtual/devlink/platform:e6055000.gpio--platform:e6e90000.spi
-platform:e6060000.pinctrl--platform:e6e90000.spi ->
../../devices/virtual/devlink/platform:e6060000.pinctrl--platform:e6e90000.spi
-platform:e6150000.clock-controller--platform:e6e90000.spi ->
../../devices/virtual/devlink/platform:e6150000.clock-controller--platform:e6e90000.spi
-platform:soc--platform:e6e90000.spi ->
../../devices/virtual/devlink/platform:soc--platform:e6e90000.spi

platform:e6060000.pinctrl--platform:keys link is not recreated?!?!?

- Adding overlay again:

No debug output
No change in sys/class/devlink?!?!?
SPI EEPROM works

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

2023-02-15 07:39:44

by Tony Lindgren

[permalink] [raw]
Subject: Re: [PATCH v3 00/12] fw_devlink improvements

Hi,

* Saravana Kannan <[email protected]> [230207 01:42]:
> Naresh, Tony, Abel, Geert, Dmitry, Maxim(s), Miquel, Luca, Doug, Martin,
> Jean-Philippe,
>
> Can I get your Tested-by's for this v3 series please?

Just FYI, the patches in next-20230215 behave for me.

Regards,

Tony

2023-02-15 12:35:16

by Jean-Philippe Brucker

[permalink] [raw]
Subject: Re: [PATCH v3 00/12] fw_devlink improvements

Hi Saravana,

On Mon, Feb 06, 2023 at 05:41:52PM -0800, Saravana Kannan wrote:
> Naresh, Tony, Abel, Geert, Dmitry, Maxim(s), Miquel, Luca, Doug, Martin,
> Jean-Philippe,
>
> Can I get your Tested-by's for this v3 series please?

Sorry for the delay (I misconfigured my inbox). I tested virtio-iommu with
these changes, no regression:

Tested-by: Jean-Philippe Brucker <[email protected]>


Removing driver_deferred_probe_check_state() by reverting [1] breaks
loading virtio-iommu as a module, as the dependency between PCI devices
and PCI IOMMU is ignored, and the device probed too early [2]. I'll try to
figure out how to make that work.

Thanks,
Jean

[1] https://lore.kernel.org/lkml/[email protected]/
[2] https://lore.kernel.org/lkml/Yv+dpeIPvde7oDHi@myrica/

2023-02-16 03:12:27

by Dmitry Baryshkov

[permalink] [raw]
Subject: Re: [PATCH v3 00/12] fw_devlink improvements

On 07/02/2023 03:41, Saravana Kannan wrote:
> Naresh, Tony, Abel, Geert, Dmitry, Maxim(s), Miquel, Luca, Doug, Martin,
> Jean-Philippe,
>
> Can I get your Tested-by's for this v3 series please?
>
> Vladimir,
>
> Ccing you because DSA's and fw_devlink have known/existing problems
> (still in my TODOs to fix). But I want to make sure this series doesn't
> cause additional problems for DSA.
>
> All,
>
> This patch series improves fw_devlink in the following ways:
>
> 1. It no longer cares about a fwnode having a "compatible" property. It
> figures this out more dynamically. The only expectation is that
> fwnodes that are converted to devices actually get probed by a driver
> for the dependencies to be enforced correctly.
>
> 2. Finer grained dependency tracking. fw_devlink will now create device
> links from the consumer to the actual resource's device (if it has one,
> Eg: gpio_device) instead of the parent supplier device. This improves
> things like async suspend/resume ordering, potentially remove the need
> for frameworks to create device links, more parallelized async probing,
> and better sync_state() tracking.
>
> 3. Handle hardware/software quirks where a child firmware node gets
> populated as a device before its parent firmware node AND actually
> supplies a non-optional resource to the parent firmware node's
> device.
>
> 4. Way more robust at cycle handling (see patch for the insane cases).
>
> 5. Stops depending on OF_POPULATED to figure out some corner cases.
>
> 6. Simplifies the work that needs to be done by the firmware specific
> code.
>
> The v3 series has gone through my usual testing on my end and looks good
> to me.

Saravana,

Please excuse me, I was completely overwhelmed with my regular work and
had no time to properly test the series, while doing just the light
test would defeat the purpose of testing.

Tested-by: Dmitry Baryshkov <[email protected]> # Qualcomm RB3

Thanks a lot for going through all the troubles and hunting all the issues!

Just a note: on an RB3 device (arm64 qcom/sdm845-db845c.dtsi) extended
with the patch at [3] I got the following messages in dmesg:

[ 1.051325] platform ae00000.mdss: Failed to create device link
with ae00000.mdss
[ 1.059368] platform ae00000.mdss: Failed to create device link
with ae00000.mdss
[ 1.067174] platform ae00000.mdss: Failed to create device link
with ae00000.mdss
[ 1.088322] platform c440000.spmi: Failed to create device link
with c440000.spmi
[ 1.096019] platform c440000.spmi: Failed to create device link
with c440000.spmi
[ 1.103707] platform c440000.spmi: Failed to create device link
with c440000.spmi
[ 1.111400] platform c440000.spmi: Failed to create device link
with c440000.spmi
[ 1.119141] platform c440000.spmi: Failed to create device link
with c440000.spmi
[ 1.126825] platform c440000.spmi: Failed to create device link
with c440000.spmi
[ 2.024763] qcom-lab-ibb-regulator c440000.spmi:pmic@3:labibb:
Failed to create device link with c440000.spmi
[ 2.035026] qcom-lab-ibb-regulator c440000.spmi:pmic@3:labibb:
Failed to create device link with c440000.spmi

They look to be harmless, but it might be good to filter some of them
out? Especially the ones which tell about creating a device link
pointing back to the same device.

[3] https://lore.kernel.org/linux-arm-msm/[email protected]/

>
> Thanks,
> Saravana
>
> [1] - https://lore.kernel.org/lkml/[email protected]/
> [2] - https://lore.kernel.org/lkml/CAGETcx-JUV1nj8wBJrTPfyvM7=Mre5j_vkVmZojeiumUGG6QZQ@mail.gmail.com/
>
> v1 -> v2:
> - Fixed Patch 1 to handle a corner case discussed in [2].
> - New patch 10 to handle "fsl,imx8mq-gpc" being initialized by 2 drivers.
> - New patch 11 to add fw_devlink support for SCMI devices.
>
> v2 -> v3:
> - Addressed most of Andy's comments in v2
> - Added Colin and Sudeep's Tested-by for the series (except the imx and
> renesas patches)
> - Added Sudeep's Acked-by for the scmi patch.
> - Added Geert's Reviewed-by for the renesas patch.
> - Fixed gpiolib crash reported by Naresh.
> - Patch 6: Fix __fwnode_links_move_consumers() to preserve fwnode link flags.
> - New Patch 12 to fix nvmem-cells issue reported by Maxim(s)/Miquel.
> - Deleted some stale function doc in Patch 8
>
> Cc: Abel Vesa <[email protected]>
> Cc: Alexander Stein <[email protected]>
> Cc: Tony Lindgren <[email protected]>
> Cc: Sudeep Holla <[email protected]>
> Cc: Geert Uytterhoeven <[email protected]>
> Cc: John Stultz <[email protected]>
> Cc: Doug Anderson <[email protected]>
> Cc: Guenter Roeck <[email protected]>
> Cc: Dmitry Baryshkov <[email protected]>
> Cc: Maxim Kiselev <[email protected]>
> Cc: Maxim Kochetkov <[email protected]>
> Cc: Miquel Raynal <[email protected]>
> Cc: Luca Weiss <[email protected]>
> Cc: Colin Foster <[email protected]>
> Cc: Martin Kepplinger <[email protected]>
> Cc: Jean-Philippe Brucker <[email protected]>
> Cc: Vladimir Oltean <[email protected]>
>
> Saravana Kannan (12):
> driver core: fw_devlink: Don't purge child fwnode's consumer links
> driver core: fw_devlink: Improve check for fwnode with no
> device/driver
> soc: renesas: Move away from using OF_POPULATED for fw_devlink
> gpiolib: Clear the gpio_device's fwnode initialized flag before adding
> driver core: fw_devlink: Add DL_FLAG_CYCLE support to device links
> driver core: fw_devlink: Allow marking a fwnode link as being part of
> a cycle
> driver core: fw_devlink: Consolidate device link flag computation
> driver core: fw_devlink: Make cycle detection more robust
> of: property: Simplify of_link_to_phandle()
> irqchip/irq-imx-gpcv2: Mark fwnode device as not initialized
> firmware: arm_scmi: Set fwnode for the scmi_device
> mtd: mtdpart: Don't create platform device that'll never probe
>
> drivers/base/core.c | 449 +++++++++++++++++++++-----------
> drivers/firmware/arm_scmi/bus.c | 3 +-
> drivers/gpio/gpiolib.c | 7 +
> drivers/irqchip/irq-imx-gpcv2.c | 1 +
> drivers/mtd/mtdpart.c | 10 +
> drivers/of/property.c | 84 +-----
> drivers/soc/imx/gpcv2.c | 2 +-
> drivers/soc/renesas/rcar-sysc.c | 2 +-
> include/linux/device.h | 1 +
> include/linux/fwnode.h | 12 +-
> 10 files changed, 344 insertions(+), 227 deletions(-)
>

--
With best wishes

Dmitry

2023-02-25 06:25:43

by Saravana Kannan

[permalink] [raw]
Subject: Re: [PATCH v3 00/12] fw_devlink improvements

On Wed, Feb 15, 2023 at 7:12 PM Dmitry Baryshkov
<[email protected]> wrote:
>
> On 07/02/2023 03:41, Saravana Kannan wrote:
> > Naresh, Tony, Abel, Geert, Dmitry, Maxim(s), Miquel, Luca, Doug, Martin,
> > Jean-Philippe,
> >
> > Can I get your Tested-by's for this v3 series please?
> >
> > Vladimir,
> >
> > Ccing you because DSA's and fw_devlink have known/existing problems
> > (still in my TODOs to fix). But I want to make sure this series doesn't
> > cause additional problems for DSA.
> >
> > All,
> >
> > This patch series improves fw_devlink in the following ways:
> >
> > 1. It no longer cares about a fwnode having a "compatible" property. It
> > figures this out more dynamically. The only expectation is that
> > fwnodes that are converted to devices actually get probed by a driver
> > for the dependencies to be enforced correctly.
> >
> > 2. Finer grained dependency tracking. fw_devlink will now create device
> > links from the consumer to the actual resource's device (if it has one,
> > Eg: gpio_device) instead of the parent supplier device. This improves
> > things like async suspend/resume ordering, potentially remove the need
> > for frameworks to create device links, more parallelized async probing,
> > and better sync_state() tracking.
> >
> > 3. Handle hardware/software quirks where a child firmware node gets
> > populated as a device before its parent firmware node AND actually
> > supplies a non-optional resource to the parent firmware node's
> > device.
> >
> > 4. Way more robust at cycle handling (see patch for the insane cases).
> >
> > 5. Stops depending on OF_POPULATED to figure out some corner cases.
> >
> > 6. Simplifies the work that needs to be done by the firmware specific
> > code.
> >
> > The v3 series has gone through my usual testing on my end and looks good
> > to me.
>
> Saravana,
>
> Please excuse me, I was completely overwhelmed with my regular work and
> had no time to properly test the series, while doing just the light
> test would defeat the purpose of testing.
>
> Tested-by: Dmitry Baryshkov <[email protected]> # Qualcomm RB3
>
> Thanks a lot for going through all the troubles and hunting all the issues!

You are welcome! Thanks for testing it.

> Just a note: on an RB3 device (arm64 qcom/sdm845-db845c.dtsi) extended
> with the patch at [3] I got the following messages in dmesg:
>
> [ 1.051325] platform ae00000.mdss: Failed to create device link
> with ae00000.mdss
> [ 1.059368] platform ae00000.mdss: Failed to create device link
> with ae00000.mdss
> [ 1.067174] platform ae00000.mdss: Failed to create device link
> with ae00000.mdss
> [ 1.088322] platform c440000.spmi: Failed to create device link
> with c440000.spmi
> [ 1.096019] platform c440000.spmi: Failed to create device link
> with c440000.spmi
> [ 1.103707] platform c440000.spmi: Failed to create device link
> with c440000.spmi
> [ 1.111400] platform c440000.spmi: Failed to create device link
> with c440000.spmi
> [ 1.119141] platform c440000.spmi: Failed to create device link
> with c440000.spmi
> [ 1.126825] platform c440000.spmi: Failed to create device link
> with c440000.spmi
> [ 2.024763] qcom-lab-ibb-regulator c440000.spmi:pmic@3:labibb:
> Failed to create device link with c440000.spmi
> [ 2.035026] qcom-lab-ibb-regulator c440000.spmi:pmic@3:labibb:
> Failed to create device link with c440000.spmi
>
> They look to be harmless, but it might be good to filter some of them
> out? Especially the ones which tell about creating a device link
> pointing back to the same device.

I'm sure it's harmless when the supplier == consumer. Agreed on
filtering these out.

I looked at [3], but it's not obvious to me how this is happening for
your specific case. There are a couple of ways I can think of:
1. A SYNC_STATE_ONLY link being created as a proxy link (I don't do as
many checks here because it can't break anything)
2. __fw_devlink_pickup_dangling_consumers() causing the consumer and
supplier to be the same.

But I want to understand which one is happening in your case. Can you
add a WARN_ON(1) after the error message and give me the list of stack
dumps that are unique?

Thanks,
Saravana


>
> [3] https://lore.kernel.org/linux-arm-msm/[email protected]/
>
> >
> > Thanks,
> > Saravana
> >
> > [1] - https://lore.kernel.org/lkml/[email protected]/
> > [2] - https://lore.kernel.org/lkml/CAGETcx-JUV1nj8wBJrTPfyvM7=Mre5j_vkVmZojeiumUGG6QZQ@mail.gmail.com/
> >
> > v1 -> v2:
> > - Fixed Patch 1 to handle a corner case discussed in [2].
> > - New patch 10 to handle "fsl,imx8mq-gpc" being initialized by 2 drivers.
> > - New patch 11 to add fw_devlink support for SCMI devices.
> >
> > v2 -> v3:
> > - Addressed most of Andy's comments in v2
> > - Added Colin and Sudeep's Tested-by for the series (except the imx and
> > renesas patches)
> > - Added Sudeep's Acked-by for the scmi patch.
> > - Added Geert's Reviewed-by for the renesas patch.
> > - Fixed gpiolib crash reported by Naresh.
> > - Patch 6: Fix __fwnode_links_move_consumers() to preserve fwnode link flags.
> > - New Patch 12 to fix nvmem-cells issue reported by Maxim(s)/Miquel.
> > - Deleted some stale function doc in Patch 8
> >
> > Cc: Abel Vesa <[email protected]>
> > Cc: Alexander Stein <[email protected]>
> > Cc: Tony Lindgren <[email protected]>
> > Cc: Sudeep Holla <[email protected]>
> > Cc: Geert Uytterhoeven <[email protected]>
> > Cc: John Stultz <[email protected]>
> > Cc: Doug Anderson <[email protected]>
> > Cc: Guenter Roeck <[email protected]>
> > Cc: Dmitry Baryshkov <[email protected]>
> > Cc: Maxim Kiselev <[email protected]>
> > Cc: Maxim Kochetkov <[email protected]>
> > Cc: Miquel Raynal <[email protected]>
> > Cc: Luca Weiss <[email protected]>
> > Cc: Colin Foster <[email protected]>
> > Cc: Martin Kepplinger <[email protected]>
> > Cc: Jean-Philippe Brucker <[email protected]>
> > Cc: Vladimir Oltean <[email protected]>
> >
> > Saravana Kannan (12):
> > driver core: fw_devlink: Don't purge child fwnode's consumer links
> > driver core: fw_devlink: Improve check for fwnode with no
> > device/driver
> > soc: renesas: Move away from using OF_POPULATED for fw_devlink
> > gpiolib: Clear the gpio_device's fwnode initialized flag before adding
> > driver core: fw_devlink: Add DL_FLAG_CYCLE support to device links
> > driver core: fw_devlink: Allow marking a fwnode link as being part of
> > a cycle
> > driver core: fw_devlink: Consolidate device link flag computation
> > driver core: fw_devlink: Make cycle detection more robust
> > of: property: Simplify of_link_to_phandle()
> > irqchip/irq-imx-gpcv2: Mark fwnode device as not initialized
> > firmware: arm_scmi: Set fwnode for the scmi_device
> > mtd: mtdpart: Don't create platform device that'll never probe
> >
> > drivers/base/core.c | 449 +++++++++++++++++++++-----------
> > drivers/firmware/arm_scmi/bus.c | 3 +-
> > drivers/gpio/gpiolib.c | 7 +
> > drivers/irqchip/irq-imx-gpcv2.c | 1 +
> > drivers/mtd/mtdpart.c | 10 +
> > drivers/of/property.c | 84 +-----
> > drivers/soc/imx/gpcv2.c | 2 +-
> > drivers/soc/renesas/rcar-sysc.c | 2 +-
> > include/linux/device.h | 1 +
> > include/linux/fwnode.h | 12 +-
> > 10 files changed, 344 insertions(+), 227 deletions(-)
> >
>
> --
> With best wishes
>
> Dmitry