In the following sequence:
1) of_platform_depopulate()
2) of_overlay_remove()
During the step 1, devices are destroyed and devlinks are removed.
During the step 2, OF nodes are destroyed but
__of_changeset_entry_destroy() can raise warnings related to missing
of_node_put():
ERROR: memory leak, expected refcount 1 instead of 2 ...
Indeed, during the devlink removals performed at step 1, the removal
itself releasing the device (and the attached of_node) is done by a job
queued in a workqueue and so, it is done asynchronously with respect to
function calls.
When the warning is present, of_node_put() will be called but wrongly
too late from the workqueue job.
In order to be sure that any ongoing devlink removals are done before
the of_node destruction, synchronize the of_changeset_destroy() with the
devlink removals.
Fixes: 80dd33cf72d1 ("drivers: base: Fix device link removal")
Cc: [email protected]
Signed-off-by: Herve Codina <[email protected]>
---
drivers/of/dynamic.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/drivers/of/dynamic.c b/drivers/of/dynamic.c
index 3bf27052832f..169e2a9ae22f 100644
--- a/drivers/of/dynamic.c
+++ b/drivers/of/dynamic.c
@@ -9,6 +9,7 @@
#define pr_fmt(fmt) "OF: " fmt
+#include <linux/device.h>
#include <linux/of.h>
#include <linux/spinlock.h>
#include <linux/slab.h>
@@ -667,6 +668,12 @@ void of_changeset_destroy(struct of_changeset *ocs)
{
struct of_changeset_entry *ce, *cen;
+ /*
+ * Wait for any ongoing device link removals before destroying some of
+ * nodes.
+ */
+ device_link_wait_removal();
+
list_for_each_entry_safe_reverse(ce, cen, &ocs->entries, node)
__of_changeset_entry_destroy(ce);
}
--
2.43.0
On Wed, 2024-03-06 at 09:50 +0100, Herve Codina wrote:
> In the following sequence:
> 1) of_platform_depopulate()
> 2) of_overlay_remove()
>
> During the step 1, devices are destroyed and devlinks are removed.
> During the step 2, OF nodes are destroyed but
> __of_changeset_entry_destroy() can raise warnings related to missing
> of_node_put():
> ERROR: memory leak, expected refcount 1 instead of 2 ...
>
> Indeed, during the devlink removals performed at step 1, the removal
> itself releasing the device (and the attached of_node) is done by a job
> queued in a workqueue and so, it is done asynchronously with respect to
> function calls.
> When the warning is present, of_node_put() will be called but wrongly
> too late from the workqueue job.
>
> In order to be sure that any ongoing devlink removals are done before
> the of_node destruction, synchronize the of_changeset_destroy() with the
> devlink removals.
>
> Fixes: 80dd33cf72d1 ("drivers: base: Fix device link removal")
> Cc: [email protected]
> Signed-off-by: Herve Codina <[email protected]>
> ---
LGTM
Reviewed-by: Nuno Sa <[email protected]>
> drivers/of/dynamic.c | 7 +++++++
> 1 file changed, 7 insertions(+)
>
> diff --git a/drivers/of/dynamic.c b/drivers/of/dynamic.c
> index 3bf27052832f..169e2a9ae22f 100644
> --- a/drivers/of/dynamic.c
> +++ b/drivers/of/dynamic.c
> @@ -9,6 +9,7 @@
>
> #define pr_fmt(fmt) "OF: " fmt
>
> +#include <linux/device.h>
> #include <linux/of.h>
> #include <linux/spinlock.h>
> #include <linux/slab.h>
> @@ -667,6 +668,12 @@ void of_changeset_destroy(struct of_changeset *ocs)
> {
> struct of_changeset_entry *ce, *cen;
>
> + /*
> + * Wait for any ongoing device link removals before destroying some
> of
> + * nodes.
> + */
> + device_link_wait_removal();
> +
> list_for_each_entry_safe_reverse(ce, cen, &ocs->entries, node)
> __of_changeset_entry_destroy(ce);
> }
On Wed, 6 Mar 2024 09:50:03 +0100
Herve Codina <[email protected]> wrote:
> In the following sequence:
> 1) of_platform_depopulate()
> 2) of_overlay_remove()
>
> During the step 1, devices are destroyed and devlinks are removed.
> During the step 2, OF nodes are destroyed but
> __of_changeset_entry_destroy() can raise warnings related to missing
> of_node_put():
> ERROR: memory leak, expected refcount 1 instead of 2 ...
>
> Indeed, during the devlink removals performed at step 1, the removal
> itself releasing the device (and the attached of_node) is done by a job
> queued in a workqueue and so, it is done asynchronously with respect to
> function calls.
> When the warning is present, of_node_put() will be called but wrongly
> too late from the workqueue job.
>
> In order to be sure that any ongoing devlink removals are done before
> the of_node destruction, synchronize the of_changeset_destroy() with the
> devlink removals.
>
> Fixes: 80dd33cf72d1 ("drivers: base: Fix device link removal")
> Cc: [email protected]
> Signed-off-by: Herve Codina <[email protected]>
> ---
> drivers/of/dynamic.c | 7 +++++++
> 1 file changed, 7 insertions(+)
>
> diff --git a/drivers/of/dynamic.c b/drivers/of/dynamic.c
> index 3bf27052832f..169e2a9ae22f 100644
> --- a/drivers/of/dynamic.c
> +++ b/drivers/of/dynamic.c
> @@ -9,6 +9,7 @@
>
> #define pr_fmt(fmt) "OF: " fmt
>
> +#include <linux/device.h>
> #include <linux/of.h>
> #include <linux/spinlock.h>
> #include <linux/slab.h>
> @@ -667,6 +668,12 @@ void of_changeset_destroy(struct of_changeset *ocs)
> {
> struct of_changeset_entry *ce, *cen;
>
> + /*
> + * Wait for any ongoing device link removals before destroying some of
> + * nodes.
> + */
> + device_link_wait_removal();
Tested-by: Luca Ceresoli <[email protected]>
And no problem appeared in my tests due to the removed unlock/lock
around device_link_wait_removal().
Luca
--
Luca Ceresoli, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com
On Wed, Mar 6, 2024 at 12:51 AM Herve Codina <[email protected]> wrote:
>
> In the following sequence:
> 1) of_platform_depopulate()
> 2) of_overlay_remove()
>
> During the step 1, devices are destroyed and devlinks are removed.
> During the step 2, OF nodes are destroyed but
> __of_changeset_entry_destroy() can raise warnings related to missing
> of_node_put():
> ERROR: memory leak, expected refcount 1 instead of 2 ...
>
> Indeed, during the devlink removals performed at step 1, the removal
> itself releasing the device (and the attached of_node) is done by a job
> queued in a workqueue and so, it is done asynchronously with respect to
> function calls.
> When the warning is present, of_node_put() will be called but wrongly
> too late from the workqueue job.
>
> In order to be sure that any ongoing devlink removals are done before
> the of_node destruction, synchronize the of_changeset_destroy() with the
> devlink removals.
>
> Fixes: 80dd33cf72d1 ("drivers: base: Fix device link removal")
> Cc: [email protected]
> Signed-off-by: Herve Codina <[email protected]>
> ---
> drivers/of/dynamic.c | 7 +++++++
> 1 file changed, 7 insertions(+)
>
> diff --git a/drivers/of/dynamic.c b/drivers/of/dynamic.c
> index 3bf27052832f..169e2a9ae22f 100644
> --- a/drivers/of/dynamic.c
> +++ b/drivers/of/dynamic.c
> @@ -9,6 +9,7 @@
>
> #define pr_fmt(fmt) "OF: " fmt
>
> +#include <linux/device.h>
> #include <linux/of.h>
> #include <linux/spinlock.h>
> #include <linux/slab.h>
> @@ -667,6 +668,12 @@ void of_changeset_destroy(struct of_changeset *ocs)
> {
> struct of_changeset_entry *ce, *cen;
>
> + /*
> + * Wait for any ongoing device link removals before destroying some of
> + * nodes.
> + */
Not going to ask you to revise this patch just for this, but this
comment isn't very useful. It's telling what you are doing. Not why.
And the function name is already clear on what you are doing.
Maybe something like this would be better at describing the "why"?
Free free to reword it.
When a device is deleted, the device links to/from it are also queued
for deletion. Until these device links are freed, the devices
themselves aren't freed. If the device being deleted is due to an
overlay change, this device might be holding a reference to a device
node that will be freed. So, wait until all already pending device
links are deleted before freeing a device node. This ensures we don't
free any device node that has a non-zero reference count.
Reviewed-by: Saravana Kannan <[email protected]>
-Saravana
> + device_link_wait_removal();
> +
> list_for_each_entry_safe_reverse(ce, cen, &ocs->entries, node)
> __of_changeset_entry_destroy(ce);
> }
> --
> 2.43.0
>