2023-11-30 17:41:48

by Herve Codina

[permalink] [raw]
Subject: [PATCH 0/2] Synchronize DT overlay removal with devlink removals

Hi,

In the following sequence:
of_platform_depopulate(); /* Remove devices from a DT overlay node */
of_overlay_remove(); /* Remove the DT overlay node itself */

Some warnings are raised by __of_changeset_entry_destroy() which was
called from of_overlay_remove():
ERROR: memory leak, expected refcount 1 instead of 2 ...

The issue is that, during the device devlink removals triggered from the
of_platform_depopulate(), jobs are put in a workqueue.
These jobs drop the reference to the devices. When a device is no more
referenced (refcount == 0), it is released and the reference to its
of_node is dropped by a call to of_node_put().
These operations are fully correct except that, because of the
workqueue, they are done asynchronously with respect to function calls.

In the sequence provided, the jobs are run too late, after the call to
__of_changeset_entry_destroy() and so a missing of_node_put() call is
detected by __of_changeset_entry_destroy().

This series fixes this issue introducing device_link_wait_removal() in
order to wait for the end of jobs execution (patch 1) and using this
function to synchronize the overlay removal with the end of jobs
execution (patch 2).

Best regards,
Hervé

Herve Codina (2):
driver core: Introduce device_link_wait_removal()
of: overlay: Synchronize of_overlay_remove() with the devlink removals

drivers/base/core.c | 26 +++++++++++++++++++++++---
drivers/of/overlay.c | 6 ++++++
include/linux/device.h | 1 +
3 files changed, 30 insertions(+), 3 deletions(-)

--
2.42.0


2023-11-30 17:41:56

by Herve Codina

[permalink] [raw]
Subject: [PATCH 2/2] of: overlay: Synchronize of_overlay_remove() with the devlink removals

In the following sequence:
1) of_platform_depopulate()
2) of_overlay_remove()

During the step 1, devices are destroyed and devlinks are removed.
During the step 2, OF nodes are destroyed but
__of_changeset_entry_destroy() can raise warnings related to missing
of_node_put():
ERROR: memory leak, expected refcount 1 instead of 2 ...

Indeed, during the devlink removals performed at step 1, the removal
itself releasing the device (and the attached of_node) is done by a job
queued in a workqueue and so, it is done asynchronously with respect to
function calls.
When the warning is present, of_node_put() will be called but wrongly
too late from the workqueue job.

In order to be sure that any ongoing devlink removals are done before
the of_node destruction, synchronize the of_overlay_remove() with the
devlink removals.

Signed-off-by: Herve Codina <[email protected]>
---
drivers/of/overlay.c | 6 ++++++
1 file changed, 6 insertions(+)

diff --git a/drivers/of/overlay.c b/drivers/of/overlay.c
index a9a292d6d59b..5c5f808b163e 100644
--- a/drivers/of/overlay.c
+++ b/drivers/of/overlay.c
@@ -1202,6 +1202,12 @@ int of_overlay_remove(int *ovcs_id)
goto out;
}

+ /*
+ * Wait for any ongoing device link removals before removing some of
+ * nodes
+ */
+ device_link_wait_removal();
+
mutex_lock(&of_mutex);

ovcs = idr_find(&ovcs_idr, *ovcs_id);
--
2.42.0

2023-11-30 17:42:00

by Herve Codina

[permalink] [raw]
Subject: [PATCH 1/2] driver core: Introduce device_link_wait_removal()

The commit 80dd33cf72d1 ("drivers: base: Fix device link removal")
introduces a workqueue to release the consumer and supplier devices used
in the devlink.
In the job queued, devices are release and in turn, when all the
references to these devices are dropped, the release function of the
device itself is called.

Nothing is present to provide some synchronisation with this workqueue
in order to ensure that all ongoing releasing operations are done and
so, some other operations can be started safely.

For instance, in the following sequence:
1) of_platform_depopulate()
2) of_overlay_remove()

During the step 1, devices are released and related devlinks are removed
(jobs pushed in the workqueue).
During the step 2, OF nodes are destroyed but, without any
synchronisation with devlink removal jobs, of_overlay_remove() can raise
warnings related to missing of_node_put():
ERROR: memory leak, expected refcount 1 instead of 2

Indeed, the missing of_node_put() call is going to be done, too late,
from the workqueue job execution.

Introduce device_link_wait_removal() to offer a way to synchronize
operations waiting for the end of devlink removals (i.e. end of
workqueue jobs).
Also, as a flushing operation is done on the workqueue, the workqueue
used is moved from a system-wide workqueue to a local one.

Signed-off-by: Herve Codina <[email protected]>
---
drivers/base/core.c | 26 +++++++++++++++++++++++---
include/linux/device.h | 1 +
2 files changed, 24 insertions(+), 3 deletions(-)

diff --git a/drivers/base/core.c b/drivers/base/core.c
index ac026187ac6a..2e102a77758c 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -44,6 +44,7 @@ static bool fw_devlink_is_permissive(void);
static void __fw_devlink_link_to_consumers(struct device *dev);
static bool fw_devlink_drv_reg_done;
static bool fw_devlink_best_effort;
+static struct workqueue_struct *fw_devlink_wq;

/**
* __fwnode_link_add - Create a link between two fwnode_handles.
@@ -530,12 +531,26 @@ static void devlink_dev_release(struct device *dev)
/*
* It may take a while to complete this work because of the SRCU
* synchronization in device_link_release_fn() and if the consumer or
- * supplier devices get deleted when it runs, so put it into the "long"
- * workqueue.
+ * supplier devices get deleted when it runs, so put it into the
+ * dedicated workqueue.
*/
- queue_work(system_long_wq, &link->rm_work);
+ queue_work(fw_devlink_wq, &link->rm_work);
}

+/**
+ * device_link_wait_removal - Wait for ongoing devlink removal jobs to terminate
+ */
+void device_link_wait_removal(void)
+{
+ /*
+ * devlink removal jobs are queued in the dedicated work queue.
+ * To be sure that all removal jobs are terminated, ensure that any
+ * scheduled work has run to completion.
+ */
+ drain_workqueue(fw_devlink_wq);
+}
+EXPORT_SYMBOL_GPL(device_link_wait_removal);
+
static struct class devlink_class = {
.name = "devlink",
.dev_groups = devlink_groups,
@@ -4085,9 +4100,14 @@ int __init devices_init(void)
sysfs_dev_char_kobj = kobject_create_and_add("char", dev_kobj);
if (!sysfs_dev_char_kobj)
goto char_kobj_err;
+ fw_devlink_wq = alloc_workqueue("fw_devlink_wq", 0, 0);
+ if (!fw_devlink_wq)
+ goto wq_err;

return 0;

+ wq_err:
+ kobject_put(sysfs_dev_char_kobj);
char_kobj_err:
kobject_put(sysfs_dev_block_kobj);
block_kobj_err:
diff --git a/include/linux/device.h b/include/linux/device.h
index 2b093e62907a..c26f4b3df2bd 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -1250,6 +1250,7 @@ void device_link_del(struct device_link *link);
void device_link_remove(void *consumer, struct device *supplier);
void device_links_supplier_sync_state_pause(void);
void device_links_supplier_sync_state_resume(void);
+void device_link_wait_removal(void);

/* Create alias, so I can be autoloaded. */
#define MODULE_ALIAS_CHARDEV(major,minor) \
--
2.42.0

2023-12-06 17:16:18

by Rob Herring (Arm)

[permalink] [raw]
Subject: Re: [PATCH 0/2] Synchronize DT overlay removal with devlink removals

On Thu, Nov 30, 2023 at 06:41:07PM +0100, Herve Codina wrote:
> Hi,

+Saravana for comment

Looks okay to me though.

>
> In the following sequence:
> of_platform_depopulate(); /* Remove devices from a DT overlay node */
> of_overlay_remove(); /* Remove the DT overlay node itself */
>
> Some warnings are raised by __of_changeset_entry_destroy() which was
> called from of_overlay_remove():
> ERROR: memory leak, expected refcount 1 instead of 2 ...
>
> The issue is that, during the device devlink removals triggered from the
> of_platform_depopulate(), jobs are put in a workqueue.
> These jobs drop the reference to the devices. When a device is no more
> referenced (refcount == 0), it is released and the reference to its
> of_node is dropped by a call to of_node_put().
> These operations are fully correct except that, because of the
> workqueue, they are done asynchronously with respect to function calls.
>
> In the sequence provided, the jobs are run too late, after the call to
> __of_changeset_entry_destroy() and so a missing of_node_put() call is
> detected by __of_changeset_entry_destroy().
>
> This series fixes this issue introducing device_link_wait_removal() in
> order to wait for the end of jobs execution (patch 1) and using this
> function to synchronize the overlay removal with the end of jobs
> execution (patch 2).
>
> Best regards,
> Herv?
>
> Herve Codina (2):
> driver core: Introduce device_link_wait_removal()
> of: overlay: Synchronize of_overlay_remove() with the devlink removals
>
> drivers/base/core.c | 26 +++++++++++++++++++++++---
> drivers/of/overlay.c | 6 ++++++
> include/linux/device.h | 1 +
> 3 files changed, 30 insertions(+), 3 deletions(-)
>
> --
> 2.42.0
>

2023-12-07 03:10:04

by Saravana Kannan

[permalink] [raw]
Subject: Re: [PATCH 0/2] Synchronize DT overlay removal with devlink removals

On Wed, Dec 6, 2023 at 9:15 AM Rob Herring <[email protected]> wrote:
>
> On Thu, Nov 30, 2023 at 06:41:07PM +0100, Herve Codina wrote:
> > Hi,
>
> +Saravana for comment

I'll respond to this within a week -- very swamped at the moment. The
main thing I want to make sure is that we don't cause an indirect
deadlock with this wait(). I'll go back and look at why we added the
work queue and then check for device/devlink locking issues.

-Saravana

>
> Looks okay to me though.
>
> >
> > In the following sequence:
> > of_platform_depopulate(); /* Remove devices from a DT overlay node */
> > of_overlay_remove(); /* Remove the DT overlay node itself */
> >
> > Some warnings are raised by __of_changeset_entry_destroy() which was
> > called from of_overlay_remove():
> > ERROR: memory leak, expected refcount 1 instead of 2 ...
> >
> > The issue is that, during the device devlink removals triggered from the
> > of_platform_depopulate(), jobs are put in a workqueue.
> > These jobs drop the reference to the devices. When a device is no more
> > referenced (refcount == 0), it is released and the reference to its
> > of_node is dropped by a call to of_node_put().
> > These operations are fully correct except that, because of the
> > workqueue, they are done asynchronously with respect to function calls.
> >
> > In the sequence provided, the jobs are run too late, after the call to
> > __of_changeset_entry_destroy() and so a missing of_node_put() call is
> > detected by __of_changeset_entry_destroy().
> >
> > This series fixes this issue introducing device_link_wait_removal() in
> > order to wait for the end of jobs execution (patch 1) and using this
> > function to synchronize the overlay removal with the end of jobs
> > execution (patch 2).
> >
> > Best regards,
> > Hervé
> >
> > Herve Codina (2):
> > driver core: Introduce device_link_wait_removal()
> > of: overlay: Synchronize of_overlay_remove() with the devlink removals
> >
> > drivers/base/core.c | 26 +++++++++++++++++++++++---
> > drivers/of/overlay.c | 6 ++++++
> > include/linux/device.h | 1 +
> > 3 files changed, 30 insertions(+), 3 deletions(-)
> >
> > --
> > 2.42.0
> >

2023-12-20 17:16:43

by Luca Ceresoli

[permalink] [raw]
Subject: Re: [PATCH 0/2] Synchronize DT overlay removal with devlink removals

Hello Saravana, Rob, Hervé,

[+Miquèl, who contributed to the discussion with Hervé and me]

On Wed, 6 Dec 2023 19:09:06 -0800
Saravana Kannan <[email protected]> wrote:

> On Wed, Dec 6, 2023 at 9:15 AM Rob Herring <[email protected]> wrote:
> >
> > On Thu, Nov 30, 2023 at 06:41:07PM +0100, Herve Codina wrote:
> > > Hi,
> >
> > +Saravana for comment
>
> I'll respond to this within a week -- very swamped at the moment. The
> main thing I want to make sure is that we don't cause an indirect
> deadlock with this wait(). I'll go back and look at why we added the
> work queue and then check for device/devlink locking issues.

While working on a project unrelated to Hervé's work, I also ended up
in getting sporadic but frequent "ERROR: memory leak, expected refcount
1 instead of..." messages, which persisted even after adding this patch
series on my tree.

My use case is the insertion and removal of a simple overlay describing
a regulator-fixed and an I2C GPIO expander using it. The messages appear
regardless of whether the insertion and removal is done from kernel code
or via the configfs interface (out-of-tree patches from [0]).

I reconstructed the sequence of operations, all of which stem from
of_overlay_remove():

int of_overlay_remove(int *ovcs_id)
{
...

device_link_wait_removal(); // proposed by this patch series

mutex_lock(&of_mutex);

...

ret = __of_changeset_revert_notify(&ovcs->cset);
// this ends up calling (excerpt from a long stack trace):
// -> of_i2c_notify
// -> device_remove
// -> devm_regulator_release
// -> device_link_remove
// -> devlink_dev_release, which queues work for
// device_link_release_fn, which in turn calls:
// -> device_put
// -> device_release
// -> {platform,regulator,...}_dev*_release
// -> of_node_put() [**]

...

free_overlay_changeset(ovcs);
// calls:
// -> of_changeset_destroy
// -> __of_changeset_entry_destroy
// -> pr_err("ERROR: memory leak, expected refcount 1 instead of %d...
// The error appears or not, based on when the workqueue runs

err_unlock:
mutex_unlock(&of_mutex);

...
}

So this adds up to the question of whether devlink removal should actually
be run asynchronously or not.

A simple short-term solution is to move the call to
device_link_wait_removal() later, just before free_overlay_changeset():


diff --git a/drivers/of/overlay.c b/drivers/of/overlay.c
index 1a8a6620748c..eccf08cf2160 100644
--- a/drivers/of/overlay.c
+++ b/drivers/of/overlay.c
@@ -1375,12 +1375,6 @@ int of_overlay_remove(int *ovcs_id)
goto out;
}

- /*
- * Wait for any ongoing device link removals before removing some of
- * nodes
- */
- device_link_wait_removal();
-
mutex_lock(&of_mutex);

ovcs = idr_find(&ovcs_idr, *ovcs_id);
@@ -1427,6 +1421,14 @@ int of_overlay_remove(int *ovcs_id)
if (!ret)
ret = ret_tmp;

+ /*
+ * Wait for any ongoing device link removals before removing some of
+ * nodes
+ */
+ mutex_unlock(&of_mutex);
+ device_link_wait_removal();
+ mutex_lock(&of_mutex);
+
free_overlay_changeset(ovcs);

err_unlock:


This obviously raises the question of whether unlocking and re-locking
the mutex is potentially dangerous. I have no answer to this right away,
but I tested this change with CONFIG_PROVE_LOCKING=y and no issue showed
up after several overlay load/unload sequences so I am not aware of any
actual issues with this change.

[0] https://git.kernel.org/pub/scm/linux/kernel/git/geert/renesas-drivers.git/log/?h=topic/overlays

Luca
--
Luca Ceresoli, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com

2023-12-20 18:12:51

by Herve Codina

[permalink] [raw]
Subject: Re: [PATCH 0/2] Synchronize DT overlay removal with devlink removals

Hi,

On Wed, 20 Dec 2023 18:16:27 +0100
Luca Ceresoli <[email protected]> wrote:

> Hello Saravana, Rob, Hervé,
>
> [+Miquèl, who contributed to the discussion with Hervé and me]
>
> On Wed, 6 Dec 2023 19:09:06 -0800
> Saravana Kannan <[email protected]> wrote:
>
> > On Wed, Dec 6, 2023 at 9:15 AM Rob Herring <[email protected]> wrote:
> > >
> > > On Thu, Nov 30, 2023 at 06:41:07PM +0100, Herve Codina wrote:
> > > > Hi,
> > >
> > > +Saravana for comment
> >
> > I'll respond to this within a week -- very swamped at the moment. The
> > main thing I want to make sure is that we don't cause an indirect
> > deadlock with this wait(). I'll go back and look at why we added the
> > work queue and then check for device/devlink locking issues.
>
> While working on a project unrelated to Hervé's work, I also ended up
> in getting sporadic but frequent "ERROR: memory leak, expected refcount
> 1 instead of..." messages, which persisted even after adding this patch
> series on my tree.
>
> My use case is the insertion and removal of a simple overlay describing
> a regulator-fixed and an I2C GPIO expander using it. The messages appear
> regardless of whether the insertion and removal is done from kernel code
> or via the configfs interface (out-of-tree patches from [0]).
>
> I reconstructed the sequence of operations, all of which stem from
> of_overlay_remove():
>
> int of_overlay_remove(int *ovcs_id)
> {
> ...
>
> device_link_wait_removal(); // proposed by this patch series
>
> mutex_lock(&of_mutex);
>
> ...
>
> ret = __of_changeset_revert_notify(&ovcs->cset);
> // this ends up calling (excerpt from a long stack trace):
> // -> of_i2c_notify
> // -> device_remove
> // -> devm_regulator_release
> // -> device_link_remove
> // -> devlink_dev_release, which queues work for
> // device_link_release_fn, which in turn calls:
> // -> device_put
> // -> device_release
> // -> {platform,regulator,...}_dev*_release
> // -> of_node_put() [**]
>
> ...
>
> free_overlay_changeset(ovcs);
> // calls:
> // -> of_changeset_destroy
> // -> __of_changeset_entry_destroy
> // -> pr_err("ERROR: memory leak, expected refcount 1 instead of %d...
> // The error appears or not, based on when the workqueue runs
>
> err_unlock:
> mutex_unlock(&of_mutex);
>
> ...
> }
>
> So this adds up to the question of whether devlink removal should actually
> be run asynchronously or not.
>
> A simple short-term solution is to move the call to
> device_link_wait_removal() later, just before free_overlay_changeset():

Indeed, during of_overlay_remove() notifications can be done and in Luca's
use-case, they lead to some device removals and so devlink removals.

That's why we move the synchronization calling device_link_wait_removal()
after notifications and so just before free_overlay_changeset().

>
>
> diff --git a/drivers/of/overlay.c b/drivers/of/overlay.c
> index 1a8a6620748c..eccf08cf2160 100644
> --- a/drivers/of/overlay.c
> +++ b/drivers/of/overlay.c
> @@ -1375,12 +1375,6 @@ int of_overlay_remove(int *ovcs_id)
> goto out;
> }
>
> - /*
> - * Wait for any ongoing device link removals before removing some of
> - * nodes
> - */
> - device_link_wait_removal();
> -
> mutex_lock(&of_mutex);
>
> ovcs = idr_find(&ovcs_idr, *ovcs_id);
> @@ -1427,6 +1421,14 @@ int of_overlay_remove(int *ovcs_id)
> if (!ret)
> ret = ret_tmp;
>
> + /*
> + * Wait for any ongoing device link removals before removing some of
> + * nodes
> + */
> + mutex_unlock(&of_mutex);
> + device_link_wait_removal();
> + mutex_lock(&of_mutex);
> +
> free_overlay_changeset(ovcs);
>
> err_unlock:
>
>
> This obviously raises the question of whether unlocking and re-locking
> the mutex is potentially dangerous. I have no answer to this right away,
> but I tested this change with CONFIG_PROVE_LOCKING=y and no issue showed
> up after several overlay load/unload sequences so I am not aware of any
> actual issues with this change.
>
> [0] https://git.kernel.org/pub/scm/linux/kernel/git/geert/renesas-drivers.git/log/?h=topic/overlays
>
> Luca

Thanks Luca for this complementary use-case related to this issue.

Hervé
--
Hervé Codina, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com

2024-02-21 00:20:26

by Saravana Kannan

[permalink] [raw]
Subject: Re: [PATCH 0/2] Synchronize DT overlay removal with devlink removals

On Wed, Dec 6, 2023 at 7:09 PM Saravana Kannan <[email protected]> wrote:
>
> On Wed, Dec 6, 2023 at 9:15 AM Rob Herring <[email protected]> wrote:
> >
> > On Thu, Nov 30, 2023 at 06:41:07PM +0100, Herve Codina wrote:
> > > Hi,
> >
> > +Saravana for comment
>
> I'll respond to this within a week -- very swamped at the moment. The
> main thing I want to make sure is that we don't cause an indirect
> deadlock with this wait(). I'll go back and look at why we added the
> work queue and then check for device/devlink locking issues.
>

Sorry about the long delay, but I finally got back to this because
Nuno nudged me to review a similar patch they sent. I'll leave some
easy to address comments in the patches.

-Saravana

> -Saravana
>
> >
> > Looks okay to me though.
> >
> > >
> > > In the following sequence:
> > > of_platform_depopulate(); /* Remove devices from a DT overlay node */
> > > of_overlay_remove(); /* Remove the DT overlay node itself */
> > >
> > > Some warnings are raised by __of_changeset_entry_destroy() which was
> > > called from of_overlay_remove():
> > > ERROR: memory leak, expected refcount 1 instead of 2 ...
> > >
> > > The issue is that, during the device devlink removals triggered from the
> > > of_platform_depopulate(), jobs are put in a workqueue.
> > > These jobs drop the reference to the devices. When a device is no more
> > > referenced (refcount == 0), it is released and the reference to its
> > > of_node is dropped by a call to of_node_put().
> > > These operations are fully correct except that, because of the
> > > workqueue, they are done asynchronously with respect to function calls.
> > >
> > > In the sequence provided, the jobs are run too late, after the call to
> > > __of_changeset_entry_destroy() and so a missing of_node_put() call is
> > > detected by __of_changeset_entry_destroy().
> > >
> > > This series fixes this issue introducing device_link_wait_removal() in
> > > order to wait for the end of jobs execution (patch 1) and using this
> > > function to synchronize the overlay removal with the end of jobs
> > > execution (patch 2).
> > >
> > > Best regards,
> > > Hervé
> > >
> > > Herve Codina (2):
> > > driver core: Introduce device_link_wait_removal()
> > > of: overlay: Synchronize of_overlay_remove() with the devlink removals
> > >
> > > drivers/base/core.c | 26 +++++++++++++++++++++++---
> > > drivers/of/overlay.c | 6 ++++++
> > > include/linux/device.h | 1 +
> > > 3 files changed, 30 insertions(+), 3 deletions(-)
> > >
> > > --
> > > 2.42.0
> > >

2024-02-21 00:32:03

by Saravana Kannan

[permalink] [raw]
Subject: Re: [PATCH 1/2] driver core: Introduce device_link_wait_removal()

On Thu, Nov 30, 2023 at 9:41 AM Herve Codina <[email protected]> wrote:
>
> The commit 80dd33cf72d1 ("drivers: base: Fix device link removal")
> introduces a workqueue to release the consumer and supplier devices used
> in the devlink.
> In the job queued, devices are release and in turn, when all the
> references to these devices are dropped, the release function of the
> device itself is called.
>
> Nothing is present to provide some synchronisation with this workqueue
> in order to ensure that all ongoing releasing operations are done and
> so, some other operations can be started safely.
>
> For instance, in the following sequence:
> 1) of_platform_depopulate()
> 2) of_overlay_remove()
>
> During the step 1, devices are released and related devlinks are removed
> (jobs pushed in the workqueue).
> During the step 2, OF nodes are destroyed but, without any
> synchronisation with devlink removal jobs, of_overlay_remove() can raise
> warnings related to missing of_node_put():
> ERROR: memory leak, expected refcount 1 instead of 2
>
> Indeed, the missing of_node_put() call is going to be done, too late,
> from the workqueue job execution.
>
> Introduce device_link_wait_removal() to offer a way to synchronize
> operations waiting for the end of devlink removals (i.e. end of
> workqueue jobs).
> Also, as a flushing operation is done on the workqueue, the workqueue
> used is moved from a system-wide workqueue to a local one.

Thanks for the bug report and fix. Sorry again about the delay in
reviewing the changes.

Please add Fixes tag for 80dd33cf72d1.

> Signed-off-by: Herve Codina <[email protected]>
> ---
> drivers/base/core.c | 26 +++++++++++++++++++++++---
> include/linux/device.h | 1 +
> 2 files changed, 24 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/base/core.c b/drivers/base/core.c
> index ac026187ac6a..2e102a77758c 100644
> --- a/drivers/base/core.c
> +++ b/drivers/base/core.c
> @@ -44,6 +44,7 @@ static bool fw_devlink_is_permissive(void);
> static void __fw_devlink_link_to_consumers(struct device *dev);
> static bool fw_devlink_drv_reg_done;
> static bool fw_devlink_best_effort;
> +static struct workqueue_struct *fw_devlink_wq;
>
> /**
> * __fwnode_link_add - Create a link between two fwnode_handles.
> @@ -530,12 +531,26 @@ static void devlink_dev_release(struct device *dev)
> /*
> * It may take a while to complete this work because of the SRCU
> * synchronization in device_link_release_fn() and if the consumer or
> - * supplier devices get deleted when it runs, so put it into the "long"
> - * workqueue.
> + * supplier devices get deleted when it runs, so put it into the
> + * dedicated workqueue.
> */
> - queue_work(system_long_wq, &link->rm_work);
> + queue_work(fw_devlink_wq, &link->rm_work);

This has nothing to do with fw_devlink. fw_devlink is just triggering
the issue in device links. You can hit this bug without fw_devlink too.
So call this device_link_wq since it's consistent with device_link_* APIs.

> }
>
> +/**
> + * device_link_wait_removal - Wait for ongoing devlink removal jobs to terminate
> + */
> +void device_link_wait_removal(void)
> +{
> + /*
> + * devlink removal jobs are queued in the dedicated work queue.
> + * To be sure that all removal jobs are terminated, ensure that any
> + * scheduled work has run to completion.
> + */
> + drain_workqueue(fw_devlink_wq);

Is there a reason this needs to be drain_workqueu() instead of
flush_workqueue(). Drain is a stronger guarantee than we need in this
case. All we are trying to make sure is that all the device link
remove work queued so far have completed.

> +}
> +EXPORT_SYMBOL_GPL(device_link_wait_removal);
> +
> static struct class devlink_class = {
> .name = "devlink",
> .dev_groups = devlink_groups,
> @@ -4085,9 +4100,14 @@ int __init devices_init(void)
> sysfs_dev_char_kobj = kobject_create_and_add("char", dev_kobj);
> if (!sysfs_dev_char_kobj)
> goto char_kobj_err;
> + fw_devlink_wq = alloc_workqueue("fw_devlink_wq", 0, 0);
> + if (!fw_devlink_wq)

Fix the name appropriately here too please.

Thanks,
Saravana


> + goto wq_err;
>
> return 0;
>
> + wq_err:
> + kobject_put(sysfs_dev_char_kobj);
> char_kobj_err:
> kobject_put(sysfs_dev_block_kobj);
> block_kobj_err:
> diff --git a/include/linux/device.h b/include/linux/device.h
> index 2b093e62907a..c26f4b3df2bd 100644
> --- a/include/linux/device.h
> +++ b/include/linux/device.h
> @@ -1250,6 +1250,7 @@ void device_link_del(struct device_link *link);
> void device_link_remove(void *consumer, struct device *supplier);
> void device_links_supplier_sync_state_pause(void);
> void device_links_supplier_sync_state_resume(void);
> +void device_link_wait_removal(void);
>
> /* Create alias, so I can be autoloaded. */
> #define MODULE_ALIAS_CHARDEV(major,minor) \
> --
> 2.42.0
>
>

2024-02-21 00:37:54

by Saravana Kannan

[permalink] [raw]
Subject: Re: [PATCH 2/2] of: overlay: Synchronize of_overlay_remove() with the devlink removals

On Thu, Nov 30, 2023 at 9:41 AM Herve Codina <[email protected]> wrote:
>
> In the following sequence:
> 1) of_platform_depopulate()
> 2) of_overlay_remove()
>
> During the step 1, devices are destroyed and devlinks are removed.
> During the step 2, OF nodes are destroyed but
> __of_changeset_entry_destroy() can raise warnings related to missing
> of_node_put():
> ERROR: memory leak, expected refcount 1 instead of 2 ...
>
> Indeed, during the devlink removals performed at step 1, the removal
> itself releasing the device (and the attached of_node) is done by a job
> queued in a workqueue and so, it is done asynchronously with respect to
> function calls.
> When the warning is present, of_node_put() will be called but wrongly
> too late from the workqueue job.
>
> In order to be sure that any ongoing devlink removals are done before
> the of_node destruction, synchronize the of_overlay_remove() with the
> devlink removals.
>

Add Fixes tag for this one too to point to the change that added the workqueue.

Please CC Nuno and Luca in your v2 series.

> Signed-off-by: Herve Codina <[email protected]>
> ---
> drivers/of/overlay.c | 6 ++++++
> 1 file changed, 6 insertions(+)
>
> diff --git a/drivers/of/overlay.c b/drivers/of/overlay.c
> index a9a292d6d59b..5c5f808b163e 100644
> --- a/drivers/of/overlay.c
> +++ b/drivers/of/overlay.c
> @@ -1202,6 +1202,12 @@ int of_overlay_remove(int *ovcs_id)
> goto out;
> }
>
> + /*
> + * Wait for any ongoing device link removals before removing some of
> + * nodes
> + */
> + device_link_wait_removal();
> +

Nuno in his patch[1] had this "wait" happen inside
__of_changeset_entry_destroy(). Which seems to be necessary to not hit
the issue that Luca reported[2] in this patch series. Is there any
problem with doing that?

Luca for some reason did a unlock/lock(of_mutex) in his test patch and
I don't think that's necessary.

Can you move this call to where Nuno did it and see if that works for
all of you?

[1] - https://lore.kernel.org/all/[email protected]/
[2] - https://lore.kernel.org/all/20231220181627.341e8789@booty/

Thank,
Saravana


> mutex_lock(&of_mutex);
>
> ovcs = idr_find(&ovcs_idr, *ovcs_id);
> --
> 2.42.0
>
>

2024-02-21 06:56:50

by Nuno Sá

[permalink] [raw]
Subject: Re: [PATCH 1/2] driver core: Introduce device_link_wait_removal()

On Tue, 2024-02-20 at 16:31 -0800, Saravana Kannan wrote:
> On Thu, Nov 30, 2023 at 9:41 AM Herve Codina <[email protected]> wrote:
> >
> > The commit 80dd33cf72d1 ("drivers: base: Fix device link removal")
> > introduces a workqueue to release the consumer and supplier devices used
> > in the devlink.
> > In the job queued, devices are release and in turn, when all the
> > references to these devices are dropped, the release function of the
> > device itself is called.
> >
> > Nothing is present to provide some synchronisation with this workqueue
> > in order to ensure that all ongoing releasing operations are done and
> > so, some other operations can be started safely.
> >
> > For instance, in the following sequence:
> >   1) of_platform_depopulate()
> >   2) of_overlay_remove()
> >
> > During the step 1, devices are released and related devlinks are removed
> > (jobs pushed in the workqueue).
> > During the step 2, OF nodes are destroyed but, without any
> > synchronisation with devlink removal jobs, of_overlay_remove() can raise
> > warnings related to missing of_node_put():
> >   ERROR: memory leak, expected refcount 1 instead of 2
> >
> > Indeed, the missing of_node_put() call is going to be done, too late,
> > from the workqueue job execution.
> >
> > Introduce device_link_wait_removal() to offer a way to synchronize
> > operations waiting for the end of devlink removals (i.e. end of
> > workqueue jobs).
> > Also, as a flushing operation is done on the workqueue, the workqueue
> > used is moved from a system-wide workqueue to a local one.
>
> Thanks for the bug report and fix. Sorry again about the delay in
> reviewing the changes.
>
> Please add Fixes tag for 80dd33cf72d1.
>
> > Signed-off-by: Herve Codina <[email protected]>
> > ---
> >  drivers/base/core.c    | 26 +++++++++++++++++++++++---
> >  include/linux/device.h |  1 +
> >  2 files changed, 24 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/base/core.c b/drivers/base/core.c
> > index ac026187ac6a..2e102a77758c 100644
> > --- a/drivers/base/core.c
> > +++ b/drivers/base/core.c
> > @@ -44,6 +44,7 @@ static bool fw_devlink_is_permissive(void);
> >  static void __fw_devlink_link_to_consumers(struct device *dev);
> >  static bool fw_devlink_drv_reg_done;
> >  static bool fw_devlink_best_effort;
> > +static struct workqueue_struct *fw_devlink_wq;
> >
> >  /**
> >   * __fwnode_link_add - Create a link between two fwnode_handles.
> > @@ -530,12 +531,26 @@ static void devlink_dev_release(struct device *dev)
> >         /*
> >          * It may take a while to complete this work because of the SRCU
> >          * synchronization in device_link_release_fn() and if the consumer or
> > -        * supplier devices get deleted when it runs, so put it into the "long"
> > -        * workqueue.
> > +        * supplier devices get deleted when it runs, so put it into the
> > +        * dedicated workqueue.
> >          */
> > -       queue_work(system_long_wq, &link->rm_work);
> > +       queue_work(fw_devlink_wq, &link->rm_work);
>
> This has nothing to do with fw_devlink. fw_devlink is just triggering
> the issue in device links. You can hit this bug without fw_devlink too.
> So call this device_link_wq since it's consistent with device_link_* APIs.
>

I'm not sure if I got this right in my series. I do call devlink_release_queue() to
my queue. But on the Overlay side I use fwnode_links_flush_queue() because it looked
more sensible from an OF point of view. And including (in OF code) linux/fwnode.h
instead linux/device.h makes more sense to me.

> >  }
> >
> > +/**
> > + * device_link_wait_removal - Wait for ongoing devlink removal jobs to terminate
> > + */
> > +void device_link_wait_removal(void)
> > +{
> > +       /*
> > +        * devlink removal jobs are queued in the dedicated work queue.
> > +        * To be sure that all removal jobs are terminated, ensure that any
> > +        * scheduled work has run to completion.
> > +        */
> > +       drain_workqueue(fw_devlink_wq);
>
> Is there a reason this needs to be drain_workqueu() instead of
> flush_workqueue(). Drain is a stronger guarantee than we need in this
> case. All we are trying to make sure is that all the device link
> remove work queued so far have completed.
>

Yeah, I'm also using flush_workqueue().

> > +}
> > +EXPORT_SYMBOL_GPL(device_link_wait_removal);
> > +
> >  static struct class devlink_class = {
> >         .name = "devlink",
> >         .dev_groups = devlink_groups,
> > @@ -4085,9 +4100,14 @@ int __init devices_init(void)
> >         sysfs_dev_char_kobj = kobject_create_and_add("char", dev_kobj);
> >         if (!sysfs_dev_char_kobj)
> >                 goto char_kobj_err;
> > +       fw_devlink_wq = alloc_workqueue("fw_devlink_wq", 0, 0);
> > +       if (!fw_devlink_wq)
>
> Fix the name appropriately here too please.

Hi Saravana,

Oh, was not aware of this series... Please look at my first patch. It already has a
review tag by Rafael. I think the creation of the queue makes more sense to be done
in devlink_class_init(). Moreover, Rafael complained in my first version that
erroring out because we failed to create the queue is too harsh since devlinks can
still work. So, what we do is to schedule the work if we have a queue or too call
device_link_release_fn() synchronously if we don't have the queue (note that failing
to allocate the queue is very unlikely anyways).

- Nuno Sá
>

2024-02-21 07:03:26

by Nuno Sá

[permalink] [raw]
Subject: Re: [PATCH 2/2] of: overlay: Synchronize of_overlay_remove() with the devlink removals

On Tue, 2024-02-20 at 16:37 -0800, Saravana Kannan wrote:
> On Thu, Nov 30, 2023 at 9:41 AM Herve Codina <[email protected]> wrote:
> >
> > In the following sequence:
> >   1) of_platform_depopulate()
> >   2) of_overlay_remove()
> >
> > During the step 1, devices are destroyed and devlinks are removed.
> > During the step 2, OF nodes are destroyed but
> > __of_changeset_entry_destroy() can raise warnings related to missing
> > of_node_put():
> >   ERROR: memory leak, expected refcount 1 instead of 2 ...
> >
> > Indeed, during the devlink removals performed at step 1, the removal
> > itself releasing the device (and the attached of_node) is done by a job
> > queued in a workqueue and so, it is done asynchronously with respect to
> > function calls.
> > When the warning is present, of_node_put() will be called but wrongly
> > too late from the workqueue job.
> >
> > In order to be sure that any ongoing devlink removals are done before
> > the of_node destruction, synchronize the of_overlay_remove() with the
> > devlink removals.
> >
>
> Add Fixes tag for this one too to point to the change that added the workqueue.
>
> Please CC Nuno and Luca in your v2 series.
>
> > Signed-off-by: Herve Codina <[email protected]>
> > ---
> >  drivers/of/overlay.c | 6 ++++++
> >  1 file changed, 6 insertions(+)
> >
> > diff --git a/drivers/of/overlay.c b/drivers/of/overlay.c
> > index a9a292d6d59b..5c5f808b163e 100644
> > --- a/drivers/of/overlay.c
> > +++ b/drivers/of/overlay.c
> > @@ -1202,6 +1202,12 @@ int of_overlay_remove(int *ovcs_id)
> >                 goto out;
> >         }
> >
> > +       /*
> > +        * Wait for any ongoing device link removals before removing some of
> > +        * nodes
> > +        */
> > +       device_link_wait_removal();
> > +
>
> Nuno in his patch[1] had this "wait" happen inside
> __of_changeset_entry_destroy(). Which seems to be necessary to not hit
> the issue that Luca reported[2] in this patch series. Is there any
> problem with doing that?
>

In my tests, I did not saw any issue. Logically it also makes sense as you wait for
all possible refcount drops just before checking your assumptions. But I might be
missing something though...

> Luca for some reason did a unlock/lock(of_mutex) in his test patch and
> I don't think that's necessary.

Yes, I agree. queue_work() and flush_worqueue() should already have all the
synchronization semantics internally.

- Nuno Sá



2024-02-23 01:10:36

by Saravana Kannan

[permalink] [raw]
Subject: Re: [PATCH 1/2] driver core: Introduce device_link_wait_removal()

On Tue, Feb 20, 2024 at 10:56 PM Nuno Sá <[email protected]> wrote:
>
> On Tue, 2024-02-20 at 16:31 -0800, Saravana Kannan wrote:
> > On Thu, Nov 30, 2023 at 9:41 AM Herve Codina <[email protected]> wrote:
> > >
> > > The commit 80dd33cf72d1 ("drivers: base: Fix device link removal")
> > > introduces a workqueue to release the consumer and supplier devices used
> > > in the devlink.
> > > In the job queued, devices are release and in turn, when all the
> > > references to these devices are dropped, the release function of the
> > > device itself is called.
> > >
> > > Nothing is present to provide some synchronisation with this workqueue
> > > in order to ensure that all ongoing releasing operations are done and
> > > so, some other operations can be started safely.
> > >
> > > For instance, in the following sequence:
> > > 1) of_platform_depopulate()
> > > 2) of_overlay_remove()
> > >
> > > During the step 1, devices are released and related devlinks are removed
> > > (jobs pushed in the workqueue).
> > > During the step 2, OF nodes are destroyed but, without any
> > > synchronisation with devlink removal jobs, of_overlay_remove() can raise
> > > warnings related to missing of_node_put():
> > > ERROR: memory leak, expected refcount 1 instead of 2
> > >
> > > Indeed, the missing of_node_put() call is going to be done, too late,
> > > from the workqueue job execution.
> > >
> > > Introduce device_link_wait_removal() to offer a way to synchronize
> > > operations waiting for the end of devlink removals (i.e. end of
> > > workqueue jobs).
> > > Also, as a flushing operation is done on the workqueue, the workqueue
> > > used is moved from a system-wide workqueue to a local one.
> >
> > Thanks for the bug report and fix. Sorry again about the delay in
> > reviewing the changes.
> >
> > Please add Fixes tag for 80dd33cf72d1.
> >
> > > Signed-off-by: Herve Codina <[email protected]>
> > > ---
> > > drivers/base/core.c | 26 +++++++++++++++++++++++---
> > > include/linux/device.h | 1 +
> > > 2 files changed, 24 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/drivers/base/core.c b/drivers/base/core.c
> > > index ac026187ac6a..2e102a77758c 100644
> > > --- a/drivers/base/core.c
> > > +++ b/drivers/base/core.c
> > > @@ -44,6 +44,7 @@ static bool fw_devlink_is_permissive(void);
> > > static void __fw_devlink_link_to_consumers(struct device *dev);
> > > static bool fw_devlink_drv_reg_done;
> > > static bool fw_devlink_best_effort;
> > > +static struct workqueue_struct *fw_devlink_wq;
> > >
> > > /**
> > > * __fwnode_link_add - Create a link between two fwnode_handles.
> > > @@ -530,12 +531,26 @@ static void devlink_dev_release(struct device *dev)
> > > /*
> > > * It may take a while to complete this work because of the SRCU
> > > * synchronization in device_link_release_fn() and if the consumer or
> > > - * supplier devices get deleted when it runs, so put it into the "long"
> > > - * workqueue.
> > > + * supplier devices get deleted when it runs, so put it into the
> > > + * dedicated workqueue.
> > > */
> > > - queue_work(system_long_wq, &link->rm_work);
> > > + queue_work(fw_devlink_wq, &link->rm_work);
> >
> > This has nothing to do with fw_devlink. fw_devlink is just triggering
> > the issue in device links. You can hit this bug without fw_devlink too.
> > So call this device_link_wq since it's consistent with device_link_* APIs.
> >
>
> I'm not sure if I got this right in my series. I do call devlink_release_queue() to
> my queue. But on the Overlay side I use fwnode_links_flush_queue() because it looked
> more sensible from an OF point of view. And including (in OF code) linux/fwnode.h
> instead linux/device.h makes more sense to me.
>
> > > }
> > >
> > > +/**
> > > + * device_link_wait_removal - Wait for ongoing devlink removal jobs to terminate
> > > + */
> > > +void device_link_wait_removal(void)
> > > +{
> > > + /*
> > > + * devlink removal jobs are queued in the dedicated work queue.
> > > + * To be sure that all removal jobs are terminated, ensure that any
> > > + * scheduled work has run to completion.
> > > + */
> > > + drain_workqueue(fw_devlink_wq);
> >
> > Is there a reason this needs to be drain_workqueu() instead of
> > flush_workqueue(). Drain is a stronger guarantee than we need in this
> > case. All we are trying to make sure is that all the device link
> > remove work queued so far have completed.
> >
>
> Yeah, I'm also using flush_workqueue().
>
> > > +}
> > > +EXPORT_SYMBOL_GPL(device_link_wait_removal);
> > > +
> > > static struct class devlink_class = {
> > > .name = "devlink",
> > > .dev_groups = devlink_groups,
> > > @@ -4085,9 +4100,14 @@ int __init devices_init(void)
> > > sysfs_dev_char_kobj = kobject_create_and_add("char", dev_kobj);
> > > if (!sysfs_dev_char_kobj)
> > > goto char_kobj_err;
> > > + fw_devlink_wq = alloc_workqueue("fw_devlink_wq", 0, 0);
> > > + if (!fw_devlink_wq)
> >
> > Fix the name appropriately here too please.
>
> Hi Saravana,
>
> Oh, was not aware of this series... Please look at my first patch. It already has a
> review tag by Rafael. I think the creation of the queue makes more sense to be done
> in devlink_class_init(). Moreover, Rafael complained in my first version that
> erroring out because we failed to create the queue is too harsh since devlinks can
> still work.

I think Rafael can be convinced on this one. Firstly, if we fail to
allocate so early, we have bigger problems.

> So, what we do is to schedule the work if we have a queue or too call
> device_link_release_fn() synchronously if we don't have the queue (note that failing
> to allocate the queue is very unlikely anyways).

device links don't really work when you synchronously need to delete a
link since it always uses SRCUs (it used to have a #ifndef CONFIG_SRCU
locking). That's like saying a code still works when it doesn't hit a
deadlock condition.

Let's stick with Herve's patch series since he send it first and it
has fewer things that need to be fixed. If he ignores this thread for
too long, you can send a revision of yours again and we can accept
that.

-Saravana

2024-02-23 08:10:54

by Nuno Sá

[permalink] [raw]
Subject: Re: [PATCH 1/2] driver core: Introduce device_link_wait_removal()

On Thu, 2024-02-22 at 17:08 -0800, Saravana Kannan wrote:
> On Tue, Feb 20, 2024 at 10:56 PM Nuno Sá <[email protected]> wrote:
> >
> > On Tue, 2024-02-20 at 16:31 -0800, Saravana Kannan wrote:
> > > On Thu, Nov 30, 2023 at 9:41 AM Herve Codina <[email protected]>
> > > wrote:
> > > >
> > > > The commit 80dd33cf72d1 ("drivers: base: Fix device link removal")
> > > > introduces a workqueue to release the consumer and supplier devices used
> > > > in the devlink.
> > > > In the job queued, devices are release and in turn, when all the
> > > > references to these devices are dropped, the release function of the
> > > > device itself is called.
> > > >
> > > > Nothing is present to provide some synchronisation with this workqueue
> > > > in order to ensure that all ongoing releasing operations are done and
> > > > so, some other operations can be started safely.
> > > >
> > > > For instance, in the following sequence:
> > > >   1) of_platform_depopulate()
> > > >   2) of_overlay_remove()
> > > >
> > > > During the step 1, devices are released and related devlinks are removed
> > > > (jobs pushed in the workqueue).
> > > > During the step 2, OF nodes are destroyed but, without any
> > > > synchronisation with devlink removal jobs, of_overlay_remove() can raise
> > > > warnings related to missing of_node_put():
> > > >   ERROR: memory leak, expected refcount 1 instead of 2
> > > >
> > > > Indeed, the missing of_node_put() call is going to be done, too late,
> > > > from the workqueue job execution.
> > > >
> > > > Introduce device_link_wait_removal() to offer a way to synchronize
> > > > operations waiting for the end of devlink removals (i.e. end of
> > > > workqueue jobs).
> > > > Also, as a flushing operation is done on the workqueue, the workqueue
> > > > used is moved from a system-wide workqueue to a local one.
> > >
> > > Thanks for the bug report and fix. Sorry again about the delay in
> > > reviewing the changes.
> > >
> > > Please add Fixes tag for 80dd33cf72d1.
> > >
> > > > Signed-off-by: Herve Codina <[email protected]>
> > > > ---
> > > >  drivers/base/core.c    | 26 +++++++++++++++++++++++---
> > > >  include/linux/device.h |  1 +
> > > >  2 files changed, 24 insertions(+), 3 deletions(-)
> > > >
> > > > diff --git a/drivers/base/core.c b/drivers/base/core.c
> > > > index ac026187ac6a..2e102a77758c 100644
> > > > --- a/drivers/base/core.c
> > > > +++ b/drivers/base/core.c
> > > > @@ -44,6 +44,7 @@ static bool fw_devlink_is_permissive(void);
> > > >  static void __fw_devlink_link_to_consumers(struct device *dev);
> > > >  static bool fw_devlink_drv_reg_done;
> > > >  static bool fw_devlink_best_effort;
> > > > +static struct workqueue_struct *fw_devlink_wq;
> > > >
> > > >  /**
> > > >   * __fwnode_link_add - Create a link between two fwnode_handles.
> > > > @@ -530,12 +531,26 @@ static void devlink_dev_release(struct device
> > > > *dev)
> > > >         /*
> > > >          * It may take a while to complete this work because of the SRCU
> > > >          * synchronization in device_link_release_fn() and if the
> > > > consumer or
> > > > -        * supplier devices get deleted when it runs, so put it into the
> > > > "long"
> > > > -        * workqueue.
> > > > +        * supplier devices get deleted when it runs, so put it into the
> > > > +        * dedicated workqueue.
> > > >          */
> > > > -       queue_work(system_long_wq, &link->rm_work);
> > > > +       queue_work(fw_devlink_wq, &link->rm_work);
> > >
> > > This has nothing to do with fw_devlink. fw_devlink is just triggering
> > > the issue in device links. You can hit this bug without fw_devlink too.
> > > So call this device_link_wq since it's consistent with device_link_* APIs.
> > >
> >
> > I'm not sure if I got this right in my series. I do call
> > devlink_release_queue() to
> > my queue. But on the Overlay side I use fwnode_links_flush_queue() because
> > it looked
> > more sensible from an OF point of view. And including (in OF code)
> > linux/fwnode.h
> > instead linux/device.h makes more sense to me.
> >
> > > >  }
> > > >
> > > > +/**
> > > > + * device_link_wait_removal - Wait for ongoing devlink removal jobs to
> > > > terminate
> > > > + */
> > > > +void device_link_wait_removal(void)
> > > > +{
> > > > +       /*
> > > > +        * devlink removal jobs are queued in the dedicated work queue.
> > > > +        * To be sure that all removal jobs are terminated, ensure that
> > > > any
> > > > +        * scheduled work has run to completion.
> > > > +        */
> > > > +       drain_workqueue(fw_devlink_wq);
> > >
> > > Is there a reason this needs to be drain_workqueu() instead of
> > > flush_workqueue(). Drain is a stronger guarantee than we need in this
> > > case. All we are trying to make sure is that all the device link
> > > remove work queued so far have completed.
> > >
> >
> > Yeah, I'm also using flush_workqueue().
> >
> > > > +}
> > > > +EXPORT_SYMBOL_GPL(device_link_wait_removal);
> > > > +
> > > >  static struct class devlink_class = {
> > > >         .name = "devlink",
> > > >         .dev_groups = devlink_groups,
> > > > @@ -4085,9 +4100,14 @@ int __init devices_init(void)
> > > >         sysfs_dev_char_kobj = kobject_create_and_add("char", dev_kobj);
> > > >         if (!sysfs_dev_char_kobj)
> > > >                 goto char_kobj_err;
> > > > +       fw_devlink_wq = alloc_workqueue("fw_devlink_wq", 0, 0);
> > > > +       if (!fw_devlink_wq)
> > >
> > > Fix the name appropriately here too please.
> >
> > Hi Saravana,
> >
> > Oh, was not aware of this series... Please look at my first patch. It
> > already has a
> > review tag by Rafael. I think the creation of the queue makes more sense to
> > be done
> > in devlink_class_init(). Moreover, Rafael complained in my first version
> > that
> > erroring out because we failed to create the queue is too harsh since
> > devlinks can
> > still work.
>
> I think Rafael can be convinced on this one. Firstly, if we fail to
> allocate so early, we have bigger problems.

That's true...

>
> > So, what we do is to schedule the work if we have a queue or too call
> > device_link_release_fn() synchronously if we don't have the queue (note that
> > failing
> > to allocate the queue is very unlikely anyways).
>
> device links don't really work when you synchronously need to delete a
> link since it always uses SRCUs (it used to have a #ifndef CONFIG_SRCU
> locking). That's like saying a code still works when it doesn't hit a

Hmm, can you elaborate please? Why wouldn't it work if we call it synchronously?
Sure, we'll have the synchronize_srcu() call which might take some time but I'm
not honestly seeing what could go wrong other than waiting?

I can also see that we can potentially hold the devlink lock for some time but
can that lead to any deadlock (It would actually be nice - if doable at all - to
not release the refcounts with a lock hold)?
> deadlock condition.
>
> Let's stick with Herve's patch series since he send it first and it
> has fewer things that need to be fixed. If he ignores this thread for

Not exactly true :). If you look at my reply in the other thread (my series)
you'll see that I actually sent it first (as RFC - and spotted the issue way
back in May last year). About the stuff to fix, not sure if it's more. For now,
your major complain seems to be about synchronously calling
device_link_release_fn() and I did not had it in my v1. But anyways, I just want
a fix for this to land as quick as possible :) 

And I guess we also need Rafael to agree in erroring if we fail to allocate the
queue as he was against it.

- Nuno Sá


2024-02-23 08:46:49

by Herve Codina

[permalink] [raw]
Subject: Re: [PATCH 1/2] driver core: Introduce device_link_wait_removal()

Hi,

On Thu, 22 Feb 2024 17:08:28 -0800
Saravana Kannan <[email protected]> wrote:

> On Tue, Feb 20, 2024 at 10:56 PM Nuno Sá <[email protected]> wrote:
> >
> > On Tue, 2024-02-20 at 16:31 -0800, Saravana Kannan wrote:
> > > On Thu, Nov 30, 2023 at 9:41 AM Herve Codina <[email protected]> wrote:
> > > >
> > > > The commit 80dd33cf72d1 ("drivers: base: Fix device link removal")
> > > > introduces a workqueue to release the consumer and supplier devices used
> > > > in the devlink.
> > > > In the job queued, devices are release and in turn, when all the
> > > > references to these devices are dropped, the release function of the
> > > > device itself is called.
> > > >
> > > > Nothing is present to provide some synchronisation with this workqueue
> > > > in order to ensure that all ongoing releasing operations are done and
> > > > so, some other operations can be started safely.
> > > >
> > > > For instance, in the following sequence:
> > > > 1) of_platform_depopulate()
> > > > 2) of_overlay_remove()
> > > >
> > > > During the step 1, devices are released and related devlinks are removed
> > > > (jobs pushed in the workqueue).
> > > > During the step 2, OF nodes are destroyed but, without any
> > > > synchronisation with devlink removal jobs, of_overlay_remove() can raise
> > > > warnings related to missing of_node_put():
> > > > ERROR: memory leak, expected refcount 1 instead of 2
> > > >
> > > > Indeed, the missing of_node_put() call is going to be done, too late,
> > > > from the workqueue job execution.
> > > >
> > > > Introduce device_link_wait_removal() to offer a way to synchronize
> > > > operations waiting for the end of devlink removals (i.e. end of
> > > > workqueue jobs).
> > > > Also, as a flushing operation is done on the workqueue, the workqueue
> > > > used is moved from a system-wide workqueue to a local one.
> > >
> > > Thanks for the bug report and fix. Sorry again about the delay in
> > > reviewing the changes.
> > >
> > > Please add Fixes tag for 80dd33cf72d1.
> > >
> > > > Signed-off-by: Herve Codina <[email protected]>
> > > > ---
> > > > drivers/base/core.c | 26 +++++++++++++++++++++++---
> > > > include/linux/device.h | 1 +
> > > > 2 files changed, 24 insertions(+), 3 deletions(-)
> > > >
> > > > diff --git a/drivers/base/core.c b/drivers/base/core.c
> > > > index ac026187ac6a..2e102a77758c 100644
> > > > --- a/drivers/base/core.c
> > > > +++ b/drivers/base/core.c
> > > > @@ -44,6 +44,7 @@ static bool fw_devlink_is_permissive(void);
> > > > static void __fw_devlink_link_to_consumers(struct device *dev);
> > > > static bool fw_devlink_drv_reg_done;
> > > > static bool fw_devlink_best_effort;
> > > > +static struct workqueue_struct *fw_devlink_wq;
> > > >
> > > > /**
> > > > * __fwnode_link_add - Create a link between two fwnode_handles.
> > > > @@ -530,12 +531,26 @@ static void devlink_dev_release(struct device *dev)
> > > > /*
> > > > * It may take a while to complete this work because of the SRCU
> > > > * synchronization in device_link_release_fn() and if the consumer or
> > > > - * supplier devices get deleted when it runs, so put it into the "long"
> > > > - * workqueue.
> > > > + * supplier devices get deleted when it runs, so put it into the
> > > > + * dedicated workqueue.
> > > > */
> > > > - queue_work(system_long_wq, &link->rm_work);
> > > > + queue_work(fw_devlink_wq, &link->rm_work);
> > >
> > > This has nothing to do with fw_devlink. fw_devlink is just triggering
> > > the issue in device links. You can hit this bug without fw_devlink too.
> > > So call this device_link_wq since it's consistent with device_link_* APIs.
> > >
> >
> > I'm not sure if I got this right in my series. I do call devlink_release_queue() to
> > my queue. But on the Overlay side I use fwnode_links_flush_queue() because it looked
> > more sensible from an OF point of view. And including (in OF code) linux/fwnode.h
> > instead linux/device.h makes more sense to me.
> >
> > > > }
> > > >
> > > > +/**
> > > > + * device_link_wait_removal - Wait for ongoing devlink removal jobs to terminate
> > > > + */
> > > > +void device_link_wait_removal(void)
> > > > +{
> > > > + /*
> > > > + * devlink removal jobs are queued in the dedicated work queue.
> > > > + * To be sure that all removal jobs are terminated, ensure that any
> > > > + * scheduled work has run to completion.
> > > > + */
> > > > + drain_workqueue(fw_devlink_wq);
> > >
> > > Is there a reason this needs to be drain_workqueu() instead of
> > > flush_workqueue(). Drain is a stronger guarantee than we need in this
> > > case. All we are trying to make sure is that all the device link
> > > remove work queued so far have completed.
> > >
> >
> > Yeah, I'm also using flush_workqueue().
> >
> > > > +}
> > > > +EXPORT_SYMBOL_GPL(device_link_wait_removal);
> > > > +
> > > > static struct class devlink_class = {
> > > > .name = "devlink",
> > > > .dev_groups = devlink_groups,
> > > > @@ -4085,9 +4100,14 @@ int __init devices_init(void)
> > > > sysfs_dev_char_kobj = kobject_create_and_add("char", dev_kobj);
> > > > if (!sysfs_dev_char_kobj)
> > > > goto char_kobj_err;
> > > > + fw_devlink_wq = alloc_workqueue("fw_devlink_wq", 0, 0);
> > > > + if (!fw_devlink_wq)
> > >
> > > Fix the name appropriately here too please.
> >
> > Hi Saravana,
> >
> > Oh, was not aware of this series... Please look at my first patch. It already has a
> > review tag by Rafael. I think the creation of the queue makes more sense to be done
> > in devlink_class_init(). Moreover, Rafael complained in my first version that
> > erroring out because we failed to create the queue is too harsh since devlinks can
> > still work.
>
> I think Rafael can be convinced on this one. Firstly, if we fail to
> allocate so early, we have bigger problems.
>
> > So, what we do is to schedule the work if we have a queue or too call
> > device_link_release_fn() synchronously if we don't have the queue (note that failing
> > to allocate the queue is very unlikely anyways).
>
> device links don't really work when you synchronously need to delete a
> link since it always uses SRCUs (it used to have a #ifndef CONFIG_SRCU
> locking). That's like saying a code still works when it doesn't hit a
> deadlock condition.
>
> Let's stick with Herve's patch series since he send it first and it
> has fewer things that need to be fixed. If he ignores this thread for
> too long, you can send a revision of yours again and we can accept
> that.

I don't ignore the thread :)

Hope I could take some time in the near future to send a v2 of this
series.

Hervé

2024-02-23 08:53:53

by Nuno Sá

[permalink] [raw]
Subject: Re: [PATCH 1/2] driver core: Introduce device_link_wait_removal()

On Fri, 2024-02-23 at 09:46 +0100, Herve Codina wrote:
> Hi,
>
> On Thu, 22 Feb 2024 17:08:28 -0800
> Saravana Kannan <[email protected]> wrote:
>
> > On Tue, Feb 20, 2024 at 10:56 PM Nuno Sá <[email protected]> wrote:
> > >
> > > On Tue, 2024-02-20 at 16:31 -0800, Saravana Kannan wrote: 
> > > > On Thu, Nov 30, 2023 at 9:41 AM Herve Codina <[email protected]>
> > > > wrote: 
> > > > >
> > > > > The commit 80dd33cf72d1 ("drivers: base: Fix device link removal")
> > > > > introduces a workqueue to release the consumer and supplier devices
> > > > > used
> > > > > in the devlink.
> > > > > In the job queued, devices are release and in turn, when all the
> > > > > references to these devices are dropped, the release function of the
> > > > > device itself is called.
> > > > >
> > > > > Nothing is present to provide some synchronisation with this workqueue
> > > > > in order to ensure that all ongoing releasing operations are done and
> > > > > so, some other operations can be started safely.
> > > > >
> > > > > For instance, in the following sequence:
> > > > >   1) of_platform_depopulate()
> > > > >   2) of_overlay_remove()
> > > > >
> > > > > During the step 1, devices are released and related devlinks are
> > > > > removed
> > > > > (jobs pushed in the workqueue).
> > > > > During the step 2, OF nodes are destroyed but, without any
> > > > > synchronisation with devlink removal jobs, of_overlay_remove() can
> > > > > raise
> > > > > warnings related to missing of_node_put():
> > > > >   ERROR: memory leak, expected refcount 1 instead of 2
> > > > >
> > > > > Indeed, the missing of_node_put() call is going to be done, too late,
> > > > > from the workqueue job execution.
> > > > >
> > > > > Introduce device_link_wait_removal() to offer a way to synchronize
> > > > > operations waiting for the end of devlink removals (i.e. end of
> > > > > workqueue jobs).
> > > > > Also, as a flushing operation is done on the workqueue, the workqueue
> > > > > used is moved from a system-wide workqueue to a local one. 
> > > >
> > > > Thanks for the bug report and fix. Sorry again about the delay in
> > > > reviewing the changes.
> > > >
> > > > Please add Fixes tag for 80dd33cf72d1.
> > > >  
> > > > > Signed-off-by: Herve Codina <[email protected]>
> > > > > ---
> > > > >  drivers/base/core.c    | 26 +++++++++++++++++++++++---
> > > > >  include/linux/device.h |  1 +
> > > > >  2 files changed, 24 insertions(+), 3 deletions(-)
> > > > >
> > > > > diff --git a/drivers/base/core.c b/drivers/base/core.c
> > > > > index ac026187ac6a..2e102a77758c 100644
> > > > > --- a/drivers/base/core.c
> > > > > +++ b/drivers/base/core.c
> > > > > @@ -44,6 +44,7 @@ static bool fw_devlink_is_permissive(void);
> > > > >  static void __fw_devlink_link_to_consumers(struct device *dev);
> > > > >  static bool fw_devlink_drv_reg_done;
> > > > >  static bool fw_devlink_best_effort;
> > > > > +static struct workqueue_struct *fw_devlink_wq;
> > > > >
> > > > >  /**
> > > > >   * __fwnode_link_add - Create a link between two fwnode_handles.
> > > > > @@ -530,12 +531,26 @@ static void devlink_dev_release(struct device
> > > > > *dev)
> > > > >         /*
> > > > >          * It may take a while to complete this work because of the
> > > > > SRCU
> > > > >          * synchronization in device_link_release_fn() and if the
> > > > > consumer or
> > > > > -        * supplier devices get deleted when it runs, so put it into
> > > > > the "long"
> > > > > -        * workqueue.
> > > > > +        * supplier devices get deleted when it runs, so put it into
> > > > > the
> > > > > +        * dedicated workqueue.
> > > > >          */
> > > > > -       queue_work(system_long_wq, &link->rm_work);
> > > > > +       queue_work(fw_devlink_wq, &link->rm_work); 
> > > >
> > > > This has nothing to do with fw_devlink. fw_devlink is just triggering
> > > > the issue in device links. You can hit this bug without fw_devlink too.
> > > > So call this device_link_wq since it's consistent with device_link_*
> > > > APIs.
> > > >  
> > >
> > > I'm not sure if I got this right in my series. I do call
> > > devlink_release_queue() to
> > > my queue. But on the Overlay side I use fwnode_links_flush_queue() because
> > > it looked
> > > more sensible from an OF point of view. And including (in OF code)
> > > linux/fwnode.h
> > > instead linux/device.h makes more sense to me.
> > >  
> > > > >  }
> > > > >
> > > > > +/**
> > > > > + * device_link_wait_removal - Wait for ongoing devlink removal jobs
> > > > > to terminate
> > > > > + */
> > > > > +void device_link_wait_removal(void)
> > > > > +{
> > > > > +       /*
> > > > > +        * devlink removal jobs are queued in the dedicated work
> > > > > queue.
> > > > > +        * To be sure that all removal jobs are terminated, ensure
> > > > > that any
> > > > > +        * scheduled work has run to completion.
> > > > > +        */
> > > > > +       drain_workqueue(fw_devlink_wq); 
> > > >
> > > > Is there a reason this needs to be drain_workqueu() instead of
> > > > flush_workqueue(). Drain is a stronger guarantee than we need in this
> > > > case. All we are trying to make sure is that all the device link
> > > > remove work queued so far have completed.
> > > >  
> > >
> > > Yeah, I'm also using flush_workqueue().
> > >  
> > > > > +}
> > > > > +EXPORT_SYMBOL_GPL(device_link_wait_removal);
> > > > > +
> > > > >  static struct class devlink_class = {
> > > > >         .name = "devlink",
> > > > >         .dev_groups = devlink_groups,
> > > > > @@ -4085,9 +4100,14 @@ int __init devices_init(void)
> > > > >         sysfs_dev_char_kobj = kobject_create_and_add("char",
> > > > > dev_kobj);
> > > > >         if (!sysfs_dev_char_kobj)
> > > > >                 goto char_kobj_err;
> > > > > +       fw_devlink_wq = alloc_workqueue("fw_devlink_wq", 0, 0);
> > > > > +       if (!fw_devlink_wq) 
> > > >
> > > > Fix the name appropriately here too please. 
> > >
> > > Hi Saravana,
> > >
> > > Oh, was not aware of this series... Please look at my first patch. It
> > > already has a
> > > review tag by Rafael. I think the creation of the queue makes more sense
> > > to be done
> > > in devlink_class_init(). Moreover, Rafael complained in my first version
> > > that
> > > erroring out because we failed to create the queue is too harsh since
> > > devlinks can
> > > still work. 
> >
> > I think Rafael can be convinced on this one. Firstly, if we fail to
> > allocate so early, we have bigger problems.
> >
> > > So, what we do is to schedule the work if we have a queue or too call
> > > device_link_release_fn() synchronously if we don't have the queue (note
> > > that failing
> > > to allocate the queue is very unlikely anyways). 
> >
> > device links don't really work when you synchronously need to delete a
> > link since it always uses SRCUs (it used to have a #ifndef CONFIG_SRCU
> > locking). That's like saying a code still works when it doesn't hit a
> > deadlock condition.
> >
> > Let's stick with Herve's patch series since he send it first and it
> > has fewer things that need to be fixed. If he ignores this thread for
> > too long, you can send a revision of yours again and we can accept
> > that.
>
> I don't ignore the thread :)
>
> Hope I could take some time in the near future to send a v2 of this
> series.

Hi Herve,

Just let me know if you don't see that happening anytime soon :). I'm very
interested in having this applied fairly soon and I think the base idea for the
fix is more or less in place (for both series). So it should be minor details
now :).

- Nuno Sá


2024-02-23 10:41:44

by Nuno Sá

[permalink] [raw]
Subject: Re: [PATCH 2/2] of: overlay: Synchronize of_overlay_remove() with the devlink removals

On Fri, 2024-02-23 at 10:45 +0100, Herve Codina wrote:
> Hi Saravana, Nuno,
>
> On Tue, 20 Feb 2024 16:37:05 -0800
> Saravana Kannan <[email protected]> wrote:
>
> ...
> > > @@ -1202,6 +1202,12 @@ int of_overlay_remove(int *ovcs_id)
> > >                 goto out;
> > >         }
> > >
> > > +       /*
> > > +        * Wait for any ongoing device link removals before removing some
> > > of
> > > +        * nodes
> > > +        */
> > > +       device_link_wait_removal();
> > > + 
> >
> > Nuno in his patch[1] had this "wait" happen inside
> > __of_changeset_entry_destroy(). Which seems to be necessary to not hit
> > the issue that Luca reported[2] in this patch series. Is there any
> > problem with doing that?
>
> Is it the right place to wait ?
>
> __of_changeset_entry_destroy() can do some of_node_put() and I am not sure
> that of_node_put() can call device_put() when the of_node refcount reachs
> zero.
>

I don't think of_node_put() can call device_put(). At least by looking at:

https://elixir.bootlin.com/linux/v6.8-rc5/source/drivers/of/dynamic.c#L326

> If of_node_put() cannot call device_put(), I think we can wait in the
> of_changeset_destroy(). I.e. the __of_changeset_entry_destroy() caller.
>   https://elixir.bootlin.com/linux/v6.8-rc1/source/drivers/of/dynamic.c#L670
>
> What do you think about this ?
> Does it make sense ?

I think it makes sense from a logical point of view. Like, let's flush the queue
right before checking our assumptions...

In my tests, I did not saw any issue (Hopefully I was not missing any subtlety).

- Nuno Sá


2024-02-23 10:46:15

by Nuno Sá

[permalink] [raw]
Subject: Re: [PATCH 1/2] driver core: Introduce device_link_wait_removal()

On Fri, 2024-02-23 at 10:11 +0100, Herve Codina wrote:
> Hi Saravana,
>
> On Tue, 20 Feb 2024 16:31:13 -0800
> Saravana Kannan <[email protected]> wrote:
>
> ...
>
> > > +void device_link_wait_removal(void)
> > > +{
> > > +       /*
> > > +        * devlink removal jobs are queued in the dedicated work queue.
> > > +        * To be sure that all removal jobs are terminated, ensure that
> > > any
> > > +        * scheduled work has run to completion.
> > > +        */
> > > +       drain_workqueue(fw_devlink_wq); 
> >
> > Is there a reason this needs to be drain_workqueu() instead of
> > flush_workqueue(). Drain is a stronger guarantee than we need in this
> > case. All we are trying to make sure is that all the device link
> > remove work queued so far have completed.
>
> I used drain_workqueue() because drain_workqueue() allows for jobs already
> present in a workqueue to re-queue a job and drain_workqueue() will wait
> also for this new job completion.
>
> I think flush_workqueue() doesn't wait for this chain queueing.
>
> In our case, my understanding was that device_link_release_fn() calls
> put_device() for the consumer and the supplier.
> If refcounts reaches zero, devlink_dev_release() can be called again
> and re-queue a job.
>

Looks sensible. The only doubt (that Saravana mays know better) is that I'm not
sure put_device() on a supplier or consumer can actually lead to
devlink_dev_release(). AFAIU, a consumer or a supplier should not be a device
from the devlink class. Hence, looking at device_release(), I'm not sure it can
happen unless for some odd reason someone is messing with devlinks in .remove()
or .type->remove().

- Nuno Sá


2024-02-27 16:56:11

by Nuno Sá

[permalink] [raw]
Subject: Re: [PATCH 2/2] of: overlay: Synchronize of_overlay_remove() with the devlink removals

On Tue, 2024-02-27 at 16:24 +0100, Herve Codina wrote:
> Hi Saravana, Luca, Nuno,
>
> On Tue, 20 Feb 2024 16:37:05 -0800
> Saravana Kannan <[email protected]> wrote:
>
> ...
>
> > >
> > > diff --git a/drivers/of/overlay.c b/drivers/of/overlay.c
> > > index a9a292d6d59b..5c5f808b163e 100644
> > > --- a/drivers/of/overlay.c
> > > +++ b/drivers/of/overlay.c
> > > @@ -1202,6 +1202,12 @@ int of_overlay_remove(int *ovcs_id)
> > >                 goto out;
> > >         }
> > >
> > > +       /*
> > > +        * Wait for any ongoing device link removals before removing some of
> > > +        * nodes
> > > +        */
> > > +       device_link_wait_removal();
> > > + 
> >
> > Nuno in his patch[1] had this "wait" happen inside
> > __of_changeset_entry_destroy(). Which seems to be necessary to not hit
> > the issue that Luca reported[2] in this patch series. Is there any
> > problem with doing that?
> >
> > Luca for some reason did a unlock/lock(of_mutex) in his test patch and
> > I don't think that's necessary.
>
> I think the unlock/lock in Luca's case and so in Nuno's case is needed.
>
> I do the device_link_wait_removal() wihout having the of_mutex locked.
>
> Now, suppose I do the device_link_wait_removal() call with the of_mutex locked.
> The following flow is allowed and a deadlock is present.
>
> of_overlay_remove()
>   lock(of_mutex)
>      device_link_wait_removal()
>
> And, from the workqueue jobs execution:
>   ...
>     device_put()
>       some_driver->remove()
>         of_overlay_remove() <--- The job will never end.
>                                  It is waiting for of_mutex.
>                                  Deadlock
>

We may need some input from Saravana (and others) on this. I might be missing
something but can a put_device() lead into a driver remove callback? Driver code is
not device code and put_device() leads to device_release() which will either call the
device ->release(), ->type->release() or the class ->dev_release(). And, IMO, calling
of_overlay_remove() or something like that (like something that would lead to
unbinding a device from it's driver) in a device release callback would be at the
very least very questionable. Typically, what you see in there is of_node_put() and
things like kfree() of the device itself or any other data.

The driver remove callback should be called when unbinding the device from it's
drivers and devlinks should also be removed after device_unbind_cleanup() (i.e, after
the driver remove callback).

Having said the above, the driver core has lots of subtleties so, again, I can be
missing something. But at this point I'm still not seeing any deadlock...

- Nuno Sá


2024-02-27 17:54:19

by Herve Codina

[permalink] [raw]
Subject: Re: [PATCH 2/2] of: overlay: Synchronize of_overlay_remove() with the devlink removals

Hi Nuno,

On Tue, 27 Feb 2024 17:55:07 +0100
Nuno Sá <[email protected]> wrote:

> On Tue, 2024-02-27 at 16:24 +0100, Herve Codina wrote:
> > Hi Saravana, Luca, Nuno,
> >
> > On Tue, 20 Feb 2024 16:37:05 -0800
> > Saravana Kannan <[email protected]> wrote:
> >
> > ...
> >
> > > >
> > > > diff --git a/drivers/of/overlay.c b/drivers/of/overlay.c
> > > > index a9a292d6d59b..5c5f808b163e 100644
> > > > --- a/drivers/of/overlay.c
> > > > +++ b/drivers/of/overlay.c
> > > > @@ -1202,6 +1202,12 @@ int of_overlay_remove(int *ovcs_id)
> > > >                 goto out;
> > > >         }
> > > >
> > > > +       /*
> > > > +        * Wait for any ongoing device link removals before removing some of
> > > > +        * nodes
> > > > +        */
> > > > +       device_link_wait_removal();
> > > > + 
> > >
> > > Nuno in his patch[1] had this "wait" happen inside
> > > __of_changeset_entry_destroy(). Which seems to be necessary to not hit
> > > the issue that Luca reported[2] in this patch series. Is there any
> > > problem with doing that?
> > >
> > > Luca for some reason did a unlock/lock(of_mutex) in his test patch and
> > > I don't think that's necessary.
> >
> > I think the unlock/lock in Luca's case and so in Nuno's case is needed.
> >
> > I do the device_link_wait_removal() wihout having the of_mutex locked.
> >
> > Now, suppose I do the device_link_wait_removal() call with the of_mutex locked.
> > The following flow is allowed and a deadlock is present.
> >
> > of_overlay_remove()
> >   lock(of_mutex)
> >      device_link_wait_removal()
> >
> > And, from the workqueue jobs execution:
> >   ...
> >     device_put()
> >       some_driver->remove()
> >         of_overlay_remove() <--- The job will never end.
> >                                  It is waiting for of_mutex.
> >                                  Deadlock
> >
>
> We may need some input from Saravana (and others) on this. I might be missing
> something but can a put_device() lead into a driver remove callback? Driver code is
> not device code and put_device() leads to device_release() which will either call the
> device ->release(), ->type->release() or the class ->dev_release(). And, IMO, calling
> of_overlay_remove() or something like that (like something that would lead to
> unbinding a device from it's driver) in a device release callback would be at the
> very least very questionable. Typically, what you see in there is of_node_put() and
> things like kfree() of the device itself or any other data.

I think that calling of_overlay_remove() in a device release callback makes
sense. The overlay is used to declare sub-nodes from the device node. It
does not add/remove the device node itself but sub-nodes.

The use case is the use of DT overlays to describe PCI devices.
https://lore.kernel.org/all/[email protected]/
https://lore.kernel.org/lkml/[email protected]/
--- 8< ---
The lan966x SoCs can be used in two different ways:

- It can run Linux by itself, on ARM64 cores included in the SoC. This
use-case of the lan966x is currently being upstreamed, using a
traditional Device Tree representation of the lan996x HW blocks [1]
A number of drivers for the different IPs of the SoC have already
been merged in upstream Linux.

- It can be used as a PCIe endpoint, connected to a separate platform
that acts as the PCIe root complex. In this case, all the devices
that are embedded on this SoC are exposed through PCIe BARs and the
ARM64 cores of the SoC are not used. Since this is a PCIe card, it
can be plugged on any platform, of any architecture supporting PCIe.
--- 8< ---

This quite long story led to DT overlay support for PCI devices and so the
unittest I mentioned:
https://elixir.bootlin.com/linux/v6.8-rc6/source/drivers/of/unittest.c#L3946


So, I have a PCI driver that bind to the lan966x PCI board.
This driver loads an overlay at probe() and unload it at remove().
Also, this driver can be module. A simple rmmod leads to the remove() call.

This driver is not yet upstream because I haven't yet fixed all the issues I
encountered that's why of now, I can point only the unittest related to overlay
support for PCI.

>
> The driver remove callback should be called when unbinding the device from it's
> drivers and devlinks should also be removed after device_unbind_cleanup() (i.e, after
> the driver remove callback).
>
> Having said the above, the driver core has lots of subtleties so, again, I can be
> missing something. But at this point I'm still not seeing any deadlock...
>

I gave a wrong example.
Based on Luca's sequence he gave in
https://lore.kernel.org/all/20231220181627.341e8789@booty/

We can have the following:

--- 8< ---
int of_overlay_remove(int *ovcs_id)
{
...

device_link_wait_removal(); // proposed by this patch series

mutex_lock(&of_mutex);

...

ret = __of_changeset_revert_notify(&ovcs->cset);
// this ends up calling (excerpt from a long stack trace):
// -> of_i2c_notify
// -> device_remove
// -> devm_regulator_release
// -> device_link_remove
// -> devlink_dev_release, which queues work for
// device_link_release_fn, which in turn calls:
// -> device_put
// -> device_release
// -> {platform,regulator,...}_dev*_release
// -> of_node_put() [**]

...

free_overlay_changeset(ovcs);
// calls:
// -> of_changeset_destroy
// -> __of_changeset_entry_destroy
// -> pr_err("ERROR: memory leak, expected refcount 1 instead of %d...
// The error appears or not, based on when the workqueue runs

err_unlock:
mutex_unlock(&of_mutex);

...
}
--- 8< ---

I think, on your side, you can have something similar.
I was wrong (sorry for my mistake). the problem is not device_put() but
device_remove().

In my deadlog example, s/device_put()/device_remove()/

Regards,
Hervé

--
Hervé Codina, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com

2024-02-27 19:08:18

by Nuno Sá

[permalink] [raw]
Subject: Re: [PATCH 2/2] of: overlay: Synchronize of_overlay_remove() with the devlink removals

Hi Herve,

On Tue, 2024-02-27 at 18:54 +0100, Herve Codina wrote:
> Hi Nuno,
>
> On Tue, 27 Feb 2024 17:55:07 +0100
> Nuno Sá <[email protected]> wrote:
>
> > On Tue, 2024-02-27 at 16:24 +0100, Herve Codina wrote:
> > > Hi Saravana, Luca, Nuno,
> > >
> > > On Tue, 20 Feb 2024 16:37:05 -0800
> > > Saravana Kannan <[email protected]> wrote:
> > >
> > > ...
> > >  
> > > > >
> > > > > diff --git a/drivers/of/overlay.c b/drivers/of/overlay.c
> > > > > index a9a292d6d59b..5c5f808b163e 100644
> > > > > --- a/drivers/of/overlay.c
> > > > > +++ b/drivers/of/overlay.c
> > > > > @@ -1202,6 +1202,12 @@ int of_overlay_remove(int *ovcs_id)
> > > > >                 goto out;
> > > > >         }
> > > > >
> > > > > +       /*
> > > > > +        * Wait for any ongoing device link removals before removing some
> > > > > of
> > > > > +        * nodes
> > > > > +        */
> > > > > +       device_link_wait_removal();
> > > > > +   
> > > >
> > > > Nuno in his patch[1] had this "wait" happen inside
> > > > __of_changeset_entry_destroy(). Which seems to be necessary to not hit
> > > > the issue that Luca reported[2] in this patch series. Is there any
> > > > problem with doing that?
> > > >
> > > > Luca for some reason did a unlock/lock(of_mutex) in his test patch and
> > > > I don't think that's necessary. 
> > >
> > > I think the unlock/lock in Luca's case and so in Nuno's case is needed.
> > >
> > > I do the device_link_wait_removal() wihout having the of_mutex locked.
> > >
> > > Now, suppose I do the device_link_wait_removal() call with the of_mutex locked.
> > > The following flow is allowed and a deadlock is present.
> > >
> > > of_overlay_remove()
> > >   lock(of_mutex)
> > >      device_link_wait_removal()
> > >
> > > And, from the workqueue jobs execution:
> > >   ...
> > >     device_put()
> > >       some_driver->remove()
> > >         of_overlay_remove() <--- The job will never end.
> > >                                  It is waiting for of_mutex.
> > >                                  Deadlock
> > >  
> >
> > We may need some input from Saravana (and others) on this. I might be missing
> > something but can a put_device() lead into a driver remove callback? Driver code
> > is
> > not device code and put_device() leads to device_release() which will either call
> > the
> > device ->release(), ->type->release() or the class ->dev_release() And, IMO,
> > calling
> > of_overlay_remove() or something like that (like something that would lead to
> > unbinding a device from it's driver) in a device release callback would be at the
> > very least very questionable. Typically, what you see in there is of_node_put()
> > and
> > things like kfree() of the device itself or any other data.
>
> I think that calling of_overlay_remove() in a device release callback makes
> sense. The overlay is used to declare sub-nodes from the device node. It
> does not add/remove the device node itself but sub-nodes.
>

I think we are speaking about two different things... device release is not the same
as the driver remove callback. I admit the pci case seems to be a beast of it's own
and I just spent some time (given your links) on it so I can't surely be sure about
what I'm about to say... But, AFAICT, I did not saw any overlay or changeset being
removed from a kobj_type release callback.

> The use case is the use of DT overlays to describe PCI devices.
> https://lore.kernel.org/all/[email protected]/
> https://lore.kernel.org/lkml/[email protected]/
> --- 8< ---
> The lan966x SoCs can be used in two different ways:
>
>  - It can run Linux by itself, on ARM64 cores included in the SoC. This
>    use-case of the lan966x is currently being upstreamed, using a
>    traditional Device Tree representation of the lan996x HW blocks [1]
>    A number of drivers for the different IPs of the SoC have already
>    been merged in upstream Linux.
>
>  - It can be used as a PCIe endpoint, connected to a separate platform
>    that acts as the PCIe root complex. In this case, all the devices
>    that are embedded on this SoC are exposed through PCIe BARs and the
>    ARM64 cores of the SoC are not used. Since this is a PCIe card, it
>    can be plugged on any platform, of any architecture supporting PCIe.
> --- 8< ---
>
> This quite long story led to DT overlay support for PCI devices and so the
> unittest I mentioned:
>   https://elixir.bootlin.com/linux/v6.8-rc6/source/drivers/of/unittest.c#L3946
>
>
> So, I have a PCI driver that bind to the lan966x PCI board.
> This driver loads an overlay at probe() and unload it at remove().
> Also, this driver can be module. A simple rmmod leads to the remove() call.
>

Hmm, and I think that would not be an issue... Note that the code that runs in
device_link_release_fn() is doing put_device() which ends ups on the kobj_type
release callback and so far I could not see any evidence of such a callback being
responsible of calling device_remove() on another device. That would be weird (I
think) since I would expect such a call to happen in a kind of unregister function.

> This driver is not yet upstream because I haven't yet fixed all the issues I
> encountered that's why of now, I can point only the unittest related to overlay
> support for PCI.
>
> >
> > The driver remove callback should be called when unbinding the device from it's
> > drivers and devlinks should also be removed after device_unbind_cleanup() (i.e,
> > after
> > the driver remove callback).
> >
> > Having said the above, the driver core has lots of subtleties so, again, I can be
> > missing something. But at this point I'm still not seeing any deadlock..
> >
>
> I gave a wrong example.
> Based on Luca's sequence he gave in
>   https://lore.kernel.org/all/20231220181627.341e8789@booty/

Regarding Luca's comments, my first approach was actually to just make the devlink
removal synchronously... I'm still not sure what would be the issue of doing that
(other than potentially waiting some time for the srcu synchronization). About the
unlock, I'm just not sure what could happen if someone else (other than us) sneaks in
and grabs the of_mutex while we are in the middle of removing an overlay...

>
> We can have the following:
>
> --- 8< ---
> int of_overlay_remove(int *ovcs_id)
> {
>     ...
>
>     device_link_wait_removal(); // proposed by this patch series
>
>     mutex_lock(&of_mutex);
>
>     ...
>
>     ret = __of_changeset_revert_notify(&ovcs->cset);
>     // this ends up calling (excerpt from a long stack trace):
>     // -> of_i2c_notify
>     // -> device_remove
>     // -> devm_regulator_release
>     // -> device_link_remove
>     // -> devlink_dev_release, which queues work for
>     //      device_link_release_fn, which in turn calls:
>     //      -> device_put
>     //      -> device_release
>     //      -> {platform,regulator,...}_dev*_release
>     //      -> of_node_put() [**]
>
>     ...
>
>     free_overlay_changeset(ovcs);
>     // calls:
>     // -> of_changeset_destroy
>     // -> __of_changeset_entry_destroy
>     // -> pr_err("ERROR: memory leak, expected refcount 1 instead of %d...
>     // The error appears or not, based on when the workqueue runs
>
> err_unlock:
>     mutex_unlock(&of_mutex);
>
>     ...
> }
> --- 8< ---
>
> I think, on your side, you can have something similar.
> I was wrong (sorry for my mistake). the problem is not device_put() but
> device_remove().

But I'm not seeing how device_remove() can deadlock since I'm not sure we can go from
device_link_release_fn() to device_remove(). If there's such a path, then I'll agree
on the deadlock.

>
> In my deadlog example, s/device_put()/device_remove()/
>

Exactly... and that is why my first question was to wonder about put_device() being
able to call any overlay removal code. So, do you know if it's really possible for a
device release callback to end up calling device_remove()? Because, then I could see
the deadlock as device_remove() can end up unbinding the device from it's driver and
hence calling drv->remove().

- Nuno Sá

2024-02-27 19:13:57

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH 2/2] of: overlay: Synchronize of_overlay_remove() with the devlink removals

On Tue, Feb 27, 2024 at 8:08 PM Nuno Sá <[email protected]> wrote:
>
> Hi Herve,
>
> On Tue, 2024-02-27 at 18:54 +0100, Herve Codina wrote:
> > Hi Nuno,
> >
> > On Tue, 27 Feb 2024 17:55:07 +0100
> > Nuno Sá <[email protected]> wrote:
> >
> > > On Tue, 2024-02-27 at 16:24 +0100, Herve Codina wrote:
> > > > Hi Saravana, Luca, Nuno,
> > > >
> > > > On Tue, 20 Feb 2024 16:37:05 -0800
> > > > Saravana Kannan <[email protected]> wrote:
> > > >
> > > > ...
> > > >
> > > > > >
> > > > > > diff --git a/drivers/of/overlay.c b/drivers/of/overlay.c
> > > > > > index a9a292d6d59b..5c5f808b163e 100644
> > > > > > --- a/drivers/of/overlay.c
> > > > > > +++ b/drivers/of/overlay.c
> > > > > > @@ -1202,6 +1202,12 @@ int of_overlay_remove(int *ovcs_id)
> > > > > > goto out;
> > > > > > }
> > > > > >
> > > > > > + /*
> > > > > > + * Wait for any ongoing device link removals before removing some
> > > > > > of
> > > > > > + * nodes
> > > > > > + */
> > > > > > + device_link_wait_removal();
> > > > > > +
> > > > >
> > > > > Nuno in his patch[1] had this "wait" happen inside
> > > > > __of_changeset_entry_destroy(). Which seems to be necessary to not hit
> > > > > the issue that Luca reported[2] in this patch series. Is there any
> > > > > problem with doing that?
> > > > >
> > > > > Luca for some reason did a unlock/lock(of_mutex) in his test patch and
> > > > > I don't think that's necessary.
> > > >
> > > > I think the unlock/lock in Luca's case and so in Nuno's case is needed.
> > > >
> > > > I do the device_link_wait_removal() wihout having the of_mutex locked.
> > > >
> > > > Now, suppose I do the device_link_wait_removal() call with the of_mutex locked.
> > > > The following flow is allowed and a deadlock is present.
> > > >
> > > > of_overlay_remove()
> > > > lock(of_mutex)
> > > > device_link_wait_removal()
> > > >
> > > > And, from the workqueue jobs execution:
> > > > ...
> > > > device_put()
> > > > some_driver->remove()
> > > > of_overlay_remove() <--- The job will never end.
> > > > It is waiting for of_mutex.
> > > > Deadlock
> > > >
> > >
> > > We may need some input from Saravana (and others) on this. I might be missing
> > > something but can a put_device() lead into a driver remove callback? Driver code
> > > is
> > > not device code and put_device() leads to device_release() which will either call
> > > the
> > > device ->release(), ->type->release() or the class ->dev_release(). And, IMO,
> > > calling
> > > of_overlay_remove() or something like that (like something that would lead to
> > > unbinding a device from it's driver) in a device release callback would be at the
> > > very least very questionable. Typically, what you see in there is of_node_put()
> > > and
> > > things like kfree() of the device itself or any other data.
> >
> > I think that calling of_overlay_remove() in a device release callback makes
> > sense. The overlay is used to declare sub-nodes from the device node. It
> > does not add/remove the device node itself but sub-nodes.
> >
>
> I think we are speaking about two different things... device release is not the same
> as the driver remove callback. I admit the pci case seems to be a beast of it's own
> and I just spent some time (given your links) on it so I can't surely be sure about
> what I'm about to say... But, AFAICT, I did not saw any overlay or changeset being
> removed from a kobj_type release callback.
>
> > The use case is the use of DT overlays to describe PCI devices.
> > https://lore.kernel.org/all/[email protected]/
> > https://lore.kernel.org/lkml/[email protected]/
> > --- 8< ---
> > The lan966x SoCs can be used in two different ways:
> >
> > - It can run Linux by itself, on ARM64 cores included in the SoC. This
> > use-case of the lan966x is currently being upstreamed, using a
> > traditional Device Tree representation of the lan996x HW blocks [1]
> > A number of drivers for the different IPs of the SoC have already
> > been merged in upstream Linux.
> >
> > - It can be used as a PCIe endpoint, connected to a separate platform
> > that acts as the PCIe root complex. In this case, all the devices
> > that are embedded on this SoC are exposed through PCIe BARs and the
> > ARM64 cores of the SoC are not used. Since this is a PCIe card, it
> > can be plugged on any platform, of any architecture supporting PCIe.
> > --- 8< ---
> >
> > This quite long story led to DT overlay support for PCI devices and so the
> > unittest I mentioned:
> > https://elixir.bootlin.com/linux/v6.8-rc6/source/drivers/of/unittest.c#L3946
> >
> >
> > So, I have a PCI driver that bind to the lan966x PCI board.
> > This driver loads an overlay at probe() and unload it at remove().
> > Also, this driver can be module. A simple rmmod leads to the remove() call.
> >
>
> Hmm, and I think that would not be an issue... Note that the code that runs in
> device_link_release_fn() is doing put_device() which ends ups on the kobj_type
> release callback and so far I could not see any evidence of such a callback being
> responsible of calling device_remove() on another device. That would be weird (I
> think) since I would expect such a call to happen in a kind of unregister function.
>
> > This driver is not yet upstream because I haven't yet fixed all the issues I
> > encountered that's why of now, I can point only the unittest related to overlay
> > support for PCI.
> >
> > >
> > > The driver remove callback should be called when unbinding the device from it's
> > > drivers and devlinks should also be removed after device_unbind_cleanup() (i.e,
> > > after
> > > the driver remove callback).
> > >
> > > Having said the above, the driver core has lots of subtleties so, again, I can be
> > > missing something. But at this point I'm still not seeing any deadlock...
> > >
> >
> > I gave a wrong example.
> > Based on Luca's sequence he gave in
> > https://lore.kernel.org/all/20231220181627.341e8789@booty/
>
> Regarding Luca's comments, my first approach was actually to just make the devlink
> removal synchronously... I'm still not sure what would be the issue of doing that
> (other than potentially waiting some time for the srcu synchronization).

It would allow forward progress to be made, but it would add potential
delay for everybody, which is only really needed in the DT overlay
case. Sounds like something to avoid to me.

> About the unlock, I'm just not sure what could happen if someone else (other than us) sneaks in
> and grabs the of_mutex while we are in the middle of removing an overlay..

If that is possible at all, I'd call it a bug.

2024-02-27 19:28:56

by Nuno Sá

[permalink] [raw]
Subject: Re: [PATCH 2/2] of: overlay: Synchronize of_overlay_remove() with the devlink removals

On Tue, 2024-02-27 at 20:13 +0100, Rafael J. Wysocki wrote:
> On Tue, Feb 27, 2024 at 8:08 PM Nuno Sá <[email protected]> wrote:
> >
> > Hi Herve,
> >
> > On Tue, 2024-02-27 at 18:54 +0100, Herve Codina wrote:
> > > Hi Nuno,
> > >
> > > On Tue, 27 Feb 2024 17:55:07 +0100
> > > Nuno Sá <[email protected]> wrote:
> > >
> > > > On Tue, 2024-02-27 at 16:24 +0100, Herve Codina wrote:
> > > > > Hi Saravana, Luca, Nuno,
> > > > >
> > > > > On Tue, 20 Feb 2024 16:37:05 -0800
> > > > > Saravana Kannan <[email protected]> wrote:
> > > > >
> > > > > ...
> > > > >
> > > > > > >
> > > > > > > diff --git a/drivers/of/overlay.c b/drivers/of/overlay.c
> > > > > > > index a9a292d6d59b..5c5f808b163e 100644
> > > > > > > --- a/drivers/of/overlay.c
> > > > > > > +++ b/drivers/of/overlay.c
> > > > > > > @@ -1202,6 +1202,12 @@ int of_overlay_remove(int *ovcs_id)
> > > > > > >                 goto out;
> > > > > > >         }
> > > > > > >
> > > > > > > +       /*
> > > > > > > +        * Wait for any ongoing device link removals before removing
> > > > > > > some
> > > > > > > of
> > > > > > > +        * nodes
> > > > > > > +        */
> > > > > > > +       device_link_wait_removal();
> > > > > > > +
> > > > > >
> > > > > > Nuno in his patch[1] had this "wait" happen inside
> > > > > > __of_changeset_entry_destroy(). Which seems to be necessary to not hit
> > > > > > the issue that Luca reported[2] in this patch series. Is there any
> > > > > > problem with doing that?
> > > > > >
> > > > > > Luca for some reason did a unlock/lock(of_mutex) in his test patch and
> > > > > > I don't think that's necessary.
> > > > >
> > > > > I think the unlock/lock in Luca's case and so in Nuno's case is needed.
> > > > >
> > > > > I do the device_link_wait_removal() wihout having the of_mutex locked.
> > > > >
> > > > > Now, suppose I do the device_link_wait_removal() call with the of_mutex
> > > > > locked.
> > > > > The following flow is allowed and a deadlock is present.
> > > > >
> > > > > of_overlay_remove()
> > > > >   lock(of_mutex)
> > > > >      device_link_wait_removal()
> > > > >
> > > > > And, from the workqueue jobs execution:
> > > > >   ...
> > > > >     device_put()
> > > > >       some_driver->remove()
> > > > >         of_overlay_remove() <--- The job will never end.
> > > > >                                  It is waiting for of_mutex.
> > > > >                                  Deadlock
> > > > >
> > > >
> > > > We may need some input from Saravana (and others) on this. I might be missing
> > > > something but can a put_device() lead into a driver remove callback? Driver
> > > > code
> > > > is
> > > > not device code and put_device() leads to device_release() which will either
> > > > call
> > > > the
> > > > device ->release(), ->type->release() or the class ->dev_release(). And, IMO,
> > > > calling
> > > > of_overlay_remove() or something like that (like something that would lead to
> > > > unbinding a device from it's driver) in a device release callback would be at
> > > > the
> > > > very least very questionable. Typically, what you see in there is
> > > > of_node_put()
> > > > and
> > > > things like kfree() of the device itself or any other data.
> > >
> > > I think that calling of_overlay_remove() in a device release callback makes
> > > sense. The overlay is used to declare sub-nodes from the device node. It
> > > does not add/remove the device node itself but sub-nodes.
> > >
> >
> > I think we are speaking about two different things... device release is not the
> > same
> > as the driver remove callback. I admit the pci case seems to be a beast of it's
> > own
> > and I just spent some time (given your links) on it so I can't surely be sure
> > about
> > what I'm about to say... But, AFAICT, I did not saw any overlay or changeset
> > being
> > removed from a kobj_type release callback.
> >
> > > The use case is the use of DT overlays to describe PCI devices.
> > > https://lore.kernel.org/all/[email protected]/
> > > https://lore.kernel.org/lkml/[email protected]/
> > > --- 8< ---
> > > The lan966x SoCs can be used in two different ways:
> > >
> > >  - It can run Linux by itself, on ARM64 cores included in the SoC. This
> > >    use-case of the lan966x is currently being upstreamed, using a
> > >    traditional Device Tree representation of the lan996x HW blocks [1]
> > >    A number of drivers for the different IPs of the SoC have already
> > >    been merged in upstream Linux.
> > >
> > >  - It can be used as a PCIe endpoint, connected to a separate platform
> > >    that acts as the PCIe root complex. In this case, all the devices
> > >    that are embedded on this SoC are exposed through PCIe BARs and the
> > >    ARM64 cores of the SoC are not used. Since this is a PCIe card, it
> > >    can be plugged on any platform, of any architecture supporting PCIe.
> > > --- 8< ---
> > >
> > > This quite long story led to DT overlay support for PCI devices and so the
> > > unittest I mentioned:
> > >   https://elixir.bootlin.com/linux/v6.8-rc6/source/drivers/of/unittest.c#L3946
> > >
> > >
> > > So, I have a PCI driver that bind to the lan966x PCI board.
> > > This driver loads an overlay at probe() and unload it at remove().
> > > Also, this driver can be module. A simple rmmod leads to the remove() call.
> > >
> >
> > Hmm, and I think that would not be an issue... Note that the code that runs in
> > device_link_release_fn() is doing put_device() which ends ups on the kobj_type
> > release callback and so far I could not see any evidence of such a callback being
> > responsible of calling device_remove() on another device. That would be weird (I
> > think) since I would expect such a call to happen in a kind of unregister
> > function.
> >
> > > This driver is not yet upstream because I haven't yet fixed all the issues I
> > > encountered that's why of now, I can point only the unittest related to overlay
> > > support for PCI.
> > >
> > > >
> > > > The driver remove callback should be called when unbinding the device from
> > > > it's
> > > > drivers and devlinks should also be removed after device_unbind_cleanup()
> > > > (i.e,
> > > > after
> > > > the driver remove callback).
> > > >
> > > > Having said the above, the driver core has lots of subtleties so, again, I
> > > > can be
> > > > missing something. But at this point I'm still not seeing any deadlock...
> > > >
> > >
> > > I gave a wrong example.
> > > Based on Luca's sequence he gave in
> > >   https://lore.kernel.org/all/20231220181627.341e8789@booty/
> >
> > Regarding Luca's comments, my first approach was actually to just make the
> > devlink
> > removal synchronously... I'm still not sure what would be the issue of doing that
> > (other than potentially waiting some time for the srcu synchronization).
>
> It would allow forward progress to be made, but it would add potential
> delay for everybody, which is only really needed in the DT overlay
> case.  Sounds like something to avoid to me.

I mean, we could surely detect (or have a way to do it) if the devlink is being
removed as part of an overlay removal and only call device_link_release_fn()
synchronously in that case. It would minimize the effects I guess.

But yeah, we still need to make sure there's an actual deadlock... I'm still not
seeing it but Herve may very well be correct about it.

>
> > About the unlock, I'm just not sure what could happen if someone else (other than
> > us) sneaks in
> > and grabs the of_mutex while we are in the middle of removing an overlay...
>
> If that is possible at all, I'd call it a bug.

I think, in theory, it could happen as it looks to be a fairly coarse grained lock
for OF:

https://elixir.bootlin.com/linux/latest/source/drivers/of/of_private.h#L40

- Nuno Sá

2024-02-29 23:27:29

by Saravana Kannan

[permalink] [raw]
Subject: Re: [PATCH 1/2] driver core: Introduce device_link_wait_removal()

On Fri, Feb 23, 2024 at 2:41 AM Nuno Sá <[email protected]> wrote:
>
> On Fri, 2024-02-23 at 10:11 +0100, Herve Codina wrote:
> > Hi Saravana,
> >
> > On Tue, 20 Feb 2024 16:31:13 -0800
> > Saravana Kannan <[email protected]> wrote:
> >
> > ...
> >
> > > > +void device_link_wait_removal(void)
> > > > +{
> > > > + /*
> > > > + * devlink removal jobs are queued in the dedicated work queue.
> > > > + * To be sure that all removal jobs are terminated, ensure that
> > > > any
> > > > + * scheduled work has run to completion.
> > > > + */
> > > > + drain_workqueue(fw_devlink_wq);
> > >
> > > Is there a reason this needs to be drain_workqueu() instead of
> > > flush_workqueue(). Drain is a stronger guarantee than we need in this
> > > case. All we are trying to make sure is that all the device link
> > > remove work queued so far have completed.
> >
> > I used drain_workqueue() because drain_workqueue() allows for jobs already
> > present in a workqueue to re-queue a job and drain_workqueue() will wait
> > also for this new job completion.
> >
> > I think flush_workqueue() doesn't wait for this chain queueing.
> >
> > In our case, my understanding was that device_link_release_fn() calls
> > put_device() for the consumer and the supplier.
> > If refcounts reaches zero, devlink_dev_release() can be called again
> > and re-queue a job.
> >
>
> Looks sensible. The only doubt (that Saravana mays know better) is that I'm not
> sure put_device() on a supplier or consumer can actually lead to
> devlink_dev_release(). AFAIU, a consumer or a supplier should not be a device
> from the devlink class. Hence, looking at device_release(), I'm not sure it can
> happen unless for some odd reason someone is messing with devlinks in .remove()
> or .type->remove().

The case we are trying to fix here involves a supplier or a consumer
device (say Device-A) being device_del(). When that happens, all the
device links to/from the device are deleted by a call to
device_links_purge() since a device link can't exist without both the
supplier and consumer existing.

The problem you were hitting is that the device link deletion code
does the put_device(Device-A) in a workqueue. You change is to make
sure to wait until that has completed. To do that, you only need to
wait for the device link deletion work (already queued before
returning from device_del()) to finish. You don't need to wait for
anything more.

I read up on drain_workqueue() before I made my comments. The point I
was trying to make is that there could be some unrelated device link
deletions that you don't need to wait on.

But taking a closer look[1], it looks like drain_workqueue() might
actually cause bugs because while a workqueue is being drained, if
another unrelated device link deletion is trying to queue work, that
will get ignored.

Reply to rest of the emails in this thread here:

Nuno,

Sorry if I messed up who sent the first patch, but I did dig back to
your v1. But I could be wrong.

If devlink_dev_release() could have done the work synchronously, we'd
have done it in the first place. It's actually a bug because
devlink_dev_release() gets called in atomic context but the
put_device() on the supplier/consumer can do some sleeping work.

-Saravana

[1] - https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/workqueue.c#n1727

2024-03-01 07:14:15

by Nuno Sá

[permalink] [raw]
Subject: Re: [PATCH 1/2] driver core: Introduce device_link_wait_removal()

On Thu, 2024-02-29 at 15:26 -0800, Saravana Kannan wrote:
> On Fri, Feb 23, 2024 at 2:41 AM Nuno Sá <[email protected]> wrote:
> >
> > On Fri, 2024-02-23 at 10:11 +0100, Herve Codina wrote:
> > > Hi Saravana,
> > >
> > > On Tue, 20 Feb 2024 16:31:13 -0800
> > > Saravana Kannan <[email protected]> wrote:
> > >
> > > ...
> > >
> > > > > +void device_link_wait_removal(void)
> > > > > +{
> > > > > +       /*
> > > > > +        * devlink removal jobs are queued in the dedicated work queue.
> > > > > +        * To be sure that all removal jobs are terminated, ensure that
> > > > > any
> > > > > +        * scheduled work has run to completion.
> > > > > +        */
> > > > > +       drain_workqueue(fw_devlink_wq);
> > > >
> > > > Is there a reason this needs to be drain_workqueu() instead of
> > > > flush_workqueue(). Drain is a stronger guarantee than we need in this
> > > > case. All we are trying to make sure is that all the device link
> > > > remove work queued so far have completed.
> > >
> > > I used drain_workqueue() because drain_workqueue() allows for jobs already
> > > present in a workqueue to re-queue a job and drain_workqueue() will wait
> > > also for this new job completion.
> > >
> > > I think flush_workqueue() doesn't wait for this chain queueing.
> > >
> > > In our case, my understanding was that device_link_release_fn() calls
> > > put_device() for the consumer and the supplier.
> > > If refcounts reaches zero, devlink_dev_release() can be called again
> > > and re-queue a job.
> > >
> >
> > Looks sensible. The only doubt (that Saravana mays know better) is that I'm not
> > sure put_device() on a supplier or consumer can actually lead to
> > devlink_dev_release(). AFAIU, a consumer or a supplier should not be a device
> > from the devlink class. Hence, looking at device_release(), I'm not sure it can
> > happen unless for some odd reason someone is messing with devlinks in .remove()
> > or .type->remove().
>
> The case we are trying to fix here involves a supplier or a consumer
> device (say Device-A) being device_del(). When that happens, all the
> device links to/from the device are deleted by a call to
> device_links_purge() since a device link can't exist without both the
> supplier and consumer existing.
>
> The problem you were hitting is that the device link deletion code
> does the put_device(Device-A) in a workqueue. You change is to make
> sure to wait until that has completed. To do that, you only need to
> wait for the device link deletion work (already queued before
> returning from device_del()) to finish. You don't need to wait for
> anything more.
>
> I read up on drain_workqueue() before I made my comments. The point I
> was trying to make is that there could be some unrelated device link
> deletions that you don't need to wait on.
>
> But taking a closer look[1], it looks like drain_workqueue() might
> actually cause bugs because while a workqueue is being drained, if
> another unrelated device link deletion is trying to queue work, that
> will get ignored.
>

Oh, even worst then... please also take a look at the new v3 Herve sent. Herve is
already convinced about flush_workqueue(). The other sensible discussion is about
releasing the of_mutex in patch 2. I'm not convinced we need it but you may know
better.

> Reply to rest of the emails in this thread here:
>
> Nuno,
>
> Sorry if I messed up who sent the first patch, but I did dig back to
> your v1. But I could be wrong.
>

I did sent first a RFC [1] (which should also count :)). And it actually took a lot
of "pushing" with resends to get some attention on this. And if follow the RFC you'll
even see that I first reported the issue in May or something (but did not really put
too much effort on it at the time).

I have to admit it's a bit frustrating given how much I pushed and insisted in fixing
this (and not have my own patches in :P). But that's life and in the end of day I
just care about this being fixed. So, no hard feelings :).

> If devlink_dev_release() could have done the work synchronously, we'd
> have done it in the first place. It's actually a bug because
> devlink_dev_release() gets called in atomic context but the
> put_device() on the supplier/consumer can do some sleeping work.
>

Not sure I'm following the above. I may be missing something but looking at the code
paths it actually looks like devlink_dev_release() is always called with the
device_links_lock held. Therefore we need to be already in a sleeping context or we
already have a problem...

Looking at git history, the problem we had before was that we were using call_srcu()
and the srcu callback cannot sleep which could happen in a device release function.

Anyways, Rafael already said he's fine in erroring out in case the queue fails to
allocate (as you said, if that happens the system is already likely screwed). My only
complain now is in the place we're allocating the queue.

[1]: https://lore.kernel.org/lkml/[email protected]/

- Nuno Sá

> -Saravana
>
> [1] -
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/workqueue.c#n1727