2024-02-05 12:16:11

by Nuno Sa via B4 Relay

[permalink] [raw]
Subject: [PATCH v2 0/2] fix DT overlays when device links are released

Link to RFC:
* https://lore.kernel.org/lkml/[email protected]/

v1:
* https://lore.kernel.org/r/[email protected]

Changes in v2:
* Don't error out in case alloc_workqueue() fails. Devlinks can still
work and we'll then release them synchronously.

I'm pasting again the link of the first time I exposed the issue where
one can see the resulps (big splat) of failing DT assumption:

https://lore.kernel.org/linux-devicetree/[email protected]/

---
Nuno Sa (2):
driver: core: add dedicated workqueue for devlink removal
of: dynamic: flush devlinks workqueue before destroying the changeset

drivers/base/core.c | 32 ++++++++++++++++++++++++++++----
drivers/of/dynamic.c | 8 ++++++++
include/linux/fwnode.h | 1 +
3 files changed, 37 insertions(+), 4 deletions(-)
---
base-commit: 6613476e225e090cc9aad49be7fa504e290dd33d
change-id: 20240123-fix-device-links-overlays-5422e033a09b
--

Thanks!
- Nuno Sá



2024-02-05 12:16:16

by Nuno Sa via B4 Relay

[permalink] [raw]
Subject: [PATCH v2 2/2] of: dynamic: flush devlinks workqueue before destroying the changeset

From: Nuno Sa <[email protected]>

Device links will drop their supplier + consumer refcounts
asynchronously. That means that the refcount of the of_node attached to
these devices will also be dropped asynchronously and so we cannot
guarantee the DT overlay assumption that the of_node refcount must be 1 in
__of_changeset_entry_destroy().

Given the above, call the new fwnode_links_flush_queue() helper to flush
the devlink workqueue so we can be sure that all links are dropped before
doing the proper checks.

Signed-off-by: Nuno Sa <[email protected]>
---
drivers/of/dynamic.c | 8 ++++++++
1 file changed, 8 insertions(+)

diff --git a/drivers/of/dynamic.c b/drivers/of/dynamic.c
index 3bf27052832f..b7153c72c9c9 100644
--- a/drivers/of/dynamic.c
+++ b/drivers/of/dynamic.c
@@ -14,6 +14,7 @@
#include <linux/slab.h>
#include <linux/string.h>
#include <linux/proc_fs.h>
+#include <linux/fwnode.h>

#include "of_private.h"

@@ -518,6 +519,13 @@ EXPORT_SYMBOL(of_changeset_create_node);

static void __of_changeset_entry_destroy(struct of_changeset_entry *ce)
{
+ /*
+ * device links drop their device references (and hence their of_node
+ * references) asynchronously on a dedicated workqueue. Hence we need
+ * to flush it to make sure everything is done before doing the below
+ * checks.
+ */
+ fwnode_links_flush_queue();
if (ce->action == OF_RECONFIG_ATTACH_NODE &&
of_node_check_flag(ce->np, OF_OVERLAY)) {
if (kref_read(&ce->np->kobj.kref) > 1) {

--
2.43.0


2024-02-05 12:16:17

by Nuno Sa via B4 Relay

[permalink] [raw]
Subject: [PATCH v2 1/2] driver: core: add dedicated workqueue for devlink removal

From: Nuno Sa <[email protected]>

Let's use a dedicated queue for devlinks since releasing a link happens
asynchronously but some code paths, like DT overlays, have some
expectations regarding the of_node when being removed (the refcount must
be 1). Given how devlinks are released that cannot be assured. Hence, add a
dedicated queue so that it's easy to sync against devlinks removal.

While at it, make sure to explicitly include <linux/workqueue.h>.

Signed-off-by: Nuno Sa <[email protected]>
---
drivers/base/core.c | 32 ++++++++++++++++++++++++++++----
include/linux/fwnode.h | 1 +
2 files changed, 29 insertions(+), 4 deletions(-)

diff --git a/drivers/base/core.c b/drivers/base/core.c
index 14d46af40f9a..4bb9c10489ed 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -31,6 +31,7 @@
#include <linux/swiotlb.h>
#include <linux/sysfs.h>
#include <linux/dma-map-ops.h> /* for dma_default_coherent */
+#include <linux/workqueue.h>

#include "base.h"
#include "physical_location.h"
@@ -44,6 +45,7 @@ static bool fw_devlink_is_permissive(void);
static void __fw_devlink_link_to_consumers(struct device *dev);
static bool fw_devlink_drv_reg_done;
static bool fw_devlink_best_effort;
+static struct workqueue_struct *devlink_release_queue __ro_after_init;

/**
* __fwnode_link_add - Create a link between two fwnode_handles.
@@ -235,6 +237,12 @@ static void __fw_devlink_pickup_dangling_consumers(struct fwnode_handle *fwnode,
__fw_devlink_pickup_dangling_consumers(child, new_sup);
}

+void fwnode_links_flush_queue(void)
+{
+ if (devlink_release_queue)
+ flush_workqueue(devlink_release_queue);
+}
+
static DEFINE_MUTEX(device_links_lock);
DEFINE_STATIC_SRCU(device_links_srcu);

@@ -531,9 +539,13 @@ static void devlink_dev_release(struct device *dev)
* It may take a while to complete this work because of the SRCU
* synchronization in device_link_release_fn() and if the consumer or
* supplier devices get deleted when it runs, so put it into the "long"
- * workqueue.
+ * devlink workqueue (in case we could allocate one).
+ *
*/
- queue_work(system_long_wq, &link->rm_work);
+ if (devlink_release_queue)
+ queue_work(devlink_release_queue, &link->rm_work);
+ else
+ device_link_release_fn(&link->rm_work);
}

static struct class devlink_class = {
@@ -636,10 +648,22 @@ static int __init devlink_class_init(void)
return ret;

ret = class_interface_register(&devlink_class_intf);
- if (ret)
+ if (ret) {
class_unregister(&devlink_class);
+ return ret;
+ }

- return ret;
+ /*
+ * Using a dedicated queue for devlinks since releasing a link happens
+ * asynchronously but some code paths, like DT overlays, have some
+ * expectations regarding the of_node when being removed (the refcount
+ * must be 1). Given how devlinks are released that cannot be assured.
+ * Hence, add a dedicated queue so that it's easy to sync against
+ * devlinks removal.
+ */
+ devlink_release_queue = alloc_workqueue("devlink_release", 0, 0);
+
+ return 0;
}
postcore_initcall(devlink_class_init);

diff --git a/include/linux/fwnode.h b/include/linux/fwnode.h
index 2a72f55d26eb..017b170e9903 100644
--- a/include/linux/fwnode.h
+++ b/include/linux/fwnode.h
@@ -213,5 +213,6 @@ extern bool fw_devlink_is_strict(void);
int fwnode_link_add(struct fwnode_handle *con, struct fwnode_handle *sup);
void fwnode_links_purge(struct fwnode_handle *fwnode);
void fw_devlink_purge_absent_suppliers(struct fwnode_handle *fwnode);
+void fwnode_links_flush_queue(void);

#endif

--
2.43.0


2024-02-05 12:38:23

by Andy Shevchenko

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] of: dynamic: flush devlinks workqueue before destroying the changeset

On Mon, Feb 05, 2024 at 01:09:33PM +0100, Nuno Sa via B4 Relay wrote:
> From: Nuno Sa <[email protected]>
>
> Device links will drop their supplier + consumer refcounts
> asynchronously. That means that the refcount of the of_node attached to
> these devices will also be dropped asynchronously and so we cannot
> guarantee the DT overlay assumption that the of_node refcount must be 1 in
> __of_changeset_entry_destroy().
>
> Given the above, call the new fwnode_links_flush_queue() helper to flush
> the devlink workqueue so we can be sure that all links are dropped before
> doing the proper checks.

Have you seen my comments against v1?

> +++ b/drivers/of/dynamic.c
> @@ -14,6 +14,7 @@
> #include <linux/slab.h>
> #include <linux/string.h>
> #include <linux/proc_fs.h>
> +#include <linux/fwnode.h>

Try to squeeze this to make it ordered (with given context it may go before
linux/s* ones, but maybe you may find a better spot).

..

> + /*
> + * device links drop their device references (and hence their of_node

Device links...

> + * references) asynchronously on a dedicated workqueue. Hence we need
> + * to flush it to make sure everything is done before doing the below
> + * checks.
> + */

--
With Best Regards,
Andy Shevchenko



2024-02-05 13:11:15

by Nuno Sa

[permalink] [raw]
Subject: RE: [PATCH v2 2/2] of: dynamic: flush devlinks workqueue before destroying the changeset



> -----Original Message-----
> From: Andy Shevchenko <[email protected]>
> Sent: Monday, February 5, 2024 1:36 PM
> To: Sa, Nuno <[email protected]>
> Cc: Greg Kroah-Hartman <[email protected]>; Rafael J. Wysocki
> <[email protected]>; Frank Rowand <[email protected]>; Rob Herring
> <[email protected]>; Daniel Scally <[email protected]>; Heikki Krogerus
> <[email protected]>; Sakari Ailus <[email protected]>;
> Len Brown <[email protected]>; [email protected];
> [email protected]; [email protected]
> Subject: Re: [PATCH v2 2/2] of: dynamic: flush devlinks workqueue before
> destroying the changeset
>
> [External]
>
> On Mon, Feb 05, 2024 at 01:09:33PM +0100, Nuno Sa via B4 Relay wrote:
> > From: Nuno Sa <[email protected]>
> >
> > Device links will drop their supplier + consumer refcounts
> > asynchronously. That means that the refcount of the of_node attached to
> > these devices will also be dropped asynchronously and so we cannot
> > guarantee the DT overlay assumption that the of_node refcount must be 1 in
> > __of_changeset_entry_destroy().
> >
> > Given the above, call the new fwnode_links_flush_queue() helper to flush
> > the devlink workqueue so we can be sure that all links are dropped before
> > doing the proper checks.
>
> Have you seen my comments against v1?

Hmm, not really. lei is not fetching any email (even this one - that's why I'm using my work email + awesome
email client to reply).
And looking directly in lore... I'm also not seeing any reply from you. Maybe there's an issue.

- Nuno S?


2024-02-05 13:38:51

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] driver: core: add dedicated workqueue for devlink removal

On Mon, Feb 5, 2024 at 1:09 PM Nuno Sa via B4 Relay
<[email protected]> wrote:
>
> From: Nuno Sa <[email protected]>
>
> Let's use a dedicated queue for devlinks since releasing a link happens
> asynchronously but some code paths, like DT overlays, have some
> expectations regarding the of_node when being removed (the refcount must
> be 1). Given how devlinks are released that cannot be assured. Hence, add a
> dedicated queue so that it's easy to sync against devlinks removal.
>
> While at it, make sure to explicitly include <linux/workqueue.h>.
>
> Signed-off-by: Nuno Sa <[email protected]>

Reviewed-by: Rafael J. Wysocki <[email protected]>

> ---
> drivers/base/core.c | 32 ++++++++++++++++++++++++++++----
> include/linux/fwnode.h | 1 +
> 2 files changed, 29 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/base/core.c b/drivers/base/core.c
> index 14d46af40f9a..4bb9c10489ed 100644
> --- a/drivers/base/core.c
> +++ b/drivers/base/core.c
> @@ -31,6 +31,7 @@
> #include <linux/swiotlb.h>
> #include <linux/sysfs.h>
> #include <linux/dma-map-ops.h> /* for dma_default_coherent */
> +#include <linux/workqueue.h>
>
> #include "base.h"
> #include "physical_location.h"
> @@ -44,6 +45,7 @@ static bool fw_devlink_is_permissive(void);
> static void __fw_devlink_link_to_consumers(struct device *dev);
> static bool fw_devlink_drv_reg_done;
> static bool fw_devlink_best_effort;
> +static struct workqueue_struct *devlink_release_queue __ro_after_init;
>
> /**
> * __fwnode_link_add - Create a link between two fwnode_handles.
> @@ -235,6 +237,12 @@ static void __fw_devlink_pickup_dangling_consumers(struct fwnode_handle *fwnode,
> __fw_devlink_pickup_dangling_consumers(child, new_sup);
> }
>
> +void fwnode_links_flush_queue(void)
> +{
> + if (devlink_release_queue)
> + flush_workqueue(devlink_release_queue);
> +}
> +
> static DEFINE_MUTEX(device_links_lock);
> DEFINE_STATIC_SRCU(device_links_srcu);
>
> @@ -531,9 +539,13 @@ static void devlink_dev_release(struct device *dev)
> * It may take a while to complete this work because of the SRCU
> * synchronization in device_link_release_fn() and if the consumer or
> * supplier devices get deleted when it runs, so put it into the "long"
> - * workqueue.
> + * devlink workqueue (in case we could allocate one).
> + *
> */
> - queue_work(system_long_wq, &link->rm_work);
> + if (devlink_release_queue)
> + queue_work(devlink_release_queue, &link->rm_work);
> + else
> + device_link_release_fn(&link->rm_work);
> }
>
> static struct class devlink_class = {
> @@ -636,10 +648,22 @@ static int __init devlink_class_init(void)
> return ret;
>
> ret = class_interface_register(&devlink_class_intf);
> - if (ret)
> + if (ret) {
> class_unregister(&devlink_class);
> + return ret;
> + }
>
> - return ret;
> + /*
> + * Using a dedicated queue for devlinks since releasing a link happens
> + * asynchronously but some code paths, like DT overlays, have some
> + * expectations regarding the of_node when being removed (the refcount
> + * must be 1). Given how devlinks are released that cannot be assured.
> + * Hence, add a dedicated queue so that it's easy to sync against
> + * devlinks removal.
> + */
> + devlink_release_queue = alloc_workqueue("devlink_release", 0, 0);
> +
> + return 0;
> }
> postcore_initcall(devlink_class_init);
>
> diff --git a/include/linux/fwnode.h b/include/linux/fwnode.h
> index 2a72f55d26eb..017b170e9903 100644
> --- a/include/linux/fwnode.h
> +++ b/include/linux/fwnode.h
> @@ -213,5 +213,6 @@ extern bool fw_devlink_is_strict(void);
> int fwnode_link_add(struct fwnode_handle *con, struct fwnode_handle *sup);
> void fwnode_links_purge(struct fwnode_handle *fwnode);
> void fw_devlink_purge_absent_suppliers(struct fwnode_handle *fwnode);
> +void fwnode_links_flush_queue(void);
>
> #endif
>
> --
> 2.43.0
>

2024-02-05 13:49:30

by Andy Shevchenko

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] driver: core: add dedicated workqueue for devlink removal

On Mon, Feb 05, 2024 at 01:09:32PM +0100, Nuno Sa via B4 Relay wrote:
> From: Nuno Sa <[email protected]>
>
> Let's use a dedicated queue for devlinks since releasing a link happens
> asynchronously but some code paths, like DT overlays, have some
> expectations regarding the of_node when being removed (the refcount must
> be 1). Given how devlinks are released that cannot be assured. Hence, add a
> dedicated queue so that it's easy to sync against devlinks removal.
>
> While at it, make sure to explicitly include <linux/workqueue.h>.

..

> +++ b/include/linux/fwnode.h
> @@ -213,5 +213,6 @@ extern bool fw_devlink_is_strict(void);
> int fwnode_link_add(struct fwnode_handle *con, struct fwnode_handle *sup);
> void fwnode_links_purge(struct fwnode_handle *fwnode);
> void fw_devlink_purge_absent_suppliers(struct fwnode_handle *fwnode);
> +void fwnode_links_flush_queue(void);

I am not sure if you have seen my comment against v1.

I find the namespace a bit messy for devlinks. And to me seems the best place
for this line is to be before fwnode_links_purge().

--
With Best Regards,
Andy Shevchenko



2024-02-05 14:32:19

by Nuno Sá

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] driver: core: add dedicated workqueue for devlink removal

On Mon, 2024-02-05 at 14:35 +0200, Andy Shevchenko wrote:
> On Mon, Feb 05, 2024 at 01:09:32PM +0100, Nuno Sa via B4 Relay wrote:
> > From: Nuno Sa <[email protected]>
> >
> > Let's use a dedicated queue for devlinks since releasing a link happens
> > asynchronously but some code paths, like DT overlays, have some
> > expectations regarding the of_node when being removed (the refcount must
> > be 1). Given how devlinks are released that cannot be assured. Hence, add a
> > dedicated queue so that it's easy to sync against devlinks removal.
> >
> > While at it, make sure to explicitly include <linux/workqueue.h>.
>
> ...
>
> > +++ b/include/linux/fwnode.h
> > @@ -213,5 +213,6 @@ extern bool fw_devlink_is_strict(void);
> >  int fwnode_link_add(struct fwnode_handle *con, struct fwnode_handle *sup);
> >  void fwnode_links_purge(struct fwnode_handle *fwnode);
> >  void fw_devlink_purge_absent_suppliers(struct fwnode_handle *fwnode);
> > +void fwnode_links_flush_queue(void);
>
> I am not sure if you have seen my comment against v1.
>

I did received it like 30min ago...

> I find the namespace a bit messy for devlinks. And to me seems the best place
> for this line is to be before fwnode_links_purge().
>

TBH, I'm not really keen on sending a v3 just for that (unless I'm asked otherwise).
But If I have (still missing DT guys feedback), I'll do as you suggested.

- Nuno Sá

2024-02-12 12:16:18

by Nuno Sá

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] of: dynamic: flush devlinks workqueue before destroying the changeset

On Mon, 2024-02-05 at 13:09 +0100, Nuno Sa wrote:
> Device links will drop their supplier + consumer refcounts
> asynchronously. That means that the refcount of the of_node attached to
> these devices will also be dropped asynchronously and so we cannot
> guarantee the DT overlay assumption that the of_node refcount must be 1 in
> __of_changeset_entry_destroy().
>
> Given the above, call the new fwnode_links_flush_queue() helper to flush
> the devlink workqueue so we can be sure that all links are dropped before
> doing the proper checks.
>
> Signed-off-by: Nuno Sa <[email protected]>
> ---
>  drivers/of/dynamic.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
>
> diff --git a/drivers/of/dynamic.c b/drivers/of/dynamic.c
> index 3bf27052832f..b7153c72c9c9 100644
> --- a/drivers/of/dynamic.c
> +++ b/drivers/of/dynamic.c
> @@ -14,6 +14,7 @@
>  #include <linux/slab.h>
>  #include <linux/string.h>
>  #include <linux/proc_fs.h>
> +#include <linux/fwnode.h>
>  
>  #include "of_private.h"
>  
> @@ -518,6 +519,13 @@ EXPORT_SYMBOL(of_changeset_create_node);
>  
>  static void __of_changeset_entry_destroy(struct of_changeset_entry *ce)
>  {
> + /*
> + * device links drop their device references (and hence their of_node
> + * references) asynchronously on a dedicated workqueue. Hence we need
> + * to flush it to make sure everything is done before doing the below
> + * checks.
> + */
> + fwnode_links_flush_queue();
>   if (ce->action == OF_RECONFIG_ATTACH_NODE &&
>       of_node_check_flag(ce->np, OF_OVERLAY)) {
>   if (kref_read(&ce->np->kobj.kref) > 1) {
>

Hi Rob and Frank,

Any way you could take a look at this and see if you're ok with the change in the
overlay code?

On the devlink side , we already got the ok from Rafael.

Thanks!
- Nuno Sá

2024-02-13 14:52:05

by Rob Herring (Arm)

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] of: dynamic: flush devlinks workqueue before destroying the changeset

On Mon, Feb 12, 2024 at 01:10:27PM +0100, Nuno S? wrote:
> On Mon, 2024-02-05 at 13:09 +0100, Nuno Sa wrote:
> > Device links will drop their supplier + consumer refcounts
> > asynchronously. That means that the refcount of the of_node attached to
> > these devices will also be dropped asynchronously and so we cannot
> > guarantee the DT overlay assumption that the of_node refcount must be 1 in
> > __of_changeset_entry_destroy().
> >
> > Given the above, call the new fwnode_links_flush_queue() helper to flush
> > the devlink workqueue so we can be sure that all links are dropped before
> > doing the proper checks.
> >
> > Signed-off-by: Nuno Sa <[email protected]>
> > ---
> > ?drivers/of/dynamic.c | 8 ++++++++
> > ?1 file changed, 8 insertions(+)
> >
> > diff --git a/drivers/of/dynamic.c b/drivers/of/dynamic.c
> > index 3bf27052832f..b7153c72c9c9 100644
> > --- a/drivers/of/dynamic.c
> > +++ b/drivers/of/dynamic.c
> > @@ -14,6 +14,7 @@
> > ?#include <linux/slab.h>
> > ?#include <linux/string.h>
> > ?#include <linux/proc_fs.h>
> > +#include <linux/fwnode.h>
> > ?
> > ?#include "of_private.h"
> > ?
> > @@ -518,6 +519,13 @@ EXPORT_SYMBOL(of_changeset_create_node);
> > ?
> > ?static void __of_changeset_entry_destroy(struct of_changeset_entry *ce)
> > ?{
> > + /*
> > + * device links drop their device references (and hence their of_node
> > + * references) asynchronously on a dedicated workqueue. Hence we need
> > + * to flush it to make sure everything is done before doing the below
> > + * checks.
> > + */
> > + fwnode_links_flush_queue();
> > ? if (ce->action == OF_RECONFIG_ATTACH_NODE &&
> > ? ??? of_node_check_flag(ce->np, OF_OVERLAY)) {
> > ? if (kref_read(&ce->np->kobj.kref) > 1) {
> >
>
> Hi Rob and Frank,
>
> Any way you could take a look at this and see if you're ok with the change in the
> overlay code?
>
> On the devlink side , we already got the ok from Rafael.

Didn't Saravana say he was going to look at this? As of yesterday, he's
also a DT maintainer so deferring to him.

Rob

2024-02-13 15:03:21

by Nuno Sá

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] of: dynamic: flush devlinks workqueue before destroying the changeset

On Tue, 2024-02-13 at 08:51 -0600, Rob Herring wrote:
> On Mon, Feb 12, 2024 at 01:10:27PM +0100, Nuno Sá wrote:
> > On Mon, 2024-02-05 at 13:09 +0100, Nuno Sa wrote:
> > > Device links will drop their supplier + consumer refcounts
> > > asynchronously. That means that the refcount of the of_node attached to
> > > these devices will also be dropped asynchronously and so we cannot
> > > guarantee the DT overlay assumption that the of_node refcount must be 1 in
> > > __of_changeset_entry_destroy().
> > >
> > > Given the above, call the new fwnode_links_flush_queue() helper to flush
> > > the devlink workqueue so we can be sure that all links are dropped before
> > > doing the proper checks.
> > >
> > > Signed-off-by: Nuno Sa <[email protected]>
> > > ---
> > >  drivers/of/dynamic.c | 8 ++++++++
> > >  1 file changed, 8 insertions(+)
> > >
> > > diff --git a/drivers/of/dynamic.c b/drivers/of/dynamic.c
> > > index 3bf27052832f..b7153c72c9c9 100644
> > > --- a/drivers/of/dynamic.c
> > > +++ b/drivers/of/dynamic.c
> > > @@ -14,6 +14,7 @@
> > >  #include <linux/slab.h>
> > >  #include <linux/string.h>
> > >  #include <linux/proc_fs.h>
> > > +#include <linux/fwnode.h>
> > >  
> > >  #include "of_private.h"
> > >  
> > > @@ -518,6 +519,13 @@ EXPORT_SYMBOL(of_changeset_create_node);
> > >  
> > >  static void __of_changeset_entry_destroy(struct of_changeset_entry *ce)
> > >  {
> > > + /*
> > > + * device links drop their device references (and hence their
> > > of_node
> > > + * references) asynchronously on a dedicated workqueue. Hence we
> > > need
> > > + * to flush it to make sure everything is done before doing the
> > > below
> > > + * checks.
> > > + */
> > > + fwnode_links_flush_queue();
> > >   if (ce->action == OF_RECONFIG_ATTACH_NODE &&
> > >       of_node_check_flag(ce->np, OF_OVERLAY)) {
> > >   if (kref_read(&ce->np->kobj.kref) > 1) {
> > >
> >
> > Hi Rob and Frank,
> >
> > Any way you could take a look at this and see if you're ok with the change
> > in the
> > overlay code?
> >
> > On the devlink side , we already got the ok from Rafael.
>
> Didn't Saravana say he was going to look at this? As of yesterday, he's
> also a DT maintainer so deferring to him.
>

Yeah, I did asked him but I guess he never had the time for it... Saravana,
could you please give some feedback on this? I think the most sensible part is
on the devlink side but I assume this is not going to be merged without an ack
from a DT maintainer...

- Nuno Sá

2024-02-14 03:45:08

by Saravana Kannan

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] of: dynamic: flush devlinks workqueue before destroying the changeset

On Tue, Feb 13, 2024 at 6:57 AM Nuno Sá <[email protected]> wrote:
>
> On Tue, 2024-02-13 at 08:51 -0600, Rob Herring wrote:
> > On Mon, Feb 12, 2024 at 01:10:27PM +0100, Nuno Sá wrote:
> > > On Mon, 2024-02-05 at 13:09 +0100, Nuno Sa wrote:
> > > > Device links will drop their supplier + consumer refcounts
> > > > asynchronously. That means that the refcount of the of_node attached to
> > > > these devices will also be dropped asynchronously and so we cannot
> > > > guarantee the DT overlay assumption that the of_node refcount must be 1 in
> > > > __of_changeset_entry_destroy().
> > > >
> > > > Given the above, call the new fwnode_links_flush_queue() helper to flush
> > > > the devlink workqueue so we can be sure that all links are dropped before
> > > > doing the proper checks.
> > > >
> > > > Signed-off-by: Nuno Sa <[email protected]>
> > > > ---
> > > > drivers/of/dynamic.c | 8 ++++++++
> > > > 1 file changed, 8 insertions(+)
> > > >
> > > > diff --git a/drivers/of/dynamic.c b/drivers/of/dynamic.c
> > > > index 3bf27052832f..b7153c72c9c9 100644
> > > > --- a/drivers/of/dynamic.c
> > > > +++ b/drivers/of/dynamic.c
> > > > @@ -14,6 +14,7 @@
> > > > #include <linux/slab.h>
> > > > #include <linux/string.h>
> > > > #include <linux/proc_fs.h>
> > > > +#include <linux/fwnode.h>
> > > >
> > > > #include "of_private.h"
> > > >
> > > > @@ -518,6 +519,13 @@ EXPORT_SYMBOL(of_changeset_create_node);
> > > >
> > > > static void __of_changeset_entry_destroy(struct of_changeset_entry *ce)
> > > > {
> > > > + /*
> > > > + * device links drop their device references (and hence their
> > > > of_node
> > > > + * references) asynchronously on a dedicated workqueue. Hence we
> > > > need
> > > > + * to flush it to make sure everything is done before doing the
> > > > below
> > > > + * checks.
> > > > + */
> > > > + fwnode_links_flush_queue();
> > > > if (ce->action == OF_RECONFIG_ATTACH_NODE &&
> > > > of_node_check_flag(ce->np, OF_OVERLAY)) {
> > > > if (kref_read(&ce->np->kobj.kref) > 1) {
> > > >
> > >
> > > Hi Rob and Frank,
> > >
> > > Any way you could take a look at this and see if you're ok with the change
> > > in the
> > > overlay code?
> > >
> > > On the devlink side , we already got the ok from Rafael.
> >
> > Didn't Saravana say he was going to look at this? As of yesterday, he's
> > also a DT maintainer so deferring to him.
> >
>
> Yeah, I did asked him but I guess he never had the time for it... Saravana,
> could you please give some feedback on this? I think the most sensible part is
> on the devlink side but I assume this is not going to be merged without an ack
> from a DT maintainer...

Sorry for the delay Nuno. I'll get to this. I promise. This week is a bit busy.

-Saravana

2024-02-14 12:56:36

by Nuno Sá

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] of: dynamic: flush devlinks workqueue before destroying the changeset

On Tue, 2024-02-13 at 19:44 -0800, Saravana Kannan wrote:
> On Tue, Feb 13, 2024 at 6:57 AM Nuno Sá <[email protected]> wrote:
> >
> > On Tue, 2024-02-13 at 08:51 -0600, Rob Herring wrote:
> > > On Mon, Feb 12, 2024 at 01:10:27PM +0100, Nuno Sá wrote:
> > > > On Mon, 2024-02-05 at 13:09 +0100, Nuno Sa wrote:
> > > > > Device links will drop their supplier + consumer refcounts
> > > > > asynchronously. That means that the refcount of the of_node attached
> > > > > to
> > > > > these devices will also be dropped asynchronously and so we cannot
> > > > > guarantee the DT overlay assumption that the of_node refcount must be
> > > > > 1 in
> > > > > __of_changeset_entry_destroy().
> > > > >
> > > > > Given the above, call the new fwnode_links_flush_queue() helper to
> > > > > flush
> > > > > the devlink workqueue so we can be sure that all links are dropped
> > > > > before
> > > > > doing the proper checks.
> > > > >
> > > > > Signed-off-by: Nuno Sa <[email protected]>
> > > > > ---
> > > > >  drivers/of/dynamic.c | 8 ++++++++
> > > > >  1 file changed, 8 insertions(+)
> > > > >
> > > > > diff --git a/drivers/of/dynamic.c b/drivers/of/dynamic.c
> > > > > index 3bf27052832f..b7153c72c9c9 100644
> > > > > --- a/drivers/of/dynamic.c
> > > > > +++ b/drivers/of/dynamic.c
> > > > > @@ -14,6 +14,7 @@
> > > > >  #include <linux/slab.h>
> > > > >  #include <linux/string.h>
> > > > >  #include <linux/proc_fs.h>
> > > > > +#include <linux/fwnode.h>
> > > > >
> > > > >  #include "of_private.h"
> > > > >
> > > > > @@ -518,6 +519,13 @@ EXPORT_SYMBOL(of_changeset_create_node);
> > > > >
> > > > >  static void __of_changeset_entry_destroy(struct of_changeset_entry
> > > > > *ce)
> > > > >  {
> > > > > + /*
> > > > > +  * device links drop their device references (and hence their
> > > > > of_node
> > > > > +  * references) asynchronously on a dedicated workqueue. Hence we
> > > > > need
> > > > > +  * to flush it to make sure everything is done before doing the
> > > > > below
> > > > > +  * checks.
> > > > > +  */
> > > > > + fwnode_links_flush_queue();
> > > > >   if (ce->action == OF_RECONFIG_ATTACH_NODE &&
> > > > >       of_node_check_flag(ce->np, OF_OVERLAY)) {
> > > > >           if (kref_read(&ce->np->kobj.kref) > 1) {
> > > > >
> > > >
> > > > Hi Rob and Frank,
> > > >
> > > > Any way you could take a look at this and see if you're ok with the
> > > > change
> > > > in the
> > > > overlay code?
> > > >
> > > > On the devlink side , we already got the ok from Rafael.
> > >
> > > Didn't Saravana say he was going to look at this? As of yesterday, he's
> > > also a DT maintainer so deferring to him.
> > >
> >
> > Yeah, I did asked him but I guess he never had the time for it... Saravana,
> > could you please give some feedback on this? I think the most sensible part
> > is
> > on the devlink side but I assume this is not going to be merged without an
> > ack
> > from a DT maintainer...
>
> Sorry for the delay Nuno. I'll get to this. I promise. This week is a bit
> busy.
>

No worries. Just making sure it's not forgotten :)

- Nuno Sá


2024-02-21 06:58:49

by Nuno Sá

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] of: dynamic: flush devlinks workqueue before destroying the changeset

On Tue, 2024-02-20 at 16:39 -0800, Saravana Kannan wrote:
> On Wed, Feb 14, 2024 at 4:48 AM Nuno Sá <[email protected]> wrote:
> >
> > On Tue, 2024-02-13 at 19:44 -0800, Saravana Kannan wrote:
> > > On Tue, Feb 13, 2024 at 6:57 AM Nuno Sá <[email protected]> wrote:
> > > >
> > > > On Tue, 2024-02-13 at 08:51 -0600, Rob Herring wrote:
> > > > > On Mon, Feb 12, 2024 at 01:10:27PM +0100, Nuno Sá wrote:
> > > > > > On Mon, 2024-02-05 at 13:09 +0100, Nuno Sa wrote:
> > > > > > > Device links will drop their supplier + consumer refcounts
> > > > > > > asynchronously. That means that the refcount of the of_node attached
> > > > > > > to
> > > > > > > these devices will also be dropped asynchronously and so we cannot
> > > > > > > guarantee the DT overlay assumption that the of_node refcount must be
> > > > > > > 1 in
> > > > > > > __of_changeset_entry_destroy().
> > > > > > >
> > > > > > > Given the above, call the new fwnode_links_flush_queue() helper to
> > > > > > > flush
> > > > > > > the devlink workqueue so we can be sure that all links are dropped
> > > > > > > before
> > > > > > > doing the proper checks.
> > > > > > >
> > > > > > > Signed-off-by: Nuno Sa <[email protected]>
> > > > > > > ---
> > > > > > >  drivers/of/dynamic.c | 8 ++++++++
> > > > > > >  1 file changed, 8 insertions(+)
> > > > > > >
> > > > > > > diff --git a/drivers/of/dynamic.c b/drivers/of/dynamic.c
> > > > > > > index 3bf27052832f..b7153c72c9c9 100644
> > > > > > > --- a/drivers/of/dynamic.c
> > > > > > > +++ b/drivers/of/dynamic.c
> > > > > > > @@ -14,6 +14,7 @@
> > > > > > >  #include <linux/slab.h>
> > > > > > >  #include <linux/string.h>
> > > > > > >  #include <linux/proc_fs.h>
> > > > > > > +#include <linux/fwnode.h>
> > > > > > >
> > > > > > >  #include "of_private.h"
> > > > > > >
> > > > > > > @@ -518,6 +519,13 @@ EXPORT_SYMBOL(of_changeset_create_node);
> > > > > > >
> > > > > > >  static void __of_changeset_entry_destroy(struct of_changeset_entry
> > > > > > > *ce)
> > > > > > >  {
> > > > > > > + /*
> > > > > > > +  * device links drop their device references (and hence their
> > > > > > > of_node
> > > > > > > +  * references) asynchronously on a dedicated workqueue Hence we
> > > > > > > need
> > > > > > > +  * to flush it to make sure everything is done before doing the
> > > > > > > below
> > > > > > > +  * checks.
> > > > > > > +  */
> > > > > > > + fwnode_links_flush_queue();
> > > > > > >   if (ce->action == OF_RECONFIG_ATTACH_NODE &&
> > > > > > >       of_node_check_flag(ce->np, OF_OVERLAY)) {
> > > > > > >           if (kref_read(&ce->np->kobj.kref) > 1) {
> > > > > > >
> > > > > >
> > > > > > Hi Rob and Frank,
> > > > > >
> > > > > > Any way you could take a look at this and see if you're ok with the
> > > > > > change
> > > > > > in the
> > > > > > overlay code?
> > > > > >
> > > > > > On the devlink side , we already got the ok from Rafael.
> > > > >
> > > > > Didn't Saravana say he was going to look at this? As of yesterday, he's
> > > > > also a DT maintainer so deferring to him.
> > > > >
> > > >
> > > > Yeah, I did asked him but I guess he never had the time for it... Saravana,
> > > > could you please give some feedback on this? I think the most sensible part
> > > > is
> > > > on the devlink side but I assume this is not going to be merged without an
> > > > ack
> > > > from a DT maintainer...
> > >
> > > Sorry for the delay Nuno. I'll get to this. I promise. This week is a bit
> > > busy.
> > >
> >
> > No worries. Just making sure it's not forgotten :)
>
> Hi Nuno,
>
> Thanks for nudging me about this issue.
>

Hi Saravana,

> I replied to a similar patch series that Herve sent out last year.
> Chose to reply to that because it had fewer issues to fix and Herve
> sent it out a while ago.

I think it's fixing the same issues but as he sent first, fair enough :)

> https://lore.kernel.org/all/[email protected]/
>
> Can you please chime in there?
>

Already did... Please look at my first patch. It already has an ack from Rafael and I
think it's fairly close with what you want (it might need some naming improvements
though).

- Nuno Sá


2024-02-21 07:22:34

by Nuno Sá

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] of: dynamic: flush devlinks workqueue before destroying the changeset

On Tue, 2024-02-20 at 16:39 -0800, Saravana Kannan wrote:
> On Wed, Feb 14, 2024 at 4:48 AM Nuno Sá <[email protected]> wrote:
> >
> > On Tue, 2024-02-13 at 19:44 -0800, Saravana Kannan wrote:
> > > On Tue, Feb 13, 2024 at 6:57 AM Nuno Sá <[email protected]> wrote:
> > > >
> > > > On Tue, 2024-02-13 at 08:51 -0600, Rob Herring wrote:
> > > > > On Mon, Feb 12, 2024 at 01:10:27PM +0100, Nuno Sá wrote:
> > > > > > On Mon, 2024-02-05 at 13:09 +0100, Nuno Sa wrote:
> > > > > > > Device links will drop their supplier + consumer refcounts
> > > > > > > asynchronously. That means that the refcount of the of_node attached
> > > > > > > to
> > > > > > > these devices will also be dropped asynchronously and so we cannot
> > > > > > > guarantee the DT overlay assumption that the of_node refcount must be
> > > > > > > 1 in
> > > > > > > __of_changeset_entry_destroy().
> > > > > > >
> > > > > > > Given the above, call the new fwnode_links_flush_queue() helper to
> > > > > > > flush
> > > > > > > the devlink workqueue so we can be sure that all links are dropped
> > > > > > > before
> > > > > > > doing the proper checks.
> > > > > > >
> > > > > > > Signed-off-by: Nuno Sa <[email protected]>
> > > > > > > ---
> > > > > > >  drivers/of/dynamic.c | 8 ++++++++
> > > > > > >  1 file changed, 8 insertions(+)
> > > > > > >
> > > > > > > diff --git a/drivers/of/dynamic.c b/drivers/of/dynamic.c
> > > > > > > index 3bf27052832f..b7153c72c9c9 100644
> > > > > > > --- a/drivers/of/dynamic.c
> > > > > > > +++ b/drivers/of/dynamic.c
> > > > > > > @@ -14,6 +14,7 @@
> > > > > > >  #include <linux/slab.h>
> > > > > > >  #include <linux/string.h>
> > > > > > >  #include <linux/proc_fs.h>
> > > > > > > +#include <linux/fwnode.h>
> > > > > > >
> > > > > > >  #include "of_private.h"
> > > > > > >
> > > > > > > @@ -518,6 +519,13 @@ EXPORT_SYMBOL(of_changeset_create_node);
> > > > > > >
> > > > > > >  static void __of_changeset_entry_destroy(struct of_changeset_entry
> > > > > > > *ce)
> > > > > > >  {
> > > > > > > + /*
> > > > > > > +  * device links drop their device references (and hence their
> > > > > > > of_node
> > > > > > > +  * references) asynchronously on a dedicated workqueue Hence we
> > > > > > > need
> > > > > > > +  * to flush it to make sure everything is done before doing the
> > > > > > > below
> > > > > > > +  * checks.
> > > > > > > +  */
> > > > > > > + fwnode_links_flush_queue();
> > > > > > >   if (ce->action == OF_RECONFIG_ATTACH_NODE &&
> > > > > > >       of_node_check_flag(ce->np, OF_OVERLAY)) {
> > > > > > >           if (kref_read(&ce->np->kobj.kref) > 1) {
> > > > > > >
> > > > > >
> > > > > > Hi Rob and Frank,
> > > > > >
> > > > > > Any way you could take a look at this and see if you're ok with the
> > > > > > change
> > > > > > in the
> > > > > > overlay code?
> > > > > >
> > > > > > On the devlink side , we already got the ok from Rafael.
> > > > >
> > > > > Didn't Saravana say he was going to look at this? As of yesterday, he's
> > > > > also a DT maintainer so deferring to him.
> > > > >
> > > >
> > > > Yeah, I did asked him but I guess he never had the time for it... Saravana,
> > > > could you please give some feedback on this? I think the most sensible part
> > > > is
> > > > on the devlink side but I assume this is not going to be merged without an
> > > > ack
> > > > from a DT maintainer...
> > >
> > > Sorry for the delay Nuno. I'll get to this. I promise. This week is a bit
> > > busy.
> > >
> >
> > No worries. Just making sure it's not forgotten :)
>
> Hi Nuno,
>
> Thanks for nudging me about this issue.
>
> I replied to a similar patch series that Herve sent out last year.
> Chose to reply to that because it had fewer issues to fix and Herve
> sent it out a while ago.

Ehehe, FWIW, I did sent it out before I believe:

https://lore.kernel.org/lkml/[email protected]/

I just got no attention and it took some time until I got some feedback (I also
pushed for it with resends). If you follow the links in the cover, you'll see I first
started (and spotted the issue) the effort in May last year.

That said, I'm more than fine with whatever series is taken. I just care about the
problem being solved :)

- Nuno Sá


2024-02-21 00:40:14

by Saravana Kannan

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] of: dynamic: flush devlinks workqueue before destroying the changeset

On Wed, Feb 14, 2024 at 4:48 AM Nuno Sá <[email protected]> wrote:
>
> On Tue, 2024-02-13 at 19:44 -0800, Saravana Kannan wrote:
> > On Tue, Feb 13, 2024 at 6:57 AM Nuno Sá <noname.nuno@gmailcom> wrote:
> > >
> > > On Tue, 2024-02-13 at 08:51 -0600, Rob Herring wrote:
> > > > On Mon, Feb 12, 2024 at 01:10:27PM +0100, Nuno Sá wrote:
> > > > > On Mon, 2024-02-05 at 13:09 +0100, Nuno Sa wrote:
> > > > > > Device links will drop their supplier + consumer refcounts
> > > > > > asynchronously. That means that the refcount of the of_node attached
> > > > > > to
> > > > > > these devices will also be dropped asynchronously and so we cannot
> > > > > > guarantee the DT overlay assumption that the of_node refcount must be
> > > > > > 1 in
> > > > > > __of_changeset_entry_destroy().
> > > > > >
> > > > > > Given the above, call the new fwnode_links_flush_queue() helper to
> > > > > > flush
> > > > > > the devlink workqueue so we can be sure that all links are dropped
> > > > > > before
> > > > > > doing the proper checks.
> > > > > >
> > > > > > Signed-off-by: Nuno Sa <[email protected]>
> > > > > > ---
> > > > > > drivers/of/dynamic.c | 8 ++++++++
> > > > > > 1 file changed, 8 insertions(+)
> > > > > >
> > > > > > diff --git a/drivers/of/dynamic.c b/drivers/of/dynamic.c
> > > > > > index 3bf27052832f..b7153c72c9c9 100644
> > > > > > --- a/drivers/of/dynamic.c
> > > > > > +++ b/drivers/of/dynamic.c
> > > > > > @@ -14,6 +14,7 @@
> > > > > > #include <linux/slab.h>
> > > > > > #include <linux/string.h>
> > > > > > #include <linux/proc_fs.h>
> > > > > > +#include <linux/fwnode.h>
> > > > > >
> > > > > > #include "of_private.h"
> > > > > >
> > > > > > @@ -518,6 +519,13 @@ EXPORT_SYMBOL(of_changeset_create_node);
> > > > > >
> > > > > > static void __of_changeset_entry_destroy(struct of_changeset_entry
> > > > > > *ce)
> > > > > > {
> > > > > > + /*
> > > > > > + * device links drop their device references (and hence their
> > > > > > of_node
> > > > > > + * references) asynchronously on a dedicated workqueue. Hence we
> > > > > > need
> > > > > > + * to flush it to make sure everything is done before doing the
> > > > > > below
> > > > > > + * checks.
> > > > > > + */
> > > > > > + fwnode_links_flush_queue();
> > > > > > if (ce->action == OF_RECONFIG_ATTACH_NODE &&
> > > > > > of_node_check_flag(ce->np, OF_OVERLAY)) {
> > > > > > if (kref_read(&ce->np->kobj.kref) > 1) {
> > > > > >
> > > > >
> > > > > Hi Rob and Frank,
> > > > >
> > > > > Any way you could take a look at this and see if you're ok with the
> > > > > change
> > > > > in the
> > > > > overlay code?
> > > > >
> > > > > On the devlink side , we already got the ok from Rafael.
> > > >
> > > > Didn't Saravana say he was going to look at this? As of yesterday, he's
> > > > also a DT maintainer so deferring to him.
> > > >
> > >
> > > Yeah, I did asked him but I guess he never had the time for it... Saravana,
> > > could you please give some feedback on this? I think the most sensible part
> > > is
> > > on the devlink side but I assume this is not going to be merged without an
> > > ack
> > > from a DT maintainer...
> >
> > Sorry for the delay Nuno. I'll get to this. I promise. This week is a bit
> > busy.
> >
>
> No worries. Just making sure it's not forgotten :)

Hi Nuno,

Thanks for nudging me about this issue.

I replied to a similar patch series that Herve sent out last year.
Chose to reply to that because it had fewer issues to fix and Herve
sent it out a while ago.
https://lore.kernel.org/all/[email protected]/

Can you please chime in there?

Thanks,
Saravana