2023-09-27 17:43:15

by Zhenhua Huang

[permalink] [raw]
Subject: [RESEND PATCH] driver core: Clear FWNODE_FLAG_LINKS_ADDED in device_links_purge()

Flag FWNODE_FLAG_LINKS_ADDED stops fwnode links creation. Current kernel
only adds it once after fwnode links creation in fw_devlink_parse_fwnode().
After that even device links being purged, the flag will not be cleared.

Fwnode links are converted to device links and will not be added back
forever in normal case. Essentially if a device is registered and
unregisted (also deleted) before it is probed (due to missing fwlink
dependencies, abort in device_links_check_suppliers), the fwlink is not
setup next when device is newly created again. This means the probe gets
called without meeting all dependencies.

It usuallly happens in the case of a glue driver. Of_platform_populate()
allows us to populate subnodes. We may do it in ancestor node probing
function, then check subnode's probing status because there may be chances
that suppliers of subnode are not ready. We may further need to do
of_platform_depopulate(which purges device links) and in some time
of_platform_populate() again. Such case we miss fwnode links(so that device
links) during second time of populating subnodes.

Fix it by Clearing FWNODE_FLAG_LINKS_ADDED flag in purging device link
func, indicates both fwnode links and device links are absent.

Signed-off-by: Zhenhua Huang <[email protected]>
---
drivers/base/core.c | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/drivers/base/core.c b/drivers/base/core.c
index b7d7f41..2a1975d 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -1630,6 +1630,10 @@ static void device_links_purge(struct device *dev)
__device_link_del(&link->kref);
}

+ /* Clear flags in fwnode. Give a chance to create fwnode link again */
+ if (dev->fwnode)
+ dev->fwnode->flags &= ~FWNODE_FLAG_LINKS_ADDED;
+
device_links_write_unlock();
}

--
2.7.4


2023-10-02 22:11:10

by Saravana Kannan

[permalink] [raw]
Subject: Re: [RESEND PATCH] driver core: Clear FWNODE_FLAG_LINKS_ADDED in device_links_purge()

On Tue, Sep 26, 2023 at 7:31 PM Zhenhua Huang <[email protected]> wrote:
>
> Flag FWNODE_FLAG_LINKS_ADDED stops fwnode links creation. Current kernel
> only adds it once after fwnode links creation in fw_devlink_parse_fwnode().
> After that even device links being purged, the flag will not be cleared.
>
> Fwnode links are converted to device links and will not be added back
> forever in normal case. Essentially if a device is registered and
> unregisted (also deleted) before it is probed (due to missing fwlink
> dependencies, abort in device_links_check_suppliers), the fwlink is not
> setup next when device is newly created again. This means the probe gets
> called without meeting all dependencies.

I know exactly what problem you are explaining, but it's low in my
list of TODOs because in almost all cases, it's a poorly written
driver.

> It usuallly happens in the case of a glue driver. Of_platform_populate()
> allows us to populate subnodes. We may do it in ancestor node probing
> function, then check subnode's probing status because there may be chances
> that suppliers of subnode are not ready. We may further need to do
> of_platform_depopulate(which purges device links) and in some time
> of_platform_populate() again. Such case we miss fwnode links(so that device
> links) during second time of populating subnodes.

Why is the device driver for the parent device adding the child
devices before it knows it can finish probing? Adding the child
devices should be the last thing you do as part of your probe.

> Fix it by Clearing FWNODE_FLAG_LINKS_ADDED flag in purging device link
> func, indicates both fwnode links and device links are absent.
>

NACK for this patch.

While the intent is good, the change is very wrong because it's
clearing the flags too soon and can cause a lot of busy work.

Also, this issue hasn't been fixed yet because fixing this will cause
issues in some other corner cases related to network devices. I need a
bit more elaborate set of fixes before I can fix this issue you are
raising. But as I said before, in 90% of the cases it's bad driver
behavior.

Thanks,
Saravana


> Signed-off-by: Zhenhua Huang <[email protected]>
> ---
> drivers/base/core.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/drivers/base/core.c b/drivers/base/core.c
> index b7d7f41..2a1975d 100644
> --- a/drivers/base/core.c
> +++ b/drivers/base/core.c
> @@ -1630,6 +1630,10 @@ static void device_links_purge(struct device *dev)
> __device_link_del(&link->kref);
> }
>
> + /* Clear flags in fwnode. Give a chance to create fwnode link again */
> + if (dev->fwnode)
> + dev->fwnode->flags &= ~FWNODE_FLAG_LINKS_ADDED;
> +
> device_links_write_unlock();
> }
>
> --
> 2.7.4
>

2023-10-03 03:34:56

by Pavan Kondeti

[permalink] [raw]
Subject: Re: [RESEND PATCH] driver core: Clear FWNODE_FLAG_LINKS_ADDED in device_links_purge()

On Mon, Oct 02, 2023 at 02:40:52PM -0700, Saravana Kannan wrote:
>
> > It usuallly happens in the case of a glue driver. Of_platform_populate()
> > allows us to populate subnodes. We may do it in ancestor node probing
> > function, then check subnode's probing status because there may be chances
> > that suppliers of subnode are not ready. We may further need to do
> > of_platform_depopulate(which purges device links) and in some time
> > of_platform_populate() again. Such case we miss fwnode links(so that device
> > links) during second time of populating subnodes.
>
> Why is the device driver for the parent device adding the child
> devices before it knows it can finish probing? Adding the child
> devices should be the last thing you do as part of your probe.
>

Is there a way how a parent device driver can know if the dependencies
of its child are met before registering the device?

Does it make sense to treat the child dependencies as parent (if it
opts) dependencies in the firmware device links? Not every child may be needed
for the parent to operate but some devices which are glue devices (SoC
specific) for children (say 3rd party IP or generic devices) would
definitely want to know if their children are ready for probe.

Thanks,
Pavan

2023-10-05 14:48:10

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [RESEND PATCH] driver core: Clear FWNODE_FLAG_LINKS_ADDED in device_links_purge()

On Wed, Sep 27, 2023 at 10:30:10AM +0800, Zhenhua Huang wrote:
> Flag FWNODE_FLAG_LINKS_ADDED stops fwnode links creation. Current kernel
> only adds it once after fwnode links creation in fw_devlink_parse_fwnode().
> After that even device links being purged, the flag will not be cleared.
>
> Fwnode links are converted to device links and will not be added back
> forever in normal case. Essentially if a device is registered and
> unregisted (also deleted) before it is probed (due to missing fwlink
> dependencies, abort in device_links_check_suppliers), the fwlink is not
> setup next when device is newly created again. This means the probe gets
> called without meeting all dependencies.
>
> It usuallly happens in the case of a glue driver.

What exact glue driver is causing this to happen? Why can't we fix that
up instead?

thanks,

greg k-h

2023-10-05 16:04:20

by Pavan Kondeti

[permalink] [raw]
Subject: Re: [RESEND PATCH] driver core: Clear FWNODE_FLAG_LINKS_ADDED in device_links_purge()

On Thu, Oct 05, 2023 at 11:32:22AM +0200, Greg KH wrote:
> On Wed, Sep 27, 2023 at 10:30:10AM +0800, Zhenhua Huang wrote:
> > Flag FWNODE_FLAG_LINKS_ADDED stops fwnode links creation. Current kernel
> > only adds it once after fwnode links creation in fw_devlink_parse_fwnode().
> > After that even device links being purged, the flag will not be cleared.
> >
> > Fwnode links are converted to device links and will not be added back
> > forever in normal case. Essentially if a device is registered and
> > unregisted (also deleted) before it is probed (due to missing fwlink
> > dependencies, abort in device_links_check_suppliers), the fwlink is not
> > setup next when device is newly created again. This means the probe gets
> > called without meeting all dependencies.
> >
> > It usuallly happens in the case of a glue driver.
>
> What exact glue driver is causing this to happen? Why can't we fix that
> up instead?
>

Yes, we are looking at fixing the glue driver and/or the probe order of drivers
involed by controlling modules load order. To answer your question,

It is observed with dwc3-qcom on a downstream kernel based on
Android GKI 6.1 kernel.

usb_1: usb@a6f8800 {
...
usb_1_dwc3: usb@a600000 {
compatible = "snps,dwc3";
iommus = <&apps_smmu 0x40 0x0>;
...
};
};

The parent device has no dependency on IOMMU but the child has
dependency. The parent probe gets called before IOMMU is probed.
The parent glue driver adds the child device and removes it since its
probe is not completed. Once the child is unregistered, all the fwlinks
are lost. Next time when the child is added, its probe gets called
before IOMMU is ready (since the fwlinks are lost).

Thanks,
Pavan