2021-01-30 04:57:42

by Saravana Kannan

[permalink] [raw]
Subject: Re: [PATCH v1 0/2] Make fw_devlink=on more forgiving

On Fri, Jan 29, 2021 at 8:03 PM Saravana Kannan <[email protected]> wrote:
>
> This patch series solves two general issues with fw_devlink=on
>
> Patch 1/2 addresses the issue of firmware nodes that look like they'll
> have struct devices created for them, but will never actually have
> struct devices added for them. For example, DT nodes with a compatible
> property that don't have devices added for them.
>
> Patch 2/2 address (for static kernels) the issue of optional suppliers
> that'll never have a driver registered for them. So, if the device could
> have probed with fw_devlink=permissive with a static kernel, this patch
> should allow those devices to probe with a fw_devlink=on. This doesn't
> solve it for the case where modules are enabled because there's no way
> to tell if a driver will never be registered or it's just about to be
> registered. I have some other ideas for that, but it'll have to come
> later thinking about it a bit.
>
> These two patches might remove the need for several other patches that
> went in as fixes for commit e590474768f1 ("driver core: Set
> fw_devlink=on by default"), but I think all those fixes are good
> changes. So I think we should leave those in.
>
> Marek, Geert,
>
> Can you try this series on a static kernel with your OF_POPULATED
> changes reverted? I just want to make sure these patches can identify
> and fix those cases.
>
> Tudor,
>
> You should still make the clock driver fix (because it's a bug), but I
> think this series will fix your issue too (even without the clock driver
> fix). Can you please give this a shot?

Marek, Geert, Tudor,

Forgot to say that this will probably fix your issues only in a static
kernel. So please try this with a static kernel. If you can also try
and confirm that this does not fix the issue for a modular kernel,
that'd be good too.

-Saravana

>
> Marc,
>
> Can you try this series with the gpiolib fix reverted please? I'm pretty
> sure this will fix that case.
>
> Linus,
>
> This series very likely removes the need for the gpiolib patch (we can
> wait for Marc to confirm). I'm split on whether we should leave it in or
> not. Thoughts?
>
> Thanks,
> Saravana
>
> Saravana Kannan (2):
> driver core: fw_devlink: Detect supplier devices that will never be
> added
> driver core: fw_devlink: Handle missing drivers for optional suppliers
>
> drivers/base/base.h | 2 +
> drivers/base/core.c | 134 +++++++++++++++++++++++++++++++++++++-------
> drivers/base/dd.c | 5 ++
> 3 files changed, 121 insertions(+), 20 deletions(-)
>
> --
> 2.30.0.365.g02bc693789-goog
>


2021-02-01 08:07:55

by Marek Szyprowski

[permalink] [raw]
Subject: Re: [PATCH v1 0/2] Make fw_devlink=on more forgiving

Hi Saravana,

On 30.01.2021 05:08, Saravana Kannan wrote:
> On Fri, Jan 29, 2021 at 8:03 PM Saravana Kannan <[email protected]> wrote:
>> This patch series solves two general issues with fw_devlink=on
>>
>> Patch 1/2 addresses the issue of firmware nodes that look like they'll
>> have struct devices created for them, but will never actually have
>> struct devices added for them. For example, DT nodes with a compatible
>> property that don't have devices added for them.
>>
>> Patch 2/2 address (for static kernels) the issue of optional suppliers
>> that'll never have a driver registered for them. So, if the device could
>> have probed with fw_devlink=permissive with a static kernel, this patch
>> should allow those devices to probe with a fw_devlink=on. This doesn't
>> solve it for the case where modules are enabled because there's no way
>> to tell if a driver will never be registered or it's just about to be
>> registered. I have some other ideas for that, but it'll have to come
>> later thinking about it a bit.
>>
>> These two patches might remove the need for several other patches that
>> went in as fixes for commit e590474768f1 ("driver core: Set
>> fw_devlink=on by default"), but I think all those fixes are good
>> changes. So I think we should leave those in.
>>
>> Marek, Geert,
>>
>> Can you try this series on a static kernel with your OF_POPULATED
>> changes reverted? I just want to make sure these patches can identify
>> and fix those cases.
>>
>> Tudor,
>>
>> You should still make the clock driver fix (because it's a bug), but I
>> think this series will fix your issue too (even without the clock driver
>> fix). Can you please give this a shot?
> Marek, Geert, Tudor,
>
> Forgot to say that this will probably fix your issues only in a static
> kernel. So please try this with a static kernel. If you can also try
> and confirm that this does not fix the issue for a modular kernel,
> that'd be good too.

I've checked those patches on top of linux next-20210129 with
c09a3e6c97f0 ("soc: samsung: pm_domains: Convert to regular platform
driver") commit reverted. Sadly it doesn't help. All devices that belong
to the Exynos power domains are never probed and stay endlessly on the
deferred devices list. I've used static kernel build - the one from
exynos_defconfig.

Best regards

--
Marek Szyprowski, PhD
Samsung R&D Institute Poland

2021-02-01 09:05:08

by Saravana Kannan

[permalink] [raw]
Subject: Re: [PATCH v1 0/2] Make fw_devlink=on more forgiving

On Mon, Feb 1, 2021 at 12:05 AM Marek Szyprowski
<[email protected]> wrote:
>
> Hi Saravana,
>
> On 30.01.2021 05:08, Saravana Kannan wrote:
> > On Fri, Jan 29, 2021 at 8:03 PM Saravana Kannan <[email protected]> wrote:
> >> This patch series solves two general issues with fw_devlink=on
> >>
> >> Patch 1/2 addresses the issue of firmware nodes that look like they'll
> >> have struct devices created for them, but will never actually have
> >> struct devices added for them. For example, DT nodes with a compatible
> >> property that don't have devices added for them.
> >>
> >> Patch 2/2 address (for static kernels) the issue of optional suppliers
> >> that'll never have a driver registered for them. So, if the device could
> >> have probed with fw_devlink=permissive with a static kernel, this patch
> >> should allow those devices to probe with a fw_devlink=on. This doesn't
> >> solve it for the case where modules are enabled because there's no way
> >> to tell if a driver will never be registered or it's just about to be
> >> registered. I have some other ideas for that, but it'll have to come
> >> later thinking about it a bit.
> >>
> >> These two patches might remove the need for several other patches that
> >> went in as fixes for commit e590474768f1 ("driver core: Set
> >> fw_devlink=on by default"), but I think all those fixes are good
> >> changes. So I think we should leave those in.
> >>
> >> Marek, Geert,
> >>
> >> Can you try this series on a static kernel with your OF_POPULATED
> >> changes reverted? I just want to make sure these patches can identify
> >> and fix those cases.
> >>
> >> Tudor,
> >>
> >> You should still make the clock driver fix (because it's a bug), but I
> >> think this series will fix your issue too (even without the clock driver
> >> fix). Can you please give this a shot?
> > Marek, Geert, Tudor,
> >
> > Forgot to say that this will probably fix your issues only in a static
> > kernel. So please try this with a static kernel. If you can also try
> > and confirm that this does not fix the issue for a modular kernel,
> > that'd be good too.
>
> I've checked those patches on top of linux next-20210129 with
> c09a3e6c97f0 ("soc: samsung: pm_domains: Convert to regular platform
> driver") commit reverted.

Hi Marek,

Thanks for testing!

> Sadly it doesn't help.

That sucks. I even partly "tested" it out on my platform (that needs
CONFIG_MODULES) by commenting out the CONFIG_MODULES check. And I saw
some device links getting dropped.

> All devices that belong

By belong, I assume you meant "are consumers"?

> to the Exynos power domains are never probed and stay endlessly on the
> deferred devices list. I've used static kernel build - the one from
> exynos_defconfig.

Can you enable the dev_dbg in __device_link_del() (the SRCU variant)?
Hopefully at least some of the device links would be dropped?

If the PD device link is not dropped, I wonder why this condition is
not hitting for consumers of the PD.

if (fw_devlink_def_probe_retry &&
link->flags & DL_FLAG_INFERRED &&
!device_links_probe_blocked_by(link->supplier)) {
device_link_drop_managed(link);
continue;
}

Could you try logging dev, link->supplier and
device_links_probe_blocked_by() return value. That should tell when a
consumer is waiting on a PD, why the PD might appear as waiting on
something else. I can't imagine the DL_FLAG_INFERRED being cleared
(it'll only happen when a driver/framework explicitly creates a device
link). Remind me again where the DT for this board is? Does the PD
depend on something else?

One other possibility is that some of the consumers of the PD could be
using the *_platform_driver_probe() macro/function that never
reattempts a probe. So even though this patch might drop the device
links, the consumer never tries again.

-Saravana

2021-02-01 10:44:30

by Geert Uytterhoeven

[permalink] [raw]
Subject: Re: [PATCH v1 0/2] Make fw_devlink=on more forgiving

Hi Saravana,

On Sat, Jan 30, 2021 at 5:09 AM Saravana Kannan <[email protected]> wrote:
> On Fri, Jan 29, 2021 at 8:03 PM Saravana Kannan <[email protected]> wrote:
> > This patch series solves two general issues with fw_devlink=on
> >
> > Patch 1/2 addresses the issue of firmware nodes that look like they'll
> > have struct devices created for them, but will never actually have
> > struct devices added for them. For example, DT nodes with a compatible
> > property that don't have devices added for them.
> >
> > Patch 2/2 address (for static kernels) the issue of optional suppliers
> > that'll never have a driver registered for them. So, if the device could
> > have probed with fw_devlink=permissive with a static kernel, this patch
> > should allow those devices to probe with a fw_devlink=on. This doesn't
> > solve it for the case where modules are enabled because there's no way
> > to tell if a driver will never be registered or it's just about to be
> > registered. I have some other ideas for that, but it'll have to come
> > later thinking about it a bit.
> >
> > These two patches might remove the need for several other patches that
> > went in as fixes for commit e590474768f1 ("driver core: Set
> > fw_devlink=on by default"), but I think all those fixes are good
> > changes. So I think we should leave those in.
> >
> > Marek, Geert,
> >
> > Can you try this series on a static kernel with your OF_POPULATED
> > changes reverted? I just want to make sure these patches can identify
> > and fix those cases.
> >
> > Tudor,
> >
> > You should still make the clock driver fix (because it's a bug), but I
> > think this series will fix your issue too (even without the clock driver
> > fix). Can you please give this a shot?
>
> Marek, Geert, Tudor,
>
> Forgot to say that this will probably fix your issues only in a static
> kernel. So please try this with a static kernel. If you can also try
> and confirm that this does not fix the issue for a modular kernel,
> that'd be good too.

Thanks for your series!

For the modular case, this series has no impact, as expected (i.e. fails
to boot, no I/O devices probed).
With modules disabled, both r8a7791/koelsch and r8a77951/salvator-xs
seem to boot fine, except for one issue on koelsch:

dmesg:

+i2c-demux-pinctrl i2c-12: failed to setup demux-adapter 0 (-19)
+i2c-demux-pinctrl i2c-13: failed to setup demux-adapter 0 (-19)
+i2c-demux-pinctrl i2c-14: failed to setup demux-adapter 0 (-19)

- #0: rsnd-dai.0-ak4642-hifi
+ No soundcards found.

regulator_summary:

-13-0050-vcc 0 0mA 0mV 0mV
-13-0039-dvdd-3v 1 0mA 0mV 0mV
-13-0039-bgvdd 1 0mA 0mV 0mV
-13-0039-pvdd 1 0mA 0mV 0mV
-13-0039-dvdd 1 0mA 0mV 0mV
-13-0039-avdd 1 0mA 0mV 0mV

pm_genpd_summary:

-/devices/platform/soc/e6518000.i2c suspended 0
-/devices/platform/soc/e6530000.i2c suspended 0
-/devices/platform/soc/e6520000.i2c suspended 0

These are all symptoms of the same issue: i2c buses and devices are not
probed, due to the use of the i2c demuxer.
I guess the fw_devlink tracker doesn't consider "i2c-parent" links?

Note that I only tested this on R-Car Gen2 and Gen3.
I did not test this on Renesas SH/R-Mobile or RZ/A SoCs.

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

2021-02-02 03:03:48

by Saravana Kannan

[permalink] [raw]
Subject: Re: [PATCH v1 0/2] Make fw_devlink=on more forgiving

On Mon, Feb 1, 2021 at 2:40 AM Geert Uytterhoeven <[email protected]> wrote:
>
> Hi Saravana,
>
> On Sat, Jan 30, 2021 at 5:09 AM Saravana Kannan <[email protected]> wrote:
> > On Fri, Jan 29, 2021 at 8:03 PM Saravana Kannan <[email protected]> wrote:
> > > This patch series solves two general issues with fw_devlink=on
> > >
> > > Patch 1/2 addresses the issue of firmware nodes that look like they'll
> > > have struct devices created for them, but will never actually have
> > > struct devices added for them. For example, DT nodes with a compatible
> > > property that don't have devices added for them.
> > >
> > > Patch 2/2 address (for static kernels) the issue of optional suppliers
> > > that'll never have a driver registered for them. So, if the device could
> > > have probed with fw_devlink=permissive with a static kernel, this patch
> > > should allow those devices to probe with a fw_devlink=on. This doesn't
> > > solve it for the case where modules are enabled because there's no way
> > > to tell if a driver will never be registered or it's just about to be
> > > registered. I have some other ideas for that, but it'll have to come
> > > later thinking about it a bit.
> > >
> > > These two patches might remove the need for several other patches that
> > > went in as fixes for commit e590474768f1 ("driver core: Set
> > > fw_devlink=on by default"), but I think all those fixes are good
> > > changes. So I think we should leave those in.
> > >
> > > Marek, Geert,
> > >
> > > Can you try this series on a static kernel with your OF_POPULATED
> > > changes reverted? I just want to make sure these patches can identify
> > > and fix those cases.
> > >
> > > Tudor,
> > >
> > > You should still make the clock driver fix (because it's a bug), but I
> > > think this series will fix your issue too (even without the clock driver
> > > fix). Can you please give this a shot?
> >
> > Marek, Geert, Tudor,
> >
> > Forgot to say that this will probably fix your issues only in a static
> > kernel. So please try this with a static kernel. If you can also try
> > and confirm that this does not fix the issue for a modular kernel,
> > that'd be good too.
>
> Thanks for your series!
>
> For the modular case, this series has no impact, as expected (i.e. fails
> to boot, no I/O devices probed).
> With modules disabled, both r8a7791/koelsch and r8a77951/salvator-xs
> seem to boot fine, except for one issue on koelsch:

Thanks a lot for testing the series!

Regarding the koelsch issue, do you not see it with your OF_POPULATED
fix for rcar-sysc driver? But only see if you revert it and use this
series?

>
> dmesg:
>
> +i2c-demux-pinctrl i2c-12: failed to setup demux-adapter 0 (-19)
> +i2c-demux-pinctrl i2c-13: failed to setup demux-adapter 0 (-19)
> +i2c-demux-pinctrl i2c-14: failed to setup demux-adapter 0 (-19)
>
> - #0: rsnd-dai.0-ak4642-hifi
> + No soundcards found.
>
> regulator_summary:
>
> -13-0050-vcc 0 0mA 0mV 0mV
> -13-0039-dvdd-3v 1 0mA 0mV 0mV
> -13-0039-bgvdd 1 0mA 0mV 0mV
> -13-0039-pvdd 1 0mA 0mV 0mV
> -13-0039-dvdd 1 0mA 0mV 0mV
> -13-0039-avdd 1 0mA 0mV 0mV
>
> pm_genpd_summary:
>
> -/devices/platform/soc/e6518000.i2c suspended 0
> -/devices/platform/soc/e6530000.i2c suspended 0
> -/devices/platform/soc/e6520000.i2c suspended 0
>
> These are all symptoms of the same issue: i2c buses and devices are not
> probed, due to the use of the i2c demuxer.
> I guess the fw_devlink tracker doesn't consider "i2c-parent" links?

No, it doesn't parse "i2c-parent". Ugh... looked at it. It's going to
be a problem to parse because it requires the parents to be disbled in
DT and then fixes them up during run time. fw_devlink can handle DT
overlay changing a specific node, but the problem is that the consumer
DT node doesn't get changed. So the i2c-parent will first be parsed,
fw_devlink will notice they are disabled, so it'll ignore them. Then
those nodes are enabled, but the i2c-parent isn't reparsed because the
consumer isn't updated.

> Note that I only tested this on R-Car Gen2 and Gen3.
> I did not test this on Renesas SH/R-Mobile or RZ/A SoCs.

Thanks for any testing you can do :)

So overall, this series seems to be helping, but doesn't cover 100% of
the cases. So I suppose this is still a useful series. I'll be happy
to take any Tested-by or Reviewed-by.

-Saravana

2021-02-02 08:00:17

by Geert Uytterhoeven

[permalink] [raw]
Subject: Re: [PATCH v1 0/2] Make fw_devlink=on more forgiving

Hi Saravana,

On Tue, Feb 2, 2021 at 4:01 AM Saravana Kannan <[email protected]> wrote:
> On Mon, Feb 1, 2021 at 2:40 AM Geert Uytterhoeven <[email protected]> wrote:
> > On Sat, Jan 30, 2021 at 5:09 AM Saravana Kannan <[email protected]> wrote:
> > > On Fri, Jan 29, 2021 at 8:03 PM Saravana Kannan <[email protected]> wrote:
> > > > This patch series solves two general issues with fw_devlink=on
> > > >
> > > > Patch 1/2 addresses the issue of firmware nodes that look like they'll
> > > > have struct devices created for them, but will never actually have
> > > > struct devices added for them. For example, DT nodes with a compatible
> > > > property that don't have devices added for them.
> > > >
> > > > Patch 2/2 address (for static kernels) the issue of optional suppliers
> > > > that'll never have a driver registered for them. So, if the device could
> > > > have probed with fw_devlink=permissive with a static kernel, this patch
> > > > should allow those devices to probe with a fw_devlink=on. This doesn't
> > > > solve it for the case where modules are enabled because there's no way
> > > > to tell if a driver will never be registered or it's just about to be
> > > > registered. I have some other ideas for that, but it'll have to come
> > > > later thinking about it a bit.
> > > >
> > > > These two patches might remove the need for several other patches that
> > > > went in as fixes for commit e590474768f1 ("driver core: Set
> > > > fw_devlink=on by default"), but I think all those fixes are good
> > > > changes. So I think we should leave those in.
> > > >
> > > > Marek, Geert,
> > > >
> > > > Can you try this series on a static kernel with your OF_POPULATED
> > > > changes reverted? I just want to make sure these patches can identify
> > > > and fix those cases.
> > > >
> > > > Tudor,
> > > >
> > > > You should still make the clock driver fix (because it's a bug), but I
> > > > think this series will fix your issue too (even without the clock driver
> > > > fix). Can you please give this a shot?
> > >
> > > Marek, Geert, Tudor,
> > >
> > > Forgot to say that this will probably fix your issues only in a static
> > > kernel. So please try this with a static kernel. If you can also try
> > > and confirm that this does not fix the issue for a modular kernel,
> > > that'd be good too.
> >
> > Thanks for your series!
> >
> > For the modular case, this series has no impact, as expected (i.e. fails
> > to boot, no I/O devices probed).
> > With modules disabled, both r8a7791/koelsch and r8a77951/salvator-xs
> > seem to boot fine, except for one issue on koelsch:
>
> Thanks a lot for testing the series!
>
> Regarding the koelsch issue, do you not see it with your OF_POPULATED
> fix for rcar-sysc driver? But only see if you revert it and use this
> series?

I've just rechecked, and with fw_devlink=on, and my OF_POPULATED
fir for rcar-sysc, i2c-demux-pinctrl works, both with modules enabled
and disabled.

> > dmesg:
> >
> > +i2c-demux-pinctrl i2c-12: failed to setup demux-adapter 0 (-19)
> > +i2c-demux-pinctrl i2c-13: failed to setup demux-adapter 0 (-19)
> > +i2c-demux-pinctrl i2c-14: failed to setup demux-adapter 0 (-19)
> >
> > - #0: rsnd-dai.0-ak4642-hifi
> > + No soundcards found.
> >
> > regulator_summary:
> >
> > -13-0050-vcc 0 0mA 0mV 0mV
> > -13-0039-dvdd-3v 1 0mA 0mV 0mV
> > -13-0039-bgvdd 1 0mA 0mV 0mV
> > -13-0039-pvdd 1 0mA 0mV 0mV
> > -13-0039-dvdd 1 0mA 0mV 0mV
> > -13-0039-avdd 1 0mA 0mV 0mV
> >
> > pm_genpd_summary:
> >
> > -/devices/platform/soc/e6518000.i2c suspended 0
> > -/devices/platform/soc/e6530000.i2c suspended 0
> > -/devices/platform/soc/e6520000.i2c suspended 0
> >
> > These are all symptoms of the same issue: i2c buses and devices are not
> > probed, due to the use of the i2c demuxer.
> > I guess the fw_devlink tracker doesn't consider "i2c-parent" links?
>
> No, it doesn't parse "i2c-parent". Ugh... looked at it. It's going to
> be a problem to parse because it requires the parents to be disbled in
> DT and then fixes them up during run time. fw_devlink can handle DT
> overlay changing a specific node, but the problem is that the consumer
> DT node doesn't get changed. So the i2c-parent will first be parsed,
> fw_devlink will notice they are disabled, so it'll ignore them. Then
> those nodes are enabled, but the i2c-parent isn't reparsed because the
> consumer isn't updated.

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

2021-02-02 08:14:33

by Marek Szyprowski

[permalink] [raw]
Subject: Re: [PATCH v1 0/2] Make fw_devlink=on more forgiving

Hi Saravana,

On 01.02.2021 10:02, Saravana Kannan wrote:
> On Mon, Feb 1, 2021 at 12:05 AM Marek Szyprowski
> <[email protected]> wrote:
>> On 30.01.2021 05:08, Saravana Kannan wrote:
>>> On Fri, Jan 29, 2021 at 8:03 PM Saravana Kannan <[email protected]> wrote:
>>>> This patch series solves two general issues with fw_devlink=on
>>>>
>>>> Patch 1/2 addresses the issue of firmware nodes that look like they'll
>>>> have struct devices created for them, but will never actually have
>>>> struct devices added for them. For example, DT nodes with a compatible
>>>> property that don't have devices added for them.
>>>>
>>>> Patch 2/2 address (for static kernels) the issue of optional suppliers
>>>> that'll never have a driver registered for them. So, if the device could
>>>> have probed with fw_devlink=permissive with a static kernel, this patch
>>>> should allow those devices to probe with a fw_devlink=on. This doesn't
>>>> solve it for the case where modules are enabled because there's no way
>>>> to tell if a driver will never be registered or it's just about to be
>>>> registered. I have some other ideas for that, but it'll have to come
>>>> later thinking about it a bit.
>>>>
>>>> These two patches might remove the need for several other patches that
>>>> went in as fixes for commit e590474768f1 ("driver core: Set
>>>> fw_devlink=on by default"), but I think all those fixes are good
>>>> changes. So I think we should leave those in.
>>>>
>>>> Marek, Geert,
>>>>
>>>> Can you try this series on a static kernel with your OF_POPULATED
>>>> changes reverted? I just want to make sure these patches can identify
>>>> and fix those cases.
>>>>
>>>> Tudor,
>>>>
>>>> You should still make the clock driver fix (because it's a bug), but I
>>>> think this series will fix your issue too (even without the clock driver
>>>> fix). Can you please give this a shot?
>>> Marek, Geert, Tudor,
>>>
>>> Forgot to say that this will probably fix your issues only in a static
>>> kernel. So please try this with a static kernel. If you can also try
>>> and confirm that this does not fix the issue for a modular kernel,
>>> that'd be good too.
>> I've checked those patches on top of linux next-20210129 with
>> c09a3e6c97f0 ("soc: samsung: pm_domains: Convert to regular platform
>> driver") commit reverted.
> Hi Marek,
>
> Thanks for testing!
>
>> Sadly it doesn't help.
> That sucks. I even partly "tested" it out on my platform (that needs
> CONFIG_MODULES) by commenting out the CONFIG_MODULES check. And I saw
> some device links getting dropped.

Well, my fault. I've missed the fact that I have to disable
CONFIG_MODULES to let it work. This is not really a fix for my case,
because the exynos_defconfig has modules enabled (mainly for WiFi and
media drivers). However disabling the CONFIG_MODULES indeed helped a
bit. Most of the devices got finally probed. There are only 4 left in
the deferred_devices list:

sound
12e20000.sysmmu
12d00000.hdmi
12c10000.mixer

The last two (12c10000.mixer and 12d00000.hdmi) are consumers of the
12e20000.sysmmu, which is a consumer of the 10023c20.power-domain. That
power domain in turn is a consumer (child) of another power domain
(10023c80.power-domain):

# dmesg | grep 10023c20.power-domain
[    0.354435] platform 10023c20.power-domain: Linked as a consumer to
10023c80.power-domain
[    0.489573] platform 12d00000.hdmi: Linked as a consumer to
10023c20.power-domain
[    0.497143] platform 12c10000.mixer: Linked as a consumer to
10023c20.power-domain
[    0.580874] platform 12e20000.sysmmu: Linked as a consumer to
10023c20.power-domain
[    0.601655] platform 12e20000.sysmmu: probe deferral - supplier
10023c20.power-domain not ready
[    2.744884] platform 12c10000.mixer: probe deferral - supplier
10023c20.power-domain not ready
[    2.766726] platform 12d00000.hdmi: probe deferral - supplier
10023c20.power-domain not ready

...

So a dependency chain of 2 power domains is still not resolved properly.

I didn't have time to check what's wrong with the sound node. Simple
grepping of the messages for the 'sound' string don't give any results.
The above tests has been done on the Odroid U3 board
(arch/arm/boot/dts/exynos4412-odroidu3.dts).

Best regards

--
Marek Szyprowski, PhD
Samsung R&D Institute Poland

2021-02-02 08:23:08

by Saravana Kannan

[permalink] [raw]
Subject: Re: [PATCH v1 0/2] Make fw_devlink=on more forgiving

On Mon, Feb 1, 2021 at 11:55 PM Geert Uytterhoeven <[email protected]> wrote:
>
> Hi Saravana,
>
> On Tue, Feb 2, 2021 at 4:01 AM Saravana Kannan <[email protected]> wrote:
> > On Mon, Feb 1, 2021 at 2:40 AM Geert Uytterhoeven <[email protected]> wrote:
> > > On Sat, Jan 30, 2021 at 5:09 AM Saravana Kannan <[email protected]> wrote:
> > > > On Fri, Jan 29, 2021 at 8:03 PM Saravana Kannan <[email protected]> wrote:
> > > > > This patch series solves two general issues with fw_devlink=on
> > > > >
> > > > > Patch 1/2 addresses the issue of firmware nodes that look like they'll
> > > > > have struct devices created for them, but will never actually have
> > > > > struct devices added for them. For example, DT nodes with a compatible
> > > > > property that don't have devices added for them.
> > > > >
> > > > > Patch 2/2 address (for static kernels) the issue of optional suppliers
> > > > > that'll never have a driver registered for them. So, if the device could
> > > > > have probed with fw_devlink=permissive with a static kernel, this patch
> > > > > should allow those devices to probe with a fw_devlink=on. This doesn't
> > > > > solve it for the case where modules are enabled because there's no way
> > > > > to tell if a driver will never be registered or it's just about to be
> > > > > registered. I have some other ideas for that, but it'll have to come
> > > > > later thinking about it a bit.
> > > > >
> > > > > These two patches might remove the need for several other patches that
> > > > > went in as fixes for commit e590474768f1 ("driver core: Set
> > > > > fw_devlink=on by default"), but I think all those fixes are good
> > > > > changes. So I think we should leave those in.
> > > > >
> > > > > Marek, Geert,
> > > > >
> > > > > Can you try this series on a static kernel with your OF_POPULATED
> > > > > changes reverted? I just want to make sure these patches can identify
> > > > > and fix those cases.
> > > > >
> > > > > Tudor,
> > > > >
> > > > > You should still make the clock driver fix (because it's a bug), but I
> > > > > think this series will fix your issue too (even without the clock driver
> > > > > fix). Can you please give this a shot?
> > > >
> > > > Marek, Geert, Tudor,
> > > >
> > > > Forgot to say that this will probably fix your issues only in a static
> > > > kernel. So please try this with a static kernel. If you can also try
> > > > and confirm that this does not fix the issue for a modular kernel,
> > > > that'd be good too.
> > >
> > > Thanks for your series!
> > >
> > > For the modular case, this series has no impact, as expected (i.e. fails
> > > to boot, no I/O devices probed).
> > > With modules disabled, both r8a7791/koelsch and r8a77951/salvator-xs
> > > seem to boot fine, except for one issue on koelsch:
> >
> > Thanks a lot for testing the series!
> >
> > Regarding the koelsch issue, do you not see it with your OF_POPULATED
> > fix for rcar-sysc driver? But only see if you revert it and use this
> > series?
>
> I've just rechecked, and with fw_devlink=on, and my OF_POPULATED
> fir for rcar-sysc, i2c-demux-pinctrl works, both with modules enabled
> and disabled.

Thanks Geert! My guess is that with your OF_POPULATED changes the
"i2c-parents" of i2c-demux-pinctrl don't get probe deferred and
therefore i2c-demux-pinctrl probes after them and everything goes
well.

I guess that goes to show this series can't be the magic bullet even
with patch 2/3 -- especially for top level DT nodes that never have
devices created.

The other odd thing I noticed is that i2c-demux-pinctrl seems to
return -ENODEV when I think it should do -EPROBE_DEFER. In
i2c_demux_activate_master():

ret = of_changeset_apply(&priv->chan[new_chan].chgset);
if (ret)
goto err;

adap = of_find_i2c_adapter_by_node(priv->chan[new_chan].parent_np);
if (!adap) {
ret = -ENODEV;
goto err_with_revert;
}

If I understand the code correctly, it's assuming the selected parent
will probe successfully as soon as its status=ok change is done. Which
is not guaranteed for many reasons (driver not registered, async
probing, stuff like fw_devlink, etc).

Thanks,
Saravana

2021-02-02 08:37:22

by Saravana Kannan

[permalink] [raw]
Subject: Re: [PATCH v1 0/2] Make fw_devlink=on more forgiving

On Tue, Feb 2, 2021 at 12:12 AM Marek Szyprowski
<[email protected]> wrote:
>
> Hi Saravana,
>
> On 01.02.2021 10:02, Saravana Kannan wrote:
> > On Mon, Feb 1, 2021 at 12:05 AM Marek Szyprowski
> > <[email protected]> wrote:
> >> On 30.01.2021 05:08, Saravana Kannan wrote:
> >>> On Fri, Jan 29, 2021 at 8:03 PM Saravana Kannan <[email protected]> wrote:
> >>>> This patch series solves two general issues with fw_devlink=on
> >>>>
> >>>> Patch 1/2 addresses the issue of firmware nodes that look like they'll
> >>>> have struct devices created for them, but will never actually have
> >>>> struct devices added for them. For example, DT nodes with a compatible
> >>>> property that don't have devices added for them.
> >>>>
> >>>> Patch 2/2 address (for static kernels) the issue of optional suppliers
> >>>> that'll never have a driver registered for them. So, if the device could
> >>>> have probed with fw_devlink=permissive with a static kernel, this patch
> >>>> should allow those devices to probe with a fw_devlink=on. This doesn't
> >>>> solve it for the case where modules are enabled because there's no way
> >>>> to tell if a driver will never be registered or it's just about to be
> >>>> registered. I have some other ideas for that, but it'll have to come
> >>>> later thinking about it a bit.
> >>>>
> >>>> These two patches might remove the need for several other patches that
> >>>> went in as fixes for commit e590474768f1 ("driver core: Set
> >>>> fw_devlink=on by default"), but I think all those fixes are good
> >>>> changes. So I think we should leave those in.
> >>>>
> >>>> Marek, Geert,
> >>>>
> >>>> Can you try this series on a static kernel with your OF_POPULATED
> >>>> changes reverted? I just want to make sure these patches can identify
> >>>> and fix those cases.
> >>>>
> >>>> Tudor,
> >>>>
> >>>> You should still make the clock driver fix (because it's a bug), but I
> >>>> think this series will fix your issue too (even without the clock driver
> >>>> fix). Can you please give this a shot?
> >>> Marek, Geert, Tudor,
> >>>
> >>> Forgot to say that this will probably fix your issues only in a static
> >>> kernel. So please try this with a static kernel. If you can also try
> >>> and confirm that this does not fix the issue for a modular kernel,
> >>> that'd be good too.
> >> I've checked those patches on top of linux next-20210129 with
> >> c09a3e6c97f0 ("soc: samsung: pm_domains: Convert to regular platform
> >> driver") commit reverted.
> > Hi Marek,
> >
> > Thanks for testing!
> >
> >> Sadly it doesn't help.
> > That sucks. I even partly "tested" it out on my platform (that needs
> > CONFIG_MODULES) by commenting out the CONFIG_MODULES check. And I saw
> > some device links getting dropped.
>
> Well, my fault. I've missed the fact that I have to disable
> CONFIG_MODULES to let it work. This is not really a fix for my case,
> because the exynos_defconfig has modules enabled (mainly for WiFi and
> media drivers). However disabling the CONFIG_MODULES indeed helped a
> bit. Most of the devices got finally probed. There are only 4 left in
> the deferred_devices list:
>
> sound
> 12e20000.sysmmu
> 12d00000.hdmi
> 12c10000.mixer
>
> The last two (12c10000.mixer and 12d00000.hdmi) are consumers of the
> 12e20000.sysmmu, which is a consumer of the 10023c20.power-domain. That
> power domain in turn is a consumer (child) of another power domain
> (10023c80.power-domain):
>
> # dmesg | grep 10023c20.power-domain
> [ 0.354435] platform 10023c20.power-domain: Linked as a consumer to
> 10023c80.power-domain
> [ 0.489573] platform 12d00000.hdmi: Linked as a consumer to
> 10023c20.power-domain
> [ 0.497143] platform 12c10000.mixer: Linked as a consumer to
> 10023c20.power-domain
> [ 0.580874] platform 12e20000.sysmmu: Linked as a consumer to
> 10023c20.power-domain
> [ 0.601655] platform 12e20000.sysmmu: probe deferral - supplier
> 10023c20.power-domain not ready
> [ 2.744884] platform 12c10000.mixer: probe deferral - supplier
> 10023c20.power-domain not ready
> [ 2.766726] platform 12d00000.hdmi: probe deferral - supplier
> 10023c20.power-domain not ready
>
> ...
>
> So a dependency chain of 2 power domains is still not resolved properly.
>
> I didn't have time to check what's wrong with the sound node. Simple
> grepping of the messages for the 'sound' string don't give any results.
> The above tests has been done on the Odroid U3 board
> (arch/arm/boot/dts/exynos4412-odroidu3.dts).

Thanks for testing again! This actually gave me valuable info. The
problem is that 10023c20.power-domain (let's call it PD-B) never gets
added to the deferred probe list (because that only happens once a
driver is registered). So, it never gets to drop it's dependency on
10023c20.power-domain (let's call it PD-A). So, once all drivers are
registered and the SMMU checks if it needs to drop the device link to
PD-B, it sees that PD-B is still waiting on suppliers. So it could be
that the PD-B would probe in the future. But PD-B never probes in the
future.

Anyway, I was trying to avoid climbing the tree/graph with Patch 2/3.
But I might not be able to avoid it. Maybe all I need to check is
whether it's in the deferred probe list. Let me think about it. And
maybe it will solve Geert's issue too.

-Saravana