2024-02-28 21:56:25

by Rob Herring

[permalink] [raw]
Subject: Re: [REGRESSION] Re: [PATCH v2 2/3] of: property: Improve finding the supplier of a remote-endpoint property

On Mon, Feb 26, 2024 at 5:52 AM Luca Ceresoli <[email protected]> wrote:
>
> Hello Saravana,
>
> On Fri, 23 Feb 2024 17:35:24 -0800
> Saravana Kannan <[email protected]> wrote:
>
> > On Fri, Feb 23, 2024 at 8:18 AM Luca Ceresoli <[email protected]> wrote:
> > >
> > > Hello Saravana,
> > >
> > > [+cc Hervé Codina]
> > >
> > > On Tue, 6 Feb 2024 17:18:01 -0800
> > > Saravana Kannan <[email protected]> wrote:
> > >
> > > > After commit 4a032827daa8 ("of: property: Simplify of_link_to_phandle()"),
> > > > remote-endpoint properties created a fwnode link from the consumer device
> > > > to the supplier endpoint. This is a tiny bit inefficient (not buggy) when
> > > > trying to create device links or detecting cycles. So, improve this the
> > > > same way we improved finding the consumer of a remote-endpoint property.
> > > >
> > > > Fixes: 4a032827daa8 ("of: property: Simplify of_link_to_phandle()")
> > > > Signed-off-by: Saravana Kannan <[email protected]>
> > >
> > > After rebasing my own branch on v6.8-rc5 from v6.8-rc1 I started
> > > getting unexpected warnings during device tree overlay removal. After a
> > > somewhat painful bisection I identified this patch as the one that
> > > triggers it all.
> >
> > Thanks for the report.
> >
> > >
> > > > ---
> > > > --- a/drivers/of/property.c
> > > > +++ b/drivers/of/property.c
> > > > @@ -1232,7 +1232,6 @@ DEFINE_SIMPLE_PROP(pinctrl5, "pinctrl-5", NULL)
> > > > DEFINE_SIMPLE_PROP(pinctrl6, "pinctrl-6", NULL)
> > > > DEFINE_SIMPLE_PROP(pinctrl7, "pinctrl-7", NULL)
> > > > DEFINE_SIMPLE_PROP(pinctrl8, "pinctrl-8", NULL)
> > > > -DEFINE_SIMPLE_PROP(remote_endpoint, "remote-endpoint", NULL)
> > > > DEFINE_SIMPLE_PROP(pwms, "pwms", "#pwm-cells")
> > > > DEFINE_SIMPLE_PROP(resets, "resets", "#reset-cells")
> > > > DEFINE_SIMPLE_PROP(leds, "leds", NULL)
> > > > @@ -1298,6 +1297,17 @@ static struct device_node *parse_interrupts(struct device_node *np,
> > > > return of_irq_parse_one(np, index, &sup_args) ? NULL : sup_args.np;
> > > > }
> > > >
> > > > +static struct device_node *parse_remote_endpoint(struct device_node *np,
> > > > + const char *prop_name,
> > > > + int index)
> > > > +{
> > > > + /* Return NULL for index > 0 to signify end of remote-endpoints. */
> > > > + if (!index || strcmp(prop_name, "remote-endpoint"))
> > >
> > > There seem to be a bug here: "!index" should be "index > 0", as the
> > > comment suggests. Otherwise NULL is always returned.
> >
> > Ah crap, I think you are right. It should have been "index". Not
> > "!index". But I tested this! Sigh. I probably screwed up my testing.
> >
> > Please send out a Fix for this.
> >
> > Geert, we got excited too soon. :(
> >
> > > I am going to send a quick patch for that, but haven't done so yet
> > > because it still won't solve the problem, so I wanted to open the topic
> > > here without further delay.
> > >
> > > Even with the 'index > 0' fix I'm still getting pretty much the same:
> >
> > This part is confusing though. If I read your DT correctly, there's a
> > cycle between platform:panel-dsi-lvds and i2c:13-002c. And fw_devlink
> > should not be enforcing any ordering between those devices ever.
> >
> > I'm surprised that in your "working" case, fw_devlink didn't detect
> > any cycle. It should have. If there's any debugging to do, that's the
> > one we need to debug.
> >
> > >
> > > [ 34.836781] ------------[ cut here ]------------
> > > [ 34.841401] WARNING: CPU: 2 PID: 204 at drivers/base/devres.c:1064 devm_kfree+0x8c/0xfc
> > > ...
> > > [ 35.024751] Call trace:
> > > [ 35.027199] devm_kfree+0x8c/0xfc
> > > [ 35.030520] devm_drm_panel_bridge_release+0x54/0x64 [drm_kms_helper]
> > > [ 35.036990] devres_release_group+0xe0/0x164
> > > [ 35.041264] i2c_device_remove+0x38/0x9c
> > > [ 35.045196] device_remove+0x4c/0x80
> > > [ 35.048774] device_release_driver_internal+0x1d4/0x230
> > > [ 35.054003] device_release_driver+0x18/0x24
> > > [ 35.058279] bus_remove_device+0xcc/0x10c
> > > [ 35.062292] device_del+0x15c/0x41c
> > > [ 35.065786] device_unregister+0x18/0x34
> > > [ 35.069714] i2c_unregister_device+0x54/0x88
> > > [ 35.073988] of_i2c_notify+0x98/0x224
> > > [ 35.077656] blocking_notifier_call_chain+0x6c/0xa0
> > > [ 35.082543] __of_changeset_entry_notify+0x100/0x16c
> > > [ 35.087515] __of_changeset_revert_notify+0x44/0x78
> > > [ 35.092398] of_overlay_remove+0x114/0x1c4
> > > ...
> > >
> > > By comparing the two versions I found that before removing the overlay:
> > >
> > > * in the "working" case (with this patch reverted) I have:
> > >
> > > # ls /sys/class/devlink/ | grep 002c
> > > platform:hpbr--i2c:13-002c
> > > platform:panel-dsi-lvds--i2c:13-002c
> >
> > Can you check the "status" and "sync_state_only" file in this folder
> > and tell me what it says?
> >
> > Since these devices have a cyclic dependency between them, it should
> > have been something other than "not tracked" and "sync_state_only"
> > should be "1". But my guess is you'll see "active" and "0".
> >
> > > platform:regulator-sys-1v8--i2c:13-002c
> > > regulator:regulator.31--i2c:13-002c
> > > #
> > >
> > > * in the "broken" case (v6.8-rc5 + s/!index/index > 0/ as mentioned):
> > >
> > > # ls /sys/class/devlink/ | grep 002c
> > > platform:hpbr--i2c:13-002c
> > > platform:regulator-sys-1v8--i2c:13-002c
> > > regulator:regulator.30--i2c:13-002c
> > > #
> > >
> > > So in the latter case the panel-dsi-lvds--i2c:13-002c link is missing.
> > > I think it gets created but later on removed. Here's a snippet of the
> > > kernel log when that happens:
> > >
> > > [ 9.578279] ----- cycle: start -----
> > > [ 9.578283] /soc@0/bus@30800000/i2c@30ad0000/i2cmux@70/i2c@3/dsi-lvds-bridge@2c: cycle: depends on /panel-dsi-lvds
> > > [ 9.578308] /panel-dsi-lvds: cycle: depends on /soc@0/bus@30800000/i2c@30ad0000/i2cmux@70/i2c@3/dsi-lvds-bridge@2c
> > > [ 9.578329] ----- cycle: end -----
> > > [ 9.578334] platform panel-dsi-lvds: Fixed dependency cycle(s) with /soc@0/bus@30800000/i2c@30ad0000/i2cmux@70/i2c@3/dsi-lvds-bridge@2c
> > > ...
> >
> > Somewhere in this area, I'm thinking you'll also see "device:
> > 'i2c:13-002c--platform:panel-dsi-lvds': device_add" do you not? And if
> > you enabled device link logs, you'll see that it was "sync state only"
> > link.
> >
> > > [ 9.590620] /panel-dsi-lvds Dropping the fwnode link to /soc@0/bus@30800000/i2c@30ad0000/i2cmux@70/i2c@3/dsi-lvds-bridge@2c
> > > ...
> > > [ 9.597280] ----- cycle: start -----
> > > [ 9.597283] /panel-dsi-lvds: cycle: depends on /soc@0/bus@30800000/i2c@30ad0000/i2cmux@70/i2c@3/dsi-lvds-bridge@2c
> > > [ 9.602781] /soc@0/bus@30800000/i2c@30ad0000/i2cmux@70/i2c@3/dsi-lvds-bridge@2c: cycle: depends on /panel-dsi-lvds
> > > [ 9.607581] ----- cycle: end -----
> > > [ 9.607585] i2c 13-002c: Fixed dependency cycle(s) with /panel-dsi-lvds
> > > [ 9.614217] device: 'platform:panel-dsi-lvds--i2c:13-002c': device_add
> > > ...
> > > [ 9.614277] /soc@0/bus@30800000/i2c@30ad0000/i2cmux@70/i2c@3/dsi-lvds-bridge@2c Dropping the fwnode link to /panel-dsi-lvds
> > > [ 9.614369] /soc@0/bus@30800000/i2c@30ad0000/i2cmux@70/i2c@3/dsi-lvds-bridge@2c Dropping the fwnode link to /regulator-dock-sys-1v8
> > > ...
> > > [ 9.739840] panel-simple panel-dsi-lvds: Dropping the link to 13-002c
> > > [ 9.739846] device: 'i2c:13-002c--platform:panel-dsi-lvds': device_unregister
> >
> > Oh yeah, see. The "device_add" I expected earlier is getting removed here.
> >
> > > [ 10.247037] sn65dsi83 13-002c: Dropping the link to panel-dsi-lvds
> > > [ 10.247049] device: 'platform:panel-dsi-lvds--i2c:13-002c': device_unregister
> > >
> > > And here's the relevant portion of my device tree overlay:
> > >
> > > --------------------8<--------------------
> > >
> >
> > I think the eventual fix would be this series + adding a
> > "post-init-providers" property to the device that's supposed to probe
> > first and point it to the device that's supposed to probe next. Do
> > this at the device node level, not the endpoint level.
> > https://lore.kernel.org/lkml/[email protected]/
>
> I'm certainly going to look at this series in more detail and at the
> debugging you asked for, however I'm afraid I won't have access to the
> hardware this week and it's not going to be a quick task anyway.
>
> So in this moment I think it's quite clear that this specific patch
> creates a regression and there is no clear fix that is reasonably
> likely to get merged before 6.8.
>
> I propose reverting this patch immediately, unless you have a better
> short-term solution.

It's just this one of the 3 patches that needs reverting?

Rob


2024-02-28 23:59:00

by Saravana Kannan

[permalink] [raw]
Subject: Re: [REGRESSION] Re: [PATCH v2 2/3] of: property: Improve finding the supplier of a remote-endpoint property

On Wed, Feb 28, 2024 at 1:56 PM Rob Herring <[email protected]> wrote:
>
> On Mon, Feb 26, 2024 at 5:52 AM Luca Ceresoli <[email protected]> wrote:
> >
> > Hello Saravana,
> >
> > On Fri, 23 Feb 2024 17:35:24 -0800
> > Saravana Kannan <[email protected]> wrote:
> >
> > > On Fri, Feb 23, 2024 at 8:18 AM Luca Ceresoli <[email protected]> wrote:
> > > >
> > > > Hello Saravana,
> > > >
> > > > [+cc Hervé Codina]
> > > >
> > > > On Tue, 6 Feb 2024 17:18:01 -0800
> > > > Saravana Kannan <[email protected]> wrote:
> > > >
> > > > > After commit 4a032827daa8 ("of: property: Simplify of_link_to_phandle()"),
> > > > > remote-endpoint properties created a fwnode link from the consumer device
> > > > > to the supplier endpoint. This is a tiny bit inefficient (not buggy) when
> > > > > trying to create device links or detecting cycles. So, improve this the
> > > > > same way we improved finding the consumer of a remote-endpoint property.
> > > > >
> > > > > Fixes: 4a032827daa8 ("of: property: Simplify of_link_to_phandle()")
> > > > > Signed-off-by: Saravana Kannan <[email protected]>
> > > >
> > > > After rebasing my own branch on v6.8-rc5 from v6.8-rc1 I started
> > > > getting unexpected warnings during device tree overlay removal. After a
> > > > somewhat painful bisection I identified this patch as the one that
> > > > triggers it all.
> > >
> > > Thanks for the report.
> > >
> > > >
> > > > > ---
> > > > > --- a/drivers/of/property.c
> > > > > +++ b/drivers/of/property.c
> > > > > @@ -1232,7 +1232,6 @@ DEFINE_SIMPLE_PROP(pinctrl5, "pinctrl-5", NULL)
> > > > > DEFINE_SIMPLE_PROP(pinctrl6, "pinctrl-6", NULL)
> > > > > DEFINE_SIMPLE_PROP(pinctrl7, "pinctrl-7", NULL)
> > > > > DEFINE_SIMPLE_PROP(pinctrl8, "pinctrl-8", NULL)
> > > > > -DEFINE_SIMPLE_PROP(remote_endpoint, "remote-endpoint", NULL)
> > > > > DEFINE_SIMPLE_PROP(pwms, "pwms", "#pwm-cells")
> > > > > DEFINE_SIMPLE_PROP(resets, "resets", "#reset-cells")
> > > > > DEFINE_SIMPLE_PROP(leds, "leds", NULL)
> > > > > @@ -1298,6 +1297,17 @@ static struct device_node *parse_interrupts(struct device_node *np,
> > > > > return of_irq_parse_one(np, index, &sup_args) ? NULL : sup_args.np;
> > > > > }
> > > > >
> > > > > +static struct device_node *parse_remote_endpoint(struct device_node *np,
> > > > > + const char *prop_name,
> > > > > + int index)
> > > > > +{
> > > > > + /* Return NULL for index > 0 to signify end of remote-endpoints. */
> > > > > + if (!index || strcmp(prop_name, "remote-endpoint"))
> > > >
> > > > There seem to be a bug here: "!index" should be "index > 0", as the
> > > > comment suggests. Otherwise NULL is always returned.
> > >
> > > Ah crap, I think you are right. It should have been "index". Not
> > > "!index". But I tested this! Sigh. I probably screwed up my testing.
> > >
> > > Please send out a Fix for this.
> > >
> > > Geert, we got excited too soon. :(
> > >
> > > > I am going to send a quick patch for that, but haven't done so yet
> > > > because it still won't solve the problem, so I wanted to open the topic
> > > > here without further delay.
> > > >
> > > > Even with the 'index > 0' fix I'm still getting pretty much the same:
> > >
> > > This part is confusing though. If I read your DT correctly, there's a
> > > cycle between platform:panel-dsi-lvds and i2c:13-002c. And fw_devlink
> > > should not be enforcing any ordering between those devices ever.
> > >
> > > I'm surprised that in your "working" case, fw_devlink didn't detect
> > > any cycle. It should have. If there's any debugging to do, that's the
> > > one we need to debug.
> > >
> > > >
> > > > [ 34.836781] ------------[ cut here ]------------
> > > > [ 34.841401] WARNING: CPU: 2 PID: 204 at drivers/base/devres.c:1064 devm_kfree+0x8c/0xfc
> > > > ...
> > > > [ 35.024751] Call trace:
> > > > [ 35.027199] devm_kfree+0x8c/0xfc
> > > > [ 35.030520] devm_drm_panel_bridge_release+0x54/0x64 [drm_kms_helper]
> > > > [ 35.036990] devres_release_group+0xe0/0x164
> > > > [ 35.041264] i2c_device_remove+0x38/0x9c
> > > > [ 35.045196] device_remove+0x4c/0x80
> > > > [ 35.048774] device_release_driver_internal+0x1d4/0x230
> > > > [ 35.054003] device_release_driver+0x18/0x24
> > > > [ 35.058279] bus_remove_device+0xcc/0x10c
> > > > [ 35.062292] device_del+0x15c/0x41c
> > > > [ 35.065786] device_unregister+0x18/0x34
> > > > [ 35.069714] i2c_unregister_device+0x54/0x88
> > > > [ 35.073988] of_i2c_notify+0x98/0x224
> > > > [ 35.077656] blocking_notifier_call_chain+0x6c/0xa0
> > > > [ 35.082543] __of_changeset_entry_notify+0x100/0x16c
> > > > [ 35.087515] __of_changeset_revert_notify+0x44/0x78
> > > > [ 35.092398] of_overlay_remove+0x114/0x1c4
> > > > ...
> > > >
> > > > By comparing the two versions I found that before removing the overlay:
> > > >
> > > > * in the "working" case (with this patch reverted) I have:
> > > >
> > > > # ls /sys/class/devlink/ | grep 002c
> > > > platform:hpbr--i2c:13-002c
> > > > platform:panel-dsi-lvds--i2c:13-002c
> > >
> > > Can you check the "status" and "sync_state_only" file in this folder
> > > and tell me what it says?
> > >
> > > Since these devices have a cyclic dependency between them, it should
> > > have been something other than "not tracked" and "sync_state_only"
> > > should be "1". But my guess is you'll see "active" and "0".
> > >
> > > > platform:regulator-sys-1v8--i2c:13-002c
> > > > regulator:regulator.31--i2c:13-002c
> > > > #
> > > >
> > > > * in the "broken" case (v6.8-rc5 + s/!index/index > 0/ as mentioned):
> > > >
> > > > # ls /sys/class/devlink/ | grep 002c
> > > > platform:hpbr--i2c:13-002c
> > > > platform:regulator-sys-1v8--i2c:13-002c
> > > > regulator:regulator.30--i2c:13-002c
> > > > #
> > > >
> > > > So in the latter case the panel-dsi-lvds--i2c:13-002c link is missing.
> > > > I think it gets created but later on removed. Here's a snippet of the
> > > > kernel log when that happens:
> > > >
> > > > [ 9.578279] ----- cycle: start -----
> > > > [ 9.578283] /soc@0/bus@30800000/i2c@30ad0000/i2cmux@70/i2c@3/dsi-lvds-bridge@2c: cycle: depends on /panel-dsi-lvds
> > > > [ 9.578308] /panel-dsi-lvds: cycle: depends on /soc@0/bus@30800000/i2c@30ad0000/i2cmux@70/i2c@3/dsi-lvds-bridge@2c
> > > > [ 9.578329] ----- cycle: end -----
> > > > [ 9.578334] platform panel-dsi-lvds: Fixed dependency cycle(s) with /soc@0/bus@30800000/i2c@30ad0000/i2cmux@70/i2c@3/dsi-lvds-bridge@2c
> > > > ...
> > >
> > > Somewhere in this area, I'm thinking you'll also see "device:
> > > 'i2c:13-002c--platform:panel-dsi-lvds': device_add" do you not? And if
> > > you enabled device link logs, you'll see that it was "sync state only"
> > > link.
> > >
> > > > [ 9.590620] /panel-dsi-lvds Dropping the fwnode link to /soc@0/bus@30800000/i2c@30ad0000/i2cmux@70/i2c@3/dsi-lvds-bridge@2c
> > > > ...
> > > > [ 9.597280] ----- cycle: start -----
> > > > [ 9.597283] /panel-dsi-lvds: cycle: depends on /soc@0/bus@30800000/i2c@30ad0000/i2cmux@70/i2c@3/dsi-lvds-bridge@2c
> > > > [ 9.602781] /soc@0/bus@30800000/i2c@30ad0000/i2cmux@70/i2c@3/dsi-lvds-bridge@2c: cycle: depends on /panel-dsi-lvds
> > > > [ 9.607581] ----- cycle: end -----
> > > > [ 9.607585] i2c 13-002c: Fixed dependency cycle(s) with /panel-dsi-lvds
> > > > [ 9.614217] device: 'platform:panel-dsi-lvds--i2c:13-002c': device_add
> > > > ...
> > > > [ 9.614277] /soc@0/bus@30800000/i2c@30ad0000/i2cmux@70/i2c@3/dsi-lvds-bridge@2c Dropping the fwnode link to /panel-dsi-lvds
> > > > [ 9.614369] /soc@0/bus@30800000/i2c@30ad0000/i2cmux@70/i2c@3/dsi-lvds-bridge@2c Dropping the fwnode link to /regulator-dock-sys-1v8
> > > > ...
> > > > [ 9.739840] panel-simple panel-dsi-lvds: Dropping the link to 13-002c
> > > > [ 9.739846] device: 'i2c:13-002c--platform:panel-dsi-lvds': device_unregister
> > >
> > > Oh yeah, see. The "device_add" I expected earlier is getting removed here.
> > >
> > > > [ 10.247037] sn65dsi83 13-002c: Dropping the link to panel-dsi-lvds
> > > > [ 10.247049] device: 'platform:panel-dsi-lvds--i2c:13-002c': device_unregister
> > > >
> > > > And here's the relevant portion of my device tree overlay:
> > > >
> > > > --------------------8<--------------------
> > > >
> > >
> > > I think the eventual fix would be this series + adding a
> > > "post-init-providers" property to the device that's supposed to probe
> > > first and point it to the device that's supposed to probe next. Do
> > > this at the device node level, not the endpoint level.
> > > https://lore.kernel.org/lkml/[email protected]/
> >
> > I'm certainly going to look at this series in more detail and at the
> > debugging you asked for, however I'm afraid I won't have access to the
> > hardware this week and it's not going to be a quick task anyway.
> >
> > So in this moment I think it's quite clear that this specific patch
> > creates a regression and there is no clear fix that is reasonably
> > likely to get merged before 6.8.
> >
> > I propose reverting this patch immediately, unless you have a better
> > short-term solution.
>
> It's just this one of the 3 patches that needs reverting?

I sent a fix. With the fix, it's just exposing a bug elsewhere.

-Saravana

2024-02-29 00:27:07

by Rob Herring

[permalink] [raw]
Subject: Re: [REGRESSION] Re: [PATCH v2 2/3] of: property: Improve finding the supplier of a remote-endpoint property

On Wed, Feb 28, 2024 at 5:58 PM Saravana Kannan <[email protected]> wrote:
>
> On Wed, Feb 28, 2024 at 1:56 PM Rob Herring <[email protected]> wrote:
> >
> > On Mon, Feb 26, 2024 at 5:52 AM Luca Ceresoli <[email protected]> wrote:
> > >
> > > Hello Saravana,
> > >
> > > On Fri, 23 Feb 2024 17:35:24 -0800
> > > Saravana Kannan <[email protected]> wrote:
> > >
> > > > On Fri, Feb 23, 2024 at 8:18 AM Luca Ceresoli <[email protected]> wrote:
> > > > >
> > > > > Hello Saravana,
> > > > >
> > > > > [+cc Hervé Codina]
> > > > >
> > > > > On Tue, 6 Feb 2024 17:18:01 -0800
> > > > > Saravana Kannan <[email protected]> wrote:
> > > > >
> > > > > > After commit 4a032827daa8 ("of: property: Simplify of_link_to_phandle()"),
> > > > > > remote-endpoint properties created a fwnode link from the consumer device
> > > > > > to the supplier endpoint. This is a tiny bit inefficient (not buggy) when
> > > > > > trying to create device links or detecting cycles. So, improve this the
> > > > > > same way we improved finding the consumer of a remote-endpoint property.
> > > > > >
> > > > > > Fixes: 4a032827daa8 ("of: property: Simplify of_link_to_phandle()")
> > > > > > Signed-off-by: Saravana Kannan <[email protected]>
> > > > >
> > > > > After rebasing my own branch on v6.8-rc5 from v6.8-rc1 I started
> > > > > getting unexpected warnings during device tree overlay removal. After a
> > > > > somewhat painful bisection I identified this patch as the one that
> > > > > triggers it all.
> > > >
> > > > Thanks for the report.
> > > >
> > > > >
> > > > > > ---
> > > > > > --- a/drivers/of/property.c
> > > > > > +++ b/drivers/of/property.c
> > > > > > @@ -1232,7 +1232,6 @@ DEFINE_SIMPLE_PROP(pinctrl5, "pinctrl-5", NULL)
> > > > > > DEFINE_SIMPLE_PROP(pinctrl6, "pinctrl-6", NULL)
> > > > > > DEFINE_SIMPLE_PROP(pinctrl7, "pinctrl-7", NULL)
> > > > > > DEFINE_SIMPLE_PROP(pinctrl8, "pinctrl-8", NULL)
> > > > > > -DEFINE_SIMPLE_PROP(remote_endpoint, "remote-endpoint", NULL)
> > > > > > DEFINE_SIMPLE_PROP(pwms, "pwms", "#pwm-cells")
> > > > > > DEFINE_SIMPLE_PROP(resets, "resets", "#reset-cells")
> > > > > > DEFINE_SIMPLE_PROP(leds, "leds", NULL)
> > > > > > @@ -1298,6 +1297,17 @@ static struct device_node *parse_interrupts(struct device_node *np,
> > > > > > return of_irq_parse_one(np, index, &sup_args) ? NULL : sup_args.np;
> > > > > > }
> > > > > >
> > > > > > +static struct device_node *parse_remote_endpoint(struct device_node *np,
> > > > > > + const char *prop_name,
> > > > > > + int index)
> > > > > > +{
> > > > > > + /* Return NULL for index > 0 to signify end of remote-endpoints. */
> > > > > > + if (!index || strcmp(prop_name, "remote-endpoint"))
> > > > >
> > > > > There seem to be a bug here: "!index" should be "index > 0", as the
> > > > > comment suggests. Otherwise NULL is always returned.
> > > >
> > > > Ah crap, I think you are right. It should have been "index". Not
> > > > "!index". But I tested this! Sigh. I probably screwed up my testing.
> > > >
> > > > Please send out a Fix for this.
> > > >
> > > > Geert, we got excited too soon. :(
> > > >
> > > > > I am going to send a quick patch for that, but haven't done so yet
> > > > > because it still won't solve the problem, so I wanted to open the topic
> > > > > here without further delay.
> > > > >
> > > > > Even with the 'index > 0' fix I'm still getting pretty much the same:
> > > >
> > > > This part is confusing though. If I read your DT correctly, there's a
> > > > cycle between platform:panel-dsi-lvds and i2c:13-002c. And fw_devlink
> > > > should not be enforcing any ordering between those devices ever.
> > > >
> > > > I'm surprised that in your "working" case, fw_devlink didn't detect
> > > > any cycle. It should have. If there's any debugging to do, that's the
> > > > one we need to debug.
> > > >
> > > > >
> > > > > [ 34.836781] ------------[ cut here ]------------
> > > > > [ 34.841401] WARNING: CPU: 2 PID: 204 at drivers/base/devres.c:1064 devm_kfree+0x8c/0xfc
> > > > > ...
> > > > > [ 35.024751] Call trace:
> > > > > [ 35.027199] devm_kfree+0x8c/0xfc
> > > > > [ 35.030520] devm_drm_panel_bridge_release+0x54/0x64 [drm_kms_helper]
> > > > > [ 35.036990] devres_release_group+0xe0/0x164
> > > > > [ 35.041264] i2c_device_remove+0x38/0x9c
> > > > > [ 35.045196] device_remove+0x4c/0x80
> > > > > [ 35.048774] device_release_driver_internal+0x1d4/0x230
> > > > > [ 35.054003] device_release_driver+0x18/0x24
> > > > > [ 35.058279] bus_remove_device+0xcc/0x10c
> > > > > [ 35.062292] device_del+0x15c/0x41c
> > > > > [ 35.065786] device_unregister+0x18/0x34
> > > > > [ 35.069714] i2c_unregister_device+0x54/0x88
> > > > > [ 35.073988] of_i2c_notify+0x98/0x224
> > > > > [ 35.077656] blocking_notifier_call_chain+0x6c/0xa0
> > > > > [ 35.082543] __of_changeset_entry_notify+0x100/0x16c
> > > > > [ 35.087515] __of_changeset_revert_notify+0x44/0x78
> > > > > [ 35.092398] of_overlay_remove+0x114/0x1c4
> > > > > ...
> > > > >
> > > > > By comparing the two versions I found that before removing the overlay:
> > > > >
> > > > > * in the "working" case (with this patch reverted) I have:
> > > > >
> > > > > # ls /sys/class/devlink/ | grep 002c
> > > > > platform:hpbr--i2c:13-002c
> > > > > platform:panel-dsi-lvds--i2c:13-002c
> > > >
> > > > Can you check the "status" and "sync_state_only" file in this folder
> > > > and tell me what it says?
> > > >
> > > > Since these devices have a cyclic dependency between them, it should
> > > > have been something other than "not tracked" and "sync_state_only"
> > > > should be "1". But my guess is you'll see "active" and "0".
> > > >
> > > > > platform:regulator-sys-1v8--i2c:13-002c
> > > > > regulator:regulator.31--i2c:13-002c
> > > > > #
> > > > >
> > > > > * in the "broken" case (v6.8-rc5 + s/!index/index > 0/ as mentioned):
> > > > >
> > > > > # ls /sys/class/devlink/ | grep 002c
> > > > > platform:hpbr--i2c:13-002c
> > > > > platform:regulator-sys-1v8--i2c:13-002c
> > > > > regulator:regulator.30--i2c:13-002c
> > > > > #
> > > > >
> > > > > So in the latter case the panel-dsi-lvds--i2c:13-002c link is missing.
> > > > > I think it gets created but later on removed. Here's a snippet of the
> > > > > kernel log when that happens:
> > > > >
> > > > > [ 9.578279] ----- cycle: start -----
> > > > > [ 9.578283] /soc@0/bus@30800000/i2c@30ad0000/i2cmux@70/i2c@3/dsi-lvds-bridge@2c: cycle: depends on /panel-dsi-lvds
> > > > > [ 9.578308] /panel-dsi-lvds: cycle: depends on /soc@0/bus@30800000/i2c@30ad0000/i2cmux@70/i2c@3/dsi-lvds-bridge@2c
> > > > > [ 9.578329] ----- cycle: end -----
> > > > > [ 9.578334] platform panel-dsi-lvds: Fixed dependency cycle(s) with /soc@0/bus@30800000/i2c@30ad0000/i2cmux@70/i2c@3/dsi-lvds-bridge@2c
> > > > > ...
> > > >
> > > > Somewhere in this area, I'm thinking you'll also see "device:
> > > > 'i2c:13-002c--platform:panel-dsi-lvds': device_add" do you not? And if
> > > > you enabled device link logs, you'll see that it was "sync state only"
> > > > link.
> > > >
> > > > > [ 9.590620] /panel-dsi-lvds Dropping the fwnode link to /soc@0/bus@30800000/i2c@30ad0000/i2cmux@70/i2c@3/dsi-lvds-bridge@2c
> > > > > ...
> > > > > [ 9.597280] ----- cycle: start -----
> > > > > [ 9.597283] /panel-dsi-lvds: cycle: depends on /soc@0/bus@30800000/i2c@30ad0000/i2cmux@70/i2c@3/dsi-lvds-bridge@2c
> > > > > [ 9.602781] /soc@0/bus@30800000/i2c@30ad0000/i2cmux@70/i2c@3/dsi-lvds-bridge@2c: cycle: depends on /panel-dsi-lvds
> > > > > [ 9.607581] ----- cycle: end -----
> > > > > [ 9.607585] i2c 13-002c: Fixed dependency cycle(s) with /panel-dsi-lvds
> > > > > [ 9.614217] device: 'platform:panel-dsi-lvds--i2c:13-002c': device_add
> > > > > ...
> > > > > [ 9.614277] /soc@0/bus@30800000/i2c@30ad0000/i2cmux@70/i2c@3/dsi-lvds-bridge@2c Dropping the fwnode link to /panel-dsi-lvds
> > > > > [ 9.614369] /soc@0/bus@30800000/i2c@30ad0000/i2cmux@70/i2c@3/dsi-lvds-bridge@2c Dropping the fwnode link to /regulator-dock-sys-1v8
> > > > > ...
> > > > > [ 9.739840] panel-simple panel-dsi-lvds: Dropping the link to 13-002c
> > > > > [ 9.739846] device: 'i2c:13-002c--platform:panel-dsi-lvds': device_unregister
> > > >
> > > > Oh yeah, see. The "device_add" I expected earlier is getting removed here.
> > > >
> > > > > [ 10.247037] sn65dsi83 13-002c: Dropping the link to panel-dsi-lvds
> > > > > [ 10.247049] device: 'platform:panel-dsi-lvds--i2c:13-002c': device_unregister
> > > > >
> > > > > And here's the relevant portion of my device tree overlay:
> > > > >
> > > > > --------------------8<--------------------
> > > > >
> > > >
> > > > I think the eventual fix would be this series + adding a
> > > > "post-init-providers" property to the device that's supposed to probe
> > > > first and point it to the device that's supposed to probe next. Do
> > > > this at the device node level, not the endpoint level.
> > > > https://lore.kernel.org/lkml/[email protected]/
> > >
> > > I'm certainly going to look at this series in more detail and at the
> > > debugging you asked for, however I'm afraid I won't have access to the
> > > hardware this week and it's not going to be a quick task anyway.
> > >
> > > So in this moment I think it's quite clear that this specific patch
> > > creates a regression and there is no clear fix that is reasonably
> > > likely to get merged before 6.8.
> > >
> > > I propose reverting this patch immediately, unless you have a better
> > > short-term solution.
> >
> > It's just this one of the 3 patches that needs reverting?
>
> I sent a fix. With the fix, it's just exposing a bug elsewhere.

That's not telling me what to do...

You say apply the fix. Luca says revert. I say I wish I made this 6.9
material. Which is it?

If the overlay applying depends on out of tree code (likely as there
are limited ways to apply an overlay in mainline), then I don't really
care if there is still a regression.

Rob