2023-11-21 04:54:05

by Saravana Kannan

[permalink] [raw]
Subject: Re: Device links between providers and consumers

Hi Bartosz,

Adding LKML so that others are aware of what the issue is and it'll be
easier when I get to my TODO patches and send them out. I'm hoping
that's okay with you because we didn't discuss anything confidential
here.

On Mon, Nov 20, 2023 at 2:21 PM Saravana Kannan <[email protected]> wrote:
>
> On Mon, Nov 20, 2023 at 12:38 PM Bartosz Golaszewski <[email protected]> wrote:
> >
> > Hi Saravana,
> >
> > As I suspected, I couldn't observe the behavior you described during
> > our discussion at the LPC event. I have a DT GPIO provider and a
> > consumer referencing it by phandle. I'm unbinding the provider and the
> > consumer keeps on living, if it tries to use the GPIO, then it will
> > enter the regular code path in GPIO for checking if the provider is
> > there or not.
> >
> > Could you point me in the right direction here?
>
> Thanks for trying it out! Based on the code it should unbind the
> consumers. I haven't ever tried it myself (no need for it).

I took a closer look to show you where the consumer unbind is supposed
to be done, but in doing so I think I know what issue you are hitting.
One of my TODO items for device links should fix your problem.

The force unbinding of consumers when the supplier is unbound is
supposed to happen here:
device_driver_detach()
-> device_release_driver_internal()
-> __device_release_driver()
-> device_links_unbind_consumers()
-> for all "active" consumer -> device_release_driver_internal()

However the problem is the "if (drv)" check in __device_release_driver().

This problem also exists for "class" device suppliers that don't have
a drv. Fixing managed device links for "class" suppliers (and now, bus
suppliers without drv) has been in my TODO list for a while.

The gpio device is one of the cases of a "bus" device probing without
a driver. A while ago, I implemented a gpio_bus_match() that'll probe
the gpio device (so consumer probing isn't blocked) and I was trying
to keep the boilerplate code minimalistic. So, for your test case, a
quick test hack would be to implement an actual stub driver instead of
using a stub bus match. That should fix your problem with the
consumers not unbinding. I'll put up a proper fix sometime soon
(hopefully over the holiday lulls).

Btw, when we were talking in person at the LPC dinner, you were asking
"what would you do if the supplier was an optional supplier but you
forcefully unbound the consumer?" I have a nice answer now:

After a force unbind, we need to add all these consumers to the
deferred probe list and trigger another deferred probe attempt. If the
supplier was optional, the consumer would probe again. This also has
the nice property that the consumer doesn't advertise something to
userspace that it can't deliver (because the supplier has gone
missing) and it makes the error handling easier for drivers. They
don't have to worry about suppliers vanishing in every part of their
code. Once they get the supplier and probe successfully, they
shouldn't have to worry about it vanishing underneath them.

Cheers,
Saravana


>
> Let's start with making sure the basic functionality is working in your case.
>
> Can you check /sys/class/devlink to see if you see a folder with the
> following name?
> <bus:supplier>--<bus:consumer>
>
> Once you find it, can you cat all the file contents and tell me what
> it says before you unbind it?
>
> The "status" should be "available". And "sync_state_only" should be false.
>
> Also, how are you unbinding the supplier? And does the board you are
> playing with something that's upstream? Should we take this discussion
> to LKML?
>
> -Saravana


2023-11-21 10:07:25

by Bartosz Golaszewski

[permalink] [raw]
Subject: Re: Device links between providers and consumers

On Tue, Nov 21, 2023 at 5:53 AM Saravana Kannan <[email protected]> wrote:
>
> Hi Bartosz,
>
> Adding LKML so that others are aware of what the issue is and it'll be
> easier when I get to my TODO patches and send them out. I'm hoping
> that's okay with you because we didn't discuss anything confidential
> here.
>

That's alright, thanks for starting the public conversation.

> On Mon, Nov 20, 2023 at 2:21 PM Saravana Kannan <[email protected]> wrote:
> >
> > On Mon, Nov 20, 2023 at 12:38 PM Bartosz Golaszewski <[email protected]> wrote:
> > >
> > > Hi Saravana,
> > >
> > > As I suspected, I couldn't observe the behavior you described during
> > > our discussion at the LPC event. I have a DT GPIO provider and a
> > > consumer referencing it by phandle. I'm unbinding the provider and the
> > > consumer keeps on living, if it tries to use the GPIO, then it will
> > > enter the regular code path in GPIO for checking if the provider is
> > > there or not.
> > >
> > > Could you point me in the right direction here?
> >
> > Thanks for trying it out! Based on the code it should unbind the
> > consumers. I haven't ever tried it myself (no need for it).
>
> I took a closer look to show you where the consumer unbind is supposed
> to be done, but in doing so I think I know what issue you are hitting.
> One of my TODO items for device links should fix your problem.
>
> The force unbinding of consumers when the supplier is unbound is
> supposed to happen here:
> device_driver_detach()
> -> device_release_driver_internal()
> -> __device_release_driver()
> -> device_links_unbind_consumers()
> -> for all "active" consumer -> device_release_driver_internal()
>
> However the problem is the "if (drv)" check in __device_release_driver().
>
> This problem also exists for "class" device suppliers that don't have
> a drv. Fixing managed device links for "class" suppliers (and now, bus
> suppliers without drv) has been in my TODO list for a while.
>
> The gpio device is one of the cases of a "bus" device probing without
> a driver. A while ago, I implemented a gpio_bus_match() that'll probe
> the gpio device (so consumer probing isn't blocked) and I was trying
> to keep the boilerplate code minimalistic. So, for your test case, a
> quick test hack would be to implement an actual stub driver instead of
> using a stub bus match. That should fix your problem with the
> consumers not unbinding. I'll put up a proper fix sometime soon
> (hopefully over the holiday lulls).
>

But I don't even see any code referring to device_link in
drivers/gpio/. I see that if you get a regulator, there is a link
created between the regulator device and the consumer device in
_regulator_get() but nothing like that in GPIO.

> Btw, when we were talking in person at the LPC dinner, you were asking
> "what would you do if the supplier was an optional supplier but you
> forcefully unbound the consumer?" I have a nice answer now:
>

Actually, my question was: "what if a resource is optional and the
provider of that resource gets unbound". But below you still did
answer this question. :)

> After a force unbind, we need to add all these consumers to the
> deferred probe list and trigger another deferred probe attempt. If the
> supplier was optional, the consumer would probe again. This also has
> the nice property that the consumer doesn't advertise something to
> userspace that it can't deliver (because the supplier has gone
> missing) and it makes the error handling easier for drivers. They
> don't have to worry about suppliers vanishing in every part of their
> code. Once they get the supplier and probe successfully, they
> shouldn't have to worry about it vanishing underneath them.
>

Let me rephrase it to see if I understand it:

1. Provider is probed.
2. Consumer is probed and gets the resource from provider.
3. Provider is unbound.
4. As a result, the consumer is unbound.
5. Consumer is put into the deferred probe list.
6. Consumer binds again to its driver but this time doesn't get the resource.

It makes some sense I guess but then you have to deal with the device
disappearing for a brief moment in whatever code uses it so it's not
like it has no price over handling the provider unbind in consumers.
If you're exposing anything to user-space, you're offloading that
handling to it.

There are also two approaches to handling the providers unbinding:
returning an error from API calls vs returning 0 and doing nothing in
which case most of the consumer code can remain the same. This is what
GPIO does ATM.

> Cheers,
> Saravana
>

Is there a way for a driver to alter the behavior? For instance tell
the device core that it should not unbind it if the provider is
detached?

The behavior in general makes sense but it only applies to platform
devices on DT systems and has some corner-cases that would need to be
ironed out. What I proposed is more generic as it also covers
resources exposed to drivers or user-space from discoverable devices.

Bart

>
> >
> > Let's start with making sure the basic functionality is working in your case.
> >
> > Can you check /sys/class/devlink to see if you see a folder with the
> > following name?
> > <bus:supplier>--<bus:consumer>
> >
> > Once you find it, can you cat all the file contents and tell me what
> > it says before you unbind it?
> >
> > The "status" should be "available". And "sync_state_only" should be false.
> >
> > Also, how are you unbinding the supplier? And does the board you are
> > playing with something that's upstream? Should we take this discussion
> > to LKML?
> >
> > -Saravana

2023-11-21 13:36:34

by Bartosz Golaszewski

[permalink] [raw]
Subject: Re: Device links between providers and consumers

On Tue, Nov 21, 2023 at 11:05 AM Bartosz Golaszewski <[email protected]> wrote:
>
> On Tue, Nov 21, 2023 at 5:53 AM Saravana Kannan <[email protected]> wrote:
> >
> > Hi Bartosz,
> >
> > Adding LKML so that others are aware of what the issue is and it'll be
> > easier when I get to my TODO patches and send them out. I'm hoping
> > that's okay with you because we didn't discuss anything confidential
> > here.
> >
>
> That's alright, thanks for starting the public conversation.
>
> > On Mon, Nov 20, 2023 at 2:21 PM Saravana Kannan <[email protected]> wrote:
> > >
> > > On Mon, Nov 20, 2023 at 12:38 PM Bartosz Golaszewski <[email protected]> wrote:
> > > >
> > > > Hi Saravana,
> > > >
> > > > As I suspected, I couldn't observe the behavior you described during
> > > > our discussion at the LPC event. I have a DT GPIO provider and a
> > > > consumer referencing it by phandle. I'm unbinding the provider and the
> > > > consumer keeps on living, if it tries to use the GPIO, then it will
> > > > enter the regular code path in GPIO for checking if the provider is
> > > > there or not.
> > > >
> > > > Could you point me in the right direction here?
> > >
> > > Thanks for trying it out! Based on the code it should unbind the
> > > consumers. I haven't ever tried it myself (no need for it).
> >
> > I took a closer look to show you where the consumer unbind is supposed
> > to be done, but in doing so I think I know what issue you are hitting.
> > One of my TODO items for device links should fix your problem.
> >
> > The force unbinding of consumers when the supplier is unbound is
> > supposed to happen here:
> > device_driver_detach()
> > -> device_release_driver_internal()
> > -> __device_release_driver()
> > -> device_links_unbind_consumers()
> > -> for all "active" consumer -> device_release_driver_internal()
> >
> > However the problem is the "if (drv)" check in __device_release_driver().
> >
> > This problem also exists for "class" device suppliers that don't have
> > a drv. Fixing managed device links for "class" suppliers (and now, bus
> > suppliers without drv) has been in my TODO list for a while.
> >
> > The gpio device is one of the cases of a "bus" device probing without
> > a driver. A while ago, I implemented a gpio_bus_match() that'll probe
> > the gpio device (so consumer probing isn't blocked) and I was trying
> > to keep the boilerplate code minimalistic. So, for your test case, a
> > quick test hack would be to implement an actual stub driver instead of
> > using a stub bus match. That should fix your problem with the
> > consumers not unbinding. I'll put up a proper fix sometime soon
> > (hopefully over the holiday lulls).
> >
>
> But I don't even see any code referring to device_link in
> drivers/gpio/. I see that if you get a regulator, there is a link
> created between the regulator device and the consumer device in
> _regulator_get() but nothing like that in GPIO.
>
> > Btw, when we were talking in person at the LPC dinner, you were asking
> > "what would you do if the supplier was an optional supplier but you
> > forcefully unbound the consumer?" I have a nice answer now:
> >
>
> Actually, my question was: "what if a resource is optional and the
> provider of that resource gets unbound". But below you still did
> answer this question. :)
>
> > After a force unbind, we need to add all these consumers to the
> > deferred probe list and trigger another deferred probe attempt. If the
> > supplier was optional, the consumer would probe again. This also has
> > the nice property that the consumer doesn't advertise something to
> > userspace that it can't deliver (because the supplier has gone
> > missing) and it makes the error handling easier for drivers. They
> > don't have to worry about suppliers vanishing in every part of their
> > code. Once they get the supplier and probe successfully, they
> > shouldn't have to worry about it vanishing underneath them.
> >
>
> Let me rephrase it to see if I understand it:
>
> 1. Provider is probed.
> 2. Consumer is probed and gets the resource from provider.
> 3. Provider is unbound.
> 4. As a result, the consumer is unbound.
> 5. Consumer is put into the deferred probe list.
> 6. Consumer binds again to its driver but this time doesn't get the resource.
>
> It makes some sense I guess but then you have to deal with the device
> disappearing for a brief moment in whatever code uses it so it's not
> like it has no price over handling the provider unbind in consumers.
> If you're exposing anything to user-space, you're offloading that
> handling to it.
>
> There are also two approaches to handling the providers unbinding:
> returning an error from API calls vs returning 0 and doing nothing in
> which case most of the consumer code can remain the same. This is what
> GPIO does ATM.
>
> > Cheers,
> > Saravana
> >
>
> Is there a way for a driver to alter the behavior? For instance tell
> the device core that it should not unbind it if the provider is
> detached?
>
> The behavior in general makes sense but it only applies to platform
> devices on DT systems and has some corner-cases that would need to be
> ironed out. What I proposed is more generic as it also covers
> resources exposed to drivers or user-space from discoverable devices.
>

Actually the two are largely orthogonal so can be developed independently.

One more thing: the device link mechanism will never work for
interrupts as the interrupt subsystem doesn't use struct device. Same
for clk but that's less of a problem as they are rarely detachable.

Bart

> Bart
>
> >
> > >
> > > Let's start with making sure the basic functionality is working in your case.
> > >
> > > Can you check /sys/class/devlink to see if you see a folder with the
> > > following name?
> > > <bus:supplier>--<bus:consumer>
> > >
> > > Once you find it, can you cat all the file contents and tell me what
> > > it says before you unbind it?
> > >
> > > The "status" should be "available". And "sync_state_only" should be false.
> > >
> > > Also, how are you unbinding the supplier? And does the board you are
> > > playing with something that's upstream? Should we take this discussion
> > > to LKML?
> > >
> > > -Saravana