2022-11-26 17:08:59

by Andrew Lunn

[permalink] [raw]
Subject: Re: [PATCH v4 2/6] can: etas_es58x: add devlink support

> @@ -2196,11 +2198,12 @@ static struct es58x_device *es58x_init_es58x_dev(struct usb_interface *intf,
> ops = &es581_4_ops;
> }
>
> - es58x_dev = devm_kzalloc(dev, es58x_sizeof_es58x_device(param),
> - GFP_KERNEL);
> - if (!es58x_dev)
> + devlink = devlink_alloc(&es58x_dl_ops, es58x_sizeof_es58x_device(param),
> + dev);
> + if (!devlink)
> return ERR_PTR(-ENOMEM);
>
> + es58x_dev = devlink_priv(devlink);

That is 'interesting'. Makes me wonder about lifetimes of different
objects. Previously your es58x_dev structure would disappear when the
driver is released, or an explicit call to devm_kfree(). Now it
disappears when devlink_free() is called. Any danger of use after free
here?

USB devices always make me wonder about life times rules since they
are probably the mode dynamic sort of device the kernel has the
handle, them just abruptly disappearing.

> es58x_dev->param = param;
> es58x_dev->ops = ops;
> es58x_dev->dev = dev;
> @@ -2247,6 +2250,8 @@ static int es58x_probe(struct usb_interface *intf,
> if (ret)
> return ret;
>
> + devlink_register(priv_to_devlink(es58x_dev));
> +
> for (ch_idx = 0; ch_idx < es58x_dev->num_can_ch; ch_idx++) {
> ret = es58x_init_netdev(es58x_dev, ch_idx);
> if (ret) {
> @@ -2272,8 +2277,10 @@ static void es58x_disconnect(struct usb_interface *intf)
> dev_info(&intf->dev, "Disconnecting %s %s\n",
> es58x_dev->udev->manufacturer, es58x_dev->udev->product);
>
> + devlink_unregister(priv_to_devlink(es58x_dev));
> es58x_free_netdevs(es58x_dev);
> es58x_free_urbs(es58x_dev);
> + devlink_free(priv_to_devlink(es58x_dev));
> usb_set_intfdata(intf, NULL);

Should devlink_free() be after usb_set_inftdata()?

Andrew


2022-11-27 06:15:32

by Vincent MAILHOL

[permalink] [raw]
Subject: Re: [PATCH v4 2/6] can: etas_es58x: add devlink support

On Tue. 27 Nov. 2022 at 01:51, Andrew Lunn <[email protected]> wrote:
> > @@ -2196,11 +2198,12 @@ static struct es58x_device *es58x_init_es58x_dev(struct usb_interface *intf,
> > ops = &es581_4_ops;
> > }
> >
> > - es58x_dev = devm_kzalloc(dev, es58x_sizeof_es58x_device(param),
> > - GFP_KERNEL);
> > - if (!es58x_dev)
> > + devlink = devlink_alloc(&es58x_dl_ops, es58x_sizeof_es58x_device(param),
> > + dev);
> > + if (!devlink)
> > return ERR_PTR(-ENOMEM);
> >
> > + es58x_dev = devlink_priv(devlink);
>
> That is 'interesting'.

Another interesting thing I found is:
https://elixir.bootlin.com/linux/v6.1-rc6/source/drivers/net/ethernet/intel/ice/ice_devlink.c#L866

Because devlink does not have an equivalent to devm_kzalloc(), that
driver uses devm_add_action_or_reset() instead. But any other drivers
will call devlink_free() in their disconnect function. So here, I just
followed the trend.

> Makes me wonder about lifetimes of different
> objects. Previously your es58x_dev structure would disappear when the
> driver is released, or an explicit call to devm_kfree(). Now it
> disappears when devlink_free() is called.

Even before that, this driver used to release es58x_dev in its
disconnect() function. I changed it to use devm_kzalloc() last year
after discovering its existence.
https://git.kernel.org/torvalds/linux/c/6bde4c7fd845

>Any danger of use after free here?

devlink_alloc() allocates one continuous block for both the devlink
and the device priv (struct es58x_dev here):
https://elixir.bootlin.com/linux/v6.1-rc6/source/net/core/devlink.c#L9629

So calling devlink_free() also releases struct es58x_dev.

> USB devices always make me wonder about life times rules since they
> are probably the mode dynamic sort of device the kernel has the
> handle, them just abruptly disappearing.
>
> > es58x_dev->param = param;
> > es58x_dev->ops = ops;
> > es58x_dev->dev = dev;
> > @@ -2247,6 +2250,8 @@ static int es58x_probe(struct usb_interface *intf,
> > if (ret)
> > return ret;
> >
> > + devlink_register(priv_to_devlink(es58x_dev));
> > +
> > for (ch_idx = 0; ch_idx < es58x_dev->num_can_ch; ch_idx++) {
> > ret = es58x_init_netdev(es58x_dev, ch_idx);
> > if (ret) {
> > @@ -2272,8 +2277,10 @@ static void es58x_disconnect(struct usb_interface *intf)
> > dev_info(&intf->dev, "Disconnecting %s %s\n",
> > es58x_dev->udev->manufacturer, es58x_dev->udev->product);
> >
> > + devlink_unregister(priv_to_devlink(es58x_dev));
> > es58x_free_netdevs(es58x_dev);
> > es58x_free_urbs(es58x_dev);
> > + devlink_free(priv_to_devlink(es58x_dev));
> > usb_set_intfdata(intf, NULL);
>
> Should devlink_free() be after usb_set_inftdata()?

A look at
$ git grep -W "usb_set_intfdata(.*NULL)"

shows that the two patterns (freeing before or after
usb_set_intfdata()) coexist.

You are raising an important question here. usb_set_intfdata() does
not have documentation that freeing before it is risky. And the
documentation of usb_driver::disconnect says that:
"@disconnect: Called when the interface is no longer accessible,
usually because its device has been (or is being) disconnected
or the driver module is being unloaded."
Ref: https://elixir.bootlin.com/linux/v6.1-rc6/source/include/linux/usb.h#L1130

So the interface no longer being accessible makes me assume that the
order does not matter. If it indeed matters, then this is a foot gun
and there is some clean-up work waiting for us on many drivers.

@Greg, any thoughts on whether or not the order of usb_set_intfdata()
and resource freeing matters or not?


Yours sincerely,
Vincent Mailhol

2022-11-27 16:17:46

by Alan Stern

[permalink] [raw]
Subject: Re: [PATCH v4 2/6] can: etas_es58x: add devlink support

On Sun, Nov 27, 2022 at 02:10:32PM +0900, Vincent MAILHOL wrote:
> > Should devlink_free() be after usb_set_inftdata()?
>
> A look at
> $ git grep -W "usb_set_intfdata(.*NULL)"
>
> shows that the two patterns (freeing before or after
> usb_set_intfdata()) coexist.
>
> You are raising an important question here. usb_set_intfdata() does
> not have documentation that freeing before it is risky. And the
> documentation of usb_driver::disconnect says that:
> "@disconnect: Called when the interface is no longer accessible,
> usually because its device has been (or is being) disconnected
> or the driver module is being unloaded."
> Ref: https://elixir.bootlin.com/linux/v6.1-rc6/source/include/linux/usb.h#L1130
>
> So the interface no longer being accessible makes me assume that the
> order does not matter. If it indeed matters, then this is a foot gun
> and there is some clean-up work waiting for us on many drivers.
>
> @Greg, any thoughts on whether or not the order of usb_set_intfdata()
> and resource freeing matters or not?

In fact, drivers don't have to call usb_set_intfdata(NULL) at all; the
USB core does it for them after the ->disconnect() callback returns.

But if a driver does make the call, it should be careful to ensure that
the call happens _after_ the driver is finished using the interface-data
pointer. For example, after all outstanding URBs have completed, if the
completion handlers will need to call usb_get_intfdata().

Remember, the interface-data pointer is owned by the driver. Nothing
else in the kernel uses it. So the driver merely has to be careful not
to clear the pointer while it is still using it.

Alan Stern

2022-11-28 02:07:54

by Vincent MAILHOL

[permalink] [raw]
Subject: Re: [PATCH v4 2/6] can: etas_es58x: add devlink support

On Mon. 28 Nov. 2022 at 00:41, Alan Stern <[email protected]> wrote:
> On Sun, Nov 27, 2022 at 02:10:32PM +0900, Vincent MAILHOL wrote:
> > > Should devlink_free() be after usb_set_inftdata()?
> >
> > A look at
> > $ git grep -W "usb_set_intfdata(.*NULL)"
> >
> > shows that the two patterns (freeing before or after
> > usb_set_intfdata()) coexist.
> >
> > You are raising an important question here. usb_set_intfdata() does
> > not have documentation that freeing before it is risky. And the
> > documentation of usb_driver::disconnect says that:
> > "@disconnect: Called when the interface is no longer accessible,
> > usually because its device has been (or is being) disconnected
> > or the driver module is being unloaded."
> > Ref: https://elixir.bootlin.com/linux/v6.1-rc6/source/include/linux/usb.h#L1130
> >
> > So the interface no longer being accessible makes me assume that the
> > order does not matter. If it indeed matters, then this is a foot gun
> > and there is some clean-up work waiting for us on many drivers.
> >
> > @Greg, any thoughts on whether or not the order of usb_set_intfdata()
> > and resource freeing matters or not?
>
> In fact, drivers don't have to call usb_set_intfdata(NULL) at all; the
> USB core does it for them after the ->disconnect() callback returns.

Interesting. This fact is widely unknown, cf:
$ git grep "usb_set_intfdata(.*NULL)" | wc -l
215

I will do some clean-up later on, at least for the CAN USB drivers.

> But if a driver does make the call, it should be careful to ensure that
> the call happens _after_ the driver is finished using the interface-data
> pointer. For example, after all outstanding URBs have completed, if the
> completion handlers will need to call usb_get_intfdata().

ACK. I understand that it should be called *after* the completion of
any ongoing task.

My question was more on:

devlink_free(priv_to_devlink(es58x_dev));
usb_set_intfdata(intf, NULL);

VS.

usb_set_intfdata(intf, NULL);
devlink_free(priv_to_devlink(es58x_dev));

From your comments, I understand that both are fine.

> Remember, the interface-data pointer is owned by the driver. Nothing
> else in the kernel uses it. So the driver merely has to be careful not
> to clear the pointer while it is still using it.

Thanks for your comments!


Yours sincerely,
Vincent Mailhol

2022-11-28 06:46:16

by Vincent MAILHOL

[permalink] [raw]
Subject: Re: [PATCH v4 2/6] can: etas_es58x: add devlink support

On Mon. 28 Nov. 2022 at 10:34, Vincent MAILHOL
<[email protected]> wrote:
> On Mon. 28 Nov. 2022 at 00:41, Alan Stern <[email protected]> wrote:
> > On Sun, Nov 27, 2022 at 02:10:32PM +0900, Vincent MAILHOL wrote:
> > > > Should devlink_free() be after usb_set_inftdata()?
> > >
> > > A look at
> > > $ git grep -W "usb_set_intfdata(.*NULL)"
> > >
> > > shows that the two patterns (freeing before or after
> > > usb_set_intfdata()) coexist.
> > >
> > > You are raising an important question here. usb_set_intfdata() does
> > > not have documentation that freeing before it is risky. And the
> > > documentation of usb_driver::disconnect says that:
> > > "@disconnect: Called when the interface is no longer accessible,
> > > usually because its device has been (or is being) disconnected
> > > or the driver module is being unloaded."
> > > Ref: https://elixir.bootlin.com/linux/v6.1-rc6/source/include/linux/usb.h#L1130
> > >
> > > So the interface no longer being accessible makes me assume that the
> > > order does not matter. If it indeed matters, then this is a foot gun
> > > and there is some clean-up work waiting for us on many drivers.
> > >
> > > @Greg, any thoughts on whether or not the order of usb_set_intfdata()
> > > and resource freeing matters or not?
> >
> > In fact, drivers don't have to call usb_set_intfdata(NULL) at all; the
> > USB core does it for them after the ->disconnect() callback returns.
>
> Interesting. This fact is widely unknown, cf:
> $ git grep "usb_set_intfdata(.*NULL)" | wc -l
> 215
>
> I will do some clean-up later on, at least for the CAN USB drivers.
>
> > But if a driver does make the call, it should be careful to ensure that
> > the call happens _after_ the driver is finished using the interface-data
> > pointer. For example, after all outstanding URBs have completed, if the
> > completion handlers will need to call usb_get_intfdata().
>
> ACK. I understand that it should be called *after* the completion of
> any ongoing task.
>
> My question was more on:
>
> devlink_free(priv_to_devlink(es58x_dev));
> usb_set_intfdata(intf, NULL);
>
> VS.
>
> usb_set_intfdata(intf, NULL);
> devlink_free(priv_to_devlink(es58x_dev));
>
> From your comments, I understand that both are fine.

Do we agree that the usb-skeleton is doing it wrong?
https://elixir.bootlin.com/linux/latest/source/drivers/usb/usb-skeleton.c#L567
usb_set_intfdata(interface, NULL) is called before deregistering the
interface and terminating the outstanding URBs!

> > Remember, the interface-data pointer is owned by the driver. Nothing
> > else in the kernel uses it. So the driver merely has to be careful not
> > to clear the pointer while it is still using it.
>
> Thanks for your comments!
>
>
> Yours sincerely,
> Vincent Mailhol

2022-11-28 14:45:03

by Andrew Lunn

[permalink] [raw]
Subject: Re: [PATCH v4 2/6] can: etas_es58x: add devlink support

> > But if a driver does make the call, it should be careful to ensure that
> > the call happens _after_ the driver is finished using the interface-data
> > pointer. For example, after all outstanding URBs have completed, if the
> > completion handlers will need to call usb_get_intfdata().
>
> ACK. I understand that it should be called *after* the completion of
> any ongoing task.

What sometimes gets people is /sys, /proc. etc. A process can have
such a file open when the device is unplugged. If the read needs to
make use of your private data structure, you need to guarantee it
still exists. Ideally the core needs to wait and not call the
disconnect until all such files are closed. Probably the USB core
does, it is such an obvious issue, but i have no knowledge of USB.

Andrew

2022-11-28 16:16:49

by Alan Stern

[permalink] [raw]
Subject: Re: [PATCH v4 2/6] can: etas_es58x: add devlink support

On Mon, Nov 28, 2022 at 02:32:23PM +0900, Vincent MAILHOL wrote:
> On Mon. 28 Nov. 2022 at 10:34, Vincent MAILHOL
> <[email protected]> wrote:
> > On Mon. 28 Nov. 2022 at 00:41, Alan Stern <[email protected]> wrote:
> > > On Sun, Nov 27, 2022 at 02:10:32PM +0900, Vincent MAILHOL wrote:
> > > > > Should devlink_free() be after usb_set_inftdata()?
> > > >
> > > > A look at
> > > > $ git grep -W "usb_set_intfdata(.*NULL)"
> > > >
> > > > shows that the two patterns (freeing before or after
> > > > usb_set_intfdata()) coexist.
> > > >
> > > > You are raising an important question here. usb_set_intfdata() does
> > > > not have documentation that freeing before it is risky. And the
> > > > documentation of usb_driver::disconnect says that:
> > > > "@disconnect: Called when the interface is no longer accessible,
> > > > usually because its device has been (or is being) disconnected
> > > > or the driver module is being unloaded."
> > > > Ref: https://elixir.bootlin.com/linux/v6.1-rc6/source/include/linux/usb.h#L1130
> > > >
> > > > So the interface no longer being accessible makes me assume that the
> > > > order does not matter. If it indeed matters, then this is a foot gun
> > > > and there is some clean-up work waiting for us on many drivers.
> > > >
> > > > @Greg, any thoughts on whether or not the order of usb_set_intfdata()
> > > > and resource freeing matters or not?
> > >
> > > In fact, drivers don't have to call usb_set_intfdata(NULL) at all; the
> > > USB core does it for them after the ->disconnect() callback returns.
> >
> > Interesting. This fact is widely unknown, cf:
> > $ git grep "usb_set_intfdata(.*NULL)" | wc -l
> > 215
> >
> > I will do some clean-up later on, at least for the CAN USB drivers.
> >
> > > But if a driver does make the call, it should be careful to ensure that
> > > the call happens _after_ the driver is finished using the interface-data
> > > pointer. For example, after all outstanding URBs have completed, if the
> > > completion handlers will need to call usb_get_intfdata().
> >
> > ACK. I understand that it should be called *after* the completion of
> > any ongoing task.
> >
> > My question was more on:
> >
> > devlink_free(priv_to_devlink(es58x_dev));
> > usb_set_intfdata(intf, NULL);
> >
> > VS.
> >
> > usb_set_intfdata(intf, NULL);
> > devlink_free(priv_to_devlink(es58x_dev));
> >
> > From your comments, I understand that both are fine.
>
> Do we agree that the usb-skeleton is doing it wrong?
> https://elixir.bootlin.com/linux/latest/source/drivers/usb/usb-skeleton.c#L567
> usb_set_intfdata(interface, NULL) is called before deregistering the
> interface and terminating the outstanding URBs!

Going through the usb-skeleton.c source code, you will find that
usb_get_intfdata() is called from only a few routines:

skel_open()
skel_disconnect()
skel_suspend()
skel_pre_reset()
skel_post_reset()

Of those, all but the first are called only by the USB core and they are
mutually exclusive with disconnect processing (except for
skel_disconnect() itself, of course). So they don't matter.

The first, skel_open(), can be called as a result of actions by the
user, so the driver needs to ensure that this can't happen after it
clears the interface-data pointer. The user can open the device file at
any time before the minor number is given back, so it is not proper to
call usb_set_intfdata(interface, NULL) before usb_deregister_dev() --
but the driver does exactly this!

(Well, it's not quite that bad. skel_open() does check whether the
interface-data pointer value it gets from usb_get_intfdata() is NULL.
But it's still a race.)

So yes, the current code is wrong. And in fact, it will still be wrong
even after the usb_set_intfdata(interface, NULL) line is removed,
because there is no synchronization between skel_open() and
skel_disconnect(). It is possible for skel_disconnect() to run to
completion and the USB core to clear the interface-data pointer all
while skel_open() is running. The driver needs a static private mutex
to synchronize opens with unregistrations. (This is a general
phenomenon, true of all drivers that have a user interface such as a
device file.)

The driver _does_ have a per-instance mutex, dev->io_mutex, to
synchronize I/O with disconnects. But that's separate from
synchronizing opens with unregistrations, because at open time the
driver doesn't yet know the address of the private data structure or
even if the structure is still allocated. So obviously it can't use a
mutex that is embedded within the private data structure for this
purpose.

Alan Stern

2022-11-29 00:34:44

by Vincent MAILHOL

[permalink] [raw]
Subject: Re: [PATCH v4 2/6] can: etas_es58x: add devlink support

On Tue. 29 Nov. 2022 at 00:50, Alan Stern <[email protected]> wrote:
> On Mon, Nov 28, 2022 at 02:32:23PM +0900, Vincent MAILHOL wrote:
> > On Mon. 28 Nov. 2022 at 10:34, Vincent MAILHOL
> > <[email protected]> wrote:
> > > On Mon. 28 Nov. 2022 at 00:41, Alan Stern <[email protected]> wrote:
> > > > On Sun, Nov 27, 2022 at 02:10:32PM +0900, Vincent MAILHOL wrote:
> > > > > > Should devlink_free() be after usb_set_inftdata()?
> > > > >
> > > > > A look at
> > > > > $ git grep -W "usb_set_intfdata(.*NULL)"
> > > > >
> > > > > shows that the two patterns (freeing before or after
> > > > > usb_set_intfdata()) coexist.
> > > > >
> > > > > You are raising an important question here. usb_set_intfdata() does
> > > > > not have documentation that freeing before it is risky. And the
> > > > > documentation of usb_driver::disconnect says that:
> > > > > "@disconnect: Called when the interface is no longer accessible,
> > > > > usually because its device has been (or is being) disconnected
> > > > > or the driver module is being unloaded."
> > > > > Ref: https://elixir.bootlin.com/linux/v6.1-rc6/source/include/linux/usb.h#L1130
> > > > >
> > > > > So the interface no longer being accessible makes me assume that the
> > > > > order does not matter. If it indeed matters, then this is a foot gun
> > > > > and there is some clean-up work waiting for us on many drivers.
> > > > >
> > > > > @Greg, any thoughts on whether or not the order of usb_set_intfdata()
> > > > > and resource freeing matters or not?
> > > >
> > > > In fact, drivers don't have to call usb_set_intfdata(NULL) at all; the
> > > > USB core does it for them after the ->disconnect() callback returns.
> > >
> > > Interesting. This fact is widely unknown, cf:
> > > $ git grep "usb_set_intfdata(.*NULL)" | wc -l
> > > 215
> > >
> > > I will do some clean-up later on, at least for the CAN USB drivers.
> > >
> > > > But if a driver does make the call, it should be careful to ensure that
> > > > the call happens _after_ the driver is finished using the interface-data
> > > > pointer. For example, after all outstanding URBs have completed, if the
> > > > completion handlers will need to call usb_get_intfdata().
> > >
> > > ACK. I understand that it should be called *after* the completion of
> > > any ongoing task.
> > >
> > > My question was more on:
> > >
> > > devlink_free(priv_to_devlink(es58x_dev));
> > > usb_set_intfdata(intf, NULL);
> > >
> > > VS.
> > >
> > > usb_set_intfdata(intf, NULL);
> > > devlink_free(priv_to_devlink(es58x_dev));
> > >
> > > From your comments, I understand that both are fine.
> >
> > Do we agree that the usb-skeleton is doing it wrong?
> > https://elixir.bootlin.com/linux/latest/source/drivers/usb/usb-skeleton.c#L567
> > usb_set_intfdata(interface, NULL) is called before deregistering the
> > interface and terminating the outstanding URBs!
>
> Going through the usb-skeleton.c source code, you will find that
> usb_get_intfdata() is called from only a few routines:
>
> skel_open()
> skel_disconnect()
> skel_suspend()
> skel_pre_reset()
> skel_post_reset()
>
> Of those, all but the first are called only by the USB core and they are
> mutually exclusive with disconnect processing (except for
> skel_disconnect() itself, of course). So they don't matter.
>
> The first, skel_open(), can be called as a result of actions by the
> user, so the driver needs to ensure that this can't happen after it
> clears the interface-data pointer. The user can open the device file at
> any time before the minor number is given back, so it is not proper to
> call usb_set_intfdata(interface, NULL) before usb_deregister_dev() --
> but the driver does exactly this!
>
> (Well, it's not quite that bad. skel_open() does check whether the
> interface-data pointer value it gets from usb_get_intfdata() is NULL.
> But it's still a race.)
>
> So yes, the current code is wrong. And in fact, it will still be wrong
> even after the usb_set_intfdata(interface, NULL) line is removed,
> because there is no synchronization between skel_open() and
> skel_disconnect().

ACK. I did not look outside of skel_disconnect(). Regardless, I think
that removing the usb_set_intdata(interface, NULL) is still one step
in the good direction despite the other synchronisation issues. I sent
a patch for that which Greg already pick-up:
https://git.kernel.org/gregkh/usb/c/c568f8bb41a4

>It is possible for skel_disconnect() to run to
> completion and the USB core to clear the interface-data pointer all
> while skel_open() is running. The driver needs a static private mutex
> to synchronize opens with unregistrations. (This is a general
> phenomenon, true of all drivers that have a user interface such as a
> device file.)
>
> The driver _does_ have a per-instance mutex, dev->io_mutex, to
> synchronize I/O with disconnects. But that's separate from
> synchronizing opens with unregistrations, because at open time the
> driver doesn't yet know the address of the private data structure or
> even if the structure is still allocated. So obviously it can't use a
> mutex that is embedded within the private data structure for this
> purpose.

ACK. However, I have other priorities, I will not invest more time to
dig in the usb-skeleton.c

Thank you for the answer! That was a long but interesting diversion
from the initial topic :)


Yours sincerely,
Vincent Mailhol