2009-12-04 16:06:59

by Alan Stern

[permalink] [raw]
Subject: [PATCH] Driver core: fix race in dev_driver_string

This patch (as1310) works around a race in dev_driver_string(). If
the device is unbound while the function is running, dev->driver might
become NULL after we test it and before we dereference it.

Signed-off-by: Alan Stern <[email protected]>
CC: [email protected]

---

Oliver:

We don't have to worry about the device structure being deallocated
while the routine is running. If that happens it's a bug in the
caller: improper refcounting.

Alan Stern


Index: usb-2.6/drivers/base/core.c
===================================================================
--- usb-2.6.orig/drivers/base/core.c
+++ usb-2.6/drivers/base/core.c
@@ -56,7 +56,14 @@ static inline int device_is_not_partitio
*/
const char *dev_driver_string(const struct device *dev)
{
- return dev->driver ? dev->driver->name :
+ struct device_driver *drv;
+
+ /* dev->driver can change to NULL underneath us because of unbinding,
+ * so be careful about accessing it. dev->bus and dev->class should
+ * never change once they are set, so they don't need special care.
+ */
+ drv = ACCESS_ONCE(dev->driver);
+ return drv ? drv->name :
(dev->bus ? dev->bus->name :
(dev->class ? dev->class->name : ""));
}


2009-12-04 16:16:09

by Oliver Neukum

[permalink] [raw]
Subject: Re: [PATCH] Driver core: fix race in dev_driver_string

Am Freitag, 4. Dezember 2009 17:06:57 schrieb Alan Stern:
> Oliver:
>
> We don't have to worry about the device structure being deallocated
> while the routine is running. If that happens it's a bug in the
> caller: improper refcounting.

That raises two points

1. am I supposed to get a reference just so that I can use dev_err?
2. what happens if this is a soft disconnect and the device is reconnected?
It seems to me that you'd print the wrong driver's name.

Regards
Oliver

2009-12-04 16:50:52

by Greg KH

[permalink] [raw]
Subject: Re: [PATCH] Driver core: fix race in dev_driver_string

On Fri, Dec 04, 2009 at 05:16:19PM +0100, Oliver Neukum wrote:
> Am Freitag, 4. Dezember 2009 17:06:57 schrieb Alan Stern:
> > Oliver:
> >
> > We don't have to worry about the device structure being deallocated
> > while the routine is running. If that happens it's a bug in the
> > caller: improper refcounting.
>
> That raises two points
>
> 1. am I supposed to get a reference just so that I can use dev_err?

No, you should already have a reference on the device when doing the
call, right?

> 2. what happens if this is a soft disconnect and the device is reconnected?
> It seems to me that you'd print the wrong driver's name.

Then we can live with that, it's not that big of a deal.

thanks,

greg k-h

2009-12-04 16:57:53

by Alan Stern

[permalink] [raw]
Subject: Re: [PATCH] Driver core: fix race in dev_driver_string

On Fri, 4 Dec 2009, Oliver Neukum wrote:

> Am Freitag, 4. Dezember 2009 17:06:57 schrieb Alan Stern:
> > Oliver:
> >
> > We don't have to worry about the device structure being deallocated
> > while the routine is running. If that happens it's a bug in the
> > caller: improper refcounting.
>
> That raises two points
>
> 1. am I supposed to get a reference just so that I can use dev_err?

You're supposed to hold a reference before using dev at all. If the
only use you make of dev is to call dev_err(), then yes.

That's how refcounting is meant to work.

> 2. what happens if this is a soft disconnect and the device is reconnected?
> It seems to me that you'd print the wrong driver's name.

You mean the device is unbound from driver A and then bound to driver
B, after which a thread running in driver A calls dev_err()? Then yes,
the "wrong" name will be printed.

This is part of the idea behind dev_err() and friends. They print the
name of the currently bound driver, not the name of the calling source
file. That's why, for example, the name "usb-storage" shows up in a
message printed from within usbcore rather than "hub.c".

Alan Stern

2009-12-04 19:55:47

by Oliver Neukum

[permalink] [raw]
Subject: Re: [PATCH] Driver core: fix race in dev_driver_string

Am Freitag, 4. Dezember 2009 17:50:40 schrieb Greg KH:
> On Fri, Dec 04, 2009 at 05:16:19PM +0100, Oliver Neukum wrote:
> > Am Freitag, 4. Dezember 2009 17:06:57 schrieb Alan Stern:
> > > Oliver:
> > >
> > > We don't have to worry about the device structure being deallocated
> > > while the routine is running. If that happens it's a bug in the
> > > caller: improper refcounting.
> >
> >
> > That raises two points
> >
> > 1. am I supposed to get a reference just so that I can use dev_err?
>
> No, you should already have a reference on the device when doing the
> call, right?

No, why? Consider this:

int write(...)
{
...
mutex_lock(&instance->lock);
if (instance->disconnected) {
dev_dbg(instance->dev,"writing to disconnected device");
rv = -ENODEV;
} else {
res = usb_submit_urb(...);
rv = res < 0 ? -EIO : count;
}
mutex_unlock(&instance->lock);
return rv;
}

void disconnect(...)
{
...
mutex_lock(&instance->lock);
instance->disconnected = 1;
usb_kill_urb(...);
usb_kill_urb(...);
mutex_unlock(&instance->lock);
}

This would be perfectly valid code without any references taken save
for the pesky dev_dbg()

Regards
Oliver

2009-12-04 20:57:46

by Alan Stern

[permalink] [raw]
Subject: Re: [PATCH] Driver core: fix race in dev_driver_string

On Fri, 4 Dec 2009, Oliver Neukum wrote:

> > > 1. am I supposed to get a reference just so that I can use dev_err?
> >
> > No, you should already have a reference on the device when doing the
> > call, right?
>
> No, why? Consider this:
>
> int write(...)
> {
> ...
> mutex_lock(&instance->lock);
> if (instance->disconnected) {
> dev_dbg(instance->dev,"writing to disconnected device");
> rv = -ENODEV;
> } else {
> res = usb_submit_urb(...);
> rv = res < 0 ? -EIO : count;
> }
> mutex_unlock(&instance->lock);
> return rv;
> }
>
> void disconnect(...)
> {
> ...
> mutex_lock(&instance->lock);
> instance->disconnected = 1;
> usb_kill_urb(...);
> usb_kill_urb(...);
> mutex_unlock(&instance->lock);
> }
>
> This would be perfectly valid code without any references taken save
> for the pesky dev_dbg()

Whoever calls write() must possess a valid reference. Otherwise
instance might already be deallocated when write() starts, causing an
oops well before the call to dev_dbg().

Typically the driver would take a reference during open() and drop it
during close().

Alan Stern

2009-12-04 21:18:05

by Oliver Neukum

[permalink] [raw]
Subject: Re: [PATCH] Driver core: fix race in dev_driver_string

Am Freitag, 4. Dezember 2009 21:57:50 schrieb Alan Stern:
> On Fri, 4 Dec 2009, Oliver Neukum wrote:
> > > > 1. am I supposed to get a reference just so that I can use dev_err?
> > >
> > > No, you should already have a reference on the device when doing the
> > > call, right?
> >
> > No, why? Consider this:
> >
> > int write(...)
> > {
> > ...
> > mutex_lock(&instance->lock);
> > if (instance->disconnected) {
> > dev_dbg(instance->dev,"writing to disconnected device");
> > rv = -ENODEV;
> > } else {
> > res = usb_submit_urb(...);
> > rv = res < 0 ? -EIO : count;
> > }
> > mutex_unlock(&instance->lock);
> > return rv;
> > }
> >
> > void disconnect(...)
> > {
> > ...
> > mutex_lock(&instance->lock);
> > instance->disconnected = 1;
> > usb_kill_urb(...);
> > usb_kill_urb(...);
> > mutex_unlock(&instance->lock);
> > }
> >
> > This would be perfectly valid code without any references taken save
> > for the pesky dev_dbg()
>
> Whoever calls write() must possess a valid reference. Otherwise
> instance might already be deallocated when write() starts, causing an
> oops well before the call to dev_dbg().

He needs a valid reference to "instance", not to the device. In fact
he may do IO to the device only if he knows it hasn't been disconnected.

> Typically the driver would take a reference during open() and drop it
> during close().

You can do that but then you must not do IO prior to open() or after
close(). That is you must actually wait for IO to finish in close() and
cannot prefill your buffers before open().

Regards
Oliver

2009-12-04 21:36:19

by Alan Stern

[permalink] [raw]
Subject: Re: [PATCH] Driver core: fix race in dev_driver_string

On Fri, 4 Dec 2009, Oliver Neukum wrote:

> Am Freitag, 4. Dezember 2009 21:57:50 schrieb Alan Stern:
> > On Fri, 4 Dec 2009, Oliver Neukum wrote:
> > > > > 1. am I supposed to get a reference just so that I can use dev_err?
> > > >
> > > > No, you should already have a reference on the device when doing the
> > > > call, right?
> > >
> > > No, why? Consider this:
> > >
> > > int write(...)
> > > {
> > > ...
> > > mutex_lock(&instance->lock);
> > > if (instance->disconnected) {
> > > dev_dbg(instance->dev,"writing to disconnected device");
> > > rv = -ENODEV;
> > > } else {
> > > res = usb_submit_urb(...);
> > > rv = res < 0 ? -EIO : count;
> > > }
> > > mutex_unlock(&instance->lock);
> > > return rv;
> > > }
> > >
> > > void disconnect(...)
> > > {
> > > ...
> > > mutex_lock(&instance->lock);
> > > instance->disconnected = 1;
> > > usb_kill_urb(...);
> > > usb_kill_urb(...);
> > > mutex_unlock(&instance->lock);
> > > }
> > >
> > > This would be perfectly valid code without any references taken save
> > > for the pesky dev_dbg()
> >
> > Whoever calls write() must possess a valid reference. Otherwise
> > instance might already be deallocated when write() starts, causing an
> > oops well before the call to dev_dbg().
>
> He needs a valid reference to "instance", not to the device. In fact
> he may do IO to the device only if he knows it hasn't been disconnected.

Okay, yes. In fact, that is what your write() routine does -- it
checks the disconnected flag.

> > Typically the driver would take a reference during open() and drop it
> > during close().
>
> You can do that but then you must not do IO prior to open() or after
> close(). That is you must actually wait for IO to finish in close() and
> cannot prefill your buffers before open().

If open() or close() is called before disconnect() then you don't have
to worry.

If close() is called after disconnect() there's nothing to wait for,
because disconnect() should call usb_kill_urb() on all outstanding
transfers (actually usbcore will do that for you). Likewise with
open().

The problem in this example stems from the fact that you are using
instance->dev at a time when you don't know that it is valid -- in
fact, you have good reason to believe it _isn't_ valid because
instance->disconnected is set.

One approach is to set instance->dev to NULL in disconnect(). That
wouldn't do much good for your dev_dbg(), though. A better solution is
to refcount the instance->dev pointer: Take a reference to the device
when setting instance->dev and drop it when clearing instance->dev (or
when instance is freed).

Alan Stern

2009-12-04 21:58:38

by Oliver Neukum

[permalink] [raw]
Subject: Re: [PATCH] Driver core: fix race in dev_driver_string

Am Freitag, 4. Dezember 2009 22:36:22 schrieb Alan Stern:
> > > Typically the driver would take a reference during open() and drop it
> > > during close().
> >
> >
> > You can do that but then you must not do IO prior to open() or after
> > close(). That is you must actually wait for IO to finish in close() and
> > cannot prefill your buffers before open().
>
> If open() or close() is called before disconnect() then you don't have
> to worry.
>
> If close() is called after disconnect() there's nothing to wait for,
> because disconnect() should call usb_kill_urb() on all outstanding
> transfers (actually usbcore will do that for you). Likewise with
> open().
>
> The problem in this example stems from the fact that you are using
> instance->dev at a time when you don't know that it is valid -- in
> fact, you have good reason to believe it _isn't_ valid because
> instance->disconnected is set.

OK, yes. It's a bad example. However this is tricky.

This is a bug then:

mutex_lock(...);

if (instance->error) {
rv = instance->error;
instance->error = 0;
dev_dbg(instance->dev,...);
goto err_out;
}

rv = -ENODEV;
if (instance->disconnected)
goto err_out;

> One approach is to set instance->dev to NULL in disconnect(). That
> wouldn't do much good for your dev_dbg(), though. A better solution is
> to refcount the instance->dev pointer: Take a reference to the device
> when setting instance->dev and drop it when clearing instance->dev (or
> when instance is freed).

That would mean that I am forced to adopt refcounting just to print
something. This seems very inelegant.

Regards
Oliver

2009-12-04 22:07:09

by Alan Stern

[permalink] [raw]
Subject: Re: [PATCH] Driver core: fix race in dev_driver_string

On Fri, 4 Dec 2009, Oliver Neukum wrote:

> OK, yes. It's a bad example. However this is tricky.
>
> This is a bug then:
>
> mutex_lock(...);
>
> if (instance->error) {
> rv = instance->error;
> instance->error = 0;
> dev_dbg(instance->dev,...);

Unless you can guarantee at this point that instance->dev isn't stale,
it is indeed a bug.

> goto err_out;
> }
>
> rv = -ENODEV;
> if (instance->disconnected)
> goto err_out;
>
> > One approach is to set instance->dev to NULL in disconnect(). That
> > wouldn't do much good for your dev_dbg(), though. A better solution is
> > to refcount the instance->dev pointer: Take a reference to the device
> > when setting instance->dev and drop it when clearing instance->dev (or
> > when instance is freed).
>
> That would mean that I am forced to adopt refcounting just to print
> something. This seems very inelegant.

What can I say? When the something you want to print can be
deallocated at any time, there isn't much choice.

Maybe reference counting is inelegant; it depends on your point of
view. Can you think of a more elegant way to make sure that a pointer
isn't stale?

Alan Stern

2009-12-04 22:29:22

by Dmitry Torokhov

[permalink] [raw]
Subject: Re: [PATCH] Driver core: fix race in dev_driver_string

On Friday 04 December 2009 02:07:11 pm Alan Stern wrote:
> On Fri, 4 Dec 2009, Oliver Neukum wrote:
> > OK, yes. It's a bad example. However this is tricky.
> >
> > This is a bug then:
> >
> > mutex_lock(...);
> >
> > if (instance->error) {
> > rv = instance->error;
> > instance->error = 0;
> > dev_dbg(instance->dev,...);
>
> Unless you can guarantee at this point that instance->dev isn't stale,
> it is indeed a bug.
>
> > goto err_out;
> > }
> >
> > rv = -ENODEV;
> > if (instance->disconnected)
> > goto err_out;
> >
> > > One approach is to set instance->dev to NULL in disconnect(). That
> > > wouldn't do much good for your dev_dbg(), though. A better solution is
> > > to refcount the instance->dev pointer: Take a reference to the device
> > > when setting instance->dev and drop it when clearing instance->dev (or
> > > when instance is freed).
> >
> > That would mean that I am forced to adopt refcounting just to print
> > something. This seems very inelegant.
>
> What can I say? When the something you want to print can be
> deallocated at any time, there isn't much choice.
>
> Maybe reference counting is inelegant; it depends on your point of
> view. Can you think of a more elegant way to make sure that a pointer
> isn't stale?

Yes, just say "no" to device_create() and friends. Embed device structure in
yours, be mindful of lifetime rules and only use "your" device (i.e device
bound to your driver). This way, as long as your refcount your instance you
can rest assured the device structure is there as well.

--
Dmitry

2009-12-04 23:50:34

by Alan Stern

[permalink] [raw]
Subject: Re: [PATCH] Driver core: fix race in dev_driver_string

On Fri, 4 Dec 2009, Dmitry Torokhov wrote:

> > Maybe reference counting is inelegant; it depends on your point of
> > view. Can you think of a more elegant way to make sure that a pointer
> > isn't stale?
>
> Yes, just say "no" to device_create() and friends.

device_create() wasn't used in the case Oliver is discussing.

> Embed device structure in
> yours,

You can't do that when the device structure wasn't created by your
driver.

> be mindful of lifetime rules and only use "your" device (i.e device
> bound to your driver).

What do you mean by "use"? In Oliver's case he wasn't using the
device, he was using the device structure. (Maybe that's what you
meant.) And he wanted to use it at a time when it wasn't bound to his
driver, because userspace still had an open file reference to it.
There isn't really any way around this.

> This way, as long as your refcount your instance you
> can rest assured the device structure is there as well.

I rather think that a simple device_get() and device_put() is easier
than trying to follow a bunch of rules, especially in cases where they
don't apply! :-)

Alan Stern

2009-12-05 00:39:21

by Greg KH

[permalink] [raw]
Subject: Re: [PATCH] Driver core: fix race in dev_driver_string

On Fri, Dec 04, 2009 at 10:58:48PM +0100, Oliver Neukum wrote:
> Am Freitag, 4. Dezember 2009 22:36:22 schrieb Alan Stern:
> > > > Typically the driver would take a reference during open() and drop it
> > > > during close().
> > >
> > >
> > > You can do that but then you must not do IO prior to open() or after
> > > close(). That is you must actually wait for IO to finish in close() and
> > > cannot prefill your buffers before open().
> >
> > If open() or close() is called before disconnect() then you don't have
> > to worry.
> >
> > If close() is called after disconnect() there's nothing to wait for,
> > because disconnect() should call usb_kill_urb() on all outstanding
> > transfers (actually usbcore will do that for you). Likewise with
> > open().
> >
> > The problem in this example stems from the fact that you are using
> > instance->dev at a time when you don't know that it is valid -- in
> > fact, you have good reason to believe it _isn't_ valid because
> > instance->disconnected is set.
>
> OK, yes. It's a bad example. However this is tricky.
>
> This is a bug then:
>
> mutex_lock(...);
>
> if (instance->error) {
> rv = instance->error;
> instance->error = 0;
> dev_dbg(instance->dev,...);
> goto err_out;
> }
>
> rv = -ENODEV;
> if (instance->disconnected)
> goto err_out;
>
> > One approach is to set instance->dev to NULL in disconnect(). That
> > wouldn't do much good for your dev_dbg(), though. A better solution is
> > to refcount the instance->dev pointer: Take a reference to the device
> > when setting instance->dev and drop it when clearing instance->dev (or
> > when instance is freed).
>
> That would mean that I am forced to adopt refcounting just to print
> something. This seems very inelegant.

Don't print anything if you are disconnecting :)

thanks,

greg k-h

2009-12-05 00:39:28

by Greg KH

[permalink] [raw]
Subject: Re: [PATCH] Driver core: fix race in dev_driver_string

On Fri, Dec 04, 2009 at 06:50:35PM -0500, Alan Stern wrote:
> On Fri, 4 Dec 2009, Dmitry Torokhov wrote:
>
> > > Maybe reference counting is inelegant; it depends on your point of
> > > view. Can you think of a more elegant way to make sure that a pointer
> > > isn't stale?
> >
> > Yes, just say "no" to device_create() and friends.
>
> device_create() wasn't used in the case Oliver is discussing.

It was implied, as you had a pointer to the device, not the device
itself.

> > Embed device structure in
> > yours,
>
> You can't do that when the device structure wasn't created by your
> driver.

But for USB devices, it is part of the device you are handed. Same goes
for PCI devices, and most other types of drivers, right?

> > be mindful of lifetime rules and only use "your" device (i.e device
> > bound to your driver).
>
> What do you mean by "use"? In Oliver's case he wasn't using the
> device, he was using the device structure. (Maybe that's what you
> meant.)

I think that is what is meant here.

> And he wanted to use it at a time when it wasn't bound to his
> driver, because userspace still had an open file reference to it.
> There isn't really any way around this.

But you still have a valid device, just not maybe a driver bound to it.

> > This way, as long as your refcount your instance you
> > can rest assured the device structure is there as well.
>
> I rather think that a simple device_get() and device_put() is easier
> than trying to follow a bunch of rules, especially in cases where they
> don't apply! :-)

Like here :)

thanks,

greg k-h

2009-12-05 02:37:42

by Alan Stern

[permalink] [raw]
Subject: Re: [PATCH] Driver core: fix race in dev_driver_string

On Fri, 4 Dec 2009, Greg KH wrote:

> On Fri, Dec 04, 2009 at 06:50:35PM -0500, Alan Stern wrote:
> > On Fri, 4 Dec 2009, Dmitry Torokhov wrote:
> >
> > > > Maybe reference counting is inelegant; it depends on your point of
> > > > view. Can you think of a more elegant way to make sure that a pointer
> > > > isn't stale?
> > >
> > > Yes, just say "no" to device_create() and friends.
> >
> > device_create() wasn't used in the case Oliver is discussing.
>
> It was implied, as you had a pointer to the device, not the device
> itself.

Not necessarily. For example, the serial drivers have pointers to
struct tty, not the tty structures themselves. That doesn't imply the
tty structures were constructed using device_create().

> > > Embed device structure in
> > > yours,
> >
> > You can't do that when the device structure wasn't created by your
> > driver.
>
> But for USB devices, it is part of the device you are handed. Same goes
> for PCI devices, and most other types of drivers, right?

Yes. Dmitry's word "yours" is ambiguous here. It's true that struct
pci_device contains an embedded struct device. But for example, struct
ehci_hcd doesn't -- even when the EHCI controller is a PCI device. So
if you are the ehci-hcd driver, which structure is "yours": the struct
pci_device or the struct ehci_hcd?

> > > be mindful of lifetime rules and only use "your" device (i.e device
> > > bound to your driver).
> >
> > What do you mean by "use"? In Oliver's case he wasn't using the
> > device, he was using the device structure. (Maybe that's what you
> > meant.)
>
> I think that is what is meant here.
>
> > And he wanted to use it at a time when it wasn't bound to his
> > driver, because userspace still had an open file reference to it.
> > There isn't really any way around this.
>
> But you still have a valid device, just not maybe a driver bound to it.

If a driver isn't bound to it then you don't know whether the device
structure is valid or not. It could have been deallocated. Unless
you have taken a reference to it -- then you know.

Alan Stern