LinuxLists.cc - klists and struct device semaphores

2005-03-26 16:53:48

[permalink] [raw]

Subject: klists and struct device semaphores

Pat:

Your recent series of driver model patches naturally divides up into
two parts: those involved with implementing and using klists, and
those involved with adding a semaphore to struct device. Let's
consider these parts separately.

The klist stuff embodies a good idea: safe traversal of lists while
nodes are added and removed. Your klist library is intended for
generic use and hence it necessarily is not tightly integrated with
the data type of the list objects. This can lead to problems, as
discussed below. I'll start with the least important issues and work
up to some big ones.

(1) Your structures contain more fields than I would have used.
klist_node.n_klist isn't really needed; it only gets used
when removing a klist_node from a klist, and at such times
the caller could easily pass the klist directly. Also,
klist_iter.i_head isn't needed because it's easily derivable
from klist_iter.i_klist. (Doesn't really matter because
there never are very many iterators in existence.)

(2) Likewise I'm not so sure about klist_node.n_removed. Waiting
for removal operations to complete is generally a bad idea;
in the driver-model core it's important only when deregistering
a driver. The only other scenario I can think of where you
would need to wait is when you remove an object from a klist
and then want to put it on another (see also (5)). While this
might be necessary for general-purpose usage, the driver-model
core doesn't do it. It also means that moving an object from
one klist to another can't be carried out in_interrupt. (Not
to mention you're adding a lot of struct completions all over
the place, most of which shouldn't need to be used.)

(3) Your iteratators don't allow for some simple optimizations.
For example, a bus's driver<->device matching routine should
be able to run while retaining the klist's spinlock.

(4) You don't have any way of marking klist_nodes that have been
removed from their klist. So an iterator will return these
nodes instead of skipping right past them, as one would expect.
This can have unpleasant consequences, such as probing a new
device against a driver that has been unregistered. The
klist_node's container will be forced to have its own "deleted"
marker, and callers will have to skip deleted entries by hand.

(5) Most importantly, klist_nodes aren't protected against their
containers being deallocated. Or rather, the way in which
you've set up the protection is inappropriate. Right now
you force a caller to wait until list-removal is complete,
and only later is it allowed to deallocate the container
object. This is a violation of the principles underlying the
reference-counting approach; instead a klist_node should take
a reference to its container. But your generic library-based
approach doesn't allow that, at least, not without some
awkward coding. This also means that iterating through a
klist requires acquiring and releasing two references at each
step: one for the klist_node and one for the container.

Some of these weaknesses are unavoidable (they're also present in the
outline Dmitry proposed).

Let's move on to consider the new struct device.sem. You've
recognized, like other people, that such a thing is necessary to
protect drivers against simultaneous callbacks. But things aren't as
simple as just sticking the semaphore into the structure and acquiring
it for each callback! It requires much more careful thought.

(6) Your code doesn't solve the race between driver_unregister and
device_add. What happens if a driver is unregistered at the
same time as it's being probed for a new device? Maybe I
missed something, but it looks like you might end up with the
device bound to a non-existent driver. (And what about the
other way 'round, where a device is unregistered at the same
time as it's being probed by a new driver? That's easier to
get right but I haven't checked it.)

(7) Adding lots of semaphores also adds lots of new possibilities
for deadlock. You haven't provided any rules for locking
multiple semaphores. Here are a few potentially troublesome
scenarios:

A driver being probed for newly-discovered hardware
registers a child device (e.g., a SCSI adapter driver
registers its SCSI host).

Autoresume of child device requires the parent to be
resumed as well.

During remove a driver wants to suspend or resume its
device.

There's also a possibility for deadlock with respect to some
lock private to a subsystem. Of course such things can't be
solved in the core; the subsystem has to take care of them.

(8) A subsystem driver might want to retain control over a new
device around the device_add call -- it might want to hold the
device's semaphore itself the whole time. There need to be
entry points in which the caller holds the necessary
semaphores.

(9) Your scheme doesn't allow any possibility for complete
enumeration of all children of a device; new children can be
added at any time. So for example, checking that all the
children are suspended (and preventing any from being
resumed!) while suspending a device becomes very difficult.

To solve this last problem, my thought has always been that adding a
device to the list of its parent's children should require holding the
parent's lock. There's room for disagreement. But note that there's
code in the kernel which does tree-oriented device traversals; by
locking each node in turn the traversal could be protected against
unwelcome changes in the device tree.

The final issue I have is more complex; it has to do with the peculiar
requirements of the USB subsystem. In case you're not familiar with
the details of how USB works (and for the benefit of anyone else
reading this message), here's a capsule description:

A USB device can have multiple functional units, called "interfaces"
for some strange unknown reason. It's the interfaces that do the
actual work; each one gets its own struct device (in addition to the
struct device allocated for the USB device itself) and its own driver.
While most actions are local to a single interface, there are a few
that affect the entire device including all the interfaces. The most
obvious ones are suspend, resume, and device-reset. More important
and harder to deal with is the Set-Configuration command, which
destroys all the current interfaces and creates a whole set of new
ones.

For proper protection, the USB subsystem requires that the overall
device be locked during suspend, resume, reset, and Set-Config. This
also involves locking the device during any call to a driver callback
-- but now the struct device being locked is the _parent_ of the one
bound to the driver (i.e., the interface's parent).

At the moment this locking is handled internally by the subsystem.
But in one respect it conflicts badly with the operation of the
driver-model core: when a driver is registered or unregistered. At
such times the subsystem isn't in control of which devices are probed
or unbound. I ended up "solving" this by adding a second layer of
locking, which effectively permits _all_ the USB devices to be locked
during usb_register and usb_deregister. It's awkward and it would
benefit from better support from the driver-model core.

Such support would have to take the form of locking a device's parent
as well as the device itself, minimally when probing a new driver and
unbinding a deregistered driver, possibly at other times as well. As
far as I know, USB is the only subsystem to require this and it's
probably something you don't want to do if you don't have to. I don't
know what the best answer is. It was a struggle to get where we are
now and we only had to worry about locking USB devices; locking the
interfaces too adds a whole new dimension.

Alan Stern

2005-03-28 16:56:39

by Patrick Mochel

[permalink] [raw]

Subject: Re: klists and struct device semaphores

On Sat, 26 Mar 2005, Alan Stern wrote:

> Pat:
>
> Your recent series of driver model patches naturally divides up into
> two parts: those involved with implementing and using klists, and
> those involved with adding a semaphore to struct device. Let's
> consider these parts separately.

I've split them into two replies to reflect their nature of separateness.

> (1) Your structures contain more fields than I would have used.
> klist_node.n_klist isn't really needed; it only gets used
> when removing a klist_node from a klist, and at such times
> the caller could easily pass the klist directly. Also,
> klist_iter.i_head isn't needed because it's easily derivable
> from klist_iter.i_klist. (Doesn't really matter because
> there never are very many iterators in existence.)

Those are good suggestions. Thanks.

> (2) Likewise I'm not so sure about klist_node.n_removed. Waiting
> for removal operations to complete is generally a bad idea;
> in the driver-model core it's important only when deregistering
> a driver. The only other scenario I can think of where you
> would need to wait is when you remove an object from a klist
> and then want to put it on another (see also (5)). While this
> might be necessary for general-purpose usage, the driver-model
> core doesn't do it. It also means that moving an object from
> one klist to another can't be carried out in_interrupt. (Not
> to mention you're adding a lot of struct completions all over
> the place, most of which shouldn't need to be used.)

It's important when removing a containing object's knode from the list
when that object is about to be freed. This happens during both device and
driver unregistration. In most cases, the removal operation will return
immediately. When it doesn't, it means another thread is using that
particular knode, which means its imperative that the containing object
not be freed.

Do you have suggestions about an alternative (with code)?

> (3) Your iteratators don't allow for some simple optimizations.
> For example, a bus's driver<->device matching routine should
> be able to run while retaining the klist's spinlock.

It's trivial to add a helper that holds a lock across an entire iteration.

However, we currently don't separate the bus->match() and the
driver->probe() operations. We must not hold a spinlock across ->probe(),
which means we drop the lock before both ->match() and ->probe(). However,
it might be interesting to split those up and do a locked iteration just
to find a match, grab a reference for the driver, break out of the
iteration, and call ->probe().

> (4) You don't have any way of marking klist_nodes that have been
> removed from their klist. So an iterator will return these
> nodes instead of skipping right past them, as one would expect.
> This can have unpleasant consequences, such as probing a new
> device against a driver that has been unregistered. The
> klist_node's container will be forced to have its own "deleted"
> marker, and callers will have to skip deleted entries by hand.

Good point. It's trivial to add an atomic flag (.n_attached) which is
checked during an iteration. This can also be used for the
klist_node_attached() function that I posted a few days ago (and you may
have missed).

> (5) Most importantly, klist_nodes aren't protected against their
> containers being deallocated. Or rather, the way in which
> you've set up the protection is inappropriate. Right now
> you force a caller to wait until list-removal is complete,
> and only later is it allowed to deallocate the container
> object. This is a violation of the principles underlying the
> reference-counting approach; instead a klist_node should take
> a reference to its container. But your generic library-based
> approach doesn't allow that, at least, not without some
> awkward coding. This also means that iterating through a
> klist requires acquiring and releasing two references at each
> step: one for the klist_node and one for the container.

It's assumed that the controlling subsystem will handle lifetime-based
reference counting for the containing objects. If you can point me to a
potential usage where this assumption would get us into trouble, I'd be
interested in trying to work arond this.

[ device sem questions issues addressed separately. ]

Pat

2005-03-28 17:58:54

by Patrick Mochel

[permalink] [raw]

Subject: Re: klists and struct device semaphores

On Sat, 26 Mar 2005, Alan Stern wrote:

> Let's move on to consider the new struct device.sem. You've
> recognized, like other people, that such a thing is necessary to
> protect drivers against simultaneous callbacks. But things aren't as
> simple as just sticking the semaphore into the structure and acquiring
> it for each callback! It requires much more careful thought.
>
> (6) Your code doesn't solve the race between driver_unregister and
> device_add. What happens if a driver is unregistered at the
> same time as it's being probed for a new device? Maybe I
> missed something, but it looks like you might end up with the
> device bound to a non-existent driver. (And what about the
> other way 'round, where a device is unregistered at the same
> time as it's being probed by a new driver? That's easier to
> get right but I haven't checked it.)

The only race I see is the klist_remove() in bus_remove_driver() racing
with the iteration of the klist in device_attach(). The former will block
until the driver.knode_bus reference count reaches 0, which will happen
when the ->probe() is over and the iteration stops. The klist_remove()
will finish, then each device attached to the driver will be removed.
That's less than ideal, but it should work.

To help that a bit, we could add a get_driver()/put_driver() pair to
__device_attach(), which would prevent the driver from being removed while
we're calling ->probe().

> (7) Adding lots of semaphores also adds lots of new possibilities
> for deadlock. You haven't provided any rules for locking
> multiple semaphores. Here are a few potentially troublesome
> scenarios:
>
> A driver being probed for newly-discovered hardware
> registers a child device (e.g., a SCSI adapter driver
> registers its SCSI host).
>
> Autoresume of child device requires the parent to be
> resumed as well.
>
> During remove a driver wants to suspend or resume its
> device.
>
> There's also a possibility for deadlock with respect to some
> lock private to a subsystem. Of course such things can't be
> solved in the core; the subsystem has to take care of them.

For now, I'm willing to punt on those and consider them subsystem-specific
until more is known about those situations' characteristics. As it
currently stands, the core will not lock more than 1 device at a time.
The subsystems can know that and lock devices appropriately.

> (8) A subsystem driver might want to retain control over a new
> device around the device_add call -- it might want to hold the
> device's semaphore itself the whole time. There need to be
> entry points in which the caller holds the necessary
> semaphores.

Out of curiosity, why would a subsystem want to do this? Would it be
something like this:

create device
lock device
device_add(dev);
do other stuff
unlock device (and let other things happen to it)

? If so, what do you want to protect against, suspend/resume? In cases
like this, do you still want to do driver probing, or do you know a priori
what the driver is?

> (9) Your scheme doesn't allow any possibility for complete
> enumeration of all children of a device; new children can be
> added at any time. So for example, checking that all the
> children are suspended (and preventing any from being
> resumed!) while suspending a device becomes very difficult.

Are you talking about two different things here? First, what is wrong with
children being adding at any time? We can't prevent that.

Secondly, I don't understand the suspend/resume requirement. Perhaps you
could elaborate more.

> To solve this last problem, my thought has always been that adding a
> device to the list of its parent's children should require holding the
> parent's lock. There's room for disagreement. But note that there's
> code in the kernel which does tree-oriented device traversals; by
> locking each node in turn the traversal could be protected against
> unwelcome changes in the device tree.

Holding the lock over the lifetime of the device? Or just for an
operation?

> For proper protection, the USB subsystem requires that the overall
> device be locked during suspend, resume, reset, and Set-Config. This
> also involves locking the device during any call to a driver callback
> -- but now the struct device being locked is the _parent_ of the one
> bound to the driver (i.e., the interface's parent).
>
> At the moment this locking is handled internally by the subsystem.
> But in one respect it conflicts badly with the operation of the
> driver-model core: when a driver is registered or unregistered. At
> such times the subsystem isn't in control of which devices are probed
> or unbound. I ended up "solving" this by adding a second layer of
> locking, which effectively permits _all_ the USB devices to be locked
> during usb_register and usb_deregister. It's awkward and it would
> benefit from better support from the driver-model core.

> Such support would have to take the form of locking a device's parent
> as well as the device itself, minimally when probing a new driver and
> unbinding a deregistered driver, possibly at other times as well. As
> far as I know, USB is the only subsystem to require this and it's
> probably something you don't want to do if you don't have to. I don't
> know what the best answer is. It was a struggle to get where we are
> now and we only had to worry about locking USB devices; locking the
> interfaces too adds a whole new dimension.

How is this related to (8) above? Do you need some sort of protected,
short path through the core to add the device, but not bind it or add it
to the PM core?

Thanks,

Pat

2005-03-28 18:16:21

by David Brownell

[permalink] [raw]

Subject: Re: klists and struct device semaphores

On Monday 28 March 2005 9:44 am, Patrick Mochel wrote:

> How is this related to (8) above? Do you need some sort of protected,
> short path through the core to add the device, but not bind it or add it
> to the PM core?

Erm, why is there a distinction between "adding device" and "adding it
to the PM core"? That's a conceptual problem right there. There
should be no distinctio. (But it does make eminent sense to be able
to add a device without necessarily binding it to a driver, since
the "unbound driver" state is all over the place.)

- Dave

2005-03-28 18:43:52

by Patrick Mochel

[permalink] [raw]

Subject: Re: klists and struct device semaphores

On Mon, 28 Mar 2005, David Brownell wrote:

> On Monday 28 March 2005 9:44 am, Patrick Mochel wrote:
>
> > How is this related to (8) above? Do you need some sort of protected,
> > short path through the core to add the device, but not bind it or add it
> > to the PM core?
>
> Erm, why is there a distinction between "adding device" and "adding it
> to the PM core"? That's a conceptual problem right there. There
> should be no distinctio. (But it does make eminent sense to be able
> to add a device without necessarily binding it to a driver, since
> the "unbound driver" state is all over the place.)

Don't get too excited; there is no distinction.

He seemed to imply that it would be useful for interfaces to be added
without having the possibility of being suspended until all the interfaces
of a device were added. I'm simply trying to understand what he thinks is
necessary.

Pat

2005-03-28 19:48:35

[permalink] [raw]

Subject: Re: klists and struct device semaphores

On Mon, 28 Mar 2005, Patrick Mochel wrote:

> It's important when removing a containing object's knode from the list
> when that object is about to be freed. This happens during both device and
> driver unregistration. In most cases, the removal operation will return
> immediately. When it doesn't, it means another thread is using that
> particular knode, which means its imperative that the containing object
> not be freed.
>
> Do you have suggestions about an alternative (with code)?

Here's something a little better than pseudocode but not as good as a
patch... :-)

Consider adding to struct klist two new fields:

int k_offset_to_containers_kref;
void (*k_containers_kref_release)(struct kref *);

To fill the first field in correctly requires that klist creation use a
macro; the details are unimportant. What is important is that during
klist_node_init you add:

struct kref *containers_kref = (struct kref *) ((void *) n +
k->k_offset_to_containers_kref);

kref_get(containers_kref);

and in klist_release you add:

struct kref *containers_kref = (struct kref *) ((void *) n +
n->n_klist->k_offset_to_containers_kref);

kref_put(containers_kref, n->n_klist->k_containers_kref_release);

(Actually this conflicts with my earlier suggestion about removing
n->n_klist. Oh well... nothing's perfect.)

In fact the kref_put() should take the place of the call to complete().
This scheme assumes that the container object does contain a kref, but
this is true for all the containers in the driver model.

> Good point. It's trivial to add an atomic flag (.n_attached) which is
> checked during an iteration. This can also be used for the
> klist_node_attached() function that I posted a few days ago (and you may
> have missed).

There's no need for the flag to be atomic, since it's only altered while
the klist's lock is held.

> It's assumed that the controlling subsystem will handle lifetime-based
> reference counting for the containing objects. If you can point me to a
> potential usage where this assumption would get us into trouble, I'd be
> interested in trying to work arond this.

It's not that you get into trouble; it's that you're forced to wait for
klist_node.n_removed when you shouldn't have to. To put it another way,
one of the big advantages of the refcounting approach is that it allows
you to avoid blocking on deallocations -- the deallocation happens
automatically when the last reference is dropped. Your code loses this
advantage; it's not the refcounting way.

If you replace the struct completion with the offset to the container's
kref and make the klist_node hold a reference to its container, as
described above, then this unpleasantness can go away.

Alan Stern

2005-03-28 21:58:59

[permalink] [raw]

Subject: Re: klists and struct device semaphores

On Mon, 28 Mar 2005, Patrick Mochel wrote:

> The only race I see is the klist_remove() in bus_remove_driver() racing
> with the iteration of the klist in device_attach(). The former will block
> until the driver.knode_bus reference count reaches 0, which will happen
> when the ->probe() is over and the iteration stops. The klist_remove()
> will finish, then each device attached to the driver will be removed.
> That's less than ideal, but it should work.

You're right. Instead of a race that needs to be resolved, you have a
potential for an extra sleep in bus_remove_driver(). It's not a problem.

> > (7) Adding lots of semaphores also adds lots of new possibilities
> > for deadlock. You haven't provided any rules for locking
> > multiple semaphores. Here are a few potentially troublesome
> > scenarios:

> For now, I'm willing to punt on those and consider them subsystem-specific
> until more is known about those situations' characteristics. As it
> currently stands, the core will not lock more than 1 device at a time.

That's absolutely not true. Whenever a probe() routine registers a new
device, the core will acquire nested locks. This happens in a number of
places. Likewise when a remove() routine unregisters a child device.

You need to formalize the locking rule: Never lock a device while holding
one of its descendants' locks.

> > (8) A subsystem driver might want to retain control over a new
> > device around the device_add call -- it might want to hold the
> > device's semaphore itself the whole time. There need to be
> > entry points in which the caller holds the necessary
> > semaphores.
>
> Out of curiosity, why would a subsystem want to do this? Would it be
> something like this:
>
> create device
> lock device
> device_add(dev);
> do other stuff
> unlock device (and let other things happen to it)
>
> ?

Yes, that's what I meant.

> If so, what do you want to protect against, suspend/resume? In cases
> like this, do you still want to do driver probing, or do you know a priori
> what the driver is?

The case I had in mind was adding a new USB device. The USB core wants to
retain control of the device at least through the point where it chooses
and sets a new configuration -- otherwise userspace might do so first.
We ought to be able to work around this by locking the device after
calling device_add() and before usb_create_sysfs_dev_files(). In this
case the driver is known a priori.

> > (9) Your scheme doesn't allow any possibility for complete
> > enumeration of all children of a device; new children can be
> > added at any time. So for example, checking that all the
> > children are suspended (and preventing any from being
> > resumed!) while suspending a device becomes very difficult.
>
> Are you talking about two different things here? First, what is wrong with
> children being adding at any time? We can't prevent that.

That's right; your scheme can't prevent it.

> Secondly, I don't understand the suspend/resume requirement. Perhaps you
> could elaborate more.

Look at what happens when a driver wants to suspend a device. If there
are any unsuspended children it will lead to trouble. (Note this
concerns runtime PM only; for system PM we already know that all the
children are suspended.) So the driver loops through all the children
first, making sure each one is already suspended (if not then the suspend
request must fail). At the end it knows it can safely suspend the device.

But! What if another child is added in the interim, so the loop misses
it? And there's a related problem: What if one of the existing children
gets resumed after it was checked but before the parent can be suspended?

The first problem could be solved at the driver level, by using an
additional private semaphore to block attempts at adding new children.
On the other hand, if the core always locked the parent while adding a
child then a separate private semaphore wouldn't be needed. The driver
could simply use the pre-existing device->sem.

The second problem can be solved in a couple of ways. The most obvious is
for the driver to lock all the children while checking that they are
already suspended, then unlock all of them after suspending the parent.
Alternatively the resume pathway could be changed, so that to resume a
device both it and its parent have to be locked. (The alternative might
not work as well in practice, because drivers are likely to resume devices
on demand directly, without detouring through the core routines. Even
if the core was careful always to lock the parent before a resume, drivers
might not be so careful when bypassing the core.)

> > To solve this last problem, my thought has always been that adding a
> > device to the list of its parent's children should require holding the
> > parent's lock. There's room for disagreement. But note that there's
> > code in the kernel which does tree-oriented device traversals; by
> > locking each node in turn the traversal could be protected against
> > unwelcome changes in the device tree.
>
> Holding the lock over the lifetime of the device? Or just for an
> operation?

Just for the time it takes to iterate through the traversal.

> > For proper protection, the USB subsystem requires that the overall
> > device be locked during suspend, resume, reset, and Set-Config. This
> > also involves locking the device during any call to a driver callback
> > -- but now the struct device being locked is the _parent_ of the one
> > bound to the driver (i.e., the interface's parent).
> >
> > At the moment this locking is handled internally by the subsystem.
> > But in one respect it conflicts badly with the operation of the
> > driver-model core: when a driver is registered or unregistered. At
> > such times the subsystem isn't in control of which devices are probed
> > or unbound. I ended up "solving" this by adding a second layer of
> > locking, which effectively permits _all_ the USB devices to be locked
> > during usb_register and usb_deregister. It's awkward and it would
> > benefit from better support from the driver-model core.
>
> > Such support would have to take the form of locking a device's parent
> > as well as the device itself, minimally when probing a new driver and
> > unbinding a deregistered driver, possibly at other times as well. As
> > far as I know, USB is the only subsystem to require this and it's
> > probably something you don't want to do if you don't have to. I don't
> > know what the best answer is. It was a struggle to get where we are
> > now and we only had to worry about locking USB devices; locking the
> > interfaces too adds a whole new dimension.
>
> How is this related to (8) above? Do you need some sort of protected,
> short path through the core to add the device, but not bind it or add it
> to the PM core?

At this point I'm not clear on exactly what's needed. I'll have to get
back to you on this; more thought is required. Maybe David will have
some ideas to contribute as well.

Alan Stern

2005-03-29 16:18:25

[permalink] [raw]

Subject: Re: klists and struct device semaphores

On Mon, 28 Mar 2005, Patrick Mochel wrote:

> How is this related to (8) above? Do you need some sort of protected,
> short path through the core to add the device, but not bind it or add it
> to the PM core?

Having thought it through, I believe all we need for USB support is this:

Whenever usb_register() in the USB core calls driver_register()
and the call filters down to driver_attach(), that routine
should lock dev->parent->sem before calling driver_probe_device()
(and unlock it afterward, of course).

(For the corresponding remove pathway, where usb_deregister()
calls driver_unregister(), it would be nice if __remove_driver()
locked dev->parent->sem before calling device_release_driver().
This is not really needed, however, since USB drivers aren't
supposed to touch the device in their disconnect() method.)

With that change in place we can guarantee that every time a USB driver's
probe() is called, both the interface and the parent device are locked.

I don't know how cleanly this can be implemented. You probably don't want
to lock dev->parent->sem every time, only when needed. Maybe the simplest
approach would be to add a flag in struct bus_type, which could be set for
the USB bus_type and clear for everything else.

Alan Stern

2005-03-29 16:26:27

by Dmitry Torokhov

[permalink] [raw]

Subject: Re: klists and struct device semaphores

On Tue, 29 Mar 2005 11:18:13 -0500 (EST), Alan Stern
<[email protected]> wrote:
>
> With that change in place we can guarantee that every time a USB driver's
> probe() is called, both the interface and the parent device are locked.
>
> I don't know how cleanly this can be implemented. You probably don't want
> to lock dev->parent->sem every time, only when needed. Maybe the simplest
> approach would be to add a flag in struct bus_type, which could be set for
> the USB bus_type and clear for everything else.
>

I think it is fine to lock parent unconditionally. After all
device/driver matching is not the most performance-critical part of
the kernel.

--
Dmitry

2005-03-29 16:38:36

by Patrick Mochel

[permalink] [raw]

Subject: Re: klists and struct device semaphores

On Mon, 28 Mar 2005, Alan Stern wrote:

> On Mon, 28 Mar 2005, Patrick Mochel wrote:

> > Do you have suggestions about an alternative (with code)?
>
> Here's something a little better than pseudocode but not as good as a
> patch... :-)

> To fill the first field in correctly requires that klist creation use a
> macro; the details are unimportant. What is important is that during
> klist_node_init you add:

In principle, you're right. Kind of. We need to tie the "usage" reference
count of the klist_node to the containing objects' "lifetime" count. But,
there is no need to confuscate the klist code to do it. At least not at
this point.

The subsystems that use the code must be sure to appropriately manage the
lifetime rules of the containing objects. That is true no matter what.
When they add a node, they should increment the reference count of the
containing object and decrement when the node is removed. If practice
shows that there is more that can be rolled into the model, then we can
revisit it later.

[ Sidebar: Perhaps we can add a callback parameter to klist_remove() to
call when the node has been removed, instead of the struct completion. ]

Pat

2005-03-31 02:13:17

by Patrick Mochel

[permalink] [raw]

Subject: Re: klists and struct device semaphores

On Mon, 28 Mar 2005, Alan Stern wrote:

> > For now, I'm willing to punt on those and consider them subsystem-specific
> > until more is known about those situations' characteristics. As it
> > currently stands, the core will not lock more than 1 device at a time.
>
> That's absolutely not true. Whenever a probe() routine registers a new
> device, the core will acquire nested locks. This happens in a number of
> places. Likewise when a remove() routine unregisters a child device.

Let me re-phrase that: The driver core will never explicitly take more
than 1 lock. It will lock a device during calls to the driver routines,
which the drivers should be aware of when re-entering the driver model via
those functions.

> You need to formalize the locking rule: Never lock a device while holding
> one of its descendants' locks.

That's a good rule, but definitely sub-system specific (like for those
that re-enter the driver model).

> > If so, what do you want to protect against, suspend/resume? In cases
> > like this, do you still want to do driver probing, or do you know a priori
> > what the driver is?
>
> The case I had in mind was adding a new USB device. The USB core wants to
> retain control of the device at least through the point where it chooses
> and sets a new configuration -- otherwise userspace might do so first.
> We ought to be able to work around this by locking the device after
> calling device_add() and before usb_create_sysfs_dev_files(). In this
> case the driver is known a priori.

How is this a driver model problem if it can be fixed locally?

> > Are you talking about two different things here? First, what is wrong with
> > children being adding at any time? We can't prevent that.
>
> That's right; your scheme can't prevent it.

I don't see the need nor desire to prevent it. Any device that goes to
sleep automatically should be awakened when it receives a request. If a
bridge/hub/controller goes to sleep and a child device is
inserted/hotplugged, then it should be awakened.

I agree that there has to be synchronization *while* those calls are being
made, but that can be solved with the device semaphore. It's held across a
parent's suspend. If a child device is added then, the request to add it
should block on that semaphore. It should actually try resume the device,
which will automatically block, then re-awaken the device once the lock is
dropped by the suspend process.

> > Secondly, I don't understand the suspend/resume requirement. Perhaps you
> > could elaborate more.
>
> Look at what happens when a driver wants to suspend a device. If there
> are any unsuspended children it will lead to trouble. (Note this
> concerns runtime PM only; for system PM we already know that all the
> children are suspended.) So the driver loops through all the children
> first, making sure each one is already suspended (if not then the suspend
> request must fail). At the end it knows it can safely suspend the device.
>
> But! What if another child is added in the interim, so the loop misses
> it? And there's a related problem: What if one of the existing children
> gets resumed after it was checked but before the parent can be suspended?
>
> The first problem could be solved at the driver level, by using an
> additional private semaphore to block attempts at adding new children.
> On the other hand, if the core always locked the parent while adding a
> child then a separate private semaphore wouldn't be needed. The driver
> could simply use the pre-existing device->sem.

We can lock the parent while adding a child, like I mentioned above. When
we're adding a child we must make sure that the parent is awakened anyway.

> The second problem can be solved in a couple of ways. The most obvious is
> for the driver to lock all the children while checking that they are
> already suspended, then unlock all of them after suspending the parent.
> Alternatively the resume pathway could be changed, so that to resume a
> device both it and its parent have to be locked. (The alternative might
> not work as well in practice, because drivers are likely to resume devices
> on demand directly, without detouring through the core routines. Even
> if the core was careful always to lock the parent before a resume, drivers
> might not be so careful when bypassing the core.)

I don't like the idea of locking every single child just to check if
they're suspended. Sounds messy.

How about we just add a counter to the parent for all of the running (not
suspended) children. When a child is suspended, this counter is
decremented. When the counter reaches 0, the parent can suspend. This
makes the check simple when a parent is directed to suspend.

We can have the counter modifications block on the semaphore, so if there
is an update to it while the parent is doing something, it will be queued
up appropriately. Incrementing this past 0 should automatically resume the
parent (or at least check that it's resumed). This will guarantee that the
parent is woken up for any children that are added.

In fact, we probably want to add a counter to every device for all "open
connections" so the device doesn't try to automatically sleep while a
device node is open. Once it reaches 0, we can have it enter a
pre-configured state, which should save us a bit of power for very little
pain.

Pat

2005-03-31 02:16:39

by Patrick Mochel

[permalink] [raw]

Subject: Re: klists and struct device semaphores

On Tue, 29 Mar 2005, Alan Stern wrote:

> On Mon, 28 Mar 2005, Patrick Mochel wrote:
>
> > How is this related to (8) above? Do you need some sort of protected,
> > short path through the core to add the device, but not bind it or add it
> > to the PM core?
>
> Having thought it through, I believe all we need for USB support is this:
>
> Whenever usb_register() in the USB core calls driver_register()
> and the call filters down to driver_attach(), that routine
> should lock dev->parent->sem before calling driver_probe_device()
> (and unlock it afterward, of course).
>
> (For the corresponding remove pathway, where usb_deregister()
> calls driver_unregister(), it would be nice if __remove_driver()
> locked dev->parent->sem before calling device_release_driver().
> This is not really needed, however, since USB drivers aren't
> supposed to touch the device in their disconnect() method.)

Why can't you just lock it in ->probe() and ->remove() yourself?

Pat

2005-03-31 03:01:58

by Dmitry Torokhov

[permalink] [raw]

Subject: Re: klists and struct device semaphores

On Wednesday 30 March 2005 21:16, Patrick Mochel wrote:
>
> On Tue, 29 Mar 2005, Alan Stern wrote:
>
> > On Mon, 28 Mar 2005, Patrick Mochel wrote:
> >
> > > How is this related to (8) above? Do you need some sort of protected,
> > > short path through the core to add the device, but not bind it or add it
> > > to the PM core?
> >
> > Having thought it through, I believe all we need for USB support is this:
> >
> > Whenever usb_register() in the USB core calls driver_register()
> > and the call filters down to driver_attach(), that routine
> > should lock dev->parent->sem before calling driver_probe_device()
> > (and unlock it afterward, of course).
> >
> > (For the corresponding remove pathway, where usb_deregister()
> > calls driver_unregister(), it would be nice if __remove_driver()
> > locked dev->parent->sem before calling device_release_driver().
> > This is not really needed, however, since USB drivers aren't
> > supposed to touch the device in their disconnect() method.)
>
>
> Why can't you just lock it in ->probe() and ->remove() yourself?
>

Will the lock be exported (via helper functions)? I always felt dirty using
subsys.rwsem because it I think it was supposed to be implementation detail.

--
Dmitry

2005-03-31 05:48:17

by Patrick Mochel

[permalink] [raw]

Subject: Re: klists and struct device semaphores

On Wed, 30 Mar 2005, Dmitry Torokhov wrote:

> Will the lock be exported (via helper functions)? I always felt dirty using
> subsys.rwsem because it I think it was supposed to be implementation detail.

Sure, why not? See the attached patch for helpers, exported GPL only of
course.

Thanks,

Pat

===== drivers/base/core.c 1.97 vs edited =====
--- 1.97/drivers/base/core.c 2005-03-24 19:07:33 -08:00
+++ edited/drivers/base/core.c 2005-03-30 21:20:54 -08:00
@@ -196,6 +196,33 @@

/**
+ * device_lock - lock device by taking its semaphore
+ * @dev: Device to lock
+ */
+
+void device_lock(struct device * dev)
+{
+ if (dev)
+ down(&dev->sem);
+}
+
+EXPORT_SYMBOL_GPL(device_lock);
+
+
+/**
+ * device_unlock - unlock device
+ * @dev: Device we're unlocking
+ */
+
+void device_unlock(struct device * dev)
+{
+ if (dev)
+ up(&dev->sem);
+}
+
+EXPORT_SYMBOL_GPL(device_unlock);
+
+/**
* device_initialize - init device structure.
* @dev: device.
*
===== include/linux/device.h 1.147 vs edited =====
--- 1.147/include/linux/device.h 2005-03-24 19:07:33 -08:00
+++ edited/include/linux/device.h 2005-03-30 21:17:28 -08:00
@@ -325,6 +325,9 @@
extern int device_for_each_child(struct device *, void *,
int (*fn)(struct device *, void *));

+extern void device_lock(struct device * dev);
+extern void device_unlock(struct device * dev);
+
/*
* Manual binding of a device to driver. See drivers/base/bus.c
* for information on use.

2005-03-31 16:19:08

[permalink] [raw]

Subject: Re: klists and struct device semaphores

On Wed, 30 Mar 2005, Patrick Mochel wrote:

> Let me re-phrase that: The driver core will never explicitly take more
> than 1 lock. It will lock a device during calls to the driver routines,
> which the drivers should be aware of when re-entering the driver model via
> those functions.

Okay.

> > You need to formalize the locking rule: Never lock a device while holding
> > one of its descendants' locks.
>
> That's a good rule, but definitely sub-system specific (like for those
> that re-enter the driver model).

It should be mentioned somewhere in the kerneldoc, maybe near the
declaration of your new device_lock() routine. You might also want to
mention explicitly that a probe() routine shouldn't call through the
driver core to suspend its device because it would deadlock; it should
simply do the suspend directly.

> > The case I had in mind was adding a new USB device. The USB core wants to
> > retain control of the device at least through the point where it chooses
> > and sets a new configuration -- otherwise userspace might do so first.
> > We ought to be able to work around this by locking the device after
> > calling device_add() and before usb_create_sysfs_dev_files(). In this
> > case the driver is known a priori.
>
> How is this a driver model problem if it can be fixed locally?

It isn't a problem. Forget I brought it up -- if anything ends up going
wrong I'll let you know.

> I agree that there has to be synchronization *while* those calls are being
> made, but that can be solved with the device semaphore. It's held across a
> parent's suspend. If a child device is added then, the request to add it
> should block on that semaphore. It should actually try resume the device,
> which will automatically block, then re-awaken the device once the lock is
> dropped by the suspend process.

> We can lock the parent while adding a child, like I mentioned above. When
> we're adding a child we must make sure that the parent is awakened anyway.

Yes. I realized this yesterday; drivers or subsystems can use the
device lock themselves to synchronize addition/removal of children.
There's no need for the core to do anything special.

In fact, the core had better _not_ try to lock the parent while adding a
child. The subsystem may already own the lock! Should this be made a
general requirement of device_add() -- that the caller must have locked
the parent?

> I don't like the idea of locking every single child just to check if
> they're suspended. Sounds messy.
>
> How about we just add a counter to the parent for all of the running (not
> suspended) children. When a child is suspended, this counter is
> decremented. When the counter reaches 0, the parent can suspend. This
> makes the check simple when a parent is directed to suspend.
>
> We can have the counter modifications block on the semaphore, so if there
> is an update to it while the parent is doing something, it will be queued
> up appropriately. Incrementing this past 0 should automatically resume the
> parent (or at least check that it's resumed). This will guarantee that the
> parent is woken up for any children that are added.
>
> In fact, we probably want to add a counter to every device for all "open
> connections" so the device doesn't try to automatically sleep while a
> device node is open. Once it reaches 0, we can have it enter a
> pre-configured state, which should save us a bit of power for very little
> pain.

By "open connections", do you mean something more than unsuspended
children?

Are you proposing to add these counters to struct device? If so, would
they be used and maintained by the core or by the driver/subsystem? I
should think the core wouldn't know enough about the requirements of
different devices to do anything useful. But then if the core doesn't use
the counters they should be stored in a private data structure, not in
struct device.

Alan Stern

2005-03-31 16:24:20

[permalink] [raw]

Subject: Re: klists and struct device semaphores

On Wed, 30 Mar 2005, Patrick Mochel wrote:

> > Having thought it through, I believe all we need for USB support is this:
> >
> > Whenever usb_register() in the USB core calls driver_register()
> > and the call filters down to driver_attach(), that routine
> > should lock dev->parent->sem before calling driver_probe_device()
> > (and unlock it afterward, of course).
> >
> > (For the corresponding remove pathway, where usb_deregister()
> > calls driver_unregister(), it would be nice if __remove_driver()
> > locked dev->parent->sem before calling device_release_driver().
> > This is not really needed, however, since USB drivers aren't
> > supposed to touch the device in their disconnect() method.)
>
>
> Why can't you just lock it in ->probe() and ->remove() yourself?

Aha! There you go... This explains why you need explicit locking rules.

When probe() and remove() are called, the driver-model core already owns
the device's lock. If the driver then tried to lock the parent, it would
mean acquiring locks in the wrong order. This could easily lead to
deadlock.

Furthermore, it will often happen during probe() and remove() that the
parent's lock is already owned by the USB core. So the driver _mustn't_
try to lock it.

Alan Stern

2005-03-31 18:00:30

by Patrick Mochel

[permalink] [raw]

Subject: Re: klists and struct device semaphores

On Thu, 31 Mar 2005, Alan Stern wrote:

> On Wed, 30 Mar 2005, Patrick Mochel wrote:

> > In fact, we probably want to add a counter to every device for all "open
> > connections" so the device doesn't try to automatically sleep while a
> > device node is open. Once it reaches 0, we can have it enter a
> > pre-configured state, which should save us a bit of power for very little
> > pain.
>
> By "open connections", do you mean something more than unsuspended
> children?

Yes, I mean anything that requires the device be awake and functional.
This would include open device nodes for many devices, open network
connections for network devices, active children for bridges and
controllers, etc.

This will require modification of at least the open() routines at the
subsystem level. They can simply access the class device and call down to
the driver, with some help from some core utility functions and some hand
waving. The driver (or bus subsystem) can determine if the parent needs to
be awakened at that same time, and awaken it if necessary.

> Are you proposing to add these counters to struct device? If so, would
> they be used and maintained by the core or by the driver/subsystem? I
> should think the core wouldn't know enough about the requirements of
> different devices to do anything useful. But then if the core doesn't use
> the counters they should be stored in a private data structure, not in
> struct device.

The core would know very little to be useful. However, it would most
likely need to modify them around calls to e.g. probe()/remove() to make
sure the device is functional when accessing it. Maybe.

At the very least, the shortest path to getting every device working with
this is to modify the subsystems' open calls. The only way to bridge their
notion of class-specific objects (and class_devices) with physical devices
is through the core.

So, I think we need the counter in struct device, along with some helper
functions.

Pat

2005-03-31 18:04:32

by Patrick Mochel

[permalink] [raw]

Subject: Re: klists and struct device semaphores

On Thu, 31 Mar 2005, Alan Stern wrote:

> On Wed, 30 Mar 2005, Patrick Mochel wrote:
>
> > > Having thought it through, I believe all we need for USB support is this:
> > >
> > > Whenever usb_register() in the USB core calls driver_register()
> > > and the call filters down to driver_attach(), that routine
> > > should lock dev->parent->sem before calling driver_probe_device()
> > > (and unlock it afterward, of course).
> > >
> > > (For the corresponding remove pathway, where usb_deregister()
> > > calls driver_unregister(), it would be nice if __remove_driver()
> > > locked dev->parent->sem before calling device_release_driver().
> > > This is not really needed, however, since USB drivers aren't
> > > supposed to touch the device in their disconnect() method.)
> >
> >
> > Why can't you just lock it in ->probe() and ->remove() yourself?
>
> Aha! There you go... This explains why you need explicit locking rules.
>
> When probe() and remove() are called, the driver-model core already owns
> the device's lock. If the driver then tried to lock the parent, it would
> mean acquiring locks in the wrong order. This could easily lead to
> deadlock.

I should have been more clear. From what I understand, when some devices
are probed they also want to probe and add their children. It *seems* like
this is essentially true with USB devices and interfaces, though I could
be wrong.

What I meant was that when the parent device's ->probe() method is called,
the parent's semaphore could be taken before the children are discovered
and added. This would keep the parent locked while all the fiddling is
going on with the children. Right?

Pat

2005-03-31 18:18:20

by David Brownell

[permalink] [raw]

Subject: Re: klists and struct device semaphores

On Thursday 31 March 2005 9:59 am, Patrick Mochel wrote:
> On Thu, 31 Mar 2005, Alan Stern wrote:
> > On Wed, 30 Mar 2005, Patrick Mochel wrote:
>
> > > In fact, we probably want to add a counter to every device for all "open
> > > connections" so the device doesn't try to automatically sleep while a
> > > device node is open. Once it reaches 0, we can have it enter a
> > > pre-configured state, which should save us a bit of power for very little
> > > pain.
> >
> > By "open connections", do you mean something more than unsuspended
> > children?
>
> Yes, I mean anything that requires the device be awake and functional.

So for example a device that's suspended and enabled for wakeup could be
"open" ... which fights against your "doesn't try to sleep" policy.

Maybe you mean "don't power-off" rather than "don't sleep"? Or are
you maybe assuming PC-style PCI, where nothing (on current Linux)
seems to process PME# wakeup events outside of system sleep states?
(Even when "lspci" shows PME# as active for a device ...)

> This would include open device nodes for many devices, open network
> connections for network devices, active children for bridges and
> controllers, etc.

Same thing applies. Many devices can be suspended but wake up on demand.
And even pass the wakeup events up the hardware connectivity tree. In
fact, being able to do that is a requirement to support that "USB mouse
on Centrino laptop" example: USB mouse suspended, ditto the USB host
controller, then the CPU can enter C3 state and save a few more Watts.
Move mouse, wakeup the USB controller, CPU leaves C3.

In general, good power management will _want_ to leverage such modes.

Worth noting: it's very similar to what modern CPUs do internally,
powering parts off until they're needed. The same model can apply
to other chips; and to boards that integrate those chips...

- Dave

2005-03-31 18:27:08

by Patrick Mochel

[permalink] [raw]

Subject: Re: klists and struct device semaphores

On Thu, 31 Mar 2005, David Brownell wrote:

> On Thursday 31 March 2005 9:59 am, Patrick Mochel wrote:
> > On Thu, 31 Mar 2005, Alan Stern wrote:
> > > On Wed, 30 Mar 2005, Patrick Mochel wrote:
> >
> > > > In fact, we probably want to add a counter to every device for all "open
> > > > connections" so the device doesn't try to automatically sleep while a
> > > > device node is open. Once it reaches 0, we can have it enter a
> > > > pre-configured state, which should save us a bit of power for very little
> > > > pain.
> > >
> > > By "open connections", do you mean something more than unsuspended
> > > children?
> >
> > Yes, I mean anything that requires the device be awake and functional.
>
> So for example a device that's suspended and enabled for wakeup could be
> "open" ... which fights against your "doesn't try to sleep" policy.

I don't understand what you mean. Even if a device is suspended, be it
automatically after some amount of inactivity or as directed explicitly by
a user, we want to be able to open the device and have it work.

Conversely, we only want to automatically suspend the device, or allow the
device to be explicitly put to sleep, if the device is not being used.

For the majority of devices, that seems like a basic, simple, intuitive
rule. I'm sure there will be exceptions (there always are), but I expect
those to be rare and subsystem (if not platform) specific.

> > This would include open device nodes for many devices, open network
> > connections for network devices, active children for bridges and
> > controllers, etc.
>
> Same thing applies. Many devices can be suspended but wake up on demand.
> And even pass the wakeup events up the hardware connectivity tree. In
> fact, being able to do that is a requirement to support that "USB mouse
> on Centrino laptop" example: USB mouse suspended, ditto the USB host
> controller, then the CPU can enter C3 state and save a few more Watts.
> Move mouse, wakeup the USB controller, CPU leaves C3.
>
> In general, good power management will _want_ to leverage such modes.
>
> Worth noting: it's very similar to what modern CPUs do internally,
> powering parts off until they're needed. The same model can apply
> to other chips; and to boards that integrate those chips...

I agree.

Pat

2005-03-31 18:47:07

by David Brownell

[permalink] [raw]

Subject: Re: klists and struct device semaphores

On Thursday 31 March 2005 10:26 am, Patrick Mochel wrote:
>
> On Thu, 31 Mar 2005, David Brownell wrote:
>
> > On Thursday 31 March 2005 9:59 am, Patrick Mochel wrote:
> > > On Thu, 31 Mar 2005, Alan Stern wrote:
> > > > On Wed, 30 Mar 2005, Patrick Mochel wrote:
> > >
> > > > > In fact, we probably want to add a counter to every device for all "open
> > > > > connections" so the device doesn't try to automatically sleep while a
> > > > > device node is open. Once it reaches 0, we can have it enter a
> > > > > pre-configured state, which should save us a bit of power for very little
> > > > > pain.
> > > >
> > > > By "open connections", do you mean something more than unsuspended
> > > > children?
> > >
> > > Yes, I mean anything that requires the device be awake and functional.
> >
> > So for example a device that's suspended and enabled for wakeup could be
> > "open" ... which fights against your "doesn't try to sleep" policy.
>
> I don't understand what you mean. Even if a device is suspended, be it
> automatically after some amount of inactivity or as directed explicitly by
> a user, we want to be able to open the device and have it work.
>
> Conversely, we only want to automatically suspend the device, or allow the
> device to be explicitly put to sleep, if the device is not being used.

And be able to suspend itself, even if it's open, if it's idle enough and
can wake itself up automatically.

I'm pointing out that device sleep/suspend states are orthogonal to being
"open". So I don't see what this counter would do...

- Dave

2005-03-31 19:08:26

by Patrick Mochel

[permalink] [raw]

Subject: Re: klists and struct device semaphores

On Thu, 31 Mar 2005, David Brownell wrote:

> On Thursday 31 March 2005 10:26 am, Patrick Mochel wrote:

> > Conversely, we only want to automatically suspend the device, or allow the
> > device to be explicitly put to sleep, if the device is not being used.
>
> And be able to suspend itself, even if it's open, if it's idle enough and
> can wake itself up automatically.

Certainly that's dependent on the type of device. It'll be relatively easy
to begin with devices at an open()/close() level, and consider any device
that's open() to be active and ineligible for suspend. This will allow us
to setup the infrastructure and get some decent savings from definitely
inactive devices.

We then move to a more fine-grained activity-detection and reaction
mechanism, like being idle after N seconds of inactivity. But, that will
be completely dependent on the type of device and the driver.

In fact with a usage counter, a driver that wants to be more intelligent
about how much it's being used can simply use that (decrement it and
automatically suspend) when it's inactive for a while.

Right?

Pat

2005-03-31 19:08:59

by Dmitry Torokhov

[permalink] [raw]

Subject: Re: klists and struct device semaphores

On Thu, 31 Mar 2005 10:26:36 -0800 (PST), Patrick Mochel
<[email protected]> wrote:
> I don't understand what you mean. Even if a device is suspended, be it
> automatically after some amount of inactivity or as directed explicitly by
> a user, we want to be able to open the device and have it work.
>
> Conversely, we only want to automatically suspend the device, or allow the
> device to be explicitly put to sleep, if the device is not being used.

Well, the disagreement in definition of "being used". Quite often you
have a device "open" for extended period of time (inactive ext2
partition is mounted on a disk) but device is not really active. You
could power it down and only wake up when there is a read or write
request.

It looks like "open" and "close" are terms better suited for class
devices while power management is applied to the "real" devices.

--
Dmitry

2005-03-31 19:49:47

[permalink] [raw]

Subject: Re: klists and struct device semaphores

On Thu, 31 Mar 2005, Patrick Mochel wrote:

> > > > Whenever usb_register() in the USB core calls driver_register()
> > > > and the call filters down to driver_attach(), that routine
> > > > should lock dev->parent->sem before calling driver_probe_device()
> > > > (and unlock it afterward, of course).

> > > Why can't you just lock it in ->probe() and ->remove() yourself?
> >
> > Aha! There you go... This explains why you need explicit locking rules.
> >
> > When probe() and remove() are called, the driver-model core already owns
> > the device's lock. If the driver then tried to lock the parent, it would
> > mean acquiring locks in the wrong order. This could easily lead to
> > deadlock.
>
> I should have been more clear. From what I understand, when some devices
> are probed they also want to probe and add their children. It *seems* like
> this is essentially true with USB devices and interfaces, though I could
> be wrong.

For USB devices and their interfaces that's generally wrong. But it is
right for lots of other devices (including USB host controllers).

> What I meant was that when the parent device's ->probe() method is called,
> the parent's semaphore could be taken before the children are discovered
> and added. This would keep the parent locked while all the fiddling is
> going on with the children. Right?

The parent's semaphore _would_ be taken before its probe() method is
called and the children are discovered, since the semaphore is held during
every driver callback. However that's not what I was getting at above.

The problem with USB comes in at the level of drivers for interfaces.
Occasionally these drivers need to do something during their probe() that
affects the entire USB device (as opposed to just their interface), such
as resetting the device. For this to work safely it requires that the
driver own the lock for the entire device, not just the lock for the
interface.

Often this works out okay because the driver's probe() is called when the
interface is registered by the USB core. At that time the core already
owns the device lock, so driver->probe() inherits the lock.

The problem arises when driver->probe() is called because the driver was
just insmod'ed. At such times the USB core does not own any locks on any
USB devices. The only way for driver->probe() to inherit the device lock
is for the driver-model core to acquire it beforehand -- it's not possible
for the USB core or the driver to lock the device. That's why I asked for
driver_attach() to lock dev->parent->sem around the call to
driver_probe_device().

Alan Stern

2005-04-02 18:06:38

[permalink] [raw]

Subject: Re: klists and struct device semaphores

Pat:

I looked through the new driver model code a bit more. There appears to
be a few problems (unless I'm using out-of-date code).

First, there's a race between adding a new device and registering a new
driver. The bus_add_device() routine contains these lines:

pr_debug("bus %s: add device %s\n", bus->name, dev->bus_id);
device_attach(dev);
klist_add_tail(&bus->klist_devices, &dev->knode_bus);

Suppose device_attach() doesn't find a suitable driver, but a new driver
is registered before the klist_add_tail() executes. Then the new driver
won't see the device either, and the device won't be bound at all. The
last two lines above should be in the opposite order.

Second, there's no check in driver_probe_device() or higher up to prevent
probing a device that's already bound to another driver. Such a check
needs to be synchronized with assignments to dev->driver, so it should be
made while holding dev->sem.

Third, why does device_release_driver() call klist_del() instead of
klist_remove() for dev->knode_driver? Is that just a simple mistake?
The klist_node doesn't seem to get unlinked anywhere.

Fourth, in device_release_driver() why isn't most of the work done under
the protection of dev->sem? If a driver is unregistered at the same time
as a device is removed, two threads could end up executing that routine at
the same time. Then the question would be which thread calls
klist_remove() -- not to mention the danger that both of them might. I
guess the answer is to call klist_remove() after releasing dev->sem.

Alan Stern

2005-04-06 07:41:11

by Patrick Mochel

[permalink] [raw]

Subject: Re: klists and struct device semaphores

Sorry about the delay in responding, there were some bugs to attend to,
some of which you may have inadvertantly caught below.

On Sat, 2 Apr 2005, Alan Stern wrote:

> I looked through the new driver model code a bit more. There appears to
> be a few problems (unless I'm using out-of-date code).
>
> First, there's a race between adding a new device and registering a new
> driver. The bus_add_device() routine contains these lines:
>
> pr_debug("bus %s: add device %s\n", bus->name, dev->bus_id);
> device_attach(dev);
> klist_add_tail(&bus->klist_devices, &dev->knode_bus);
>
> Suppose device_attach() doesn't find a suitable driver, but a new driver
> is registered before the klist_add_tail() executes. Then the new driver
> won't see the device either, and the device won't be bound at all. The
> last two lines above should be in the opposite order.

Yes, you're right.

> Second, there's no check in driver_probe_device() or higher up to
> prevent probing a device that's already bound to another driver. Such a
> check needs to be synchronized with assignments to dev->driver, so it
> should be made while holding dev->sem.

Yes, it should be checked while under the lock. That function is tricky,
as I've been reminded in the last couple of days. It shouldn't be public
for one, but that's cosmetic. The two callers should be holding the lock
when they call it and both check (and act appropriately) if the device is
already bound to a driver.

> Third, why does device_release_driver() call klist_del() instead of
> klist_remove() for dev->knode_driver? Is that just a simple mistake?
> The klist_node doesn't seem to get unlinked anywhere.

It can be called from driver_for_each_device() when the driver has been
unloaded. Since that increments the reference count for the node when it's
unregistering it, klist_remove() will deadlock. Instead klist_del() is
called, and when the next node is grabbed, that one will be let go and
removed from the list.

> Fourth, in device_release_driver() why isn't most of the work done under
> the protection of dev->sem? If a driver is unregistered at the same time
> as a device is removed, two threads could end up executing that routine at
> the same time. Then the question would be which thread calls
> klist_remove() -- not to mention the danger that both of them might. I
> guess the answer is to call klist_remove() after releasing dev->sem.

Yes, you're right. Also fixed. In fact, by the patch below which has just
made it into the -mm tree.

Thanks,

Pat

Short summary:

- Move logic to driver_probe_device() and comments uncommon returns:
1 - If device is bound
0 - If device not bound, and no error
error - If there was an error.

- Move locking to caller of that function, since we want to lock a
device for the entire time we're trying to bind it to a driver (to
prevent against a driver being loaded at the same time).

- Update __device_attach() and __driver_attach() to do that locking.

- Check if device is already bound in __driver_attach()

- Update the converse device_release_driver() so it locks the device
around all of the operations.

- Mark driver_probe_device() as static and remove export. It's an
internal function, it should stay that way, and there are no other
callers. If there is ever a need to export it, we can audit it as
necessary.

--- linux-2.6-mm/drivers/base/dd.c.orig 2005-04-05 15:01:05.000000000 -0700
+++ linux-2.6-mm/drivers/base/dd.c 2005-04-05 15:46:01.000000000 -0700
@@ -35,6 +35,8 @@
* nor take the bus's rwsem. Please verify those are accounted
* for before calling this. (It is ok to call with no other effort
* from a driver's probe() method.)
+ *
+ * This function must be called with @dev->sem held.
*/

void device_bind_driver(struct device * dev)
@@ -61,50 +63,57 @@
*
* If we find a match, we call @drv->probe(@dev) if it exists, and
* call device_bind_driver() above.
+ *
+ * This function returns 1 if a match is found, an error if one
+ * occurs (that is not -ENODEV or -ENXIO), and 0 otherwise.
+ *
+ * This function must be called with @dev->sem held.
*/
-int driver_probe_device(struct device_driver * drv, struct device * dev)
+static int driver_probe_device(struct device_driver * drv, struct device * dev)
{
- int error = 0;
+ int ret = 0;

if (drv->bus->match && !drv->bus->match(dev, drv))
- return -ENODEV;
+ goto Done;

- down(&dev->sem);
+ pr_debug("%s: Matched Device %s with Driver %s\n",
+ drv->bus->name, dev->bus_id, drv->name);
dev->driver = drv;
if (drv->probe) {
- error = drv->probe(dev);
- if (error) {
+ ret = drv->probe(dev);
+ if (ret) {
dev->driver = NULL;
- up(&dev->sem);
- return error;
+ goto ProbeFailed;
}
}
- up(&dev->sem);
device_bind_driver(dev);
- return 0;
-}
-
+ ret = 1;
+ pr_debug("%s: Bound Device %s to Driver %s\n",
+ drv->bus->name, dev->bus_id, drv->name);
+ goto Done;

-static int __device_attach(struct device_driver * drv, void * data)
-{
- struct device * dev = data;
- int error;
-
- error = driver_probe_device(drv, dev);
-
- if (error == -ENODEV && error == -ENXIO) {
+ ProbeFailed:
+ if (ret == -ENODEV || ret == -ENXIO) {
/* Driver matched, but didn't support device
* or device not found.
* Not an error; keep going.
*/
- error = 0;
+ ret = 0;
} else {
/* driver matched but the probe failed */
printk(KERN_WARNING
"%s: probe of %s failed with error %d\n",
- drv->name, dev->bus_id, error);
+ drv->name, dev->bus_id, ret);
}
- return 0;
+ Done:
+ return ret;
+}
+
+
+static int __device_attach(struct device_driver * drv, void * data)
+{
+ struct device * dev = data;
+ return driver_probe_device(drv, dev);
}

/**
@@ -114,35 +123,44 @@
* Walk the list of drivers that the bus has and call
* driver_probe_device() for each pair. If a compatible
* pair is found, break out and return.
+ *
+ * Returns 1 if the device was bound to a driver; 0 otherwise.
*/
int device_attach(struct device * dev)
{
+ int ret = 0;
+
+ down(&dev->sem);
if (dev->driver) {
device_bind_driver(dev);
- return 1;
- }
-
- return bus_for_each_drv(dev->bus, NULL, dev, __device_attach);
+ ret = 1;
+ } else
+ ret = bus_for_each_drv(dev->bus, NULL, dev, __device_attach);
+ up(&dev->sem);
+ return ret;
}

static int __driver_attach(struct device * dev, void * data)
{
struct device_driver * drv = data;
- int error = 0;

- if (!dev->driver) {
- error = driver_probe_device(drv, dev);
- if (error) {
- if (error != -ENODEV) {
- /* driver matched but the probe failed */
- printk(KERN_WARNING
- "%s: probe of %s failed with error %d\n",
- drv->name, dev->bus_id, error);
- } else
- error = 0;
- }
- }
+ /*
+ * Lock device and try to bind to it. We drop the error
+ * here and always return 0, because we need to keep trying
+ * to bind to devices and some drivers will return an error
+ * simply if it didn't support the device.
+ *
+ * driver_probe_device() will spit a warning if there
+ * is an error.
+ */
+
+ down(&dev->sem);
+ if (!dev->driver)
+ driver_probe_device(drv, dev);
+ up(&dev->sem);
+
+
return 0;
}

@@ -154,9 +172,6 @@
* match the driver with each one. If driver_probe_device()
* returns 0 and the @dev->driver is set, we've found a
* compatible pair.
- *
- * Note that we ignore the -ENODEV error from driver_probe_device(),
- * since it's perfectly valid for a driver not to bind to any devices.
*/
void driver_attach(struct device_driver * drv)
{
@@ -176,27 +191,27 @@

void device_release_driver(struct device * dev)
{
- struct device_driver * drv = dev->driver;
-
- if (!drv)
- return;
-
- sysfs_remove_link(&drv->kobj, kobject_name(&dev->kobj));
- sysfs_remove_link(&dev->kobj, "driver");
- klist_del(&dev->knode_driver);
+ struct device_driver * drv;

down(&dev->sem);
- device_detach_shutdown(dev);
- if (drv->remove)
- drv->remove(dev);
- dev->driver = NULL;
+ if (dev->driver) {
+ drv = dev->driver;
+ sysfs_remove_link(&drv->kobj, kobject_name(&dev->kobj));
+ sysfs_remove_link(&dev->kobj, "driver");
+ klist_del(&dev->knode_driver);
+
+ device_detach_shutdown(dev);
+ if (drv->remove)
+ drv->remove(dev);
+ dev->driver = NULL;
+ }
up(&dev->sem);
}

static int __remove_driver(struct device * dev, void * unused)
{
- device_release_driver(dev);
+ device_release_driver(dev);
return 0;
}

@@ -210,7 +225,6 @@
driver_for_each_device(drv, NULL, NULL, __remove_driver);
}

-EXPORT_SYMBOL_GPL(driver_probe_device);
EXPORT_SYMBOL_GPL(device_bind_driver);
EXPORT_SYMBOL_GPL(device_release_driver);
EXPORT_SYMBOL_GPL(device_attach);
--- linux-2.6-mm/include/linux/device.h.orig 2005-04-05 15:24:35.000000000 -0700
+++ linux-2.6-mm/include/linux/device.h 2005-04-05 15:24:43.000000000 -0700
@@ -330,7 +330,6 @@
* Manual binding of a device to driver. See drivers/base/bus.c
* for information on use.
*/
-extern int driver_probe_device(struct device_driver * drv, struct device * dev);
extern void device_bind_driver(struct device * dev);
extern void device_release_driver(struct device * dev);
extern int device_attach(struct device * dev);

2005-04-06 19:46:02

[permalink] [raw]

Subject: Re: klists and struct device semaphores

On Wed, 6 Apr 2005, Patrick Mochel wrote:

> > Third, why does device_release_driver() call klist_del() instead of
> > klist_remove() for dev->knode_driver? Is that just a simple mistake?
> > The klist_node doesn't seem to get unlinked anywhere.
>
> It can be called from driver_for_each_device() when the driver has been
> unloaded. Since that increments the reference count for the node when it's
> unregistering it, klist_remove() will deadlock. Instead klist_del() is
> called, and when the next node is grabbed, that one will be let go and
> removed from the list.

The patch looks good. But isn't there still a problem with
device_release_driver()? It doesn't wait for the klist_node to be removed
from the klist before unlocking the device and moving on. As a result, if
another driver was waiting to bind to the device you would corrupt the
list pointers, by calling klist_add_tail() for the new driver before
klist_release() had run for the old driver.

I'll be interested to see how you manage to solve this. The only way I
can think of is to avoid using driver_for_each_device() in
driver_detach().

Alan Stern

2005-04-07 20:08:34

by Patrick Mochel

[permalink] [raw]

Subject: Re: klists and struct device semaphores

On Wed, 6 Apr 2005, Alan Stern wrote:

> The patch looks good. But isn't there still a problem with
> device_release_driver()? It doesn't wait for the klist_node to be removed
> from the klist before unlocking the device and moving on. As a result, if
> another driver was waiting to bind to the device you would corrupt the
> list pointers, by calling klist_add_tail() for the new driver before
> klist_release() had run for the old driver.
>
> I'll be interested to see how you manage to solve this. The only way I
> can think of is to avoid using driver_for_each_device() in
> driver_detach().

I had implemented something along the lines of what you suggested in a
previous email:

- Add flag to klist_node: n_attached.
- Use that in klist_node_attached()
- Check during klist_next(), and skip nodes that have been removed.
- Reset flag during klist_del().

Based on the deadlock that occurs when using klist_remove() during an
iteration over that the list, and the common indirect usage of it (parents
and buses unregistering devices, drivers unbinding devices), I've just
removed klist_remove() and the embedded completion. I'll post updated
patches soon.

Thanks,

Pat