2007-11-03 19:48:36

by Stephen Hemminger

[permalink] [raw]
Subject: device struct bloat

The sizeof(struct device) is way too big, especially in the network device case.
We want to support 1000's of device's and the change from class_device to
net_device has caused needless bloat.

sizeof(struct device) = 272
sizeof(struct class_device) = 92
* not the class_id in class_device could also be removed or changed to
a ptr, since it is redundant for net_devices.


2007-11-03 23:23:21

by Greg KH

[permalink] [raw]
Subject: Re: device struct bloat

On Sat, Nov 03, 2007 at 12:48:23PM -0700, Stephen Hemminger wrote:
> The sizeof(struct device) is way too big, especially in the network device case.
> We want to support 1000's of device's and the change from class_device to
> net_device has caused needless bloat.
>
> sizeof(struct device) = 272
> sizeof(struct class_device) = 92
> * not the class_id in class_device could also be removed or changed to
> a ptr, since it is redundant for net_devices.

I agree that struct device is bigger than perhaps it should be (Kay is
working on getting rid of the bus_id field and we both just trimmed down
the base kobject by about 20 bytes) but is this really a problem that is
noticable by anyone?

I'm all for saving memory, but 1000's of struct devices is not anything
that the kernel should even notice. s390 machines create tens of
thousands of these all the time, and they are severly memory limited,
with no apparent problem.

And I'm guessing that embedded systems would not be the ones that would
be creating 1000's of network devices, right? Are these virtual devices
or backed by real, physical devices?

If it is an issue, we can start to work on slimming the structure down.
At first glance, I'm sure we can save memory by just rearanging the
fields to get rid of some structure padding that I'm sure is there.

After that, I'm sure we can push a lot of other fields out into a
separate structure to handle if the device is "virtual" or not, which
would let us drop a bunch of the dma and other resource-type things.

thanks,

greg k-h

2007-11-04 20:30:08

by Peter Zijlstra

[permalink] [raw]
Subject: Re: device struct bloat


On Sat, 2007-11-03 at 12:48 -0700, Stephen Hemminger wrote:
> The sizeof(struct device) is way too big, especially in the network device case.
> We want to support 1000's of device's and the change from class_device to
> net_device has caused needless bloat.
>
> sizeof(struct device) = 272
> sizeof(struct class_device) = 92
> * not the class_id in class_device could also be removed or changed to
> a ptr, since it is redundant for net_devices.

The thing that surprised me most was that it contains a struct
semaphore, Greg, is that really needed?



2007-11-05 03:56:48

by Greg KH

[permalink] [raw]
Subject: Re: device struct bloat

On Sun, Nov 04, 2007 at 09:29:18PM +0100, Peter Zijlstra wrote:
>
> On Sat, 2007-11-03 at 12:48 -0700, Stephen Hemminger wrote:
> > The sizeof(struct device) is way too big, especially in the network device case.
> > We want to support 1000's of device's and the change from class_device to
> > net_device has caused needless bloat.
> >
> > sizeof(struct device) = 272
> > sizeof(struct class_device) = 92
> > * not the class_id in class_device could also be removed or changed to
> > a ptr, since it is redundant for net_devices.
>
> The thing that surprised me most was that it contains a struct
> semaphore, Greg, is that really needed?

Yes, it serializes bind and unbind stuff for the device. There are
comments about it in drivers/base/dd.c if you want to look into it.

thanks,

greg k-h

2007-11-05 10:46:33

by Peter Zijlstra

[permalink] [raw]
Subject: Re: device struct bloat

On Sun, 2007-11-04 at 19:58 -0800, Greg KH wrote:
> On Sun, Nov 04, 2007 at 09:29:18PM +0100, Peter Zijlstra wrote:
> >
> > On Sat, 2007-11-03 at 12:48 -0700, Stephen Hemminger wrote:
> > > The sizeof(struct device) is way too big, especially in the network device case.
> > > We want to support 1000's of device's and the change from class_device to
> > > net_device has caused needless bloat.
> > >
> > > sizeof(struct device) = 272
> > > sizeof(struct class_device) = 92
> > > * not the class_id in class_device could also be removed or changed to
> > > a ptr, since it is redundant for net_devices.
> >
> > The thing that surprised me most was that it contains a struct
> > semaphore, Greg, is that really needed?
>
> Yes, it serializes bind and unbind stuff for the device. There are
> comments about it in drivers/base/dd.c if you want to look into it.

Ah, that is not what I meant. My question was, why a semaphore, not a
mutex? Semaphores should not be used if at all possible - for one you
don't get lockdep checking your code.

/me converts the stuff over, and boots

Ah, that probably is the reason. This device tree is hard to annotate.
I'll try and think of something..

Anyway, Andy, could you make checkpatch whine on something like:
^\+[[::space:]]*struct semaphore




Attachments:
signature.asc (189.00 B)
This is a digitally signed message part

2007-11-05 10:57:29

by Peter Zijlstra

[permalink] [raw]
Subject: Re: device struct bloat

Hmm, the problem seems to be stuff like:

add usb driver to pci
scan pci devices
add usb host controller device
scan usb devices
add usb hub device
scan usb devices
add usb .....

This seems to be able to go on forever, as long as one can cascade usb
hubs. Doesn't seem like an ideal thing to do from a stack space POV
either.

Would it be possible to break at the second scan, that is the device
probe and stick that into a workqueue or something. Then we'd only ever
have driver->device nesting.



Anyway, for those who care here is the patch:

---
drivers/base/bus.c | 20 ++++++++++----------
drivers/base/class.c | 22 +++++++++++-----------
drivers/base/core.c | 20 +++++++++-----------
drivers/base/dd.c | 38 +++++++++++++++++++-------------------
drivers/base/power/main.c | 8 ++++----
drivers/pci/bus.c | 4 ++--
drivers/pnp/interface.c | 10 +++++-----
drivers/pnp/manager.c | 18 +++++++++---------
drivers/power/power_supply_core.c | 8 ++++----
drivers/rtc/interface.c | 4 ++--
drivers/scsi/hosts.c | 4 ++--
drivers/spi/spi.c | 10 +++++-----
drivers/usb/core/hub.c | 4 ++--
include/linux/device.h | 18 +++++++++++-------
include/linux/usb.h | 6 +++---
15 files changed, 98 insertions(+), 96 deletions(-)

Index: linux-2.6-2/drivers/base/bus.c
===================================================================
--- linux-2.6-2.orig/drivers/base/bus.c
+++ linux-2.6-2/drivers/base/bus.c
@@ -190,10 +190,10 @@ static ssize_t driver_unbind(struct devi
dev = bus_find_device(bus, NULL, (void *)buf, driver_helper);
if (dev && dev->driver == drv) {
if (dev->parent) /* Needed for USB */
- down(&dev->parent->sem);
+ mutex_lock_nested(&dev->parent->mutex, DRIVER_PARENT);
device_release_driver(dev);
if (dev->parent)
- up(&dev->parent->sem);
+ mutex_unlock(&dev->parent->mutex);
err = count;
}
put_device(dev);
@@ -217,12 +217,12 @@ static ssize_t driver_bind(struct device
dev = bus_find_device(bus, NULL, (void *)buf, driver_helper);
if (dev && dev->driver == NULL) {
if (dev->parent) /* Needed for USB */
- down(&dev->parent->sem);
- down(&dev->sem);
+ mutex_lock_nested(&dev->parent->mutex, DRIVER_PARENT);
+ mutex_lock_nested(&dev->mutex, DRIVER_NORMAL);
err = driver_probe_device(drv, dev);
- up(&dev->sem);
+ mutex_unlock(&dev->mutex);
if (dev->parent)
- up(&dev->parent->sem);
+ mutex_unlock(&dev->parent->mutex);

if (err > 0) /* success */
err = count;
@@ -711,10 +711,10 @@ static int __must_check bus_rescan_devic

if (!dev->driver) {
if (dev->parent) /* Needed for USB */
- down(&dev->parent->sem);
+ mutex_lock_nested(&dev->parent->mutex, DEVICE_PARENT);
ret = device_attach(dev);
if (dev->parent)
- up(&dev->parent->sem);
+ mutex_unlock(&dev->parent->mutex);
}
return ret < 0 ? ret : 0;
}
@@ -745,10 +745,10 @@ int device_reprobe(struct device *dev)
{
if (dev->driver) {
if (dev->parent) /* Needed for USB */
- down(&dev->parent->sem);
+ mutex_lock_nested(&dev->parent->mutex, DEVICE_PARENT);
device_release_driver(dev);
if (dev->parent)
- up(&dev->parent->sem);
+ mutex_unlock(&dev->parent->mutex);
}
return bus_rescan_devices_helper(dev, NULL);
}
Index: linux-2.6-2/drivers/base/class.c
===================================================================
--- linux-2.6-2.orig/drivers/base/class.c
+++ linux-2.6-2/drivers/base/class.c
@@ -144,7 +144,7 @@ int class_register(struct class * cls)
INIT_LIST_HEAD(&cls->devices);
INIT_LIST_HEAD(&cls->interfaces);
kset_init(&cls->class_dirs);
- init_MUTEX(&cls->sem);
+ mutex_init(&cls->mutex);
error = kobject_set_name(&cls->subsys.kobj, "%s", cls->name);
if (error)
return error;
@@ -617,13 +617,13 @@ int class_device_add(struct class_device
kobject_uevent(&class_dev->kobj, KOBJ_ADD);

/* notify any interfaces this device is now here */
- down(&parent_class->sem);
+ mutex_lock_nested(&parent_class->mutex, SINGLE_DEPTH_NESTING);
list_add_tail(&class_dev->node, &parent_class->children);
list_for_each_entry(class_intf, &parent_class->interfaces, node) {
if (class_intf->add)
class_intf->add(class_dev, class_intf);
}
- up(&parent_class->sem);
+ mutex_unlock(&parent_class->mutex);

goto out1;

@@ -725,12 +725,12 @@ void class_device_del(struct class_devic
struct class_interface *class_intf;

if (parent_class) {
- down(&parent_class->sem);
+ mutex_lock_nested(&parent_class->mutex, SINGLE_DEPTH_NESTING);
list_del_init(&class_dev->node);
list_for_each_entry(class_intf, &parent_class->interfaces, node)
if (class_intf->remove)
class_intf->remove(class_dev, class_intf);
- up(&parent_class->sem);
+ mutex_unlock(&parent_class->mutex);
}

if (class_dev->dev) {
@@ -772,14 +772,14 @@ void class_device_destroy(struct class *
struct class_device *class_dev = NULL;
struct class_device *class_dev_tmp;

- down(&cls->sem);
+ mutex_lock(&cls->mutex);
list_for_each_entry(class_dev_tmp, &cls->children, node) {
if (class_dev_tmp->devt == devt) {
class_dev = class_dev_tmp;
break;
}
}
- up(&cls->sem);
+ mutex_unlock(&cls->mutex);

if (class_dev)
class_device_unregister(class_dev);
@@ -812,7 +812,7 @@ int class_interface_register(struct clas
if (!parent)
return -EINVAL;

- down(&parent->sem);
+ mutex_lock_nested(&parent->mutex, SINGLE_DEPTH_NESTING);
list_add_tail(&class_intf->node, &parent->interfaces);
if (class_intf->add) {
list_for_each_entry(class_dev, &parent->children, node)
@@ -822,7 +822,7 @@ int class_interface_register(struct clas
list_for_each_entry(dev, &parent->devices, node)
class_intf->add_dev(dev, class_intf);
}
- up(&parent->sem);
+ mutex_unlock(&parent->mutex);

return 0;
}
@@ -836,7 +836,7 @@ void class_interface_unregister(struct c
if (!parent)
return;

- down(&parent->sem);
+ mutex_lock_nested(&parent->mutex, SINGLE_DEPTH_NESTING);
list_del_init(&class_intf->node);
if (class_intf->remove) {
list_for_each_entry(class_dev, &parent->children, node)
@@ -846,7 +846,7 @@ void class_interface_unregister(struct c
list_for_each_entry(dev, &parent->devices, node)
class_intf->remove_dev(dev, class_intf);
}
- up(&parent->sem);
+ mutex_unlock(&parent->mutex);

class_put(parent);
}
Index: linux-2.6-2/drivers/base/core.c
===================================================================
--- linux-2.6-2.orig/drivers/base/core.c
+++ linux-2.6-2/drivers/base/core.c
@@ -19,8 +19,6 @@
#include <linux/kdev_t.h>
#include <linux/notifier.h>

-#include <asm/semaphore.h>
-
#include "base.h"
#include "power/power.h"

@@ -531,7 +529,7 @@ void device_initialize(struct device *de
klist_children_put);
INIT_LIST_HEAD(&dev->dma_pools);
INIT_LIST_HEAD(&dev->node);
- init_MUTEX(&dev->sem);
+ mutex_init(&dev->mutex);
spin_lock_init(&dev->devres_lock);
INIT_LIST_HEAD(&dev->devres_head);
device_init_wakeup(dev, 0);
@@ -782,7 +780,7 @@ int device_add(struct device *dev)
klist_add_tail(&dev->knode_parent, &parent->klist_children);

if (dev->class) {
- down(&dev->class->sem);
+ mutex_lock(&dev->class->mutex);
/* tie the class to the device */
list_add_tail(&dev->node, &dev->class->devices);

@@ -790,7 +788,7 @@ int device_add(struct device *dev)
list_for_each_entry(class_intf, &dev->class->interfaces, node)
if (class_intf->add_dev)
class_intf->add_dev(dev, class_intf);
- up(&dev->class->sem);
+ mutex_unlock(&dev->class->mutex);
}
Done:
put_device(dev);
@@ -926,14 +924,14 @@ void device_del(struct device * dev)
sysfs_remove_link(&dev->kobj, "device");
}

- down(&dev->class->sem);
+ mutex_lock(&dev->class->mutex);
/* notify any interfaces that the device is now gone */
list_for_each_entry(class_intf, &dev->class->interfaces, node)
if (class_intf->remove_dev)
class_intf->remove_dev(dev, class_intf);
/* remove the device from the class list */
list_del_init(&dev->node);
- up(&dev->class->sem);
+ mutex_unlock(&dev->class->mutex);

/* If we live in a parent class-directory, unreference it */
if (dev->kobj.parent->kset == &dev->class->class_dirs) {
@@ -944,7 +942,7 @@ void device_del(struct device * dev)
* if we are the last child of our class, delete
* our class-directory at this parent
*/
- down(&dev->class->sem);
+ mutex_lock(&dev->class->mutex);
list_for_each_entry(d, &dev->class->devices, node) {
if (d == dev)
continue;
@@ -957,7 +955,7 @@ void device_del(struct device * dev)
kobject_del(dev->kobj.parent);

kobject_put(dev->kobj.parent);
- up(&dev->class->sem);
+ mutex_unlock(&dev->class->mutex);
}
}
device_remove_file(dev, &uevent_attr);
@@ -1166,14 +1164,14 @@ void device_destroy(struct class *class,
struct device *dev = NULL;
struct device *dev_tmp;

- down(&class->sem);
+ mutex_lock(&class->mutex);
list_for_each_entry(dev_tmp, &class->devices, node) {
if (dev_tmp->devt == devt) {
dev = dev_tmp;
break;
}
}
- up(&class->sem);
+ mutex_unlock(&class->mutex);

if (dev)
device_unregister(dev);
Index: linux-2.6-2/drivers/base/dd.c
===================================================================
--- linux-2.6-2.orig/drivers/base/dd.c
+++ linux-2.6-2/drivers/base/dd.c
@@ -82,7 +82,7 @@ static void driver_sysfs_remove(struct d
* for before calling this. (It is ok to call with no other effort
* from a driver's probe() method.)
*
- * This function must be called with @dev->sem held.
+ * This function must be called with @dev->mutex held.
*/
int device_bind_driver(struct device *dev)
{
@@ -180,8 +180,8 @@ int driver_probe_done(void)
* This function returns 1 if a match is found, -ENODEV if the device is
* not registered, and 0 otherwise.
*
- * This function must be called with @dev->sem held. When called for a
- * USB interface, @dev->parent->sem must be held as well.
+ * This function must be called with @dev->mutex held. When called for a
+ * USB interface, @dev->parent->mutex must be held as well.
*/
int driver_probe_device(struct device_driver * drv, struct device * dev)
{
@@ -219,13 +219,13 @@ static int __device_attach(struct device
* 0 if no matching device was found;
* -ENODEV if the device is not registered.
*
- * When called for a USB interface, @dev->parent->sem must be held.
+ * When called for a USB interface, @dev->parent->mutex must be held.
*/
int device_attach(struct device * dev)
{
int ret = 0;

- down(&dev->sem);
+ mutex_lock_nested(&dev->mutex, DEVICE_NORMAL);
if (dev->driver) {
ret = device_bind_driver(dev);
if (ret == 0)
@@ -237,7 +237,7 @@ int device_attach(struct device * dev)
} else {
ret = bus_for_each_drv(dev->bus, NULL, dev, __device_attach);
}
- up(&dev->sem);
+ mutex_unlock(&dev->mutex);
return ret;
}

@@ -256,13 +256,13 @@ static int __driver_attach(struct device
*/

if (dev->parent) /* Needed for USB */
- down(&dev->parent->sem);
- down(&dev->sem);
+ mutex_lock_nested(&dev->parent->mutex, DRIVER_PARENT);
+ mutex_lock_nested(&dev->mutex, DRIVER_NORMAL);
if (!dev->driver)
driver_probe_device(drv, dev);
- up(&dev->sem);
+ mutex_unlock(&dev->mutex);
if (dev->parent)
- up(&dev->parent->sem);
+ mutex_unlock(&dev->parent->mutex);

return 0;
}
@@ -282,8 +282,8 @@ int driver_attach(struct device_driver *
}

/*
- * __device_release_driver() must be called with @dev->sem held.
- * When called for a USB interface, @dev->parent->sem must be held as well.
+ * __device_release_driver() must be called with @dev->mutex held.
+ * When called for a USB interface, @dev->parent->mutex must be held as well.
*/
static void __device_release_driver(struct device * dev)
{
@@ -315,7 +315,7 @@ static void __device_release_driver(stru
* @dev: device.
*
* Manually detach device from driver.
- * When called for a USB interface, @dev->parent->sem must be held.
+ * When called for a USB interface, @dev->parent->mutex must be held.
*/
void device_release_driver(struct device * dev)
{
@@ -324,9 +324,9 @@ void device_release_driver(struct device
* within their ->remove callback for the same device, they
* will deadlock right here.
*/
- down(&dev->sem);
+ mutex_lock_nested(&dev->mutex, DEVICE_NORMAL);
__device_release_driver(dev);
- up(&dev->sem);
+ mutex_unlock(&dev->mutex);
}


@@ -350,13 +350,13 @@ void driver_detach(struct device_driver
spin_unlock(&drv->klist_devices.k_lock);

if (dev->parent) /* Needed for USB */
- down(&dev->parent->sem);
- down(&dev->sem);
+ mutex_lock_nested(&dev->parent->mutex, DRIVER_PARENT);
+ mutex_lock_nested(&dev->mutex, DRIVER_NORMAL);
if (dev->driver == drv)
__device_release_driver(dev);
- up(&dev->sem);
+ mutex_unlock(&dev->mutex);
if (dev->parent)
- up(&dev->parent->sem);
+ mutex_unlock(&dev->parent->mutex);
put_device(dev);
}
}
Index: linux-2.6-2/drivers/base/power/main.c
===================================================================
--- linux-2.6-2.orig/drivers/base/power/main.c
+++ linux-2.6-2/drivers/base/power/main.c
@@ -81,7 +81,7 @@ static int resume_device(struct device *
TRACE_DEVICE(dev);
TRACE_RESUME(0);

- down(&dev->sem);
+ mutex_lock(&dev->mutex);

if (dev->bus && dev->bus->resume) {
dev_dbg(dev,"resuming\n");
@@ -98,7 +98,7 @@ static int resume_device(struct device *
error = dev->class->resume(dev);
}

- up(&dev->sem);
+ mutex_unlock(&dev->mutex);

TRACE_RESUME(error);
return error;
@@ -247,7 +247,7 @@ static int suspend_device(struct device
{
int error = 0;

- down(&dev->sem);
+ mutex_lock(&dev->mutex);
if (dev->power.power_state.event) {
dev_dbg(dev, "PM: suspend %d-->%d\n",
dev->power.power_state.event, state.event);
@@ -270,7 +270,7 @@ static int suspend_device(struct device
error = dev->bus->suspend(dev, state);
suspend_report_result(dev->bus->suspend, error);
}
- up(&dev->sem);
+ mutex_unlock(&dev->mutex);
return error;
}

Index: linux-2.6-2/drivers/pci/bus.c
===================================================================
--- linux-2.6-2.orig/drivers/pci/bus.c
+++ linux-2.6-2/drivers/pci/bus.c
@@ -198,9 +198,9 @@ void pci_walk_bus(struct pci_bus *top, v
next = dev->bus_list.next;

/* Run device routines with the device locked */
- down(&dev->dev.sem);
+ mutex_lock(&dev->dev.mutex);
cb(dev, userdata);
- up(&dev->dev.sem);
+ mutex_unlock(&dev->dev.mutex);
}
up_read(&pci_bus_sem);
}
Index: linux-2.6-2/drivers/pnp/interface.c
===================================================================
--- linux-2.6-2.orig/drivers/pnp/interface.c
+++ linux-2.6-2/drivers/pnp/interface.c
@@ -315,7 +315,7 @@ static ssize_t pnp_show_current_resource
return ret;
}

-extern struct semaphore pnp_res_mutex;
+extern struct mutex pnp_res_mutex;

static ssize_t
pnp_set_current_resources(struct device *dmdev, struct device_attribute *attr,
@@ -361,10 +361,10 @@ pnp_set_current_resources(struct device
goto done;
}
if (!strnicmp(buf, "get", 3)) {
- down(&pnp_res_mutex);
+ mutex_lock(&pnp_res_mutex);
if (pnp_can_read(dev))
dev->protocol->get(dev, &dev->res);
- up(&pnp_res_mutex);
+ mutex_unlock(&pnp_res_mutex);
goto done;
}
if (!strnicmp(buf, "set", 3)) {
@@ -373,7 +373,7 @@ pnp_set_current_resources(struct device
goto done;
buf += 3;
pnp_init_resource_table(&dev->res);
- down(&pnp_res_mutex);
+ mutex_lock(&pnp_res_mutex);
while (1) {
while (isspace(*buf))
++buf;
@@ -455,7 +455,7 @@ pnp_set_current_resources(struct device
}
break;
}
- up(&pnp_res_mutex);
+ mutex_unlock(&pnp_res_mutex);
goto done;
}

Index: linux-2.6-2/drivers/pnp/manager.c
===================================================================
--- linux-2.6-2.orig/drivers/pnp/manager.c
+++ linux-2.6-2/drivers/pnp/manager.c
@@ -14,7 +14,7 @@
#include <linux/bitmap.h>
#include "base.h"

-DECLARE_MUTEX(pnp_res_mutex);
+DEFINE_MUTEX(pnp_res_mutex);

static int pnp_assign_port(struct pnp_dev *dev, struct pnp_port *rule, int idx)
{
@@ -297,7 +297,7 @@ static int pnp_assign_resources(struct p
if (!pnp_can_configure(dev))
return -ENODEV;

- down(&pnp_res_mutex);
+ mutex_lock(&pnp_res_mutex);
pnp_clean_resource_table(&dev->res); /* start with a fresh slate */
if (dev->independent) {
port = dev->independent->port;
@@ -366,12 +366,12 @@ static int pnp_assign_resources(struct p
} else if (dev->dependent)
goto fail;

- up(&pnp_res_mutex);
+ mutex_unlock(&pnp_res_mutex);
return 1;

fail:
pnp_clean_resource_table(&dev->res);
- up(&pnp_res_mutex);
+ mutex_unlock(&pnp_res_mutex);
return 0;
}

@@ -396,7 +396,7 @@ int pnp_manual_config_dev(struct pnp_dev
return -ENOMEM;
*bak = dev->res;

- down(&pnp_res_mutex);
+ mutex_lock(&pnp_res_mutex);
dev->res = *res;
if (!(mode & PNP_CONFIG_FORCE)) {
for (i = 0; i < PNP_MAX_PORT; i++) {
@@ -416,14 +416,14 @@ int pnp_manual_config_dev(struct pnp_dev
goto fail;
}
}
- up(&pnp_res_mutex);
+ mutex_unlock(&pnp_res_mutex);

kfree(bak);
return 0;

fail:
dev->res = *bak;
- up(&pnp_res_mutex);
+ mutex_unlock(&pnp_res_mutex);
kfree(bak);
return -EINVAL;
}
@@ -547,9 +547,9 @@ int pnp_disable_dev(struct pnp_dev *dev)
dev->active = 0;

/* release the resources so that other devices can use them */
- down(&pnp_res_mutex);
+ mutex_lock(&pnp_res_mutex);
pnp_clean_resource_table(&dev->res);
- up(&pnp_res_mutex);
+ mutex_unlock(&pnp_res_mutex);

return 1;
}
Index: linux-2.6-2/drivers/power/power_supply_core.c
===================================================================
--- linux-2.6-2.orig/drivers/power/power_supply_core.c
+++ linux-2.6-2/drivers/power/power_supply_core.c
@@ -31,7 +31,7 @@ static void power_supply_changed_work(st
for (i = 0; i < psy->num_supplicants; i++) {
struct device *dev;

- down(&power_supply_class->sem);
+ mutex_lock(&power_supply_class->mutex);
list_for_each_entry(dev, &power_supply_class->devices, node) {
struct power_supply *pst = dev_get_drvdata(dev);

@@ -40,7 +40,7 @@ static void power_supply_changed_work(st
pst->external_power_changed(pst);
}
}
- up(&power_supply_class->sem);
+ mutex_unlock(&power_supply_class->mutex);
}

power_supply_update_leds(psy);
@@ -60,7 +60,7 @@ int power_supply_am_i_supplied(struct po
union power_supply_propval ret = {0,};
struct device *dev;

- down(&power_supply_class->sem);
+ mutex_lock(&power_supply_class->mutex);
list_for_each_entry(dev, &power_supply_class->devices, node) {
struct power_supply *epsy = dev_get_drvdata(dev);
int i;
@@ -76,7 +76,7 @@ int power_supply_am_i_supplied(struct po
}
}
out:
- up(&power_supply_class->sem);
+ mutex_unlock(&power_supply_class->mutex);

dev_dbg(psy->dev, "%s %d\n", __FUNCTION__, ret.intval);

Index: linux-2.6-2/drivers/rtc/interface.c
===================================================================
--- linux-2.6-2.orig/drivers/rtc/interface.c
+++ linux-2.6-2/drivers/rtc/interface.c
@@ -256,7 +256,7 @@ struct rtc_device *rtc_class_open(char *
struct device *dev;
struct rtc_device *rtc = NULL;

- down(&rtc_class->sem);
+ mutex_lock(&rtc_class->mutex);
list_for_each_entry(dev, &rtc_class->devices, node) {
if (strncmp(dev->bus_id, name, BUS_ID_SIZE) == 0) {
dev = get_device(dev);
@@ -272,7 +272,7 @@ struct rtc_device *rtc_class_open(char *
rtc = NULL;
}
}
- up(&rtc_class->sem);
+ mutex_unlock(&rtc_class->mutex);

return rtc;
}
Index: linux-2.6-2/drivers/scsi/hosts.c
===================================================================
--- linux-2.6-2.orig/drivers/scsi/hosts.c
+++ linux-2.6-2/drivers/scsi/hosts.c
@@ -443,7 +443,7 @@ struct Scsi_Host *scsi_host_lookup(unsig
struct class_device *cdev;
struct Scsi_Host *shost = ERR_PTR(-ENXIO), *p;

- down(&class->sem);
+ mutex_lock(&class->mutex);
list_for_each_entry(cdev, &class->children, node) {
p = class_to_shost(cdev);
if (p->host_no == hostnum) {
@@ -451,7 +451,7 @@ struct Scsi_Host *scsi_host_lookup(unsig
break;
}
}
- up(&class->sem);
+ mutex_unlock(&class->mutex);

return shost;
}
Index: linux-2.6-2/drivers/spi/spi.c
===================================================================
--- linux-2.6-2.orig/drivers/spi/spi.c
+++ linux-2.6-2/drivers/spi/spi.c
@@ -499,7 +499,7 @@ struct spi_master *spi_busnum_to_master(
struct spi_master *master = NULL;
struct spi_master *m;

- down(&spi_master_class.sem);
+ mutex_lock(&spi_master_class.mutex);
list_for_each_entry(dev, &spi_master_class.children, node) {
m = container_of(dev, struct spi_master, dev);
if (m->bus_num == bus_num) {
@@ -507,7 +507,7 @@ struct spi_master *spi_busnum_to_master(
break;
}
}
- up(&spi_master_class.sem);
+ mutex_unlock(&spi_master_class.mutex);
return master;
}
EXPORT_SYMBOL_GPL(spi_busnum_to_master);
@@ -587,7 +587,7 @@ int spi_write_then_read(struct spi_devic
const u8 *txbuf, unsigned n_tx,
u8 *rxbuf, unsigned n_rx)
{
- static DECLARE_MUTEX(lock);
+ static DEFINE_MUTEX(lock);

int status;
struct spi_message message;
@@ -613,7 +613,7 @@ int spi_write_then_read(struct spi_devic
}

/* ... unless someone else is using the pre-allocated buffer */
- if (down_trylock(&lock)) {
+ if (mutex_trylock(&lock)) {
local_buf = kmalloc(SPI_BUFSIZ, GFP_KERNEL);
if (!local_buf)
return -ENOMEM;
@@ -632,7 +632,7 @@ int spi_write_then_read(struct spi_devic
}

if (x[0].tx_buf == buf)
- up(&lock);
+ mutex_unlock(&lock);
else
kfree(local_buf);

Index: linux-2.6-2/drivers/usb/core/hub.c
===================================================================
--- linux-2.6-2.orig/drivers/usb/core/hub.c
+++ linux-2.6-2/drivers/usb/core/hub.c
@@ -3143,7 +3143,7 @@ int usb_reset_composite_device(struct us
for (i = 0; i < config->desc.bNumInterfaces; ++i) {
cintf = config->interface[i];
if (cintf != iface)
- down(&cintf->dev.sem);
+ mutex_lock(&cintf->dev.mutex);
if (device_is_registered(&cintf->dev) &&
cintf->dev.driver) {
drv = to_usb_driver(cintf->dev.driver);
@@ -3171,7 +3171,7 @@ int usb_reset_composite_device(struct us
/* FIXME: Unbind if post_reset returns an error or isn't defined */
}
if (cintf != iface)
- up(&cintf->dev.sem);
+ mutex_unlock(&cintf->dev.mutex);
}
}

Index: linux-2.6-2/include/linux/device.h
===================================================================
--- linux-2.6-2.orig/include/linux/device.h
+++ linux-2.6-2/include/linux/device.h
@@ -20,7 +20,7 @@
#include <linux/types.h>
#include <linux/module.h>
#include <linux/pm.h>
-#include <asm/semaphore.h>
+#include <asm/mutex.h>
#include <asm/atomic.h>
#include <asm/device.h>

@@ -110,7 +110,7 @@ extern int bus_unregister_notifier(struc

/* All 4 notifers below get called with the target struct device *
* as an argument. Note that those functions are likely to be called
- * with the device semaphore held in the core, so be careful.
+ * with the device mutex held in the core, so be careful.
*/
#define BUS_NOTIFY_ADD_DEVICE 0x00000001 /* device added */
#define BUS_NOTIFY_DEL_DEVICE 0x00000002 /* device removed */
@@ -137,7 +137,6 @@ struct device_driver {
int (*resume) (struct device * dev);
};

-
extern int __must_check driver_register(struct device_driver * drv);
extern void driver_unregister(struct device_driver * drv);

@@ -180,7 +179,7 @@ struct class {
struct list_head devices;
struct list_head interfaces;
struct kset class_dirs;
- struct semaphore sem; /* locks both the children and interfaces lists */
+ struct mutex mutex; /* locks both the children and interfaces lists */

struct class_attribute * class_attrs;
struct class_device_attribute * class_dev_attrs;
@@ -410,9 +409,7 @@ struct device {
unsigned is_registered:1;
unsigned uevent_suppress:1;

- struct semaphore sem; /* semaphore to synchronize calls to
- * its driver.
- */
+ struct mutex mutex; /* synchronize calls to its driver. */

struct bus_type * bus; /* type of bus device is on */
struct device_driver *driver; /* which driver has allocated this
@@ -451,6 +448,13 @@ struct device {
void (*release)(struct device * dev);
};

+enum {
+ DEVICE_NORMAL,
+ DEVICE_PARENT,
+ DRIVER_NORMAL,
+ DRIVER_PARENT,
+};
+
#ifdef CONFIG_NUMA
static inline int dev_to_node(struct device *dev)
{
Index: linux-2.6-2/include/linux/usb.h
===================================================================
--- linux-2.6-2.orig/include/linux/usb.h
+++ linux-2.6-2/include/linux/usb.h
@@ -439,9 +439,9 @@ extern struct usb_device *usb_get_dev(st
extern void usb_put_dev(struct usb_device *dev);

/* USB device locking */
-#define usb_lock_device(udev) down(&(udev)->dev.sem)
-#define usb_unlock_device(udev) up(&(udev)->dev.sem)
-#define usb_trylock_device(udev) down_trylock(&(udev)->dev.sem)
+#define usb_lock_device(udev) mutex_lock(&(udev)->dev.mutex)
+#define usb_unlock_device(udev) mutex_unlock(&(udev)->dev.mutex)
+#define usb_trylock_device(udev) mutex_trylock(&(udev)->dev.mutex)
extern int usb_lock_device_for_reset(struct usb_device *udev,
const struct usb_interface *iface);



2007-11-05 22:34:37

by Stefan Richter

[permalink] [raw]
Subject: Re: device struct bloat

Peter Zijlstra wrote:
[sem-to-mutex in struct device and struct class]
> drivers/base/bus.c | 20 ++++++++++----------
> drivers/base/class.c | 22 +++++++++++-----------
> drivers/base/core.c | 20 +++++++++-----------
> drivers/base/dd.c | 38 +++++++++++++++++++-------------------
> drivers/base/power/main.c | 8 ++++----
> drivers/pci/bus.c | 4 ++--
> drivers/pnp/interface.c | 10 +++++-----
> drivers/pnp/manager.c | 18 +++++++++---------
> drivers/power/power_supply_core.c | 8 ++++----
> drivers/rtc/interface.c | 4 ++--
> drivers/scsi/hosts.c | 4 ++--
> drivers/spi/spi.c | 10 +++++-----
> drivers/usb/core/hub.c | 4 ++--
> include/linux/device.h | 18 +++++++++++-------
> include/linux/usb.h | 6 +++---
> 15 files changed, 98 insertions(+), 96 deletions(-)

At least two files are missing:
drivers/firewire/fw-device.c accesses device.sem.
drivers/ieee1394/nodemgr.c accesses class.sem and device.sem.
--
Stefan Richter
-=====-=-=== =-== ---=-
http://arcgraph.de/sr/

2007-11-05 22:43:46

by Greg KH

[permalink] [raw]
Subject: Re: device struct bloat

On Mon, Nov 05, 2007 at 11:57:14AM +0100, Peter Zijlstra wrote:
> Hmm, the problem seems to be stuff like:
>
> add usb driver to pci
> scan pci devices
> add usb host controller device
> scan usb devices
> add usb hub device
> scan usb devices
> add usb .....
>
> This seems to be able to go on forever, as long as one can cascade usb
> hubs.

USB hubs only work 7 deep, so there is a limit.

> Doesn't seem like an ideal thing to do from a stack space POV either.
>
> Would it be possible to break at the second scan, that is the device
> probe and stick that into a workqueue or something. Then we'd only ever
> have driver->device nesting.

Alan and Oliver have done some work in this area I think, combined with
the suspend/bind/unbind issues. I'll let them comment on your patch :)

thanks,

greg k-h

---
drivers/base/bus.c | 20 ++++++++++----------
drivers/base/class.c | 22 +++++++++++-----------
drivers/base/core.c | 20 +++++++++-----------
drivers/base/dd.c | 38 +++++++++++++++++++-------------------
drivers/base/power/main.c | 8 ++++----
drivers/pci/bus.c | 4 ++--
drivers/pnp/interface.c | 10 +++++-----
drivers/pnp/manager.c | 18 +++++++++---------
drivers/power/power_supply_core.c | 8 ++++----
drivers/rtc/interface.c | 4 ++--
drivers/scsi/hosts.c | 4 ++--
drivers/spi/spi.c | 10 +++++-----
drivers/usb/core/hub.c | 4 ++--
include/linux/device.h | 18 +++++++++++-------
include/linux/usb.h | 6 +++---
15 files changed, 98 insertions(+), 96 deletions(-)

Index: linux-2.6-2/drivers/base/bus.c
===================================================================
--- linux-2.6-2.orig/drivers/base/bus.c
+++ linux-2.6-2/drivers/base/bus.c
@@ -190,10 +190,10 @@ static ssize_t driver_unbind(struct devi
dev = bus_find_device(bus, NULL, (void *)buf, driver_helper);
if (dev && dev->driver == drv) {
if (dev->parent) /* Needed for USB */
- down(&dev->parent->sem);
+ mutex_lock_nested(&dev->parent->mutex, DRIVER_PARENT);
device_release_driver(dev);
if (dev->parent)
- up(&dev->parent->sem);
+ mutex_unlock(&dev->parent->mutex);
err = count;
}
put_device(dev);
@@ -217,12 +217,12 @@ static ssize_t driver_bind(struct device
dev = bus_find_device(bus, NULL, (void *)buf, driver_helper);
if (dev && dev->driver == NULL) {
if (dev->parent) /* Needed for USB */
- down(&dev->parent->sem);
- down(&dev->sem);
+ mutex_lock_nested(&dev->parent->mutex, DRIVER_PARENT);
+ mutex_lock_nested(&dev->mutex, DRIVER_NORMAL);
err = driver_probe_device(drv, dev);
- up(&dev->sem);
+ mutex_unlock(&dev->mutex);
if (dev->parent)
- up(&dev->parent->sem);
+ mutex_unlock(&dev->parent->mutex);

if (err > 0) /* success */
err = count;
@@ -711,10 +711,10 @@ static int __must_check bus_rescan_devic

if (!dev->driver) {
if (dev->parent) /* Needed for USB */
- down(&dev->parent->sem);
+ mutex_lock_nested(&dev->parent->mutex, DEVICE_PARENT);
ret = device_attach(dev);
if (dev->parent)
- up(&dev->parent->sem);
+ mutex_unlock(&dev->parent->mutex);
}
return ret < 0 ? ret : 0;
}
@@ -745,10 +745,10 @@ int device_reprobe(struct device *dev)
{
if (dev->driver) {
if (dev->parent) /* Needed for USB */
- down(&dev->parent->sem);
+ mutex_lock_nested(&dev->parent->mutex, DEVICE_PARENT);
device_release_driver(dev);
if (dev->parent)
- up(&dev->parent->sem);
+ mutex_unlock(&dev->parent->mutex);
}
return bus_rescan_devices_helper(dev, NULL);
}
Index: linux-2.6-2/drivers/base/class.c
===================================================================
--- linux-2.6-2.orig/drivers/base/class.c
+++ linux-2.6-2/drivers/base/class.c
@@ -144,7 +144,7 @@ int class_register(struct class * cls)
INIT_LIST_HEAD(&cls->devices);
INIT_LIST_HEAD(&cls->interfaces);
kset_init(&cls->class_dirs);
- init_MUTEX(&cls->sem);
+ mutex_init(&cls->mutex);
error = kobject_set_name(&cls->subsys.kobj, "%s", cls->name);
if (error)
return error;
@@ -617,13 +617,13 @@ int class_device_add(struct class_device
kobject_uevent(&class_dev->kobj, KOBJ_ADD);

/* notify any interfaces this device is now here */
- down(&parent_class->sem);
+ mutex_lock_nested(&parent_class->mutex, SINGLE_DEPTH_NESTING);
list_add_tail(&class_dev->node, &parent_class->children);
list_for_each_entry(class_intf, &parent_class->interfaces, node) {
if (class_intf->add)
class_intf->add(class_dev, class_intf);
}
- up(&parent_class->sem);
+ mutex_unlock(&parent_class->mutex);

goto out1;

@@ -725,12 +725,12 @@ void class_device_del(struct class_devic
struct class_interface *class_intf;

if (parent_class) {
- down(&parent_class->sem);
+ mutex_lock_nested(&parent_class->mutex, SINGLE_DEPTH_NESTING);
list_del_init(&class_dev->node);
list_for_each_entry(class_intf, &parent_class->interfaces, node)
if (class_intf->remove)
class_intf->remove(class_dev, class_intf);
- up(&parent_class->sem);
+ mutex_unlock(&parent_class->mutex);
}

if (class_dev->dev) {
@@ -772,14 +772,14 @@ void class_device_destroy(struct class *
struct class_device *class_dev = NULL;
struct class_device *class_dev_tmp;

- down(&cls->sem);
+ mutex_lock(&cls->mutex);
list_for_each_entry(class_dev_tmp, &cls->children, node) {
if (class_dev_tmp->devt == devt) {
class_dev = class_dev_tmp;
break;
}
}
- up(&cls->sem);
+ mutex_unlock(&cls->mutex);

if (class_dev)
class_device_unregister(class_dev);
@@ -812,7 +812,7 @@ int class_interface_register(struct clas
if (!parent)
return -EINVAL;

- down(&parent->sem);
+ mutex_lock_nested(&parent->mutex, SINGLE_DEPTH_NESTING);
list_add_tail(&class_intf->node, &parent->interfaces);
if (class_intf->add) {
list_for_each_entry(class_dev, &parent->children, node)
@@ -822,7 +822,7 @@ int class_interface_register(struct clas
list_for_each_entry(dev, &parent->devices, node)
class_intf->add_dev(dev, class_intf);
}
- up(&parent->sem);
+ mutex_unlock(&parent->mutex);

return 0;
}
@@ -836,7 +836,7 @@ void class_interface_unregister(struct c
if (!parent)
return;

- down(&parent->sem);
+ mutex_lock_nested(&parent->mutex, SINGLE_DEPTH_NESTING);
list_del_init(&class_intf->node);
if (class_intf->remove) {
list_for_each_entry(class_dev, &parent->children, node)
@@ -846,7 +846,7 @@ void class_interface_unregister(struct c
list_for_each_entry(dev, &parent->devices, node)
class_intf->remove_dev(dev, class_intf);
}
- up(&parent->sem);
+ mutex_unlock(&parent->mutex);

class_put(parent);
}
Index: linux-2.6-2/drivers/base/core.c
===================================================================
--- linux-2.6-2.orig/drivers/base/core.c
+++ linux-2.6-2/drivers/base/core.c
@@ -19,8 +19,6 @@
#include <linux/kdev_t.h>
#include <linux/notifier.h>

-#include <asm/semaphore.h>
-
#include "base.h"
#include "power/power.h"

@@ -531,7 +529,7 @@ void device_initialize(struct device *de
klist_children_put);
INIT_LIST_HEAD(&dev->dma_pools);
INIT_LIST_HEAD(&dev->node);
- init_MUTEX(&dev->sem);
+ mutex_init(&dev->mutex);
spin_lock_init(&dev->devres_lock);
INIT_LIST_HEAD(&dev->devres_head);
device_init_wakeup(dev, 0);
@@ -782,7 +780,7 @@ int device_add(struct device *dev)
klist_add_tail(&dev->knode_parent, &parent->klist_children);

if (dev->class) {
- down(&dev->class->sem);
+ mutex_lock(&dev->class->mutex);
/* tie the class to the device */
list_add_tail(&dev->node, &dev->class->devices);

@@ -790,7 +788,7 @@ int device_add(struct device *dev)
list_for_each_entry(class_intf, &dev->class->interfaces, node)
if (class_intf->add_dev)
class_intf->add_dev(dev, class_intf);
- up(&dev->class->sem);
+ mutex_unlock(&dev->class->mutex);
}
Done:
put_device(dev);
@@ -926,14 +924,14 @@ void device_del(struct device * dev)
sysfs_remove_link(&dev->kobj, "device");
}

- down(&dev->class->sem);
+ mutex_lock(&dev->class->mutex);
/* notify any interfaces that the device is now gone */
list_for_each_entry(class_intf, &dev->class->interfaces, node)
if (class_intf->remove_dev)
class_intf->remove_dev(dev, class_intf);
/* remove the device from the class list */
list_del_init(&dev->node);
- up(&dev->class->sem);
+ mutex_unlock(&dev->class->mutex);

/* If we live in a parent class-directory, unreference it */
if (dev->kobj.parent->kset == &dev->class->class_dirs) {
@@ -944,7 +942,7 @@ void device_del(struct device * dev)
* if we are the last child of our class, delete
* our class-directory at this parent
*/
- down(&dev->class->sem);
+ mutex_lock(&dev->class->mutex);
list_for_each_entry(d, &dev->class->devices, node) {
if (d == dev)
continue;
@@ -957,7 +955,7 @@ void device_del(struct device * dev)
kobject_del(dev->kobj.parent);

kobject_put(dev->kobj.parent);
- up(&dev->class->sem);
+ mutex_unlock(&dev->class->mutex);
}
}
device_remove_file(dev, &uevent_attr);
@@ -1166,14 +1164,14 @@ void device_destroy(struct class *class,
struct device *dev = NULL;
struct device *dev_tmp;

- down(&class->sem);
+ mutex_lock(&class->mutex);
list_for_each_entry(dev_tmp, &class->devices, node) {
if (dev_tmp->devt == devt) {
dev = dev_tmp;
break;
}
}
- up(&class->sem);
+ mutex_unlock(&class->mutex);

if (dev)
device_unregister(dev);
Index: linux-2.6-2/drivers/base/dd.c
===================================================================
--- linux-2.6-2.orig/drivers/base/dd.c
+++ linux-2.6-2/drivers/base/dd.c
@@ -82,7 +82,7 @@ static void driver_sysfs_remove(struct d
* for before calling this. (It is ok to call with no other effort
* from a driver's probe() method.)
*
- * This function must be called with @dev->sem held.
+ * This function must be called with @dev->mutex held.
*/
int device_bind_driver(struct device *dev)
{
@@ -180,8 +180,8 @@ int driver_probe_done(void)
* This function returns 1 if a match is found, -ENODEV if the device is
* not registered, and 0 otherwise.
*
- * This function must be called with @dev->sem held. When called for a
- * USB interface, @dev->parent->sem must be held as well.
+ * This function must be called with @dev->mutex held. When called for a
+ * USB interface, @dev->parent->mutex must be held as well.
*/
int driver_probe_device(struct device_driver * drv, struct device * dev)
{
@@ -219,13 +219,13 @@ static int __device_attach(struct device
* 0 if no matching device was found;
* -ENODEV if the device is not registered.
*
- * When called for a USB interface, @dev->parent->sem must be held.
+ * When called for a USB interface, @dev->parent->mutex must be held.
*/
int device_attach(struct device * dev)
{
int ret = 0;

- down(&dev->sem);
+ mutex_lock_nested(&dev->mutex, DEVICE_NORMAL);
if (dev->driver) {
ret = device_bind_driver(dev);
if (ret == 0)
@@ -237,7 +237,7 @@ int device_attach(struct device * dev)
} else {
ret = bus_for_each_drv(dev->bus, NULL, dev, __device_attach);
}
- up(&dev->sem);
+ mutex_unlock(&dev->mutex);
return ret;
}

@@ -256,13 +256,13 @@ static int __driver_attach(struct device
*/

if (dev->parent) /* Needed for USB */
- down(&dev->parent->sem);
- down(&dev->sem);
+ mutex_lock_nested(&dev->parent->mutex, DRIVER_PARENT);
+ mutex_lock_nested(&dev->mutex, DRIVER_NORMAL);
if (!dev->driver)
driver_probe_device(drv, dev);
- up(&dev->sem);
+ mutex_unlock(&dev->mutex);
if (dev->parent)
- up(&dev->parent->sem);
+ mutex_unlock(&dev->parent->mutex);

return 0;
}
@@ -282,8 +282,8 @@ int driver_attach(struct device_driver *
}

/*
- * __device_release_driver() must be called with @dev->sem held.
- * When called for a USB interface, @dev->parent->sem must be held as well.
+ * __device_release_driver() must be called with @dev->mutex held.
+ * When called for a USB interface, @dev->parent->mutex must be held as well.
*/
static void __device_release_driver(struct device * dev)
{
@@ -315,7 +315,7 @@ static void __device_release_driver(stru
* @dev: device.
*
* Manually detach device from driver.
- * When called for a USB interface, @dev->parent->sem must be held.
+ * When called for a USB interface, @dev->parent->mutex must be held.
*/
void device_release_driver(struct device * dev)
{
@@ -324,9 +324,9 @@ void device_release_driver(struct device
* within their ->remove callback for the same device, they
* will deadlock right here.
*/
- down(&dev->sem);
+ mutex_lock_nested(&dev->mutex, DEVICE_NORMAL);
__device_release_driver(dev);
- up(&dev->sem);
+ mutex_unlock(&dev->mutex);
}


@@ -350,13 +350,13 @@ void driver_detach(struct device_driver
spin_unlock(&drv->klist_devices.k_lock);

if (dev->parent) /* Needed for USB */
- down(&dev->parent->sem);
- down(&dev->sem);
+ mutex_lock_nested(&dev->parent->mutex, DRIVER_PARENT);
+ mutex_lock_nested(&dev->mutex, DRIVER_NORMAL);
if (dev->driver == drv)
__device_release_driver(dev);
- up(&dev->sem);
+ mutex_unlock(&dev->mutex);
if (dev->parent)
- up(&dev->parent->sem);
+ mutex_unlock(&dev->parent->mutex);
put_device(dev);
}
}
Index: linux-2.6-2/drivers/base/power/main.c
===================================================================
--- linux-2.6-2.orig/drivers/base/power/main.c
+++ linux-2.6-2/drivers/base/power/main.c
@@ -81,7 +81,7 @@ static int resume_device(struct device *
TRACE_DEVICE(dev);
TRACE_RESUME(0);

- down(&dev->sem);
+ mutex_lock(&dev->mutex);

if (dev->bus && dev->bus->resume) {
dev_dbg(dev,"resuming\n");
@@ -98,7 +98,7 @@ static int resume_device(struct device *
error = dev->class->resume(dev);
}

- up(&dev->sem);
+ mutex_unlock(&dev->mutex);

TRACE_RESUME(error);
return error;
@@ -247,7 +247,7 @@ static int suspend_device(struct device
{
int error = 0;

- down(&dev->sem);
+ mutex_lock(&dev->mutex);
if (dev->power.power_state.event) {
dev_dbg(dev, "PM: suspend %d-->%d\n",
dev->power.power_state.event, state.event);
@@ -270,7 +270,7 @@ static int suspend_device(struct device
error = dev->bus->suspend(dev, state);
suspend_report_result(dev->bus->suspend, error);
}
- up(&dev->sem);
+ mutex_unlock(&dev->mutex);
return error;
}

Index: linux-2.6-2/drivers/pci/bus.c
===================================================================
--- linux-2.6-2.orig/drivers/pci/bus.c
+++ linux-2.6-2/drivers/pci/bus.c
@@ -198,9 +198,9 @@ void pci_walk_bus(struct pci_bus *top, v
next = dev->bus_list.next;

/* Run device routines with the device locked */
- down(&dev->dev.sem);
+ mutex_lock(&dev->dev.mutex);
cb(dev, userdata);
- up(&dev->dev.sem);
+ mutex_unlock(&dev->dev.mutex);
}
up_read(&pci_bus_sem);
}
Index: linux-2.6-2/drivers/pnp/interface.c
===================================================================
--- linux-2.6-2.orig/drivers/pnp/interface.c
+++ linux-2.6-2/drivers/pnp/interface.c
@@ -315,7 +315,7 @@ static ssize_t pnp_show_current_resource
return ret;
}

-extern struct semaphore pnp_res_mutex;
+extern struct mutex pnp_res_mutex;

static ssize_t
pnp_set_current_resources(struct device *dmdev, struct device_attribute *attr,
@@ -361,10 +361,10 @@ pnp_set_current_resources(struct device
goto done;
}
if (!strnicmp(buf, "get", 3)) {
- down(&pnp_res_mutex);
+ mutex_lock(&pnp_res_mutex);
if (pnp_can_read(dev))
dev->protocol->get(dev, &dev->res);
- up(&pnp_res_mutex);
+ mutex_unlock(&pnp_res_mutex);
goto done;
}
if (!strnicmp(buf, "set", 3)) {
@@ -373,7 +373,7 @@ pnp_set_current_resources(struct device
goto done;
buf += 3;
pnp_init_resource_table(&dev->res);
- down(&pnp_res_mutex);
+ mutex_lock(&pnp_res_mutex);
while (1) {
while (isspace(*buf))
++buf;
@@ -455,7 +455,7 @@ pnp_set_current_resources(struct device
}
break;
}
- up(&pnp_res_mutex);
+ mutex_unlock(&pnp_res_mutex);
goto done;
}

Index: linux-2.6-2/drivers/pnp/manager.c
===================================================================
--- linux-2.6-2.orig/drivers/pnp/manager.c
+++ linux-2.6-2/drivers/pnp/manager.c
@@ -14,7 +14,7 @@
#include <linux/bitmap.h>
#include "base.h"

-DECLARE_MUTEX(pnp_res_mutex);
+DEFINE_MUTEX(pnp_res_mutex);

static int pnp_assign_port(struct pnp_dev *dev, struct pnp_port *rule, int idx)
{
@@ -297,7 +297,7 @@ static int pnp_assign_resources(struct p
if (!pnp_can_configure(dev))
return -ENODEV;

- down(&pnp_res_mutex);
+ mutex_lock(&pnp_res_mutex);
pnp_clean_resource_table(&dev->res); /* start with a fresh slate */
if (dev->independent) {
port = dev->independent->port;
@@ -366,12 +366,12 @@ static int pnp_assign_resources(struct p
} else if (dev->dependent)
goto fail;

- up(&pnp_res_mutex);
+ mutex_unlock(&pnp_res_mutex);
return 1;

fail:
pnp_clean_resource_table(&dev->res);
- up(&pnp_res_mutex);
+ mutex_unlock(&pnp_res_mutex);
return 0;
}

@@ -396,7 +396,7 @@ int pnp_manual_config_dev(struct pnp_dev
return -ENOMEM;
*bak = dev->res;

- down(&pnp_res_mutex);
+ mutex_lock(&pnp_res_mutex);
dev->res = *res;
if (!(mode & PNP_CONFIG_FORCE)) {
for (i = 0; i < PNP_MAX_PORT; i++) {
@@ -416,14 +416,14 @@ int pnp_manual_config_dev(struct pnp_dev
goto fail;
}
}
- up(&pnp_res_mutex);
+ mutex_unlock(&pnp_res_mutex);

kfree(bak);
return 0;

fail:
dev->res = *bak;
- up(&pnp_res_mutex);
+ mutex_unlock(&pnp_res_mutex);
kfree(bak);
return -EINVAL;
}
@@ -547,9 +547,9 @@ int pnp_disable_dev(struct pnp_dev *dev)
dev->active = 0;

/* release the resources so that other devices can use them */
- down(&pnp_res_mutex);
+ mutex_lock(&pnp_res_mutex);
pnp_clean_resource_table(&dev->res);
- up(&pnp_res_mutex);
+ mutex_unlock(&pnp_res_mutex);

return 1;
}
Index: linux-2.6-2/drivers/power/power_supply_core.c
===================================================================
--- linux-2.6-2.orig/drivers/power/power_supply_core.c
+++ linux-2.6-2/drivers/power/power_supply_core.c
@@ -31,7 +31,7 @@ static void power_supply_changed_work(st
for (i = 0; i < psy->num_supplicants; i++) {
struct device *dev;

- down(&power_supply_class->sem);
+ mutex_lock(&power_supply_class->mutex);
list_for_each_entry(dev, &power_supply_class->devices, node) {
struct power_supply *pst = dev_get_drvdata(dev);

@@ -40,7 +40,7 @@ static void power_supply_changed_work(st
pst->external_power_changed(pst);
}
}
- up(&power_supply_class->sem);
+ mutex_unlock(&power_supply_class->mutex);
}

power_supply_update_leds(psy);
@@ -60,7 +60,7 @@ int power_supply_am_i_supplied(struct po
union power_supply_propval ret = {0,};
struct device *dev;

- down(&power_supply_class->sem);
+ mutex_lock(&power_supply_class->mutex);
list_for_each_entry(dev, &power_supply_class->devices, node) {
struct power_supply *epsy = dev_get_drvdata(dev);
int i;
@@ -76,7 +76,7 @@ int power_supply_am_i_supplied(struct po
}
}
out:
- up(&power_supply_class->sem);
+ mutex_unlock(&power_supply_class->mutex);

dev_dbg(psy->dev, "%s %d\n", __FUNCTION__, ret.intval);

Index: linux-2.6-2/drivers/rtc/interface.c
===================================================================
--- linux-2.6-2.orig/drivers/rtc/interface.c
+++ linux-2.6-2/drivers/rtc/interface.c
@@ -256,7 +256,7 @@ struct rtc_device *rtc_class_open(char *
struct device *dev;
struct rtc_device *rtc = NULL;

- down(&rtc_class->sem);
+ mutex_lock(&rtc_class->mutex);
list_for_each_entry(dev, &rtc_class->devices, node) {
if (strncmp(dev->bus_id, name, BUS_ID_SIZE) == 0) {
dev = get_device(dev);
@@ -272,7 +272,7 @@ struct rtc_device *rtc_class_open(char *
rtc = NULL;
}
}
- up(&rtc_class->sem);
+ mutex_unlock(&rtc_class->mutex);

return rtc;
}
Index: linux-2.6-2/drivers/scsi/hosts.c
===================================================================
--- linux-2.6-2.orig/drivers/scsi/hosts.c
+++ linux-2.6-2/drivers/scsi/hosts.c
@@ -443,7 +443,7 @@ struct Scsi_Host *scsi_host_lookup(unsig
struct class_device *cdev;
struct Scsi_Host *shost = ERR_PTR(-ENXIO), *p;

- down(&class->sem);
+ mutex_lock(&class->mutex);
list_for_each_entry(cdev, &class->children, node) {
p = class_to_shost(cdev);
if (p->host_no == hostnum) {
@@ -451,7 +451,7 @@ struct Scsi_Host *scsi_host_lookup(unsig
break;
}
}
- up(&class->sem);
+ mutex_unlock(&class->mutex);

return shost;
}
Index: linux-2.6-2/drivers/spi/spi.c
===================================================================
--- linux-2.6-2.orig/drivers/spi/spi.c
+++ linux-2.6-2/drivers/spi/spi.c
@@ -499,7 +499,7 @@ struct spi_master *spi_busnum_to_master(
struct spi_master *master = NULL;
struct spi_master *m;

- down(&spi_master_class.sem);
+ mutex_lock(&spi_master_class.mutex);
list_for_each_entry(dev, &spi_master_class.children, node) {
m = container_of(dev, struct spi_master, dev);
if (m->bus_num == bus_num) {
@@ -507,7 +507,7 @@ struct spi_master *spi_busnum_to_master(
break;
}
}
- up(&spi_master_class.sem);
+ mutex_unlock(&spi_master_class.mutex);
return master;
}
EXPORT_SYMBOL_GPL(spi_busnum_to_master);
@@ -587,7 +587,7 @@ int spi_write_then_read(struct spi_devic
const u8 *txbuf, unsigned n_tx,
u8 *rxbuf, unsigned n_rx)
{
- static DECLARE_MUTEX(lock);
+ static DEFINE_MUTEX(lock);

int status;
struct spi_message message;
@@ -613,7 +613,7 @@ int spi_write_then_read(struct spi_devic
}

/* ... unless someone else is using the pre-allocated buffer */
- if (down_trylock(&lock)) {
+ if (mutex_trylock(&lock)) {
local_buf = kmalloc(SPI_BUFSIZ, GFP_KERNEL);
if (!local_buf)
return -ENOMEM;
@@ -632,7 +632,7 @@ int spi_write_then_read(struct spi_devic
}

if (x[0].tx_buf == buf)
- up(&lock);
+ mutex_unlock(&lock);
else
kfree(local_buf);

Index: linux-2.6-2/drivers/usb/core/hub.c
===================================================================
--- linux-2.6-2.orig/drivers/usb/core/hub.c
+++ linux-2.6-2/drivers/usb/core/hub.c
@@ -3143,7 +3143,7 @@ int usb_reset_composite_device(struct us
for (i = 0; i < config->desc.bNumInterfaces; ++i) {
cintf = config->interface[i];
if (cintf != iface)
- down(&cintf->dev.sem);
+ mutex_lock(&cintf->dev.mutex);
if (device_is_registered(&cintf->dev) &&
cintf->dev.driver) {
drv = to_usb_driver(cintf->dev.driver);
@@ -3171,7 +3171,7 @@ int usb_reset_composite_device(struct us
/* FIXME: Unbind if post_reset returns an error or isn't defined */
}
if (cintf != iface)
- up(&cintf->dev.sem);
+ mutex_unlock(&cintf->dev.mutex);
}
}

Index: linux-2.6-2/include/linux/device.h
===================================================================
--- linux-2.6-2.orig/include/linux/device.h
+++ linux-2.6-2/include/linux/device.h
@@ -20,7 +20,7 @@
#include <linux/types.h>
#include <linux/module.h>
#include <linux/pm.h>
-#include <asm/semaphore.h>
+#include <asm/mutex.h>
#include <asm/atomic.h>
#include <asm/device.h>

@@ -110,7 +110,7 @@ extern int bus_unregister_notifier(struc

/* All 4 notifers below get called with the target struct device *
* as an argument. Note that those functions are likely to be called
- * with the device semaphore held in the core, so be careful.
+ * with the device mutex held in the core, so be careful.
*/
#define BUS_NOTIFY_ADD_DEVICE 0x00000001 /* device added */
#define BUS_NOTIFY_DEL_DEVICE 0x00000002 /* device removed */
@@ -137,7 +137,6 @@ struct device_driver {
int (*resume) (struct device * dev);
};

-
extern int __must_check driver_register(struct device_driver * drv);
extern void driver_unregister(struct device_driver * drv);

@@ -180,7 +179,7 @@ struct class {
struct list_head devices;
struct list_head interfaces;
struct kset class_dirs;
- struct semaphore sem; /* locks both the children and interfaces lists */
+ struct mutex mutex; /* locks both the children and interfaces lists */

struct class_attribute * class_attrs;
struct class_device_attribute * class_dev_attrs;
@@ -410,9 +409,7 @@ struct device {
unsigned is_registered:1;
unsigned uevent_suppress:1;

- struct semaphore sem; /* semaphore to synchronize calls to
- * its driver.
- */
+ struct mutex mutex; /* synchronize calls to its driver. */

struct bus_type * bus; /* type of bus device is on */
struct device_driver *driver; /* which driver has allocated this
@@ -451,6 +448,13 @@ struct device {
void (*release)(struct device * dev);
};

+enum {
+ DEVICE_NORMAL,
+ DEVICE_PARENT,
+ DRIVER_NORMAL,
+ DRIVER_PARENT,
+};
+
#ifdef CONFIG_NUMA
static inline int dev_to_node(struct device *dev)
{
Index: linux-2.6-2/include/linux/usb.h
===================================================================
--- linux-2.6-2.orig/include/linux/usb.h
+++ linux-2.6-2/include/linux/usb.h
@@ -439,9 +439,9 @@ extern struct usb_device *usb_get_dev(st
extern void usb_put_dev(struct usb_device *dev);

/* USB device locking */
-#define usb_lock_device(udev) down(&(udev)->dev.sem)
-#define usb_unlock_device(udev) up(&(udev)->dev.sem)
-#define usb_trylock_device(udev) down_trylock(&(udev)->dev.sem)
+#define usb_lock_device(udev) mutex_lock(&(udev)->dev.mutex)
+#define usb_unlock_device(udev) mutex_unlock(&(udev)->dev.mutex)
+#define usb_trylock_device(udev) mutex_trylock(&(udev)->dev.mutex)
extern int usb_lock_device_for_reset(struct usb_device *udev,
const struct usb_interface *iface);


2007-11-06 01:38:46

by David Brownell

[permalink] [raw]
Subject: Re: [linux-usb-devel] device struct bloat

On Monday 05 November 2007, Greg KH wrote:
> --- linux-2.6-2.orig/drivers/spi/spi.c
> +++ linux-2.6-2/drivers/spi/spi.c

It'd be quicker to end up in the right hands if you had
split this big and random patch according to subsystem...

There's already a patch in the MM queue that removes
the SPI-private semaphore. Except that it's missing
the bug noted below.

The class semaphore removal would be a different issue.


> @@ -613,7 +613,7 @@ int spi_write_then_read(struct spi_devic
> ????????}
> ?
> ????????/* ... unless someone else is using the pre-allocated buffer */
> -???????if (down_trylock(&lock)) {
> +???????if (mutex_trylock(&lock)) {

According to its kerneldoc, mutex_trylock() follows the
spinlock model not the semaphore model. So the sense of
this test is incorrect ... as will be any similar changes
in other parts of this patch:

* NOTE: this function follows the spin_trylock() convention, so
* it is negated to the down_trylock() return values! Be careful
* about this when converting semaphore users to mutexes.

So the patch in the MM queue says "if (!mutex_trylock(...)) {

> ????????????????local_buf = kmalloc(SPI_BUFSIZ, GFP_KERNEL);
> ????????????????if (!local_buf)
> ????????????????????????return -ENOMEM;

2007-11-06 09:44:19

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [linux-usb-devel] device struct bloat

On Mon, 2007-11-05 at 17:38 -0800, David Brownell wrote:
> On Monday 05 November 2007, Greg KH wrote:
> > --- linux-2.6-2.orig/drivers/spi/spi.c
> > +++ linux-2.6-2/drivers/spi/spi.c
>
> It'd be quicker to end up in the right hands if you had
> split this big and random patch according to subsystem...

Yeah, sorry, I just started converting mindlessly and stopped once it
compiled (which explains the missing FW bits, I don't have that in my
config). Then booted and fixed up what broke, well aside from the final
lockdep issue. As I couldn't get that fixed, I never looked twice,
didn't make it a proper patch-set, nor validated the rest of the patch.

Wasn't even planning on sending it out initially.

> There's already a patch in the MM queue that removes
> the SPI-private semaphore. Except that it's missing
> the bug noted below.

Right, lack of a recent -mm made me work against mainline. I'm sure I
would've noticed otherwise :-)

2007-11-06 09:49:18

by Peter Zijlstra

[permalink] [raw]
Subject: Re: device struct bloat

On Mon, 2007-11-05 at 14:49 -0800, Greg KH wrote:
> On Mon, Nov 05, 2007 at 11:57:14AM +0100, Peter Zijlstra wrote:
> > Hmm, the problem seems to be stuff like:
> >
> > add usb driver to pci
> > scan pci devices
> > add usb host controller device
> > scan usb devices
> > add usb hub device
> > scan usb devices
> > add usb .....
> >
> > This seems to be able to go on forever, as long as one can cascade usb
> > hubs.
>
> USB hubs only work 7 deep, so there is a limit.

Ah, missed that bit of knowledge :-)

> > Doesn't seem like an ideal thing to do from a stack space POV either.
> >
> > Would it be possible to break at the second scan, that is the device
> > probe and stick that into a workqueue or something. Then we'd only ever
> > have driver->device nesting.
>
> Alan and Oliver have done some work in this area I think, combined with
> the suspend/bind/unbind issues. I'll let them comment on your patch :)

Great, so the thing I need to make this work nicely is a limited
device->mutex nesting, if these changes result in that we can work
together to finish this conversion.



2007-11-06 15:37:03

by Alan Stern

[permalink] [raw]
Subject: Re: device struct bloat

On Mon, 5 Nov 2007, Greg KH wrote:

> On Mon, Nov 05, 2007 at 11:57:14AM +0100, Peter Zijlstra wrote:
> > Hmm, the problem seems to be stuff like:
> >
> > add usb driver to pci
> > scan pci devices
> > add usb host controller device
> > scan usb devices
> > add usb hub device
> > scan usb devices
> > add usb .....
> >
> > This seems to be able to go on forever, as long as one can cascade usb
> > hubs.
>
> USB hubs only work 7 deep, so there is a limit.

In fact things don't work this way. The list above stops short after
"add usb host controller devices"; the probe routines for host
controllers do not scan for USB hubs or other USB devices. Instead
they are detected by a completely separate thread (khubd).

> > Doesn't seem like an ideal thing to do from a stack space POV either.
> >
> > Would it be possible to break at the second scan, that is the device
> > probe and stick that into a workqueue or something. Then we'd only ever
> > have driver->device nesting.
>
> Alan and Oliver have done some work in this area I think, combined with
> the suspend/bind/unbind issues. I'll let them comment on your patch :)

I gather the idea is to convert dev->sem to a mutex. This idea had
occurred to me a long time ago but I didn't pursue it because of the
sheer number of places where dev->sem gets used, not to mention the
lockdep problems.

You can't possibly solve the lockdep problems here with a simple-minded
approach like your DRIVER_NORMAL, DRIVER_PARENT, etc. The device tree
is arbitrarily deep & wide, and there is at least one routine that
acquires the semaphores for _all_ the devices in the tree. This fact
alone seems to preclude using lockdep for device locks. (If there was
a form of mutex_lock() that bypassed the lockdep checks, you could use
it and avoid these issues.)

Deadlock is a serious consideration. For the most part, routines
locking devices do so along a single path in the tree. For this simple
case the rule is: Never acquire a parent's lock while holding the
child's lock.

The routine that locks all the devices acquires the locks in order of
device registration. The idea here is that children are always
registered _after_ their parents, so this should be compatible with the
previous rule. But there is a potential problem: device_move() can
move an older child under a younger parent!

Right now we have no way to deal with this. There has been some
discussion of reordering the dpm_active list when a device is moved,
but so far nothing has been done about it.

Alan Stern

2007-11-06 15:58:20

by Peter Zijlstra

[permalink] [raw]
Subject: Re: device struct bloat

On Tue, 2007-11-06 at 10:36 -0500, Alan Stern wrote:

> > > Would it be possible to break at the second scan, that is the device
> > > probe and stick that into a workqueue or something. Then we'd only ever
> > > have driver->device nesting.
> >
> > Alan and Oliver have done some work in this area I think, combined with
> > the suspend/bind/unbind issues. I'll let them comment on your patch :)
>
> I gather the idea is to convert dev->sem to a mutex. This idea had
> occurred to me a long time ago but I didn't pursue it because of the
> sheer number of places where dev->sem gets used

That should never stop us from doing the right thing :-), also dev->sem
isn't used nearly as much as for example work_struct which was changed.

> , not to mention the lockdep problems.

Right, that is the only sort-of valid reason this has not been done yet.

> You can't possibly solve the lockdep problems here with a simple-minded
> approach like your DRIVER_NORMAL, DRIVER_PARENT, etc. The device tree
> is arbitrarily deep & wide, and there is at least one routine that
> acquires the semaphores for _all_ the devices in the tree.

*blink* someone needs to take all locks - why?

> This fact
> alone seems to preclude using lockdep for device locks. (If there was
> a form of mutex_lock() that bypassed the lockdep checks, you could use
> it and avoid these issues.)

I'm sceptical of this, since its a simple tree (as opposed to a balanced
tree) a simple lock-coupling approach should be enough to guarantee
consistency.

> Deadlock is a serious consideration. For the most part, routines
> locking devices do so along a single path in the tree. For this simple
> case the rule is: Never acquire a parent's lock while holding the
> child's lock.

Sure, but once you have a parent's lock, you could unlock your
grandparent, no? (it having a locked child, your parent, should be
enough to guarantee its continued existence)

> The routine that locks all the devices acquires the locks in order of
> device registration. The idea here is that children are always
> registered _after_ their parents, so this should be compatible with the
> previous rule. But there is a potential problem: device_move() can
> move an older child under a younger parent!

Seems like a weird rule, a typical tree locking rule would be to lock
them top-down - such a rule can easily cope with moves: lock old parent,
lock child, lock new parent, move child, unlock all in reverse order.

> Right now we have no way to deal with this. There has been some
> discussion of reordering the dpm_active list when a device is moved,
> but so far nothing has been done about it.

Like said, I think the tree locking model should be revisited. A
top-down locking model with lock-coupling should, from my ignorant
perspective, solve your problems.


2007-11-06 16:33:15

by Alan Stern

[permalink] [raw]
Subject: Re: device struct bloat

On Tue, 6 Nov 2007, Peter Zijlstra wrote:

> On Tue, 2007-11-06 at 10:36 -0500, Alan Stern wrote:
>
> > I gather the idea is to convert dev->sem to a mutex. This idea had
> > occurred to me a long time ago but I didn't pursue it because of the
> > sheer number of places where dev->sem gets used
>
> That should never stop us from doing the right thing :-), also dev->sem
> isn't used nearly as much as for example work_struct which was changed.

My hat is off to David Howells. He's willing to carry out far-ranging
changes well beyond the point I can stomach.

> > You can't possibly solve the lockdep problems here with a simple-minded
> > approach like your DRIVER_NORMAL, DRIVER_PARENT, etc. The device tree
> > is arbitrarily deep & wide, and there is at least one routine that
> > acquires the semaphores for _all_ the devices in the tree.
>
> *blink* someone needs to take all locks - why?

It's for system suspend. All the devices are locked to prevent driver
binding and unbinding during the suspend transition.

Even apart from that, there are places where the USB core needs to
acquire multiple layers of locks (more than just two). For example, if
somebody has hubs nested several deep and then unplugs the hub closest
to the computer, this will happen.

Shucks, any time a driver's probe routine tries to register a child
device you will run into problems. probe() is always called with the
device locked, and registration of a child will acquire the child's
lock in order to probe drivers for the child.

> > This fact
> > alone seems to preclude using lockdep for device locks. (If there was
> > a form of mutex_lock() that bypassed the lockdep checks, you could use
> > it and avoid these issues.)
>
> I'm sceptical of this, since its a simple tree (as opposed to a balanced
> tree) a simple lock-coupling approach should be enough to guarantee
> consistency.

I don't follow your reasoning. Let's say a task wants to lock all the
direct children of a particular device. In what order should the locks
be acquired? There's no natural tree-related ordering to follow.

In the simple case where locks are acquired along a path in the tree,
you could solve the lockdep issues by passing mutex_lock_nested() a key
equal to the device's depth in the tree. But that won't work with more
complicated cases.

> > Deadlock is a serious consideration. For the most part, routines
> > locking devices do so along a single path in the tree. For this simple
> > case the rule is: Never acquire a parent's lock while holding the
> > child's lock.
>
> Sure, but once you have a parent's lock, you could unlock your
> grandparent, no? (it having a locked child, your parent, should be
> enough to guarantee its continued existence)

Of course. So what? In many cases the code needs to keep all three
locks. The locks don't merely guarantee the device structs' continued
existence (a kref could do that) -- they serialize a number of related
operations.

> > The routine that locks all the devices acquires the locks in order of
> > device registration. The idea here is that children are always
> > registered _after_ their parents, so this should be compatible with the
> > previous rule. But there is a potential problem: device_move() can
> > move an older child under a younger parent!
>
> Seems like a weird rule, a typical tree locking rule would be to lock
> them top-down - such a rule can easily cope with moves: lock old parent,
> lock child, lock new parent, move child, unlock all in reverse order.

Yep. But this isn't a typical tree-locking. System suspend sometimes
requires that devices be powered down in a specific order, and there
are constraints not captured by the positions in the tree. We get
around this by powering devices down in reverse order of detection,
which should always be safe. Although the locking need not necessarily
follow these same constraints, it is certainly the easiest approach.

(And by the way, your example rule "lock old parent, lock child, lock
new parent" is subject to deadlocks. What if another task tries to
move a different device from under the new parent to the old parent at
the same time?)

> Like said, I think the tree locking model should be revisited. A
> top-down locking model with lock-coupling should, from my ignorant
> perspective, solve your problems.

I don't understand what you mean by "lock-coupling".

Alan Stern

2007-11-06 17:19:45

by Peter Zijlstra

[permalink] [raw]
Subject: Re: device struct bloat

On Tue, 2007-11-06 at 11:32 -0500, Alan Stern wrote:
> On Tue, 6 Nov 2007, Peter Zijlstra wrote:
> > On Tue, 2007-11-06 at 10:36 -0500, Alan Stern wrote:

> > > You can't possibly solve the lockdep problems here with a simple-minded
> > > approach like your DRIVER_NORMAL, DRIVER_PARENT, etc. The device tree
> > > is arbitrarily deep & wide, and there is at least one routine that
> > > acquires the semaphores for _all_ the devices in the tree.
> >
> > *blink* someone needs to take all locks - why?
>
> It's for system suspend. All the devices are locked to prevent driver
> binding and unbinding during the suspend transition.

Just locking the tree root is not enough? (thereby avoiding all any new
modifying operation to descend into the tree).

> Even apart from that, there are places where the USB core needs to
> acquire multiple layers of locks (more than just two). For example, if
> somebody has hubs nested several deep and then unplugs the hub closest
> to the computer, this will happen.

Pin the sub-tree root with a reference, iterate the sub-tree and delete
the leafs one by one?

> Shucks, any time a driver's probe routine tries to register a child
> device you will run into problems. probe() is always called with the
> device locked, and registration of a child will acquire the child's
> lock in order to probe drivers for the child.

Right, so you say you have unbounded stack recursion? How about pushing
each probe into a stack (workqueue perhaps). Then each stack entry would
only do a single level probe, if it would recurse push a new entry on
the stack. Basic recursive -> stack transform.

> > > This fact
> > > alone seems to preclude using lockdep for device locks. (If there was
> > > a form of mutex_lock() that bypassed the lockdep checks, you could use
> > > it and avoid these issues.)
> >
> > I'm sceptical of this, since its a simple tree (as opposed to a balanced
> > tree) a simple lock-coupling approach should be enough to guarantee
> > consistency.
>
> I don't follow your reasoning. Let's say a task wants to lock all the
> direct children of a particular device. In what order should the locks
> be acquired?

I'd first start by asking if you want to lock all the children or the
sub-tree. The latter can be done by locking the root of said sub-tree.

> There's no natural tree-related ordering to follow.

Of course there is a natural deadlock free locking order in a tree: lock
the parent, lock all its children, repeat by noting that each child is a
parent again. If you only ever lock top down, there should not be any
lateral dependencies.

> In the simple case where locks are acquired along a path in the tree,
> you could solve the lockdep issues by passing mutex_lock_nested() a key
> equal to the device's depth in the tree. But that won't work with more
> complicated cases.

And only up to 8, as that's the max nesting depth.

> > > Deadlock is a serious consideration. For the most part, routines
> > > locking devices do so along a single path in the tree. For this simple
> > > case the rule is: Never acquire a parent's lock while holding the
> > > child's lock.
> >
> > Sure, but once you have a parent's lock, you could unlock your
> > grandparent, no? (it having a locked child, your parent, should be
> > enough to guarantee its continued existence)
>
> Of course. So what? In many cases the code needs to keep all three
> locks. The locks don't merely guarantee the device structs' continued
> existence (a kref could do that) -- they serialize a number of related
> operations.

Right, but does the grandparent really need serialisation? Normal simple
tree manipulations don't require more than 2 locks to guarantee tree
consistency. So you either don't have a simple tree, or you have more
requirements.

> > > The routine that locks all the devices acquires the locks in order of
> > > device registration. The idea here is that children are always
> > > registered _after_ their parents, so this should be compatible with the
> > > previous rule. But there is a potential problem: device_move() can
> > > move an older child under a younger parent!
> >
> > Seems like a weird rule, a typical tree locking rule would be to lock
> > them top-down - such a rule can easily cope with moves: lock old parent,
> > lock child, lock new parent, move child, unlock all in reverse order.
>
> Yep. But this isn't a typical tree-locking. System suspend sometimes
> requires that devices be powered down in a specific order, and there
> are constraints not captured by the positions in the tree. We get
> around this by powering devices down in reverse order of detection,
> which should always be safe. Although the locking need not necessarily
> follow these same constraints, it is certainly the easiest approach.

Ah, there are more constraints than tree order (which as I get it
resembles bus order)? Which confuses me, as you cannot detect a leaf
before you have a parent, so top-down should still be good, no?

> (And by the way, your example rule "lock old parent, lock child, lock
> new parent" is subject to deadlocks. What if another task tries to
> move a different device from under the new parent to the old parent at
> the same time?)

D'0h, right you are. How about a delete, insert sequence?

> > Like said, I think the tree locking model should be revisited. A
> > top-down locking model with lock-coupling should, from my ignorant
> > perspective, solve your problems.
>
> I don't understand what you mean by "lock-coupling".

A
B C
D E F G

In order to lock F we do:
lock A, lock C, unlock A, lock F, unlock C

Look at it this way, either you're more complex than the VFS or it can
be annotated :-)

2007-11-06 18:05:42

by Alan Stern

[permalink] [raw]
Subject: Re: device struct bloat

On Tue, 6 Nov 2007, Peter Zijlstra wrote:

> > > *blink* someone needs to take all locks - why?
> >
> > It's for system suspend. All the devices are locked to prevent driver
> > binding and unbinding during the suspend transition.
>
> Just locking the tree root is not enough? (thereby avoiding all any new
> modifying operation to descend into the tree).

But not all modifying operations must descend into the tree from the
root. Not even operations that modify the tree itself, let alone
operations that modify the individual devices without changing their
position in the tree.

> > Even apart from that, there are places where the USB core needs to
> > acquire multiple layers of locks (more than just two). For example, if
> > somebody has hubs nested several deep and then unplugs the hub closest
> > to the computer, this will happen.
>
> Pin the sub-tree root with a reference, iterate the sub-tree and delete
> the leafs one by one?

Pinning isn't the issue; serialization is. You can't serialize
operations on device Y by locking device X, even if X is at the root of
a sub-tree containing Y.

> > Shucks, any time a driver's probe routine tries to register a child
> > device you will run into problems. probe() is always called with the
> > device locked, and registration of a child will acquire the child's
> > lock in order to probe drivers for the child.
>
> Right, so you say you have unbounded stack recursion?

I didn't say anything about unbounded recursion. I said there would be
one level of recursion, and that alone would mess up the lockdep scheme
in your proposed patch.

> How about pushing
> each probe into a stack (workqueue perhaps). Then each stack entry would
> only do a single level probe, if it would recurse push a new entry on
> the stack. Basic recursive -> stack transform.

No. You're advocating spending a lot of effort to take something that
works well and for no good reason make it much more complicated and
prone to failure.

> > I don't follow your reasoning. Let's say a task wants to lock all the
> > direct children of a particular device. In what order should the locks
> > be acquired?
>
> I'd first start by asking if you want to lock all the children or the
> sub-tree. The latter can be done by locking the root of said sub-tree.

I want to lock all the children. It can't be done just by locking the
root of the sub-tree.

> > There's no natural tree-related ordering to follow.
>
> Of course there is a natural deadlock free locking order in a tree: lock
> the parent, lock all its children, repeat by noting that each child is a
> parent again. If you only ever lock top down, there should not be any
> lateral dependencies.

Nonsense. Consider a simple tree: A is the root, B and C are two
children of A. Task 1 wants to lock all the entries in the tree, so it
acquires the locks for A (the parent), then B, and then C (the
children). Meanwhile task 2 also wants to lock all the entries in the
tree, so it acquires the locks for A (the parent), then C, and then B
(the children). Now there _is_ a lateral dependency (BC/CB), despite
the fact that both tasks lock from the top down.

> > In the simple case where locks are acquired along a path in the tree,
> > you could solve the lockdep issues by passing mutex_lock_nested() a key
> > equal to the device's depth in the tree. But that won't work with more
> > complicated cases.
>
> And only up to 8, as that's the max nesting depth.

I wasn't aware of the limit. This definitely suggests that I will need
to use lockdep-avoiding routines at some stage, even if not for the
purpose being discussed here.

> > > > Deadlock is a serious consideration. For the most part, routines
> > > > locking devices do so along a single path in the tree. For this simple
> > > > case the rule is: Never acquire a parent's lock while holding the
> > > > child's lock.
> > >
> > > Sure, but once you have a parent's lock, you could unlock your
> > > grandparent, no? (it having a locked child, your parent, should be
> > > enough to guarantee its continued existence)
> >
> > Of course. So what? In many cases the code needs to keep all three
> > locks. The locks don't merely guarantee the device structs' continued
> > existence (a kref could do that) -- they serialize a number of related
> > operations.
>
> Right, but does the grandparent really need serialisation? Normal simple
> tree manipulations don't require more than 2 locks to guarantee tree
> consistency. So you either don't have a simple tree, or you have more
> requirements.

Yes, the grandparent really needs serialization. We guarantee, for
example, that suspend and resume methods won't get called while probe
is running. That guarantee is provided by the device sem. If you
change it into a mutex and release it inside a subroutine called from
probe then the guarantee won't be met.

> > > > The routine that locks all the devices acquires the locks in order of
> > > > device registration. The idea here is that children are always
> > > > registered _after_ their parents, so this should be compatible with the
> > > > previous rule. But there is a potential problem: device_move() can
> > > > move an older child under a younger parent!
> > >
> > > Seems like a weird rule, a typical tree locking rule would be to lock
> > > them top-down - such a rule can easily cope with moves: lock old parent,
> > > lock child, lock new parent, move child, unlock all in reverse order.
> >
> > Yep. But this isn't a typical tree-locking. System suspend sometimes
> > requires that devices be powered down in a specific order, and there
> > are constraints not captured by the positions in the tree. We get
> > around this by powering devices down in reverse order of detection,
> > which should always be safe. Although the locking need not necessarily
> > follow these same constraints, it is certainly the easiest approach.
>
> Ah, there are more constraints than tree order (which as I get it
> resembles bus order)? Which confuses me, as you cannot detect a leaf
> before you have a parent, so top-down should still be good, no?

No. You seem to think that top-down is a clear-cut, well-defined total
order. It isn't; it's only a partial order. In the example I gave
earlier, top-down doesn't establish any definite order relation between
B and C because they are siblings. But there might be a power
dependency between them.

> > (And by the way, your example rule "lock old parent, lock child, lock
> > new parent" is subject to deadlocks. What if another task tries to
> > move a different device from under the new parent to the old parent at
> > the same time?)
>
> D'0h, right you are. How about a delete, insert sequence?

That would work. But it shows that the situation is more complicated
than it seems at first.

> > > Like said, I think the tree locking model should be revisited. A
> > > top-down locking model with lock-coupling should, from my ignorant
> > > perspective, solve your problems.
> >
> > I don't understand what you mean by "lock-coupling".
>
> A
> B C
> D E F G
>
> In order to lock F we do:
> lock A, lock C, unlock A, lock F, unlock C
>
> Look at it this way, either you're more complex than the VFS or it can
> be annotated :-)

It would work, but what an awful waste of time! Not to mention the
overhead due to cache-line bouncing on SMP systems. And contention for
hotspots near the root of the device tree.

All just to make lockdep happy! It doesn't seem worth it, to me.

Alan Stern

2007-11-06 18:57:42

by Peter Zijlstra

[permalink] [raw]
Subject: Re: device struct bloat

On Tue, 2007-11-06 at 13:05 -0500, Alan Stern wrote:
> On Tue, 6 Nov 2007, Peter Zijlstra wrote:
>
> > > > *blink* someone needs to take all locks - why?
> > >
> > > It's for system suspend. All the devices are locked to prevent driver
> > > binding and unbinding during the suspend transition.
> >
> > Just locking the tree root is not enough? (thereby avoiding all any new
> > modifying operation to descend into the tree).
>
> But not all modifying operations must descend into the tree from the
> root. Not even operations that modify the tree itself,

Right, that is only required if you want to exclude sub-tree
modifications. If you drop this requirement you can just jump in the
middle, lock an element and check if its still linked into the tree.

> let alone
> operations that modify the individual devices without changing their
> position in the tree.

Right, in that case the dev->sem serialises the variables inside the
struct device, that can be seen as a separate orthogonal lock, the fact
that they happen to be the same need not be relevant.

> > > Even apart from that, there are places where the USB core needs to
> > > acquire multiple layers of locks (more than just two). For example, if
> > > somebody has hubs nested several deep and then unplugs the hub closest
> > > to the computer, this will happen.
> >
> > Pin the sub-tree root with a reference, iterate the sub-tree and delete
> > the leafs one by one?
>
> Pinning isn't the issue; serialization is. You can't serialize
> operations on device Y by locking device X, even if X is at the root of
> a sub-tree containing Y.

Sometimes its enough if you know that locking Y requires locking X
first. But yeah it depends on the exact requirements.

> > > Shucks, any time a driver's probe routine tries to register a child
> > > device you will run into problems. probe() is always called with the
> > > device locked, and registration of a child will acquire the child's
> > > lock in order to probe drivers for the child.
> >
> > Right, so you say you have unbounded stack recursion?
>
> I didn't say anything about unbounded recursion. I said there would be
> one level of recursion, and that alone would mess up the lockdep scheme
> in your proposed patch.

I'm quite aware that the scheme in my patch is insufficient. A single
level of recursion should be no problem, we do that in other places, it
just requires passing down enough knowledge to know we've recursed.

> > How about pushing
> > each probe into a stack (workqueue perhaps). Then each stack entry would
> > only do a single level probe, if it would recurse push a new entry on
> > the stack. Basic recursive -> stack transform.
>
> No. You're advocating spending a lot of effort to take something that
> works well and for no good reason make it much more complicated and
> prone to failure.

Agreed, if there is only a single level of recursion this is overkill.

> > > I don't follow your reasoning. Let's say a task wants to lock all the
> > > direct children of a particular device. In what order should the locks
> > > be acquired?
> >
> > I'd first start by asking if you want to lock all the children or the
> > sub-tree. The latter can be done by locking the root of said sub-tree.
>
> I want to lock all the children. It can't be done just by locking the
> root of the sub-tree.

ok

> > > There's no natural tree-related ordering to follow.
> >
> > Of course there is a natural deadlock free locking order in a tree: lock
> > the parent, lock all its children, repeat by noting that each child is a
> > parent again. If you only ever lock top down, there should not be any
> > lateral dependencies.
>
> Nonsense. Consider a simple tree: A is the root, B and C are two
> children of A. Task 1 wants to lock all the entries in the tree, so it
> acquires the locks for A (the parent), then B, and then C (the
> children). Meanwhile task 2 also wants to lock all the entries in the
> tree, so it acquires the locks for A (the parent), then C, and then B
> (the children). Now there _is_ a lateral dependency (BC/CB), despite
> the fact that both tasks lock from the top down.

Damn, I was thinking of a B+tree, in that case the children are ordered,
much like what you need I guess. (Although the balancing of the B+tree
makes it more complex)

> > > In the simple case where locks are acquired along a path in the tree,
> > > you could solve the lockdep issues by passing mutex_lock_nested() a key
> > > equal to the device's depth in the tree. But that won't work with more
> > > complicated cases.
> >
> > And only up to 8, as that's the max nesting depth.
>
> I wasn't aware of the limit. This definitely suggests that I will need
> to use lockdep-avoiding routines at some stage, even if not for the
> purpose being discussed here.

Please, don't do that. Lockdep really helps. It might be a little
getting used to, but it really pays itself back in the validation it
does. Not knowing what you were doing (might it be worth to start a new
thread on that?) lock classes might help annotate.

> > Right, but does the grandparent really need serialisation? Normal simple
> > tree manipulations don't require more than 2 locks to guarantee tree
> > consistency. So you either don't have a simple tree, or you have more
> > requirements.
>
> Yes, the grandparent really needs serialization. We guarantee, for
> example, that suspend and resume methods won't get called while probe
> is running. That guarantee is provided by the device sem. If you
> change it into a mutex and release it inside a subroutine called from
> probe then the guarantee won't be met.

OK, that makes sense.

> > > > Like said, I think the tree locking model should be revisited. A
> > > > top-down locking model with lock-coupling should, from my ignorant
> > > > perspective, solve your problems.
> > >
> > > I don't understand what you mean by "lock-coupling".
> >
> > A
> > B C
> > D E F G
> >
> > In order to lock F we do:
> > lock A, lock C, unlock A, lock F, unlock C
> >
> > Look at it this way, either you're more complex than the VFS or it can
> > be annotated :-)
>
> It would work, but what an awful waste of time! Not to mention the
> overhead due to cache-line bouncing on SMP systems. And contention for
> hotspots near the root of the device tree.

You can jump in between if you don't need the sub-tree condition
(locking X requires having locked all its parents Y).

In that case you lock, and validate that you're still linked in the
tree, at which point you can continue your descent.

> All just to make lockdep happy! It doesn't seem worth it, to me.

I've learnt that making lockdep happy makes me happy. Really, the
validation it does really helps out.


Anyway, can we make a list of requirements so I can try and work
something out?

So its a simple tree with ordered children, where:
- probe can recurse once
- probe must exclude suspend/resume
- suspend/resume must exclude everything



2007-11-07 16:42:45

by Alan Stern

[permalink] [raw]
Subject: Re: device struct bloat

On Tue, 6 Nov 2007, Peter Zijlstra wrote:

> > Nonsense. Consider a simple tree: A is the root, B and C are two
> > children of A. Task 1 wants to lock all the entries in the tree, so it
> > acquires the locks for A (the parent), then B, and then C (the
> > children). Meanwhile task 2 also wants to lock all the entries in the
> > tree, so it acquires the locks for A (the parent), then C, and then B
> > (the children). Now there _is_ a lateral dependency (BC/CB), despite
> > the fact that both tasks lock from the top down.
>
> Damn, I was thinking of a B+tree, in that case the children are ordered,
> much like what you need I guess. (Although the balancing of the B+tree
> makes it more complex)

As a matter of fact there is a hidden ordering of the children, given
by klist_children in struct device. But I think relying on it would be
a mistake.

There probably aren't many places in the kernel where a task needs to
lock two sibling devices. The only one I know of is the locking
routine for system suspend -- but then I've never looked for any
others. So maybe we don't have to worry about it much. (On the other
hand, if we don't worry about it now then you can bet someone will run
into trouble in the future!)

> > > And only up to 8, as that's the max nesting depth.
> >
> > I wasn't aware of the limit. This definitely suggests that I will need
> > to use lockdep-avoiding routines at some stage, even if not for the
> > purpose being discussed here.
>
> Please, don't do that. Lockdep really helps. It might be a little
> getting used to, but it really pays itself back in the validation it
> does. Not knowing what you were doing (might it be worth to start a new
> thread on that?) lock classes might help annotate.

I'm already using mutex_lock_nested. But if the nesting limit is 8
then there's a possibility it might be exceeded someday. What's the
alternative?

The application is dynamic power management (autosuspend and
autoresume). There's a pm_mutex associated with the device, and it is
locked during the suspend and resume processing. The problem is that
when a device gets suspended, it tells its parent -- and then the
parent may see that all the children are now suspended so the parent
can autosuspend itself, and so on up the device tree. Likewise, if an
entire subtree is autosuspended and a device near the bottom needs to
autoresume for I/O, it first has to ask its parent to autoresume, which
then has to ask _its_ parent, etc.

Right now the implementation is confined to the USB subsystem so the
nesting can't get too deep, but in time it's likely to expand.


> Anyway, can we make a list of requirements so I can try and work
> something out?
>
> So its a simple tree with ordered children, where:
> - probe can recurse once
> - probe must exclude suspend/resume
> - suspend/resume must exclude everything

Um. How about instead if I try to list the places where dev->sem is
currently used?

As discussed earlier, during system-sleep transitions
every device must be locked.

Any place the driver core calls a driver method (probe,
remove, suspend, resume), it does so with the lock held.
The shutdown method may be different; I haven't paid any
attention to it.

There are some unfortunate encroachments from the USB
subsystem in the driver core. USB requires that during
probe and remove method calls, dev->parent->sem be locked
as well as dev->sem. Some of the time this happens
automatically, but at other times (like when a new driver
is registered) the driver core has to take the lock
explicitly. You can see comments about USB in
drivers/base/{bus,dd}.c.

Subsystems may have their own special uses for dev->sem.
Whenever they need something to be mutually exclusive with
probe, remove, suspend, or resume, they can use that lock.
For example, USB requires dev->sem to be held while doing a
device reset or a device-configuration change.

Individual drivers can hold dev->sem when they want to
synchronize some region of code with their remove, suspend,
or resume methods. The USB hub driver does this.

Incidentally, probe() can recurse more than once. In fact it does in
some spots.

Alan Stern