2002-10-09 00:49:06

by Patrick Mochel

[permalink] [raw]
Subject: [bk/patch] driver model update: device_unregister()


Without further ado, here is device_unregister(), and a fixup of the IDE
and USB code that use it. Description below.

Please apply. (bkbits is taking a really long time, so it might take a few
to actually show up).

-pat

Please pull from

bk://ldm.bkbits.net/linux-2.5-core

This will update the following files:

drivers/base/core.c | 38 +++++++++++++++++++++++++++++++-------
drivers/base/fs/device.c | 3 +--
drivers/ide/ide-disk.c | 2 +-
drivers/usb/core/usb.c | 4 ++--
include/linux/device.h | 3 ++-
5 files changed, 37 insertions(+), 13 deletions(-)

through these ChangeSets:

<[email protected]> (02/10/08 1.601)
USB: call device_unregister() instead of put_device() when removing devices.

<[email protected]> (02/10/08 1.600)
IDE: call device_unregister() instead of put_device() in ide-disk->cleanup().

<[email protected]> (02/10/08 1.599)
driver model: check return of get_device() when creating a driverfs file.

<[email protected]> (02/10/08 1.598)
driver model: and present field to struct device and implement device_unregister().

device_unregister() is intended to be the complement of device_register(), and assumes
most of the functionality of the current put_device(). It should be called by the bus
driver when a device is physically removed from the system.

It should _not_ be called from a driver's remove() method, as that remove() method is called
via device_detach() in device_unregister().

dev->present is used to flag the physical presence of the device. It is set when the device
is registered, and cleared when unregistered. get_device() checks this flag, and returns
a NULL pointer if cleared. This prevents anyone from obtaining a reference to a device
that has been unregistered (removed), but not yet been freed (e.g. if someone else is
holding a reference to it).

put_device() BUG()s if dev->present is set. A device should be unregistered before it is
freed. This will catch people doing it in the wrong order.





2002-10-09 00:49:50

by Patrick Mochel

[permalink] [raw]
Subject: Re: [bk/patch] driver model update: device_unregister()


[email protected], 2002-10-08 17:23:05-07:00, [email protected]
driver model: and present field to struct device and implement device_unregister().

device_unregister() is intended to be the complement of device_register(), and assumes
most of the functionality of the current put_device(). It should be called by the bus
driver when a device is physically removed from the system.

It should _not_ be called from a driver's remove() method, as that remove() method is called
via device_detach() in device_unregister().

dev->present is used to flag the physical presence of the device. It is set when the device
is registered, and cleared when unregistered. get_device() checks this flag, and returns
a NULL pointer if cleared. This prevents anyone from obtaining a reference to a device
that has been unregistered (removed), but not yet been freed (e.g. if someone else is
holding a reference to it).

put_device() BUG()s if dev->present is set. A device should be unregistered before it is
freed. This will catch people doing it in the wrong order.

diff -Nru a/drivers/base/core.c b/drivers/base/core.c
--- a/drivers/base/core.c Tue Oct 8 17:55:22 2002
+++ b/drivers/base/core.c Tue Oct 8 17:55:22 2002
@@ -175,7 +175,7 @@
INIT_LIST_HEAD(&dev->intf_list);
spin_lock_init(&dev->lock);
atomic_set(&dev->refcount,2);
-
+ dev->present = 1;
spin_lock(&device_lock);
if (dev->parent) {
get_device_locked(dev->parent);
@@ -219,7 +219,7 @@
struct device * get_device_locked(struct device * dev)
{
struct device * ret = dev;
- if (dev && atomic_read(&dev->refcount) > 0)
+ if (dev && dev->present && atomic_read(&dev->refcount) > 0)
atomic_inc(&dev->refcount);
else
ret = NULL;
@@ -241,8 +241,35 @@
*/
void put_device(struct device * dev)
{
+ struct device * parent;
if (!atomic_dec_and_lock(&dev->refcount,&device_lock))
return;
+ parent = dev->parent;
+ dev->parent = NULL;
+ spin_unlock(&device_lock);
+
+ BUG_ON(dev->present);
+
+ if (dev->release)
+ dev->release(dev);
+
+ if (parent)
+ put_device(parent);
+}
+
+/**
+ * device_unregister - unlink device
+ * @dev: device going away
+ *
+ * The device has been removed from the system, so we disavow knowledge
+ * of it. It might not be the final reference to the device, so we mark
+ * it as !present, so no more references to it can be acquired.
+ * In the end, we decrement the final reference count for it.
+ */
+void device_unregister(struct device * dev)
+{
+ spin_lock(&device_lock);
+ dev->present = 0;
list_del_init(&dev->node);
list_del_init(&dev->g_list);
list_del_init(&dev->bus_list);
@@ -267,11 +294,7 @@
/* remove the driverfs directory */
device_remove_dir(dev);

- if (dev->release)
- dev->release(dev);
-
- if (dev->parent)
- put_device(dev->parent);
+ put_device(dev);
}

static int __init device_init(void)
@@ -287,5 +310,6 @@
core_initcall(device_init);

EXPORT_SYMBOL(device_register);
+EXPORT_SYMBOL(device_unregister);
EXPORT_SYMBOL(get_device);
EXPORT_SYMBOL(put_device);
diff -Nru a/include/linux/device.h b/include/linux/device.h
--- a/include/linux/device.h Tue Oct 8 17:55:22 2002
+++ b/include/linux/device.h Tue Oct 8 17:55:22 2002
@@ -289,6 +289,7 @@
void *platform_data; /* Platform specific data (e.g. ACPI,
BIOS data relevant to device) */

+ u32 present;
u32 current_state; /* Current operating state. In
ACPI-speak, this is D0-D3, D0
being fully functional, and D3
@@ -327,7 +328,7 @@
* High level routines for use by the bus drivers
*/
extern int device_register(struct device * dev);
-
+extern void device_unregister(struct device * dev);

/* driverfs interface for exporting device attributes */


2002-10-09 00:50:50

by Patrick Mochel

[permalink] [raw]
Subject: Re: [bk/patch] driver model update: device_unregister()


[email protected], 2002-10-08 17:32:56-07:00, [email protected]
USB: call device_unregister() instead of put_device() when removing devices.

diff -Nru a/drivers/usb/core/usb.c b/drivers/usb/core/usb.c
--- a/drivers/usb/core/usb.c Tue Oct 8 17:55:15 2002
+++ b/drivers/usb/core/usb.c Tue Oct 8 17:55:15 2002
@@ -797,7 +797,7 @@
struct usb_interface *interface = &dev->actconfig->interface[i];

/* remove this interface */
- put_device(&interface->dev);
+ device_unregister(&interface->dev);
}
}

@@ -805,7 +805,7 @@
if (dev->devnum > 0) {
clear_bit(dev->devnum, dev->bus->devmap.devicemap);
usbfs_remove_device(dev);
- put_device(&dev->dev);
+ device_unregister(&dev->dev);
}

/* Decrement the reference count, it'll auto free everything when */

2002-10-09 00:50:51

by Patrick Mochel

[permalink] [raw]
Subject: Re: [bk/patch] driver model update: device_unregister()


[email protected], 2002-10-08 17:23:56-07:00, [email protected]
driver model: check return of get_device() when creating a driverfs file.

diff -Nru a/drivers/base/fs/device.c b/drivers/base/fs/device.c
--- a/drivers/base/fs/device.c Tue Oct 8 17:55:20 2002
+++ b/drivers/base/fs/device.c Tue Oct 8 17:55:20 2002
@@ -90,8 +90,7 @@
{
int error = -EINVAL;

- if (dev) {
- get_device(dev);
+ if (get_device(dev)) {
error = driverfs_create_file(&entry->attr,&dev->dir);
put_device(dev);
}

2002-10-09 00:50:50

by Patrick Mochel

[permalink] [raw]
Subject: Re: [bk/patch] driver model update: device_unregister()


[email protected], 2002-10-08 17:32:17-07:00, [email protected]
IDE: call device_unregister() instead of put_device() in ide-disk->cleanup().

diff -Nru a/drivers/ide/ide-disk.c b/drivers/ide/ide-disk.c
--- a/drivers/ide/ide-disk.c Tue Oct 8 17:55:17 2002
+++ b/drivers/ide/ide-disk.c Tue Oct 8 17:55:17 2002
@@ -1692,7 +1692,7 @@
{
struct gendisk *g = drive->disk;

- put_device(&drive->disk->disk_dev);
+ device_unregister(&drive->disk->disk_dev);
if ((drive->id->cfs_enable_2 & 0x3000) && drive->wcache)
if (do_idedisk_flushcache(drive))
printk (KERN_INFO "%s: Write Cache FAILED Flushing!\n",

2002-10-09 01:01:21

by Alexander Viro

[permalink] [raw]
Subject: Re: [bk/patch] driver model update: device_unregister()



On Tue, 8 Oct 2002, Patrick Mochel wrote:

>
> [email protected], 2002-10-08 17:32:17-07:00, [email protected]
> IDE: call device_unregister() instead of put_device() in ide-disk->cleanup().
>
> diff -Nru a/drivers/ide/ide-disk.c b/drivers/ide/ide-disk.c
> --- a/drivers/ide/ide-disk.c Tue Oct 8 17:55:17 2002
> +++ b/drivers/ide/ide-disk.c Tue Oct 8 17:55:17 2002
> @@ -1692,7 +1692,7 @@
> {
> struct gendisk *g = drive->disk;
>
> - put_device(&drive->disk->disk_dev);
> + device_unregister(&drive->disk->disk_dev);


While you are at it, _please_, take it into ide_drive. ->disk_dev is
handled by parititions/check.c; please don't overload it.

2002-10-09 16:30:33

by Patrick Mochel

[permalink] [raw]
Subject: Re: [bk/patch] driver model update: device_unregister()


> > diff -Nru a/drivers/ide/ide-disk.c b/drivers/ide/ide-disk.c
> > --- a/drivers/ide/ide-disk.c Tue Oct 8 17:55:17 2002
> > +++ b/drivers/ide/ide-disk.c Tue Oct 8 17:55:17 2002
> > @@ -1692,7 +1692,7 @@
> > {
> > struct gendisk *g = drive->disk;
> >
> > - put_device(&drive->disk->disk_dev);
> > + device_unregister(&drive->disk->disk_dev);
>
>
> While you are at it, _please_, take it into ide_drive. ->disk_dev is
> handled by parititions/check.c; please don't overload it.

No problem; I'll do that today. But, I also think some of the stuff in
fs/partitions/check.c is bogus and should die. Partitions are not devices,
and shouldn't be treated as such.

The patch below kills the struct device in struct hd_struct, and all the
code in fs/partitions/check.c to create and remove them. This should also
fix at least one periodic Oops that people have been seeing on shutdown.

Linus, as long as Al doesn't have a problem, please pull from the address
below.

Thanks,

-pat

Please pull from

bk://ldm.bkbits.net/linux-2.5-core

This will update the following files:

fs/partitions/check.c | 115 --------------------------------------------------
include/linux/genhd.h | 2
2 files changed, 117 deletions(-)

through these ChangeSets:

<[email protected]> (02/10/09 1.716.1.1)
driver model: don't treat partitions as devices.

fs/partitions/check.c has been treating partitions as devices, and using the struct device in
struct hd_struct to describe them. This was introcduced when SCSI was initially converted to use
the new driver model.

However, partitions are devices; they're logical divisions. They shouldn't be treated as devices,
and shouldn't show up in the device hierarchy. This changeset kills the creation and teardown of
the partition 'devices', and their attributes. It also removes the struct device from struct
hd_struct, and the struct device * from struct gendisk.


diff -Nru a/fs/partitions/check.c b/fs/partitions/check.c
--- a/fs/partitions/check.c Wed Oct 9 09:28:11 2002
+++ b/fs/partitions/check.c Wed Oct 9 09:28:11 2002
@@ -111,105 +111,6 @@
return buf;
}

-/* Driverfs file support */
-static ssize_t partition_device_kdev_read(struct device *driverfs_dev,
- char *page, size_t count, loff_t off)
-{
- kdev_t kdev;
- kdev.value=(int)(long)driverfs_dev->driver_data;
- return off ? 0 : sprintf (page, "%x\n",kdev.value);
-}
-static DEVICE_ATTR(kdev,S_IRUGO,partition_device_kdev_read,NULL);
-
-static ssize_t partition_device_type_read(struct device *driverfs_dev,
- char *page, size_t count, loff_t off)
-{
- return off ? 0 : sprintf (page, "BLK\n");
-}
-static DEVICE_ATTR(type,S_IRUGO,partition_device_type_read,NULL);
-
-static void driverfs_create_partitions(struct gendisk *hd)
-{
- int max_p = 1<<hd->minor_shift;
- struct hd_struct *p = hd->part;
- char name[DEVICE_NAME_SIZE];
- char bus_id[BUS_ID_SIZE];
- struct device *dev, *parent;
- int part;
-
- /* if driverfs not supported by subsystem, skip partitions */
- if (!(hd->flags & GENHD_FL_DRIVERFS))
- return;
-
- parent = hd->driverfs_dev;
-
- if (parent) {
- sprintf(name, "%s", parent->name);
- sprintf(bus_id, "%s:", parent->bus_id);
- } else {
- *name = *bus_id = '\0';
- }
-
- dev = &hd->disk_dev;
- dev->driver_data = (void *)(long)__mkdev(hd->major, hd->first_minor);
- sprintf(dev->name, "%sdisc", name);
- sprintf(dev->bus_id, "%sdisc", bus_id);
- for (part=1; part < max_p; part++) {
- dev = &p[part-1].hd_driverfs_dev;
- sprintf(dev->name, "%spart%d", name, part);
- sprintf(dev->bus_id, "%s:p%d", bus_id, part);
- if (!p[part-1].nr_sects)
- continue;
- dev->driver_data =
- (void *)(long)__mkdev(hd->major, hd->first_minor+part);
- }
-
- dev = &hd->disk_dev;
- dev->parent = parent;
- if (parent)
- dev->bus = parent->bus;
- device_register(dev);
- device_create_file(dev, &dev_attr_type);
- device_create_file(dev, &dev_attr_kdev);
-
- for (part=0; part < max_p-1; part++) {
- dev = &p[part].hd_driverfs_dev;
- dev->parent = parent;
- if (parent)
- dev->bus = parent->bus;
- if (!dev->driver_data)
- continue;
- device_register(dev);
- device_create_file(dev, &dev_attr_type);
- device_create_file(dev, &dev_attr_kdev);
- }
-}
-
-static void driverfs_remove_partitions(struct gendisk *hd)
-{
- int max_p = 1<<hd->minor_shift;
- struct device *dev;
- struct hd_struct *p;
- int part;
-
- for (part=1, p = hd->part; part < max_p; part++, p++) {
- dev = &p->hd_driverfs_dev;
- if (dev->driver_data) {
- device_remove_file(dev, &dev_attr_type);
- device_remove_file(dev, &dev_attr_kdev);
- put_device(dev);
- dev->driver_data = NULL;
- }
- }
- dev = &hd->disk_dev;
- if (dev->driver_data) {
- device_remove_file(dev, &dev_attr_type);
- device_remove_file(dev, &dev_attr_kdev);
- put_device(dev);
- dev->driver_data = NULL;
- }
-}
-
static void check_partition(struct gendisk *hd, struct block_device *bdev)
{
devfs_handle_t de = NULL;
@@ -412,7 +313,6 @@
if (blkdev_get(bdev, FMODE_READ, 0, BDEV_RAW) < 0)
return;
check_partition(disk, bdev);
- driverfs_create_partitions(disk);
devfs_create_partitions(disk);
blkdev_put(bdev, BDEV_RAW);
}
@@ -420,30 +320,16 @@
void update_partition(struct gendisk *disk, int part)
{
struct hd_struct *p = disk->part + part - 1;
- struct device *dev = &p->hd_driverfs_dev;

if (!p->nr_sects) {
if (p->de) {
devfs_unregister(p->de);
p->de = NULL;
}
- if (dev->driver_data) {
- device_remove_file(dev, &dev_attr_type);
- device_remove_file(dev, &dev_attr_kdev);
- put_device(dev);
- dev->driver_data = NULL;
- }
return;
}
if (!p->de)
devfs_register_partition(disk, part);
- if (dev->driver_data || !(disk->flags & GENHD_FL_DRIVERFS))
- return;
- dev->driver_data =
- (void *)(long)__mkdev(disk->major, disk->first_minor+part);
- device_register(dev);
- device_create_file(dev, &dev_attr_type);
- device_create_file(dev, &dev_attr_kdev);
}

int rescan_partitions(struct gendisk *disk, struct block_device *bdev)
@@ -528,7 +414,6 @@

void del_gendisk(struct gendisk *disk)
{
- driverfs_remove_partitions(disk);
wipe_partitions(disk);
unlink_gendisk(disk);
devfs_remove_partitions(disk);
diff -Nru a/include/linux/genhd.h b/include/linux/genhd.h
--- a/include/linux/genhd.h Wed Oct 9 09:28:11 2002
+++ b/include/linux/genhd.h Wed Oct 9 09:28:11 2002
@@ -62,7 +62,6 @@
unsigned long start_sect;
unsigned long nr_sects;
devfs_handle_t de; /* primary (master) devfs entry */
- struct device hd_driverfs_dev; /* support driverfs hiearchy */
};

#define GENHD_FL_REMOVABLE 1
@@ -86,7 +85,6 @@
int number; /* devfs crap */
devfs_handle_t de; /* more of the same */
devfs_handle_t disk_de; /* piled higher and deeper */
- struct device *driverfs_dev;
struct device disk_dev;
};


2002-10-09 17:06:55

by Linus Torvalds

[permalink] [raw]
Subject: Re: [bk/patch] driver model update: device_unregister()


On Wed, 9 Oct 2002, Patrick Mochel wrote:
>
> No problem; I'll do that today. But, I also think some of the stuff in
> fs/partitions/check.c is bogus and should die. Partitions are not devices,
> and shouldn't be treated as such.

I think that is a valid argument as long as it's called "driverfs" or
something, but since the thing is clearly evolving into a "kernelfs" and
has drivers and devices as only a part of its structure knowledge, and is
used to expose various kernel hierarchies and relationships, I actually
think that it makes sense to expose the relationship of partitions to
devices.

(Not that it has to use "struct device" to do so, of course, although I
don't see any major reason why it couldn't..)

What's the oops due to?

Linus

2002-10-09 17:18:03

by Patrick Mochel

[permalink] [raw]
Subject: Re: [bk/patch] driver model update: device_unregister()


On Wed, 9 Oct 2002, Linus Torvalds wrote:

>
> On Wed, 9 Oct 2002, Patrick Mochel wrote:
> >
> > No problem; I'll do that today. But, I also think some of the stuff in
> > fs/partitions/check.c is bogus and should die. Partitions are not devices,
> > and shouldn't be treated as such.
>
> I think that is a valid argument as long as it's called "driverfs" or
> something, but since the thing is clearly evolving into a "kernelfs" and
> has drivers and devices as only a part of its structure knowledge, and is
> used to expose various kernel hierarchies and relationships, I actually
> think that it makes sense to expose the relationship of partitions to
> devices.
>
> (Not that it has to use "struct device" to do so, of course, although I
> don't see any major reason why it couldn't..)

I agree that it is useful to expose the partitions of devices via the
filesystem, but struct device seems way too heavy-handed to describe them.
I think they would be better off as a single attribute file that dumped
the partition data about the disk.

You would have something like:

/sys/class/disk/
|-- devices
| | `-- 0 -> ../../../root/wherever

and in 'wherever':

/sys/root/wherever/
|-- partitions

I dunno about the format; or if we would want one file or one per
partition.

> What's the oops due to?

Sorry, I take it back. It wasn't an oops; it was a backtrace due to the
partitions being removed during an illegal context.

-pat

2002-10-09 17:20:00

by Patrick Mochel

[permalink] [raw]
Subject: Re: [bk/patch] driver model update: device_unregister()


> I think that is a valid argument as long as it's called "driverfs" or
> something, but since the thing is clearly evolving into a "kernelfs"

BTW, is that the name you prefer, and will you take the patch?


-pat

2002-10-09 17:27:25

by Alexander Viro

[permalink] [raw]
Subject: Re: [bk/patch] driver model update: device_unregister()



On Wed, 9 Oct 2002, Linus Torvalds wrote:

> I think that is a valid argument as long as it's called "driverfs" or
> something, but since the thing is clearly evolving into a "kernelfs" and
> has drivers and devices as only a part of its structure knowledge, and is
> used to expose various kernel hierarchies and relationships, I actually
> think that it makes sense to expose the relationship of partitions to
> devices.

It makes sense, but that should be done for gendisk. I.e. we should have
(name, base, range) - not a node for each partition.

At least one obvious reason for oops is that thing sets ->disk_dev, which is
under complete control of partitions/check.c. If anything, setting
->driverfs_dev would be legitimate - use of ->disk_dev is a bug.

Al, waiting for already merged stuff to show up in snapshots...

2002-10-09 17:34:18

by Linus Torvalds

[permalink] [raw]
Subject: Re: [bk/patch] driver model update: device_unregister()


On Wed, 9 Oct 2002, Patrick Mochel wrote:
>
> > I think that is a valid argument as long as it's called "driverfs" or
> > something, but since the thing is clearly evolving into a "kernelfs"
>
> BTW, is that the name you prefer, and will you take the patch?

I do know that "kfs" is too much of a random collection of consonants.
Looks like something out of an IBM architecture manual. "kernelfs" is more
acceptable, I think, but it's not perfect either - it's a bit too generic.
Isn't /proc a kernelfs too? But I can't come up with anything better..

Linus

2002-10-09 17:42:15

by Alexander Viro

[permalink] [raw]
Subject: Re: [bk/patch] driver model update: device_unregister()



On Wed, 9 Oct 2002, Linus Torvalds wrote:

>
> On Wed, 9 Oct 2002, Alexander Viro wrote:
> >
> > It makes sense, but that should be done for gendisk. I.e. we should have
> > (name, base, range) - not a node for each partition.
>
> Actually, we _do_ want to have a node for each partition, if we want to
> show the things that are associated with one particular partition. And we
> do have those things - mounts and (onc eit's working again) LVM
> relationships etc.
>
> It's a perfectly valid question to ask "what partitions are part of this
> extended disk?" or "which partition is the backing store for this
> filesystem?". Which implies that a partition is a real first-class
> entity, not just "one of a range".

Sorry, no. Which partition is the backing store for this filesystem is
question to some filesystem drivers. Not even every fs driver that
happens to use block devices - some of them use more than one (e.g
for journal).

IOW, it's not a partition property.

2002-10-09 17:48:09

by Linus Torvalds

[permalink] [raw]
Subject: Re: [bk/patch] driver model update: device_unregister()


On Wed, 9 Oct 2002, Alexander Viro wrote:
>
> Sorry, no. Which partition is the backing store for this filesystem is
> question to some filesystem drivers. Not even every fs driver that
> happens to use block devices - some of them use more than one (e.g
> for journal).
>
> IOW, it's not a partition property.

I didn't say it was a partition. I said it was a _filesystem_ property.
And yes, it can be a list of multiple partitions - the same way LVM is a
list of _multiple_ partitions.

The point being that a partition is a real entity, and should have a node
of its own - so that you can point to it (and "node" may of course be
"subdirectory" if you want to have multiple things associated with it).

Linus

2002-10-09 17:38:10

by Linus Torvalds

[permalink] [raw]
Subject: Re: [bk/patch] driver model update: device_unregister()


On Wed, 9 Oct 2002, Alexander Viro wrote:
>
> It makes sense, but that should be done for gendisk. I.e. we should have
> (name, base, range) - not a node for each partition.

Actually, we _do_ want to have a node for each partition, if we want to
show the things that are associated with one particular partition. And we
do have those things - mounts and (onc eit's working again) LVM
relationships etc.

It's a perfectly valid question to ask "what partitions are part of this
extended disk?" or "which partition is the backing store for this
filesystem?". Which implies that a partition is a real first-class
entity, not just "one of a range".

Linus

2002-10-09 17:42:35

by Linus Torvalds

[permalink] [raw]
Subject: Re: [bk/patch] driver model update: device_unregister()


On Wed, 9 Oct 2002, Patrick Mochel wrote:
>
> Peter Anvin suggest 'kernfs', which ain't bad. Then again, we could just
> call it 'patfs'..

Oh, yeah, call it "patfs", you egomaniac. Or you could just name the whole
OS after yourself, and then the world will _really_ see what an inflated
ego you have.

Uhh.. I mean.. Never mind.

Linus

2002-10-09 17:38:00

by Patrick Mochel

[permalink] [raw]
Subject: Re: [bk/patch] driver model update: device_unregister()


On Wed, 9 Oct 2002, Linus Torvalds wrote:

>
> On Wed, 9 Oct 2002, Patrick Mochel wrote:
> >
> > > I think that is a valid argument as long as it's called "driverfs" or
> > > something, but since the thing is clearly evolving into a "kernelfs"
> >
> > BTW, is that the name you prefer, and will you take the patch?
>
> I do know that "kfs" is too much of a random collection of consonants.
> Looks like something out of an IBM architecture manual. "kernelfs" is more
> acceptable, I think, but it's not perfect either - it's a bit too generic.
> Isn't /proc a kernelfs too? But I can't come up with anything better..

The fantasy that we have is to start reverting procfs back to what it was
originally designed for (and what the name hints at): displaying process
information. Of course, that will never be entirely possible, but we'll
always have our dreams.

Peter Anvin suggest 'kernfs', which ain't bad. Then again, we could just
call it 'patfs'..

-pat

2002-10-09 17:54:43

by Alexander Viro

[permalink] [raw]
Subject: Re: [bk/patch] driver model update: device_unregister()



On Wed, 9 Oct 2002, Linus Torvalds wrote:

>
> On Wed, 9 Oct 2002, Alexander Viro wrote:
> >
> > Sorry, no. Which partition is the backing store for this filesystem is
> > question to some filesystem drivers. Not even every fs driver that
> > happens to use block devices - some of them use more than one (e.g
> > for journal).
> >
> > IOW, it's not a partition property.
>
> I didn't say it was a partition. I said it was a _filesystem_ property.
> And yes, it can be a list of multiple partitions - the same way LVM is a
> list of _multiple_ partitions.
>
> The point being that a partition is a real entity, and should have a node
> of its own - so that you can point to it (and "node" may of course be
> "subdirectory" if you want to have multiple things associated with it).

OK, call me dense, but what things are associated with partition aside of the
fact that it exists?

2002-10-09 17:55:46

by Patrick Mochel

[permalink] [raw]
Subject: Re: [bk/patch] driver model update: device_unregister()


On Wed, 9 Oct 2002, Linus Torvalds wrote:

>
> On Wed, 9 Oct 2002, Alexander Viro wrote:
> >
> > Sorry, no. Which partition is the backing store for this filesystem is
> > question to some filesystem drivers. Not even every fs driver that
> > happens to use block devices - some of them use more than one (e.g
> > for journal).
> >
> > IOW, it's not a partition property.
>
> I didn't say it was a partition. I said it was a _filesystem_ property.
> And yes, it can be a list of multiple partitions - the same way LVM is a
> list of _multiple_ partitions.
>
> The point being that a partition is a real entity, and should have a node
> of its own - so that you can point to it (and "node" may of course be
> "subdirectory" if you want to have multiple things associated with it).

It doesn't have to be a struct device, either.

What describes partitions, struct hd_struct? By adding a struct
driver_dir_entry (yes, crappy name; will change) and a bit of glue logic,
we can create driverfs directories for them, and start adding attributes
to the partitions themselves.

Volume managers can have their own top-level directories, and one
directory for each volume, with symlinks to the partition directories
under the disk node directories that make up the volume.

The code shouldn't be that bad, and I can whip something this afternoon,
if interested..

-pat

2002-10-09 18:00:09

by Alexander Viro

[permalink] [raw]
Subject: Re: [bk/patch] driver model update: device_unregister()



On Wed, 9 Oct 2002, Patrick Mochel wrote:

> What describes partitions, struct hd_struct? By adding a struct
> driver_dir_entry (yes, crappy name; will change) and a bit of glue logic,
> we can create driverfs directories for them, and start adding attributes
> to the partitions themselves.

_What_ attributes? List of PIDs that hold it open? List of <something>
for all SCM_RIGHTS cookies that happen to pass opened file for that
partition around?

Guys, you are overengineering for no good reason, AFAICS. There is a
good reason why fuser(1) sits in userland and is not done as a property
list for open file. It's exactly the same situation.

2002-10-09 17:56:41

by Jeff Muizelaar

[permalink] [raw]
Subject: Re: [bk/patch] driver model update: device_unregister()

Linus Torvalds wrote:

>
>I do know that "kfs" is too much of a random collection of consonants.
>Looks like something out of an IBM architecture manual. "kernelfs" is more
>acceptable, I think, but it's not perfect either - it's a bit too generic.
>Isn't /proc a kernelfs too? But I can't come up with anything better..
>
>
Maybe sysfs?
Though that might be too close to /proc/sys

-Jeff

2002-10-09 18:56:45

by Alexander Viro

[permalink] [raw]
Subject: Re: [bk/patch] driver model update: device_unregister()



On Wed, 9 Oct 2002, Linus Torvalds wrote:

> i
> On Wed, 9 Oct 2002, Alexander Viro wrote:
> >
> > OK, call me dense, but what things are associated with partition aside of the
> > fact that it exists?
>
> Filesystems can be associated with one or more partitions. MD devices are
> associated with one or more partitions.
>

> Not disks. Partitions.

<groan>

Can you explain why "filesystem foo uses device bar" gives an object associated
with bar and "process baz has bar opened" doesn't?

"Associated with" is asymmetric in this case - just as for files and processes
holding them opened. We do have objects for process->file (/proc/<pid>/fd/*)
and we certainly don't have anything like that for other direction.

That's what I'm asking about - do we want to have objects on the _partition_
side of things that would require per-partition directory? Not on the
filesystem/swap/whatnot side...

2002-10-09 18:42:33

by Linus Torvalds

[permalink] [raw]
Subject: Re: [bk/patch] driver model update: device_unregister()

i
On Wed, 9 Oct 2002, Alexander Viro wrote:
>
> OK, call me dense, but what things are associated with partition aside of the
> fact that it exists?

Filesystems can be associated with one or more partitions. MD devices are
associated with one or more partitions.

Not disks. Partitions.

Linus


2002-10-09 19:12:50

by Linus Torvalds

[permalink] [raw]
Subject: Re: [bk/patch] driver model update: device_unregister()


On Wed, 9 Oct 2002, Alexander Viro wrote:
>
> That's what I'm asking about - do we want to have objects on the _partition_
> side of things that would require per-partition directory? Not on the
> filesystem/swap/whatnot side...

You're going about this the wrong way.

It doesn't _matter_ if you have associations on the partition side. The
_only_ thing that matters is that somebody else has associations _to_ a
partition. That is already enough to require a "node" to associate to.

How would you otherwise describe that association in filesystem terms?

Look, let me give you an example of an existing association.

/devices/bus/scsi/devices:
lrwxrwxrwx 1 root root 40 Oct 7 17:17 1:0:0:0 -> ../../../root/pci0/00:0b.0/scsi1/1:0:0:0

Here the association is from the "list of scsi devices" to the "hardware
device node that is associated with that device".

The thing I want to point out is that you need a "target node" in order to
be able to _have_ this association.

In other words, if you think that it is reasonable to have an assocation
from the MD device to the list of partitions that are part of that MD
device, then you _need_ to have a "partition node", because otherwise you
cannot have the pointer to it. See?

So your "partition side" vs "filesystem side" thing doesn't matter. It
doesn't matter which side the associations are, you need to have a node to
associate _with_.

Noth sides need a node. Even if the relationship is only going in one
direction.

Linus

2002-10-09 19:20:37

by Alexander Viro

[permalink] [raw]
Subject: Re: [bk/patch] driver model update: device_unregister()



On Wed, 9 Oct 2002, Linus Torvalds wrote:

>
> In other words, if you think that it is reasonable to have an assocation
> from the MD device to the list of partitions that are part of that MD
> device, then you _need_ to have a "partition node", because otherwise you
> cannot have the pointer to it. See?
>
> So your "partition side" vs "filesystem side" thing doesn't matter. It
> doesn't matter which side the associations are, you need to have a node to
> associate _with_.
>
> Noth sides need a node. Even if the relationship is only going in one
> direction.

*eeek*

Even aside of the problems with putting filesystems (and filesystem types)
into driverfs (can_unload() for each fs module?), partitions _ARE_ reused.
So logics with ->release() will be a killer.

Do you really want to do that?

2002-10-09 19:43:19

by Linus Torvalds

[permalink] [raw]
Subject: Re: [bk/patch] driver model update: device_unregister()


On Wed, 9 Oct 2002, Alexander Viro wrote:
>
> Even aside of the problems with putting filesystems (and filesystem types)
> into driverfs (can_unload() for each fs module?), partitions _ARE_ reused.
> So logics with ->release() will be a killer.

Note that we don't actually do this yet, and when/if we do it it obviously
will require us to have the association data structures in place. And
clearly the rule would have to be that a partition can't be re-used while
a filesystem is busily bound to it - and that has nothing to do with
driverfs/kernel/sysfs/xxxfs.

(That rule pretty much exists already, although we obviously don't
_enforce_ the rule, since we don't even have the data structure linkages
in place).

As to the "can_unload()" thing, I really suspect that the reason it shows
up is because module unloading is fundamentally broken - again regardless
of any driverfs issues. Talk to Rusty some day about it ;)

(Side note: it may be that we could _fix_ module unloading by adding a
driverfs node to modules too, and getting rid of the "single module count"
thing, and replacing it with a more generic "list of nodes using it".
The reason module unloading is so painful largely is exactly the fact that
we only have a count, and the generic module layer has no idea what is
actually _using_ the module - except for other modules. In other words, we
get the inter-module dependencies right, but we don't see any other
dependencies).

Linus


2002-10-09 20:06:02

by Alexander Viro

[permalink] [raw]
Subject: Re: [bk/patch] driver model update: device_unregister()



On Wed, 9 Oct 2002, Linus Torvalds wrote:

> As to the "can_unload()" thing, I really suspect that the reason it shows
> up is because module unloading is fundamentally broken - again regardless
> of any driverfs issues. Talk to Rusty some day about it ;)

I did and I'm less than impressed by his arguments. Filesystem module
unloading works fine, thank you very much. There we have explicit points
where thing becomes busy and ceases to be such. And IMNSHO it's the right
way to deal with that stuff, rather than delving into quiescrap. Rusty
deals with modules that are one step removed from "make down_write() modular"
so it's hardly a surprise that unloading gets hairy...

Notice that for partitions (and even more so for superblocks) we are talking
about _way_ more than "if it's busy, it can't be removed". Unlike PCI and
friends, here we have non-trivial locking and driverfs accesses changing
refcount would need to play nice with that.

Sigh... Screw it. Tell me what directory tree you want for block
device and I'll do that, ugly or not ;-/ I really don't like the
direction it's going, but it's your tree and since it hadn't reached
my fork threshold yet...


2002-10-09 20:37:12

by Andries Brouwer

[permalink] [raw]
Subject: Re: [bk/patch] driver model update: device_unregister()

On Wed, Oct 09, 2002 at 02:00:05PM -0400, Alexander Viro wrote:

> OK, call me dense, but what things are associated with partition aside of the
> fact that it exists?

A partition has an underlying device, a start and a length.
It may have a label or volume id.
It has a partition number (the 7 of hda7) that can be assigned.
It may have substructure on the partition level, like a
DOS-type partition (BSD slice) with BSD subpartitions.

Andries

2002-10-09 21:40:22

by Roman Zippel

[permalink] [raw]
Subject: Re: [bk/patch] driver model update: device_unregister()

Hi,

On Wed, 9 Oct 2002, Alexander Viro wrote:

> > As to the "can_unload()" thing, I really suspect that the reason it shows
> > up is because module unloading is fundamentally broken - again regardless
> > of any driverfs issues. Talk to Rusty some day about it ;)
>
> I did and I'm less than impressed by his arguments. Filesystem module
> unloading works fine, thank you very much.

I agree, that it works, but it's more complex than necessary.
We need module pointers everywhere, which are needed for reference
counting. To make this worse this only works for modules, so you possibly
need another reference count (e.g. see struct proc_dir_entry).
On the other hand can_unload() isn't really a solution either. From the
point can_unload() returns true, the module must not be called anymore. So
without some big lock this would only work if can_unload also calls
module_exit() and the whole thing gets even more complex.
Anyway, the module pointers can be easily replaced with a normal ref
count, but this requires that module_exit() can fail and so a new module
API, but IMO this can wait for 2.7, try_inc_mod_count() will do until
then.

bye, Roman

2002-10-09 22:25:37

by Kai Germaschewski

[permalink] [raw]
Subject: Re: [bk/patch] driver model update: device_unregister()

On Wed, 9 Oct 2002, Roman Zippel wrote:

> > I did and I'm less than impressed by his arguments. Filesystem module
> > unloading works fine, thank you very much.
>
> I agree, that it works, but it's more complex than necessary.
> We need module pointers everywhere, which are needed for reference
> counting. To make this worse this only works for modules, so you possibly
> need another reference count (e.g. see struct proc_dir_entry).
> On the other hand can_unload() isn't really a solution either. From the
> point can_unload() returns true, the module must not be called anymore. So
> without some big lock this would only work if can_unload also calls
> module_exit() and the whole thing gets even more complex.
> Anyway, the module pointers can be easily replaced with a normal ref
> count, but this requires that module_exit() can fail and so a new module
> API, but IMO this can wait for 2.7, try_inc_mod_count() will do until
> then.

This is a never ending story, but anyway. I think the mistake is seeing
mod->use_count as a reference count for all objects exported from that
module (apart from symbols). But it is not, and I don't think it should
be. It's more of a "don't unload me now, I have given away references
which won't come back anytime soon, so unloading may block a long time"
hint. The APIs used by the module can still internally do separate exact
ref counting without that causing races, that's because a
pci_unregister_driver() call should block until all references are gone
before returning. The module doesn't need to care. It just calls
pci_unregister_driver() from module_exit() and it'll be safe without any
additional coding or thinking needed.

The only thing you need to be aware of is that pci_unregister_driver()
will call ::remove for devices registered with your driver. If you know
that your ::remove routine can get rid of the reference without further
delay, that's fine. However, say, your pci device was a network card
which registered "eth0" and someone said "ifconfig eth0 up", you have to
unregister "eth0" in your ::remove routine. And you know that this will
block until "ifconfig eth0 down" has run, which is potentially a long
time. So at this point, you decide to rather not have the user wait
forever until his "rmmod net_pci" succeeds, but return -EBUSY right away.
You do it by running MOD_INC_USE_COUNT in ::open(), and DEC again in
::close(). Actually, you don't even need to do it yourself, the network
layer will do it for you if you tell it which module is associated with
"eth0" by setting ::owner.

Note that things are actually very similar to your suggestion.

You want a two stage

module_begin_unload()

module_finish_unload()

where the first call unregisters the references it registered before, and
possibly sets a flag or something to avoid other code to acquire further
references.

Then you have to wait until the ref count hits zero, then you can call
module_finish_unload()

What we currently use (at the places where it's done right), is just these
two steps embedded into say a single call to pci_unregister_driver(),
which unregisters the driver from the list of drivers, calls the
registered devices remove method, to get rid of those references. Then, it
waits until all references are gone and returns - really much the same
thing.

The current implementation gives you a way of knowing if unloading will
succeed shortly (if not, it'll fail with -EBUSY). Yours doesn't AFAICS,
and that's actually bad IMO, I think you actually have to try to see if
the unload would succeed, and that's not pretty.

And, done right, the API for the current implementation is so simple that
I doubt you'll be able to come up with something with more ease-of-use.

--Kai





2002-10-09 23:41:48

by Patrick Mochel

[permalink] [raw]
Subject: Re: [bk/patch] driver model update: device_unregister()


> Notice that for partitions (and even more so for superblocks) we are talking
> about _way_ more than "if it's busy, it can't be removed". Unlike PCI and
> friends, here we have non-trivial locking and driverfs accesses changing
> refcount would need to play nice with that.

Noted. The refcounting concerning driverfs file I/O doesn't work exactly
the way I want it to, and as you've pointed out, can be racy. I think I
know a way to fix it, and streamline the object access, so it should
become easier to add support for new object types in driverfs.

> Sigh... Screw it. Tell me what directory tree you want for block
> device and I'll do that, ugly or not ;-/ I really don't like the
> direction it's going, but it's your tree and since it hadn't reached
> my fork threshold yet...

The patch below is a start to implementing a little of what both Linus and
I want. It replaces the struct device for partitions with just a driverfs
directory. For now, it kills the attributes, though. There must be
wrappers for each object type to create and remove attributes, as well as
ops to read and write from them. I plan on adding those, but not until I
clean up some of the per-object stuff.

This has been tested on IDE only, as the driver for my SCSI controller
(qla1280.o) has stopped working...

-pat

Please pull from

bk://ldm.bkbits.net/linux-2.5-disk

This will update the following files:

drivers/ide/ide-probe.c | 1
drivers/scsi/sd.c | 2 -
fs/partitions/check.c | 90 +++++++++++-------------------------------------
include/linux/genhd.h | 3 -
4 files changed, 25 insertions(+), 71 deletions(-)

through these ChangeSets:

<[email protected]> (02/10/09 1.714.4.7)
driver model: replace devices allocated for partitions with just driverfs directories

Partitions are not devices and shouldn't be treated as such. But consensus
says that they should be represented somehow in driverfs.

This is a first step in doing so. It stops treating them like devices, and
adds just a driverfs directory for them. The attribute files don't show
up for them anymore, but that should be only temporary (until wrappers
are created to do so for partitions).

This also removes struct hd_struct::hd_driverfs_dev and struct
gendisk::driverfs_dev.

<[email protected]> (02/10/09 1.714.4.6)
SCSI: don't use struct gendisk::dirverfs_dev pointer.

They appear to be pointing to the disk that they've just found. But, we
can just use struct gendisk::parent to do the same thing, and have a
cleaner result.

<[email protected]> (02/10/09 1.714.4.5)
IDE: Make sure we set the part for struct gendisk::disk_dev when we initialize it.

It doesn't seem like we should even have this struct device, as its only
purpose is to add a layer of separation between the disk device and the
partitions. But, it can't be removed yet...

===== drivers/ide/ide-probe.c 1.20 vs edited =====
--- 1.20/drivers/ide/ide-probe.c Wed Oct 9 10:29:29 2002
+++ edited/drivers/ide/ide-probe.c Wed Oct 9 15:36:12 2002
@@ -998,6 +998,7 @@
sprintf(disk->disk_name,"hd%c",'a'+hwif->index*MAX_DRIVES+unit);
disk->minor_shift = PARTN_BITS;
disk->fops = ide_fops;
+ disk->disk_dev.parent = &hwif->drives[unit].gendev;
hwif->drives[unit].disk = disk;
}

===== drivers/scsi/sd.c 1.65 vs edited =====
--- 1.65/drivers/scsi/sd.c Wed Oct 9 02:09:33 2002
+++ edited/drivers/scsi/sd.c Wed Oct 9 16:05:25 2002
@@ -1430,7 +1430,7 @@
else
sprintf(gd->disk_name, "sd%c",'a'+dsk_nr%26);
gd->flags = sdp->removable ? GENHD_FL_REMOVABLE : 0;
- gd->driverfs_dev = &sdp->sdev_driverfs_dev;
+ gd->disk_dev.parent = &sdp->sdev_driverfs_dev;
gd->flags |= GENHD_FL_DRIVERFS | GENHD_FL_DEVFS;
sd_disks[dsk_nr] = gd;
printk(KERN_NOTICE "Attached scsi %sdisk %s at scsi%d, channel %d, "
===== fs/partitions/check.c 1.65 vs edited =====
--- 1.65/fs/partitions/check.c Tue Oct 8 11:36:39 2002
+++ edited/fs/partitions/check.c Wed Oct 9 15:47:12 2002
@@ -132,80 +132,44 @@
{
int max_p = 1<<hd->minor_shift;
struct hd_struct *p = hd->part;
- char name[DEVICE_NAME_SIZE];
- char bus_id[BUS_ID_SIZE];
- struct device *dev, *parent;
+ struct device *dev;
int part;

- /* if driverfs not supported by subsystem, skip partitions */
- if (!(hd->flags & GENHD_FL_DRIVERFS))
- return;
-
- parent = hd->driverfs_dev;
-
- if (parent) {
- sprintf(name, "%s", parent->name);
- sprintf(bus_id, "%s:", parent->bus_id);
- } else {
- *name = *bus_id = '\0';
- }
-
dev = &hd->disk_dev;
dev->driver_data = (void *)(long)__mkdev(hd->major, hd->first_minor);
- sprintf(dev->name, "%sdisc", name);
- sprintf(dev->bus_id, "%sdisc", bus_id);
- for (part=1; part < max_p; part++) {
- dev = &p[part-1].hd_driverfs_dev;
- sprintf(dev->name, "%spart%d", name, part);
- sprintf(dev->bus_id, "%s:p%d", bus_id, part);
- if (!p[part-1].nr_sects)
- continue;
- dev->driver_data =
- (void *)(long)__mkdev(hd->major, hd->first_minor+part);
- }
-
- dev = &hd->disk_dev;
- dev->parent = parent;
- if (parent)
- dev->bus = parent->bus;
+ sprintf(dev->name, "disk");
+ sprintf(dev->bus_id, "disk");
device_register(dev);
device_create_file(dev, &dev_attr_type);
device_create_file(dev, &dev_attr_kdev);

- for (part=0; part < max_p-1; part++) {
- dev = &p[part].hd_driverfs_dev;
- dev->parent = parent;
- if (parent)
- dev->bus = parent->bus;
- if (!dev->driver_data)
+ for (part=1; part < max_p; part++) {
+ struct driver_dir_entry * dir = &p[part-1].dir;
+ dir->name = kmalloc(BUS_ID_SIZE,GFP_KERNEL);
+ if (!dir->name)
continue;
- device_register(dev);
- device_create_file(dev, &dev_attr_type);
- device_create_file(dev, &dev_attr_kdev);
+ snprintf(dir->name,BUS_ID_SIZE,"partition%d",part);
+ dir->mode = (S_IFDIR| S_IRWXU | S_IRUGO | S_IXUGO);
+ if (p[part-1].nr_sects)
+ driverfs_create_dir(dir,&dev->dir);
}
}

static void driverfs_remove_partitions(struct gendisk *hd)
{
int max_p = 1<<hd->minor_shift;
- struct device *dev;
+ struct device *dev = &hd->disk_dev;
struct hd_struct *p;
int part;

for (part=1, p = hd->part; part < max_p; part++, p++) {
- dev = &p->hd_driverfs_dev;
- if (dev->driver_data) {
- device_remove_file(dev, &dev_attr_type);
- device_remove_file(dev, &dev_attr_kdev);
- put_device(dev);
- dev->driver_data = NULL;
- }
+ driverfs_remove_dir(&p->dir);
}
- dev = &hd->disk_dev;
+
if (dev->driver_data) {
device_remove_file(dev, &dev_attr_type);
device_remove_file(dev, &dev_attr_kdev);
- put_device(dev);
+ device_unregister(dev);
dev->driver_data = NULL;
}
}
@@ -400,6 +364,8 @@
if (disk->flags & GENHD_FL_CD)
devfs_create_cdrom(disk);

+ printk("registering disk %s\n",disk->disk_name);
+
/* No minors to use for partitions */
if (!disk->minor_shift)
return;
@@ -420,30 +386,18 @@
void update_partition(struct gendisk *disk, int part)
{
struct hd_struct *p = disk->part + part - 1;
- struct device *dev = &p->hd_driverfs_dev;

if (!p->nr_sects) {
if (p->de) {
devfs_unregister(p->de);
p->de = NULL;
}
- if (dev->driver_data) {
- device_remove_file(dev, &dev_attr_type);
- device_remove_file(dev, &dev_attr_kdev);
- put_device(dev);
- dev->driver_data = NULL;
- }
- return;
+ driverfs_remove_dir(&p->dir);
+ } else {
+ if (!p->de)
+ devfs_register_partition(disk, part);
+ driverfs_create_dir(&p->dir,&disk->disk_dev.dir);
}
- if (!p->de)
- devfs_register_partition(disk, part);
- if (dev->driver_data || !(disk->flags & GENHD_FL_DRIVERFS))
- return;
- dev->driver_data =
- (void *)(long)__mkdev(disk->major, disk->first_minor+part);
- device_register(dev);
- device_create_file(dev, &dev_attr_type);
- device_create_file(dev, &dev_attr_kdev);
}

int rescan_partitions(struct gendisk *disk, struct block_device *bdev)
===== include/linux/genhd.h 1.31 vs edited =====
--- 1.31/include/linux/genhd.h Tue Oct 8 11:36:39 2002
+++ edited/include/linux/genhd.h Wed Oct 9 15:49:50 2002
@@ -61,8 +61,8 @@
struct hd_struct {
sector_t start_sect;
sector_t nr_sects;
+ struct driver_dir_entry dir; /* driverfs support */
devfs_handle_t de; /* primary (master) devfs entry */
- struct device hd_driverfs_dev; /* support driverfs hiearchy */
};

#define GENHD_FL_REMOVABLE 1
@@ -86,7 +86,6 @@
int number; /* devfs crap */
devfs_handle_t de; /* more of the same */
devfs_handle_t disk_de; /* piled higher and deeper */
- struct device *driverfs_dev;
struct device disk_dev;
};


2002-10-10 07:04:17

by Mike Anderson

[permalink] [raw]
Subject: Re: [bk/patch] driver model update: device_unregister()

Patrick Mochel [[email protected]] wrote:
> This has been tested on IDE only, as the driver for my SCSI controller
> (qla1280.o) has stopped working...

I pulled and ran with an aic, ips, and scsi_debug scsi hosts. I looks
better, outside of the cleanups I am currently working on for SCSI
related nodes.

/sys/root/pci1
|-- 01:05.0
| `-- scsi2
| |-- 2:0:0:0
| | |-- 2:0:0:0:gen
| | `-- disk
| | `-- partition1
| |-- 2:0:1:0
| | |-- 2:0:1:0:gen
| | `-- disk
| `-- 2:0:5:0
| `-- 2:0:5:0:gen


-andmike
--
Michael Anderson
[email protected]

2002-10-10 17:59:40

by Roman Zippel

[permalink] [raw]
Subject: Re: [bk/patch] driver model update: device_unregister()

Hi,

On Wed, 9 Oct 2002, Kai Germaschewski wrote:

> So at this point, you decide to rather not have the user wait
> forever until his "rmmod net_pci" succeeds, but return -EBUSY right away.
> You do it by running MOD_INC_USE_COUNT in ::open(), and DEC again in
> ::close().

You don't really want to do this. Imagine what happens if you get
preempted before you had a chance to run MOD_INC_USE_COUNT.

> The current implementation gives you a way of knowing if unloading will
> succeed shortly (if not, it'll fail with -EBUSY). Yours doesn't AFAICS,
> and that's actually bad IMO, I think you actually have to try to see if
> the unload would succeed, and that's not pretty.

The unload path could look something like this:

if (mod->usecount())
return -EBUSY;
mod->state = cleanup;
if (mod->exit())
return -EBUSY;
(free module)

So this does what you want. The only problem is here that the exit call
can fail, because there are still users, so the module will stay in the
cleanup state. start/stop functions would give you better control over
this.

> And, done right, the API for the current implementation is so simple that
> I doubt you'll be able to come up with something with more ease-of-use.

The current API just hides the complexity, but it requires extra checks
all over the kernel, to test if an object belongs to a module and if the
module is running. My proposal would get rid of this.
Deciding whether when you can release a device object or a driver object
is pretty much the same problem and a common solution is IMO prefered. The
module code should work like the remaining kernel and not require extra
care everywhere.
The diff below shows how filesystem.c would change, which becomes simpler
as it doesn't has to care about the module state anymore.

bye, Roman

Index: fs/filesystems.c
===================================================================
RCS file: /home/other/cvs/linux/linux-2.5/fs/filesystems.c,v
retrieving revision 1.1.1.5
diff -u -p -r1.1.1.5 filesystems.c
--- fs/filesystems.c 16 Apr 2002 08:45:18 -0000 1.1.1.5
+++ fs/filesystems.c 10 Oct 2002 15:23:46 -0000
@@ -28,17 +28,17 @@
static struct file_system_type *file_systems;
static rwlock_t file_systems_lock = RW_LOCK_UNLOCKED;

-/* WARNING: This can be used only if we _already_ own a reference */
+/* WARNING: This can be used only if we _already_ own a reference
+ * or hold the file_systems_lock
+ */
void get_filesystem(struct file_system_type *fs)
{
- if (fs->owner)
- __MOD_INC_USE_COUNT(fs->owner);
+ atomic_inc(&fs->refcnt);
}

void put_filesystem(struct file_system_type *fs)
{
- if (fs->owner)
- __MOD_DEC_USE_COUNT(fs->owner);
+ atomic_dec(&fs->refcnt);
}

static struct file_system_type **find_filesystem(const char *name)
@@ -77,8 +77,10 @@ int register_filesystem(struct file_syst
p = find_filesystem(fs->name);
if (*p)
res = -EBUSY;
- else
+ else {
*p = fs;
+ get_filesystem(fs);
+ }
write_unlock(&file_systems_lock);
return res;
}
@@ -103,15 +105,15 @@ int unregister_filesystem(struct file_sy
tmp = &file_systems;
while (*tmp) {
if (fs == *tmp) {
+ put_filesystem(fs);
*tmp = fs->next;
fs->next = NULL;
- write_unlock(&file_systems_lock);
- return 0;
+ break;
}
tmp = &(*tmp)->next;
}
write_unlock(&file_systems_lock);
- return -EINVAL;
+ return atomic_read(&fs->refcnt) > 0 ? -EBUSY : 0;
}

static int fs_index(const char * __name)
@@ -145,8 +147,10 @@ static int fs_name(unsigned int index, c

read_lock(&file_systems_lock);
for (tmp = file_systems; tmp; tmp = tmp->next, index--)
- if (index <= 0 && try_inc_mod_count(tmp->owner))
- break;
+ if (index <= 0) {
+ get_filesystem(tmp)
+ break;
+ }
read_unlock(&file_systems_lock);
if (!tmp)
return -EINVAL;
@@ -216,14 +220,14 @@ struct file_system_type *get_fs_type(con

read_lock(&file_systems_lock);
fs = *(find_filesystem(name));
- if (fs && !try_inc_mod_count(fs->owner))
- fs = NULL;
+ if (fs)
+ get_filesystem(fs);
read_unlock(&file_systems_lock);
if (!fs && (request_module(name) == 0)) {
read_lock(&file_systems_lock);
fs = *(find_filesystem(name));
- if (fs && !try_inc_mod_count(fs->owner))
- fs = NULL;
+ if (fs)
+ put_filesystem(fs);
read_unlock(&file_systems_lock);
}
return fs;


2002-10-10 18:43:45

by Kai Germaschewski

[permalink] [raw]
Subject: Re: [bk/patch] driver model update: device_unregister()

On Thu, 10 Oct 2002, Roman Zippel wrote:

> > So at this point, you decide to rather not have the user wait
> > forever until his "rmmod net_pci" succeeds, but return -EBUSY right away.
> > You do it by running MOD_INC_USE_COUNT in ::open(), and DEC again in
> > ::close().
>
> You don't really want to do this. Imagine what happens if you get
> preempted before you had a chance to run MOD_INC_USE_COUNT.

Well, a rmmod could potentially leave the module in "deleted" but not
"freed" state for a long time. Basically the same thing which happens for
your solution, where you call the state "cleanup". But I agree, this is
not desirable at all (nor do I think your cleanup state is). That's why I
said you should be using the right APIs. And the network layer does things
correctly, you don't need any MOD_INC_USE_COUNT in your network driver at
all, just set .owner and the network layer does it all and race-free for
you.

> The unload path could look something like this:
>
> if (mod->usecount())
> return -EBUSY;
> mod->state = cleanup;
> if (mod->exit())
> return -EBUSY;
> (free module)
>
> So this does what you want. The only problem is here that the exit call
> can fail, because there are still users, so the module will stay in the
> cleanup state. start/stop functions would give you better control over
> this.

Well, I don't think I have a general problem with allowing module_exit()
to fail, though again I don't think it's necessary for drivers which use
safe standard APIs. I'm not sure about your cleanup state, why is that
necessary? Say you do pci_unregister_driver() in your module_exit(). That
means that no users will find your driver on the list of all drivers, so
they have nothing to do try_inc_mod_count() on, which you're trying to
protect against? This is, AFAICS, currently only needed to guarantee
race-free access to the module's use count. Since you're only using that
as a hint right now and leave the final decision to ->exit(), it shouldn't
be necessary.

> > And, done right, the API for the current implementation is so simple that
> > I doubt you'll be able to come up with something with more ease-of-use.
>
> The current API just hides the complexity, but it requires extra checks
> all over the kernel, to test if an object belongs to a module and if the
> module is running. My proposal would get rid of this.
> Deciding whether when you can release a device object or a driver object
> is pretty much the same problem and a common solution is IMO prefered. The
> module code should work like the remaining kernel and not require extra
> care everywhere.
> The diff below shows how filesystem.c would change, which becomes simpler
> as it doesn't has to care about the module state anymore.

(This example seems to emphasize what I just said, no checking for state
cleanup neeeded, at least I don't see it).

You're right, code providing standard APIs to drivers (be it filesystem
drivers or network card drivers) need to be aware that these drivers can
be modular, and need to get that case right.

However, the try_inc_mod_count() is only ever used in the slow
registration paths (in the example e.g. for mounting, but not for accesses
to a mounted file system), so performance is not an issue. And getting
APIs exported to drivers right needs thinking, yes, but the module
locking isn't even the hard part there, I would claim.

What you are doing is basically removing the infrastructure to get the use
count right in a race-free way, and cope with the race by leaving modules
in 'cleanup' but not 'freed' state until the users you raced with have
gone away.

BTW, the patch below also needs changes to each filesystem driver to
return &fs->refcnt as ->use_count(), I think. And since there's no global
lock to protect it, as opposed to the unload_lock for the old module use
count, you have no way to protect the use_count between checking it
to be zero and calling ->exit(). It can be incremented in between, and you
cover up for that race by failing ->exit(), leaving the file_system in
used but not registered state. I just don't like this any better.

--Kai

2002-10-11 02:14:17

by Rob Landley

[permalink] [raw]
Subject: Re: [bk/patch] driver model update: device_unregister()

On Wednesday 09 October 2002 01:50 pm, Linus Torvalds wrote:
> On Wed, 9 Oct 2002, Patrick Mochel wrote:
> > Peter Anvin suggest 'kernfs', which ain't bad. Then again, we could just
> > call it 'patfs'..
>
> Oh, yeah, call it "patfs", you egomaniac. Or you could just name the whole
> OS after yourself, and then the world will _really_ see what an inflated
> ego you have.
>
> Uhh.. I mean.. Never mind.
>
> Linus

We've already got a reiserfs... :)

Rob

2002-10-11 13:43:18

by Roman Zippel

[permalink] [raw]
Subject: Re: [bk/patch] driver model update: device_unregister()

Hi,

On Thu, 10 Oct 2002, Kai Germaschewski wrote:

> Well, a rmmod could potentially leave the module in "deleted" but not
> "freed" state for a long time. Basically the same thing which happens for
> your solution, where you call the state "cleanup". But I agree, this is
> not desirable at all (nor do I think your cleanup state is).

If exit() can fail, it avoids at least an unkillable rmmod, what makes a
module in the cleanup state IMO acceptable. It might look strange, but
it's still possible to remove the remaining users. The race is small
enough to be not really relevant and even if it does happen, it's not
exploitable, the cleanup will simply continue at a later point.
start()/stop() would give you better control over this. If a module still
has users after stopping it, you can still decide to start it again.

> However, the try_inc_mod_count() is only ever used in the slow
> registration paths (in the example e.g. for mounting, but not for accesses
> to a mounted file system), so performance is not an issue. And getting
> APIs exported to drivers right needs thinking, yes, but the module
> locking isn't even the hard part there, I would claim.

The problem is that we have conflicting methods for object management.
Driver objects have to use the module count, device object cannot use it
and have to use something else. Some objects (like proc entries) even have
to provide both.
Concentrating on a single mechanism will give us more flexibility and
simplicity.

> What you are doing is basically removing the infrastructure to get the use
> count right in a race-free way, and cope with the race by leaving modules
> in 'cleanup' but not 'freed' state until the users you raced with have
> gone away.

That "race" is nothing negativ here. You can also remove a file, while
it's still open and this will delay the release of the resources, until
then the file stays usable. A filesystem would behave the same, you
couldn't mount it anymore, but you can still see it used in /proc/mounts.
I don't think this is necessarly bad, it's just new and unfamiliar.

> BTW, the patch below also needs changes to each filesystem driver to
> return &fs->refcnt as ->use_count(), I think.

That's the other part of my proposal. A new style module would be defined
like this:

DEFINE_MODULE
.init = ...
...
DEFINE_MODULE_END

The new module layout allows to stack initialiations:

DEFINE_FS_MODULE
.name = ...
.init = ...
...
DEFINE_FS_MODULE_END

This would already take care of most of the generic management of an fs
module.

bye, Roman

2002-10-12 00:42:07

by Roman Zippel

[permalink] [raw]
Subject: Re: [bk/patch] driver model update: device_unregister()

Hi,

On Fri, 11 Oct 2002, Kai Germaschewski wrote:

> > Your mistake here is that e.g. start() would call the register function,
> > so it's basically the same. A proc entry would be a better example, so
> > stop() would call proc_unlink() and exit() proc_delete(). This means
> > start()/start() manage access to the driver, init()/exit() manage the
> > driver resources.
>
> Oh, I don't doubt that there are cases where your scheme works, I'm just
> saying restarting in general is hard - sure, for one simple proc_entry, no
> problem.

Restarting from exit() is only an option, leaving the module in the
cleanup state (once it started with the cleanup) is no error.

> > Network drivers are an interesting example, because one can't stop them,
> > because they are always addressed by name (so you can't unregister them).
>
> What? If you have the struct net_device, you can surely get the name from
> it. ::remove gives you the struct pci_dev, pci_dev::drvdata gives you
> your private per netdevice data, where is the problem to get to the
> struct_netdevice or the name?

I was talking about network drivers in general. pci drivers can be stopped
simply with pci_unregister_driver().
To get struct net_device in first place you need the name of the device,
so you can deconfigure it (otherwise the driver stays busy).
For example ISA network drivers are currently not registered as drivers
themselves, only the devices they manage are registered. start()/stop()
does only start/stop (or register/unregister) the driver _not_ the
devices.

> > On the other hand we can also look at the costs of the current API.
> > try_inc_mod_count() is a very expansive call, because you need to protect
> > the access to the module structure and to the device structure. This means
> > it's only usable in slow paths. MOD_INC_USE_COUNT became obsolete with
> > preempt (before it was just a small smp race). The only other possibility
> > is to block in module_exit() and hope that the users will go away soon.
> > Because of the requirement, that all references have to go before you even
> > can attempt to remove the module, a user can give you a hard time to get
> > rid of a module.
>
> You must be kidding me, I pointed out a couple of times now that there is
> *no* race when the module use count is handled right.

Could you please explain how this is related to what I wrote above?

> > So at worst you could say we exchange the race of being able to unload a
> > module at all with a race that a module might stay in the cleanup state
> > for a while. In the worst case scenario it will be easier to get the
> > module out of the cleanup state than to get a module removed with
> > module_exit().
>
> That's just not true. You can say that you save locking a spinlock in
> try_inc_mod_count() (which is used in non performance critical paths) in
> exchange for accepting that unload may race and your module keeps hanging
> in cleanup state for an unbound time. rmmod hanging for an unbound time
> cannot happen today, except for bugs (which still exist).

Have you understood what I wrote? I was talking about worst case behaviour
and you talk about locking???

> Show me how it's supposed to work in the net driver
> case above, than you may convince me.

That case is actually quite trivial. If one device fails to unload, it can
always set itself back to the stopped/running state, the active devices
will continue to work fine. If you want the other devices back you can
start a new bus scan.
Note a the driver cleanup itself only starts when all devices are gone,
until then it can always go back at least to the stopped state. A driver
in the stopped state can get no new users, so there is nothing to race
with.
The cleanup stage is only needed for usage which you don't return with
usecount() (usually proc entries). So if you only use
pci_register_driver() you only need to call pci_unregister_driver() in
stop() and remove the devices in exit() until the refcnt is zero. Until
then the driver stays in the stopped state, you could even call start
again and everything is working again (Practically it needs a few more
changes in driverfs, but it should show the basic idea.)

bye, Roman

PS: Could you please at least try to understand what I wrote before you
start flaming, your answer came a bit too quick for that. :(


2002-10-15 22:12:54

by Daniel Phillips

[permalink] [raw]
Subject: Re: [bk/patch] driver model update: device_unregister()

On Wednesday 09 October 2002 22:42, Andries Brouwer wrote:
> On Wed, Oct 09, 2002 at 02:00:05PM -0400, Alexander Viro wrote:
>
> > OK, call me dense, but what things are associated with partition aside of the
> > fact that it exists?
>
> A partition has an underlying device, a start and a length.

Subtle distinction: a partition has underlying media, and the media is
associated with a device.

> It may have a label or volume id.
> It has a partition number (the 7 of hda7) that can be assigned.
> It may have substructure on the partition level, like a
> DOS-type partition (BSD slice) with BSD subpartitions.

--
Daniel