Hey there.
This changeset and set of patches improves IDE support for the new driver
model. Instead of having each IDE driver declare and register a struct
device_driver, one is added to ide_driver_t. ide_register_driver() fills
in the necessary fields to the structure and registers it with the driver
model core.
A generic remove method is implemented, which forwards the call to the ide
drivers. Doing this allows the drives to received the remove() call during
device shutdown, which obviates the need for ide reboot notifier.
The ide core also checks if the device is present before registering, so
there should be no more "ghost drives" in the device tree.
Diffstat and changelogs are appended. The patches will follow.
-pat
Please pull from
bk://ldm.bkbits.net/linux-2.5-ide
This will update the following files:
drivers/ide/ide-disk.c | 16 ------------
drivers/ide/ide-probe.c | 11 ++++++++
drivers/ide/ide.c | 64 +++++++++++-------------------------------------
include/linux/ide.h | 1
4 files changed, 27 insertions(+), 65 deletions(-)
through these ChangeSets:
<[email protected]> (02/10/07 1.696.19.3)
IDE: Add generic remove() method for drives; remove reboot notifier.
The remove() method is generic for all drives, and set in ide_driver_t::gen_driver.
The call simply forwards the call to ide_driver_t::standby().
ide_drive_release() is also added, and set when the device is registered with
the driver model core. This cleans up the drive once the last reference has gone
away by calling ide_driver_t::cleanup().
These two additions obviate the need for IDE reboot notifier, since they exploit
the constructs of the driver model core. It has been removed.
<[email protected]> (02/10/07 1.696.19.2)
IDE: register ide driver for all ide drives; not just for disk drives.
This adds
struct device_driver gen_driver;
to ide_driver_t, which is filled in with necessary fields when an ide
driver calls ide_register_driver(). That then registers the driver with
the driver model core.
As a result, this gives us the following output in driverfs:
# tree -d /sys/bus/ide/drivers/
/sys/bus/ide/drivers/
|-- ide-cdrom
`-- ide-disk
The suspend/resume callbacks in ide-disk.c have been temporarily
disabled until the ide core implements generic methods which forward
the calls to the drive drivers.
<[email protected]> (02/10/07 1.696.19.1)
IDE: only register drives that are present with the driver core.
[email protected], 2002-10-07 10:27:16-07:00, [email protected]
IDE: register ide driver for all ide drives; not just for disk drives.
This adds
struct device_driver gen_driver;
to ide_driver_t, which is filled in with necessary fields when an ide
driver calls ide_register_driver(). That then registers the driver with
the driver model core.
As a result, this gives us the following output in driverfs:
# tree -d /sys/bus/ide/drivers/
/sys/bus/ide/drivers/
|-- ide-cdrom
`-- ide-disk
The suspend/resume callbacks in ide-disk.c have been temporarily
disabled until the ide core implements generic methods which forward
the calls to the drive drivers.
diff -Nru a/drivers/ide/ide-disk.c b/drivers/ide/ide-disk.c
--- a/drivers/ide/ide-disk.c Mon Oct 7 12:19:14 2002
+++ b/drivers/ide/ide-disk.c Mon Oct 7 12:19:14 2002
@@ -1550,14 +1550,6 @@
/* This is just a hook for the overall driver tree.
*/
-static struct device_driver idedisk_devdrv = {
- .bus = &ide_bus_type,
- .name = "IDE disk driver",
-
- .suspend = idedisk_suspend,
- .resume = idedisk_resume,
-};
-
static int idedisk_ioctl (ide_drive_t *drive, struct inode *inode,
struct file *file, unsigned int cmd, unsigned long arg)
{
@@ -1603,12 +1595,6 @@
drive->doorlocking = 1;
}
}
- {
- sprintf(drive->disk->disk_dev.name, "ide-disk");
- drive->disk->disk_dev.driver = &idedisk_devdrv;
- drive->disk->disk_dev.driver_data = drive;
- }
-
#if 1
(void) probe_lba_addressing(drive, 1);
#else
@@ -1791,7 +1777,6 @@
static int idedisk_init (void)
{
ide_register_driver(&idedisk_driver);
- driver_register(&idedisk_devdrv);
return 0;
}
diff -Nru a/drivers/ide/ide.c b/drivers/ide/ide.c
--- a/drivers/ide/ide.c Mon Oct 7 12:19:14 2002
+++ b/drivers/ide/ide.c Mon Oct 7 12:19:14 2002
@@ -3414,7 +3414,9 @@
list_del_init(&drive->list);
ata_attach(drive);
}
- return 0;
+ driver->gen_driver.name = driver->name;
+ driver->gen_driver.bus = &ide_bus_type;
+ return driver_register(&driver->gen_driver);
}
EXPORT_SYMBOL(ide_register_driver);
diff -Nru a/include/linux/ide.h b/include/linux/ide.h
--- a/include/linux/ide.h Mon Oct 7 12:19:14 2002
+++ b/include/linux/ide.h Mon Oct 7 12:19:14 2002
@@ -1187,6 +1187,7 @@
int (*attach)(ide_drive_t *);
void (*ata_prebuilder)(ide_drive_t *);
void (*atapi_prebuilder)(ide_drive_t *);
+ struct device_driver gen_driver;
struct list_head drives;
struct list_head drivers;
} ide_driver_t;
[email protected], 2002-10-07 11:57:29-07:00, [email protected]
IDE: Add generic remove() method for drives; remove reboot notifier.
The remove() method is generic for all drives, and set in ide_driver_t::gen_driver.
The call simply forwards the call to ide_driver_t::standby().
ide_drive_release() is also added, and set when the device is registered with
the driver model core. This cleans up the drive once the last reference has gone
away by calling ide_driver_t::cleanup().
These two additions obviate the need for IDE reboot notifier, since they exploit
the constructs of the driver model core. It has been removed.
diff -Nru a/drivers/ide/ide-disk.c b/drivers/ide/ide-disk.c
--- a/drivers/ide/ide-disk.c Mon Oct 7 12:19:11 2002
+++ b/drivers/ide/ide-disk.c Mon Oct 7 12:19:11 2002
@@ -1678,7 +1678,6 @@
{
struct gendisk *g = drive->disk;
- put_device(&drive->disk->disk_dev);
if ((drive->id->cfs_enable_2 & 0x3000) && drive->wcache)
if (do_idedisk_flushcache(drive))
printk (KERN_INFO "%s: Write Cache FAILED Flushing!\n",
diff -Nru a/drivers/ide/ide-probe.c b/drivers/ide/ide-probe.c
--- a/drivers/ide/ide-probe.c Mon Oct 7 12:19:11 2002
+++ b/drivers/ide/ide-probe.c Mon Oct 7 12:19:11 2002
@@ -952,6 +952,15 @@
EXPORT_SYMBOL(init_irq);
+static void ide_device_release(struct device * dev)
+{
+ ide_drive_t * drive = dev->driver_data;
+ ide_driver_t * driver = drive->driver;
+
+ if (driver && driver->cleanup)
+ driver->cleanup(drive);
+}
+
/*
* init_gendisk() (as opposed to ide_geninit) is called for each major device,
* after probing for drives, to allocate partition tables and other data
@@ -986,6 +995,8 @@
"%s","IDE Drive");
disk->disk_dev.parent = &hwif->gendev;
disk->disk_dev.bus = &ide_bus_type;
+ disk->disk_dev.driver_data = &hwif->drives[unit];
+ disk->disk_dev.release = ide_device_release;
if (hwif->drives[unit].present)
device_register(&disk->disk_dev);
hwif->drives[unit].disk = disk;
diff -Nru a/drivers/ide/ide.c b/drivers/ide/ide.c
--- a/drivers/ide/ide.c Mon Oct 7 12:19:11 2002
+++ b/drivers/ide/ide.c Mon Oct 7 12:19:11 2002
@@ -2437,6 +2437,7 @@
if (driver->attach(drive) == 0) {
if (driver->owner)
__MOD_DEC_USE_COUNT(driver->owner);
+ drive->disk->disk_dev.driver = &driver->gen_driver;
return 0;
}
spin_lock(&drivers_lock);
@@ -3396,6 +3397,16 @@
EXPORT_SYMBOL(ide_unregister_subdriver);
+static int ide_drive_remove(struct device * dev)
+{
+ ide_drive_t * drive = dev->driver_data;
+ ide_driver_t * driver = drive->driver;
+
+ if (driver && driver->standby)
+ driver->standby(drive);
+ return 0;
+}
+
int ide_register_driver(ide_driver_t *driver)
{
struct list_head list;
@@ -3416,6 +3427,7 @@
}
driver->gen_driver.name = driver->name;
driver->gen_driver.bus = &ide_bus_type;
+ driver->gen_driver.remove = ide_drive_remove;
return driver_register(&driver->gen_driver);
}
@@ -3467,52 +3479,6 @@
EXPORT_SYMBOL(ide_probe);
EXPORT_SYMBOL(ide_devfs_handle);
-static int ide_notify_reboot (struct notifier_block *this, unsigned long event, void *x)
-{
- ide_hwif_t *hwif;
- ide_drive_t *drive;
- int i, unit;
-
- switch (event) {
- case SYS_HALT:
- case SYS_POWER_OFF:
- case SYS_RESTART:
- break;
- default:
- return NOTIFY_DONE;
- }
-
- printk(KERN_INFO "flushing ide devices: ");
-
- for (i = 0; i < MAX_HWIFS; i++) {
- hwif = &ide_hwifs[i];
- if (!hwif->present)
- continue;
- for (unit = 0; unit < MAX_DRIVES; ++unit) {
- drive = &hwif->drives[unit];
- if (!drive->present)
- continue;
-
- /* set the drive to standby */
- printk("%s ", drive->name);
- if (event != SYS_RESTART)
- if (drive->driver != NULL && DRIVER(drive)->standby(drive))
- continue;
-
- if (drive->driver != NULL && DRIVER(drive)->cleanup(drive))
- continue;
- }
- }
- printk("\n");
- return NOTIFY_DONE;
-}
-
-static struct notifier_block ide_notifier = {
- ide_notify_reboot,
- NULL,
- 5
-};
-
struct bus_type ide_bus_type = {
.name = "ide",
};
@@ -3538,7 +3504,6 @@
ide_init_builtin_drivers();
initializing = 0;
- register_reboot_notifier(&ide_notifier);
return 0;
}
@@ -3573,7 +3538,6 @@
{
int index;
- unregister_reboot_notifier(&ide_notifier);
for (index = 0; index < MAX_HWIFS; ++index) {
ide_unregister(index);
#if defined(CONFIG_BLK_DEV_IDEDMA) && !defined(CONFIG_DMA_NONPCI)
[email protected], 2002-10-07 09:52:31-07:00, [email protected]
IDE: only register drives that are present with the driver core.
diff -Nru a/drivers/ide/ide-probe.c b/drivers/ide/ide-probe.c
--- a/drivers/ide/ide-probe.c Mon Oct 7 12:19:16 2002
+++ b/drivers/ide/ide-probe.c Mon Oct 7 12:19:16 2002
@@ -986,8 +986,8 @@
"%s","IDE Drive");
disk->disk_dev.parent = &hwif->gendev;
disk->disk_dev.bus = &ide_bus_type;
- device_register(&disk->disk_dev);
-
+ if (hwif->drives[unit].present)
+ device_register(&disk->disk_dev);
hwif->drives[unit].disk = disk;
}
On Mon, 7 Oct 2002, Alexander Viro wrote:
>
>
> On Mon, 7 Oct 2002, Patrick Mochel wrote:
>
> > --- a/drivers/ide/ide-probe.c Mon Oct 7 12:19:11 2002
> > +++ b/drivers/ide/ide-probe.c Mon Oct 7 12:19:11 2002
> > @@ -952,6 +952,15 @@
> >
> > EXPORT_SYMBOL(init_irq);
> >
> > +static void ide_device_release(struct device * dev)
> > +{
> > + ide_drive_t * drive = dev->driver_data;
> > + ide_driver_t * driver = drive->driver;
> > +
> > + if (driver && driver->cleanup)
> > + driver->cleanup(drive);
> > +}
> > +
> > /*
> > * init_gendisk() (as opposed to ide_geninit) is called for each major device,
> > * after probing for drives, to allocate partition tables and other data
> > @@ -986,6 +995,8 @@
> > "%s","IDE Drive");
> > disk->disk_dev.parent = &hwif->gendev;
> > disk->disk_dev.bus = &ide_bus_type;
> > + disk->disk_dev.driver_data = &hwif->drives[unit];
> > + disk->disk_dev.release = ide_device_release;
>
> That is Wrong(tm). Logics around ->cleanup() doesn't belong to driverfs.
> As far as IDE code is concerned, any outside calls of ->cleanup() are
> illegal. What are you trying to achieve with that?
It's the desctrutor for the device object. ->release() is the last thing
called when the last reference to the device goes away. That's the only
time it's called.
-pat
On Mon, 7 Oct 2002, Patrick Mochel wrote:
> --- a/drivers/ide/ide-probe.c Mon Oct 7 12:19:11 2002
> +++ b/drivers/ide/ide-probe.c Mon Oct 7 12:19:11 2002
> @@ -952,6 +952,15 @@
>
> EXPORT_SYMBOL(init_irq);
>
> +static void ide_device_release(struct device * dev)
> +{
> + ide_drive_t * drive = dev->driver_data;
> + ide_driver_t * driver = drive->driver;
> +
> + if (driver && driver->cleanup)
> + driver->cleanup(drive);
> +}
> +
> /*
> * init_gendisk() (as opposed to ide_geninit) is called for each major device,
> * after probing for drives, to allocate partition tables and other data
> @@ -986,6 +995,8 @@
> "%s","IDE Drive");
> disk->disk_dev.parent = &hwif->gendev;
> disk->disk_dev.bus = &ide_bus_type;
> + disk->disk_dev.driver_data = &hwif->drives[unit];
> + disk->disk_dev.release = ide_device_release;
That is Wrong(tm). Logics around ->cleanup() doesn't belong to driverfs.
As far as IDE code is concerned, any outside calls of ->cleanup() are
illegal. What are you trying to achieve with that?
> > It's the desctrutor for the device object. ->release() is the last thing
> > called when the last reference to the device goes away. That's the only
> > time it's called.
>
> ???
>
> Details, please. When can it happen and what normally holds that object
> pinned?
get_device()/put_device(). When struct device::refcount hits 0, it's
cleaned up. c.f. drivers/base/core.c::put_device().
The bus type that the device belongs to always owns it, and could easily
be put in struct bus_type. Basically, it tells the bus that it's finally
ok to free the structure. That's not done by the driver core, since the
struct device is (so far) always embedded in a bus-specific structure.
Thoughts? Suggestions?
-pat
On Mon, 7 Oct 2002, Patrick Mochel wrote:
> > On Mon, 7 Oct 2002, Patrick Mochel wrote:
> >
> > > --- a/drivers/ide/ide-probe.c Mon Oct 7 12:19:11 2002
> > > +++ b/drivers/ide/ide-probe.c Mon Oct 7 12:19:11 2002
> > > @@ -952,6 +952,15 @@
> > >
> > > EXPORT_SYMBOL(init_irq);
> > >
> > > +static void ide_device_release(struct device * dev)
> > > +{
> > > + ide_drive_t * drive = dev->driver_data;
> > > + ide_driver_t * driver = drive->driver;
> > > +
> > > + if (driver && driver->cleanup)
> > > + driver->cleanup(drive);
> > > +}
> > > +
> > > /*
> > > * init_gendisk() (as opposed to ide_geninit) is called for each major device,
> > > * after probing for drives, to allocate partition tables and other data
> > > @@ -986,6 +995,8 @@
> > > "%s","IDE Drive");
> > > disk->disk_dev.parent = &hwif->gendev;
> > > disk->disk_dev.bus = &ide_bus_type;
> > > + disk->disk_dev.driver_data = &hwif->drives[unit];
> > > + disk->disk_dev.release = ide_device_release;
> >
> > That is Wrong(tm). Logics around ->cleanup() doesn't belong to driverfs.
> > As far as IDE code is concerned, any outside calls of ->cleanup() are
> > illegal. What are you trying to achieve with that?
>
> It's the desctrutor for the device object. ->release() is the last thing
> called when the last reference to the device goes away. That's the only
> time it's called.
???
Details, please. When can it happen and what normally holds that object
pinned?
On Mon, 7 Oct 2002, Patrick Mochel wrote:
>
> > > It's the desctrutor for the device object. ->release() is the last thing
> > > called when the last reference to the device goes away. That's the only
> > > time it's called.
> >
> > ???
> >
> > Details, please. When can it happen and what normally holds that object
> > pinned?
>
> get_device()/put_device(). When struct device::refcount hits 0, it's
> cleaned up. c.f. drivers/base/core.c::put_device().
>
> The bus type that the device belongs to always owns it, and could easily
> be put in struct bus_type. Basically, it tells the bus that it's finally
> ok to free the structure. That's not done by the driver core, since the
> struct device is (so far) always embedded in a bus-specific structure.
>
> Thoughts? Suggestions?
Lots and almost none of them printable.
_ALL_ buses that have driverfs support (IDE, SCSI, USB, PCI) have their
own rules for lifetimes of their structures. And that's not likely to
change - these objects belong to drivers and in some cases (IDE) are
not even allocated dynamically - they are reused if nothing is holding
them.
Note that all cases I've mentioned have oopsable races in the situations
when somebody keeps a reference to driverfs object longer than driver
keeps the object it's embedded into (in case of IDE you have no put_device()
calls at all, so there you happily re-register same node again and again).
Situation is very different from that for filesystems - there you have no
own lifetime rules for private part of inode; everything is controlled by
VFS.
I don't believe that you can massage the code in drivers into form where
freeing objects would be controlled by driverfs (and it's not just a matter
of postponing kfree(); if driverfs methods want to access something else,
they'll need that something also preserved). Feel free to prove me wrong,
but I'll believe it when I see it. Notice that ->release() has nothing
to any possible solution - the problem happens when we _don't_ call it
for too long (== retain reference after driver had dropped its own).
We'll need either "wait until all remaining references are gone" or "make sure
that no new references are about to appear and check reference counts of all
my nodes" - we need _some_ way for driver to decide when it's safe to get rid
of data structures.
Alternatively, we could go for separate allocation of struct device and
try to reduce the area where driverfs needs something beyond the contents
of struct device. Then we could simply take rwsem around it - exclusive
around "remove that node" and shared around every other area. Then drivers
would need to call something like
device_gone(dev);
put_device(dev);
with device_gone() doing
down_write(&dev->sem);
dev->dead = 1;
up_write(&dev->sem);
and all other accesses to the guts would do
down_read(&dev->sem);
if (!dev->dead)
...
up_read(&dev->sem);
- all the while holding a reference to dev, obviously.
I think that it's easier than messing with lifetime rules for driver objects;
whether we can tolerate overhead or not is a different question. It certainly
appears to have lower fuckup potential.
As it is, driverfs code has oopsable races for every bus that has driverfs
support, so something has to be done. Comments?
Al, liking driverfs less and less...
On Tue, 2002-10-08 at 03:17, Alexander Viro wrote:
> _ALL_ buses that have driverfs support (IDE, SCSI, USB, PCI) have their
> own rules for lifetimes of their structures. And that's not likely to
> change - these objects belong to drivers and in some cases (IDE) are
> not even allocated dynamically - they are reused if nothing is holding
> them.
IDE objects can also outlast the hardware - consider an active mount on
an ejected pcmcia card. Right now we don't do the right stuff to
reconnect that on re-insert but one day we may need to. As it is we keep
the instance around to avoid crashes
On 8 Oct 2002, Alan Cox wrote:
> On Tue, 2002-10-08 at 03:17, Alexander Viro wrote:
> > _ALL_ buses that have driverfs support (IDE, SCSI, USB, PCI) have their
> > own rules for lifetimes of their structures. And that's not likely to
> > change - these objects belong to drivers and in some cases (IDE) are
> > not even allocated dynamically - they are reused if nothing is holding
> > them.
>
> IDE objects can also outlast the hardware - consider an active mount on
> an ejected pcmcia card. Right now we don't do the right stuff to
> reconnect that on re-insert but one day we may need to. As it is we keep
> the instance around to avoid crashes
Ouch. That (reconnects) may require interesting things from queue-related
code. What behaviour do you want while card is disconnected? All requests
getting errors / all requests getting blocked / reads failing, writes blocking?
On Tue, 2002-10-08 at 13:24, Alexander Viro wrote:
> > IDE objects can also outlast the hardware - consider an active mount on
> > an ejected pcmcia card. Right now we don't do the right stuff to
> > reconnect that on re-insert but one day we may need to. As it is we keep
> > the instance around to avoid crashes
>
> Ouch. That (reconnects) may require interesting things from queue-related
> code. What behaviour do you want while card is disconnected? All requests
> getting errors / all requests getting blocked / reads failing, writes blocking?
Right now it errors further requests. USB scsi does the whole reconnect
thing and seems to get it right, I need to look into this
>Ouch. That (reconnects) may require interesting things from queue-related
>code. What behaviour do you want while card is disconnected? All requests
>getting errors / all requests getting blocked / reads failing, writes
>blocking?
Same problem I beleive with the hotswap ATAPI bay I have on some
machines, and probably very similar at the queue level to the
problem of USB & FireWire drives.
Ben.
> > > _ALL_ buses that have driverfs support (IDE, SCSI, USB, PCI) have their
> > > own rules for lifetimes of their structures. And that's not likely to
> > > change - these objects belong to drivers and in some cases (IDE) are
> > > not even allocated dynamically - they are reused if nothing is holding
> > > them.
> >
> > IDE objects can also outlast the hardware - consider an active mount on
> > an ejected pcmcia card. Right now we don't do the right stuff to
> > reconnect that on re-insert but one day we may need to. As it is we keep
> > the instance around to avoid crashes
>
> Ouch. That (reconnects) may require interesting things from queue-related
> code. What behaviour do you want while card is disconnected? All requests
> getting errors / all requests getting blocked / reads failing, writes blocking?
This raises the interesting possibility of being able to refer to
things like removable media directly, instead of the device the media
is inserted in.
The Amiga was doing this years ago. You could access floppy drives
as, E.G. df0:, df1:, etc, but if you formatted a volume and called it
foobar, you could access foobar: no matter which floppy drive you put
it in to.
Also, Plan 9 does similar interesting things - you can do the equivilent of:
ls /internet/websites/kernel.org/
and treat the website as a filesystem.
John.
On Tue, 8 Oct 2002, [email protected] wrote:
> This raises the interesting possibility of being able to refer to
> things like removable media directly, instead of the device the media
> is inserted in.
>
> The Amiga was doing this years ago. You could access floppy drives
> as, E.G. df0:, df1:, etc, but if you formatted a volume and called it
> foobar, you could access foobar: no matter which floppy drive you put
> it in to.
Isn't this possible in /etc/fstab already? Standard redhat-installs seem
to put in the labels of the volume instead of referring to the device.
> Also, Plan 9 does similar interesting things - you can do the
> equivilent of:
>
> ls /internet/websites/kernel.org/
>
> and treat the website as a filesystem.
Wouldn't you just need a filesystem driver for this? I don't know that
it's a good idea, though ;)
Ketil
Oh fun.
> _ALL_ buses that have driverfs support (IDE, SCSI, USB, PCI) have their
> own rules for lifetimes of their structures. And that's not likely to
> change - these objects belong to drivers and in some cases (IDE) are
> not even allocated dynamically - they are reused if nothing is holding
> them.
The problem only exists when a device is hotpluggable. And, if a device is
hotpluggable, the controlling bus should dynamically allocate the
structures for that device. Anything else seems buggy. Regardless of
driver model structures, it seems there is an implicit race between
removing a device, cleaning up that structure, and inserting a new device.
> Note that all cases I've mentioned have oopsable races in the situations
> when somebody keeps a reference to driverfs object longer than driver
> keeps the object it's embedded into (in case of IDE you have no put_device()
> calls at all, so there you happily re-register same node again and again).
The absence of put_device() in IDE is a bug.
> Situation is very different from that for filesystems - there you have no
> own lifetime rules for private part of inode; everything is controlled by
> VFS.
>
> I don't believe that you can massage the code in drivers into form where
> freeing objects would be controlled by driverfs (and it's not just a matter
> of postponing kfree(); if driverfs methods want to access something else,
> they'll need that something also preserved). Feel free to prove me wrong,
The driver core is not controlling the lifetime of the objects. It's
managing the reference counts on the shared objects. It's still the
allocator that frees the device. It just can't do it whenever it wants,
because it's shared and may be accessed by other entities. The release()
method is the notification that it's kosher to do so.
[ Note that I'm not trying to be a smart ass. I know you have much more
experience with this stuff; I'm just trying to grasp all the angles. ]
> but I'll believe it when I see it. Notice that ->release() has nothing
> to any possible solution - the problem happens when we _don't_ call it
> for too long (== retain reference after driver had dropped its own).
The only timing issue is when the device structures are reused. And, it
seems that that is inherently racy anyway with hotpluggable devices.
> We'll need either "wait until all remaining references are gone" or "make sure
> that no new references are about to appear and check reference counts of all
> my nodes" - we need _some_ way for driver to decide when it's safe to get rid
> of data structures.
>
> Alternatively, we could go for separate allocation of struct device and
> try to reduce the area where driverfs needs something beyond the contents
> of struct device. Then we could simply take rwsem around it - exclusive
> around "remove that node" and shared around every other area. Then drivers
> would need to call something like
> device_gone(dev);
> put_device(dev);
> with device_gone() doing
> down_write(&dev->sem);
> dev->dead = 1;
> up_write(&dev->sem);
> and all other accesses to the guts would do
> down_read(&dev->sem);
> if (!dev->dead)
> ...
> up_read(&dev->sem);
> - all the while holding a reference to dev, obviously.
I agree that we need some sort of device_unregister() (or device_gone()),
which could also detach the device from the bus and driver it belongs to,
and remove the driverfs entries. This would prevent it from gaining any
new references before the remaining extant references went away. I'll also
modify driverfs to do get()/put() in read(2) and write(2), instead of
open(2) and close(2).
I don't see how dynamically allocating the device structures is
necessarily going to help. It seems that it would be a step backwards from
what we currently have. We'll still need reference counting on the
bus-specific objects, and separate handlers for them, to decide when its
ok to free them (since someone could independently be holding a reference
to a bus object when the device goes away).
struct device was an attempt to consolidate the constructs and methods to
deal with them. The easiest way to do this is to embed the generic object
in the bus-specific object. Do you see anyway to make this work?
Thanks,
-pat
On Tue, Oct 08, 2002 at 02:05:37PM +0100, Alan Cox wrote:
> USB scsi does the whole reconnect thing and seems to get it right
You are an optimist.
On Tue, 8 Oct 2002, Patrick Mochel wrote:
> The driver core is not controlling the lifetime of the objects. It's
> managing the reference counts on the shared objects. It's still the
> allocator that frees the device. It just can't do it whenever it wants,
> because it's shared and may be accessed by other entities. The release()
> method is the notification that it's kosher to do so.
Its reference counts mean squat if they are not seen by the code that
allocates/frees the object struct device is embedded into.
> [ Note that I'm not trying to be a smart ass. I know you have much more
> experience with this stuff; I'm just trying to grasp all the angles. ]
>
> > but I'll believe it when I see it. Notice that ->release() has nothing
> > to any possible solution - the problem happens when we _don't_ call it
> > for too long (== retain reference after driver had dropped its own).
>
> The only timing issue is when the device structures are reused. And, it
> seems that that is inherently racy anyway with hotpluggable devices.
BS. Neither SCSI, nor USB nor PCI are reusing the structures in question.
They are, however, freeing them.
Again, USB disconnect when you are holding a reference to struct device
will leave you with pointer to kfree'd area.
> I don't see how dynamically allocating the device structures is
> necessarily going to help. It seems that it would be a step backwards from
> what we currently have. We'll still need reference counting on the
> bus-specific objects, and separate handlers for them, to decide when its
> ok to free them (since someone could independently be holding a reference
> to a bus object when the device goes away).
>
> struct device was an attempt to consolidate the constructs and methods to
> deal with them. The easiest way to do this is to embed the generic object
> in the bus-specific object. Do you see anyway to make this work?
I don't. Notice that in case of inodes (which is what you seem to be
imitating) logics is very different - there final decision to destroy
composite object is made by holder of pointer to shared object (i.e.
VFS code); filesystem provides the method for freeing these guys, but
that's it. That's the only reason why we can get away with refcounting
on struct inode.
With drivers decision to destroy the object is made by driver, not by
holder of shared objects. IOW, driverfs refcount doesn't protect
anything. Moreover, ->release() is guaranteed to be called not earlier
than driver makes such decision - until then driverfs object is pinned
down.
Look - we can't afford the following situation: driver destroys some of
its structures while driverfs accesses these structures. Agreed?
So we need to make sure that driver will not proceed to destroy these
data structures until driverfs code is out of critical areas.
So we need either leave _all_ work on destruction (which can be damn
nontrivial) to ->release() and leave driver to do only one thing - drop
its reference to driverfs object and let driverfs call the rest, _or_
we need to protect critical areas with something other than driverfs
refcount and let driver block until driverfs is out of said areas,
mark node as dead (to prevent later access to them) and proceed with
destroying stuff.
Now, the former is a very massive rewrite of drivers. The latter suffers
from obvious problems if critical areas are too large. If struct device
is embedded, it's all code that dereferences it and we also get a lovely
problem with blocking safely. If we make struct device separately allocated,
we can shrink these areas to stuff that really accesses driver objects.
Which can be made _way_ more narrow.
I don't see any simple way to do it - any path will require massive changes.
Notice that leaving destruction of driver structures to ->release() is
going to be _very_ messy - look at USB or PCI code, not to mention SCSI one.
I have a very strong gut feeling that embedding struct device was a mistake.
Linus, your opinion?
On Tue, 8 Oct 2002, Alexander Viro wrote:
>
> Its reference counts mean squat if they are not seen by the code that
> allocates/frees the object struct device is embedded into.
But Al, that's the whole _point_ of having the callback.
Allow the refcounts to be in an embedded entity, and then anybody who gets
the entity (_or_ the embedded thing) will increment the same count.
When the count goes to zero, the embedded thing needs to call the
_embedders_ release function - because the embedded thing doesn't even
know how it got allocated.
Al, this time you're wrong, and the code you're unhappy about going about
it the right way. The reference count _has_ to be held by the lowest-level
thing (because that's the only generic part), yet the actual allocation
and de-allocation is done by the higher levels. Which is why the lower
levels need to know which freeing function to call.
Linus
On Tue, Oct 08, 2002 at 04:47:49PM -0400, Alexander Viro wrote:
> >
> > The only timing issue is when the device structures are reused. And, it
> > seems that that is inherently racy anyway with hotpluggable devices.
>
> BS. Neither SCSI, nor USB nor PCI are reusing the structures in question.
> They are, however, freeing them.
>
> Again, USB disconnect when you are holding a reference to struct device
> will leave you with pointer to kfree'd area.
This is a USB (and PCI) bug. I'll fix them, they should be using the
release() callback that Pat has provided. With that callback, which
gets called when the device really wants to be cleaned up, I don't see
any races in the USB code (well theoretical races, there's still some
bugs in the current implementation that I'm trying to track down...)
thanks,
greg k-h
Le mar 08/10/2002 ? 16:31, Ketil Froyn a ?crit :
> On Tue, 8 Oct 2002, [email protected] wrote:
>
> > This raises the interesting possibility of being able to refer to
> > things like removable media directly, instead of the device the media
> > is inserted in.
> >
> > The Amiga was doing this years ago. You could access floppy drives
> > as, E.G. df0:, df1:, etc, but if you formatted a volume and called it
> > foobar, you could access foobar: no matter which floppy drive you put
> > it in to.
>
> Isn't this possible in /etc/fstab already? Standard redhat-installs seem
> to put in the labels of the volume instead of referring to the device.
No comparison possible. The amiga dos would stall all read/write
requests when a device came offline (e.g. a floppy disk ejected) and
would cancel them all if the user or something else decided the medium
wouldn't be available anymore, and resumed everything when the medium
was online again (even if in a different device/drive).
That was a pretty sophisticated volume management if you ask me, one I
can only dream of happening one day in linux.
On Tue, 8 Oct 2002, Linus Torvalds wrote:
>
> On Tue, 8 Oct 2002, Alexander Viro wrote:
> >
> > Its reference counts mean squat if they are not seen by the code that
> > allocates/frees the object struct device is embedded into.
>
> But Al, that's the whole _point_ of having the callback.
>
> Allow the refcounts to be in an embedded entity, and then anybody who gets
> the entity (_or_ the embedded thing) will increment the same count.
>
> When the count goes to zero, the embedded thing needs to call the
> _embedders_ release function - because the embedded thing doesn't even
> know how it got allocated.
>
> Al, this time you're wrong, and the code you're unhappy about going about
> it the right way. The reference count _has_ to be held by the lowest-level
> thing (because that's the only generic part), yet the actual allocation
> and de-allocation is done by the higher levels. Which is why the lower
> levels need to know which freeing function to call.
That would be nice, if it worked that way. As it is we have
driver allocates foo
driver grabs a reference to foo->dev
....
somebody else grabs/drops temporary references to foo->dev
....
driver call put_device(&foo->dev)
driver frees structures refered from foo.
driver frees foo.
_IF_ the last two steps were done by ->release(), your arguments would
work. Actually they are done by driver right after the put_device() call.
If you are willing to change that (== move all destruction into ->release()) -
yeah, then embedded struct device will work. It's a hell of a work though.
Comments?
> That would be nice, if it worked that way. As it is we have
>
> driver allocates foo
> driver grabs a reference to foo->dev
> ....
> somebody else grabs/drops temporary references to foo->dev
> ....
> driver call put_device(&foo->dev)
> driver frees structures refered from foo.
> driver frees foo.
>
> _IF_ the last two steps were done by ->release(), your arguments would
> work. Actually they are done by driver right after the put_device() call.
>
> If you are willing to change that (== move all destruction into ->release()) -
> yeah, then embedded struct device will work. It's a hell of a work though.
Yes, and we're willing to do a lot of it.
That's been the intention the whole time, and would have been done sooner,
but it's taken a freakin' long time to figure out what assumptions to make
in the core. And of course, they're not always right. ;) Which means
there's more work to be done there. The feedback is greatly appreciated,
and I'm always open to more..
Thanks,
-pat
On Tue, 8 Oct 2002, Kai Germaschewski wrote:
> On Tue, 8 Oct 2002, Alexander Viro wrote:
>
> > That would be nice, if it worked that way. As it is we have
> >
> > driver allocates foo
> > driver grabs a reference to foo->dev
> > ....
> > somebody else grabs/drops temporary references to foo->dev
> > ....
> > driver call put_device(&foo->dev)
> > driver frees structures refered from foo.
> > driver frees foo.
> >
> > _IF_ the last two steps were done by ->release(), your arguments would
> > work. Actually they are done by driver right after the put_device() call.
> >
> > If you are willing to change that (== move all destruction into ->release()) -
> > yeah, then embedded struct device will work. It's a hell of a work though.
> >
> > Comments?
>
> Just a short note, since I have gotta run: The latter won't work very well
> with modules, since obviously ->release() has to MOD_DEC_USE_COUNT, to
> avoid having ->release() unloaded before it's executed. So for one, that's
> a DoS making delaying module unload indefinitely by keep /driversfs/...
> open, but even worse, rmmod will refuse to unload the module, since the
> use count is > zero.
>
> That's because normally pci_unregister_driver() or whatever is called in
> cleanup_module(), but obviously to be able to call it the refcount has to
> be zero already...
That's true for drivers, but not for devices. Nonetheless, it's a big
problem that I've hoped would magically go away. Of course it won't, but I
don't have a solution for it off hand...
-pat
On Tue, 8 Oct 2002, Alexander Viro wrote:
> That would be nice, if it worked that way. As it is we have
>
> driver allocates foo
> driver grabs a reference to foo->dev
> ....
> somebody else grabs/drops temporary references to foo->dev
> ....
> driver call put_device(&foo->dev)
> driver frees structures refered from foo.
> driver frees foo.
>
> _IF_ the last two steps were done by ->release(), your arguments would
> work. Actually they are done by driver right after the put_device() call.
>
> If you are willing to change that (== move all destruction into ->release()) -
> yeah, then embedded struct device will work. It's a hell of a work though.
>
> Comments?
Just a short note, since I have gotta run: The latter won't work very well
with modules, since obviously ->release() has to MOD_DEC_USE_COUNT, to
avoid having ->release() unloaded before it's executed. So for one, that's
a DoS making delaying module unload indefinitely by keep /driversfs/...
open, but even worse, rmmod will refuse to unload the module, since the
use count is > zero.
That's because normally pci_unregister_driver() or whatever is called in
cleanup_module(), but obviously to be able to call it the refcount has to
be zero already...
--Kai
On Tue, 8 Oct 2002, Alexander Viro wrote:
> That would be nice, if it worked that way. As it is we have
>
> driver allocates foo
> driver grabs a reference to foo->dev
> ....
> somebody else grabs/drops temporary references to foo->dev
> ....
> driver call put_device(&foo->dev)
> driver frees structures refered from foo.
> driver frees foo.
>
> _IF_ the last two steps were done by ->release(), your arguments would
> work. Actually they are done by driver right after the put_device() call.
Right. But that's a driver bug, and it's because this whole thing is
fairly new.
There aren't that many things that actually play with these things (mainly
the PCI and the USB layer, and individual drivers shouldn't care, it's
just the bus layer that does all of this), so we should be able to fix the
cases cleanly.
Linus
On Tue, Oct 08, 2002 at 05:42:25PM -0400, Alexander Viro wrote:
>
> _IF_ the last two steps were done by ->release(), your arguments would
> work. Actually they are done by driver right after the put_device() call.
Yes, and that's wrong :(
> If you are willing to change that (== move all destruction into ->release()) -
> yeah, then embedded struct device will work. It's a hell of a work though.
It's not that much work, and I'll do it for USB, and for PCI, unless Pat
beats me to it.
thanks,
greg k-h
> > That's because normally pci_unregister_driver() or whatever is called in
> > cleanup_module(), but obviously to be able to call it the refcount has to
> > be zero already...
>
> That's true for drivers, but not for devices. Nonetheless, it's a big
> problem that I've hoped would magically go away. Of course it won't, but I
> don't have a solution for it off hand...
Is it actually cleanly solvable without sticking the refcounts in the
layer they belong not in driverfs ?
On Tue, 8 Oct 2002, Linus Torvalds wrote:
> Right. But that's a driver bug, and it's because this whole thing is
> fairly new.
>
> There aren't that many things that actually play with these things (mainly
> the PCI and the USB layer, and individual drivers shouldn't care, it's
> just the bus layer that does all of this), so we should be able to fix the
> cases cleanly.
It'd be nice if things were so easy, but I don't think so. Of course,
"struct device" objects are created by the bus drivers, and as such there
are not that many (currently only PCI, ISAPnP, USB, as you said, but I
think eventually, we may want to have our representation of IDE, SCSI,
ISDN, sound, ethernet, whatever there as well).
USB may be modular today, and so are many of the other potential users, so
we need to deal with that. But it's not only bus drivers, anyway.
New-style PCI drivers use pci_register_driver, where struct pci_driver
embeds struct device_driver. And the problems are exactly the same.
Today, we expect that we can kfree(&my_driver_struct) after
pci_unregister_driver(&my_driver_struct). Actually, the common case is
rather the my_driver_struct is statically allocated in a module, which
will be unloaded after pci_unregister_driver() returns, but that's
basically exactly the same thing.
I'm pretty sure we do not want to change that API to have every driver out
there specify a destructor for my_driver_struct. It'd be simple, it'd just
do MOD_DEC_USE_COUNT, though. Except for that this whole thing does not
work at all then, since we call pci_unregister_driver () from
module_exit(), which can only be called when the use count is already
zero.
In addition, even if we went the long way and found a solution for the
above problem, it still meant that we could only unload the module after
all references to struct device_driver are gone. Which can take forever
when someone holds open /driversfs/my_driver/something. That's a DoS we do
not want to have, either, I think.
I agree with Al Viro, the only sensible solution seems to make struct
device and struct device_driver separately allocated (one could possibly
do the separation and another level, like between struct driver and struct
driverfs_dir_entry, but above seems the cleanest). So they can stay around
long after a module which registered them initially is gone. Of course, we
need to have some mechanism to make sure callbacks into the module are
not made after a certain point, since .text may be gone, or driver
specific part (struct usb_driver), which is why we need some kind of
mark_dead(), which I'd rather call device_unregister(struct device *) /
driver_unregister(struct device_driver *). The actual driverfs object may
still stay around after that point, but its ->priv pointer, which pointed
to the usb_device / usb_driver is invalidated and will not be dereferenced
anymore, neither the callbacks into driver-specific code.
This way, the API for things like PCI device drivers can remain the same,
the API for bus drivers changes slightly, the driver core needs major
changes, but at least it should work and be safe this way.
--Kai
On Tue, 8 Oct 2002, Kai Germaschewski wrote:
>
> USB may be modular today, and so are many of the other potential users, so
> we need to deal with that. But it's not only bus drivers, anyway.
> New-style PCI drivers use pci_register_driver, where struct pci_driver
> embeds struct device_driver. And the problems are exactly the same.
What problems? As far as I can see, this is fairly trivial to solve.
But before I solve world hunger and this trivial problem, let's re-iterate
why it has to be done this way:
- a ref coutn should always be a count on a "data structure".
- a reference count is always associated with the _lowest_ form of data
structure that is countable. If you reference count high-level
entities, then you always have to pass _those_ around, you cannot pass
any pointers to sub-structures around because they aren't counted and
thus aren't safe.
- You can get from a high-level object to the lower level (trivially),
but you _cannot_ get from the lower level to the high-level simply
because the low levels don't even know what _kind_ of object the high
level is (and if they did, they wouldn't be low-level any more).
The whole point of the devicefs layer is to be able to share a common
structure across any kind of device/driver, so the count has to be at that
level.
Agreed on all of this?
So we have two solutions:
- don't bother with a common object at all.
This fundamentally breaks the notion of having a unified device tree.
In other words, this isn't an acceptable approach.
- get the counting right.
The "count right" thing isn't actually all that hard. Every _single_
example of trouble has been of a very simple type: higher layers haven't
been able to look at lower layer counts. And every single of those
examples are trivial to fix, and the only real trouble is finding them and
bothering to fix them.
We have two cases:
- the high-level thing embeds the low-level thing in a 1:1
relationship (ie "struct pci_dev" vs "struct device")
the high-level entity should avoid having any reference counts itself,
and just use the low-level refcount directly, and the low-level thing
has a "release()" thing it calls when the refcount goes to zero.
This is the simple case.
In particular, the module case could be _made_ into the simple case by
just initializing the module counter pointer into the "struct driver"
counter (nothing says that the module count has to be inside the
"struct module", we could easily make that a pointer to an atomic_t
without breaking much code).
- the high-level thing has _multiple_ low-level things associated with
it, each of which have independent refcounts.
This is, for example the current "module vs struct pci_driver" case,
where a module might have multiple drivers, but even if it exports only
one driver, a 1:1 relationship can always be considered just a special
case of the more generic 1:n case.
Agreed? (Yeah, yeah, we also have the really messy m:n case, but if your
dependency tree isn't a tree but a DAG, then I think you have more
problems than you really want to have in the first place).
So let's see how we can solve the complex case, and realize that we
actually _have_ this very same complex case in several parts of the kernel
already. In particular, we have it in "struct dentry" - where the 1:m
relationship is the parent:child relationship.
The simplest way to handle it is to have each low-level entity increment
the "parent" count for as long as it exists, and then the "release()"
function de-allocates the memory _and_ it also decrements the count of the
parent (and releases that if the count goes to zero). We do exactly this
for dentries, for example (dput() - kill_it).
So in general this is _not_ a hard problem to solve in any way, shape, or
form. The problem with modules is really that the module count is badly
done, and module unloading is crap. This is why we've _always_ had
problems with module unloading, and why Rusty wants to disallow it
altogether.
But modules can already refuse to unload themselves, by just registering a
"can_unload()" function.
And that's the short-term answer: make modules that register a driver just
register a can_unload() function too, and make that function verify that
the driver count is 1.
You're done. Yes, module unloading is ugly, but don't blame driverfs for
that.
Linus
Hi!
> > > > _ALL_ buses that have driverfs support (IDE, SCSI, USB, PCI) have their
> > > > own rules for lifetimes of their structures. And that's not likely to
> > > > change - these objects belong to drivers and in some cases (IDE) are
> > > > not even allocated dynamically - they are reused if nothing is holding
> > > > them.
> > >
> > > IDE objects can also outlast the hardware - consider an active mount on
> > > an ejected pcmcia card. Right now we don't do the right stuff to
> > > reconnect that on re-insert but one day we may need to. As it is we keep
> > > the instance around to avoid crashes
> >
> > Ouch. That (reconnects) may require interesting things from queue-related
> > code. What behaviour do you want while card is disconnected? All requests
> > getting errors / all requests getting blocked / reads failing, writes blocking?
>
> This raises the interesting possibility of being able to refer to
> things like removable media directly, instead of the device the media
> is inserted in.
>
> The Amiga was doing this years ago. You could access floppy drives
> as, E.G. df0:, df1:, etc, but if you formatted a volume and called it
> foobar, you could access foobar: no matter which floppy drive you put
> it in to.
>
> Also, Plan 9 does similar interesting things - you can do the equivilent of:
>
> ls /internet/websites/kernel.org/
>
> and treat the website as a filesystem.
uservfs.sf.net, and you can do similar stuff on linux.
Pavel
--
I'm [email protected]. "In my country we have almost anarchy and I don't care."
Panos Katsaloulis describing me w.r.t. patents at [email protected]
Hi!
> [email protected], 2002-10-07 10:27:16-07:00, [email protected]
> IDE: register ide driver for all ide drives; not just for disk drives.
>
> This adds
> struct device_driver gen_driver;
>
> to ide_driver_t, which is filled in with necessary fields when an ide
> driver calls ide_register_driver(). That then registers the driver with
> the driver model core.
>
> As a result, this gives us the following output in driverfs:
>
> # tree -d /sys/bus/ide/drivers/
> /sys/bus/ide/drivers/
> |-- ide-cdrom
> `-- ide-disk
>
> The suspend/resume callbacks in ide-disk.c have been temporarily
> disabled until the ide core implements generic methods which forward
> the calls to the drive drivers.
>
> diff -Nru a/drivers/ide/ide-disk.c b/drivers/ide/ide-disk.c
> --- a/drivers/ide/ide-disk.c Mon Oct 7 12:19:14 2002
> +++ b/drivers/ide/ide-disk.c Mon Oct 7 12:19:14 2002
> @@ -1550,14 +1550,6 @@
> /* This is just a hook for the overall driver tree.
> */
>
> -static struct device_driver idedisk_devdrv = {
> - .bus = &ide_bus_type,
> - .name = "IDE disk driver",
> -
> - .suspend = idedisk_suspend,
> - .resume = idedisk_resume,
> -};
> -
Here you killed the only reference to idedisk_suspend, yet
idedisk_suspend/idedisk_resume is needed if you don't want to eat
data.
Pavel
--
When do you have heart between your knees?