2002-06-04 00:43:01

by Andrew Morton

[permalink] [raw]
Subject: [patch] PCI device matching fix

The new pci_device_probe() is always passing the zeroeth
entry in the id_table to the device's probe method. It
needs to scan that table for the correct ID first.

This fixes the recent 3c59x strangenesses.


--- 2.5.20/drivers/pci/pci-driver.c~pci-scan Mon Jun 3 17:37:59 2002
+++ 2.5.20-akpm/drivers/pci/pci-driver.c Mon Jun 3 17:38:03 2002
@@ -38,12 +38,19 @@ pci_match_device(const struct pci_device
static int pci_device_probe(struct device * dev)
{
int error = 0;
+ struct pci_driver *drv;
+ struct pci_dev *pci_dev;

- struct pci_driver * drv = list_entry(dev->driver,struct pci_driver,driver);
- struct pci_dev * pci_dev = list_entry(dev,struct pci_dev,dev);
+ drv = list_entry(dev->driver, struct pci_driver, driver);
+ pci_dev = list_entry(dev, struct pci_dev, dev);

- if (drv->probe)
- error = drv->probe(pci_dev,drv->id_table);
+ if (drv->probe) {
+ const struct pci_device_id *id;
+
+ id = pci_match_device(drv->id_table, pci_dev);
+ if (id)
+ error = drv->probe(pci_dev, id);
+ }
return error > 0 ? 0 : -ENODEV;
}


-


2002-06-04 01:52:57

by Anton Altaparmakov

[permalink] [raw]
Subject: Re: [patch] PCI device matching fix

At 01:41 04/06/02, Andrew Morton wrote:
>The new pci_device_probe() is always passing the zeroeth
>entry in the id_table to the device's probe method. It
>needs to scan that table for the correct ID first.
>
>This fixes the recent 3c59x strangenesses.

Andrew,

Just as a heads up, this patch does indeed fix my 3c905 misdetection problem.

Thanks!!!

Anton



>--- 2.5.20/drivers/pci/pci-driver.c~pci-scan Mon Jun 3 17:37:59 2002
>+++ 2.5.20-akpm/drivers/pci/pci-driver.c Mon Jun 3 17:38:03 2002
>@@ -38,12 +38,19 @@ pci_match_device(const struct pci_device
> static int pci_device_probe(struct device * dev)
> {
> int error = 0;
>+ struct pci_driver *drv;
>+ struct pci_dev *pci_dev;
>
>- struct pci_driver * drv = list_entry(dev->driver,struct
>pci_driver,driver);
>- struct pci_dev * pci_dev = list_entry(dev,struct pci_dev,dev);
>+ drv = list_entry(dev->driver, struct pci_driver, driver);
>+ pci_dev = list_entry(dev, struct pci_dev, dev);
>
>- if (drv->probe)
>- error = drv->probe(pci_dev,drv->id_table);
>+ if (drv->probe) {
>+ const struct pci_device_id *id;
>+
>+ id = pci_match_device(drv->id_table, pci_dev);
>+ if (id)
>+ error = drv->probe(pci_dev, id);
>+ }
> return error > 0 ? 0 : -ENODEV;
> }
>
>
>-
>-
>To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>the body of a message to [email protected]
>More majordomo info at http://vger.kernel.org/majordomo-info.html
>Please read the FAQ at http://www.tux.org/lkml/

--
"I've not lost my mind. It's backed up on tape somewhere." - Unknown
--
Anton Altaparmakov <aia21 at cantab.net> (replace at with @)
Linux NTFS Maintainer / IRC: #ntfs on irc.openprojects.net
WWW: http://linux-ntfs.sf.net/ & http://www-stu.christs.cam.ac.uk/~aia21/

2002-06-04 03:18:40

by Patrick Mochel

[permalink] [raw]
Subject: Re: [patch] PCI device matching fix


On Mon, 3 Jun 2002, Andrew Morton wrote:

> The new pci_device_probe() is always passing the zeroeth
> entry in the id_table to the device's probe method. It
> needs to scan that table for the correct ID first.

Arg. Ok. I've applied the patch, though I wonder if there's a better way.
The proper ID is already determined, though it happens when dealing with
generic struct devices and not PCI devices:

driver_register calls driver_bind, which calls the bus's bind() callback.
That compares the bus-specific IDs that the driver supports to the
device's ID. The format and semantics for comparing IDs is inherently
bus-specific, so I made sure to keep them in the bus-specific structs. So,
which ID the device really is, is lost when we return back to the core.

We could keep it as a void *, or a char *, but I'd rather not lose the
type information. Or, a union, but those get ugly and difficult to manage.
Does anyone have any other preference?

Along with your patch and the other fixes below (in which yours is
included), all the oddities should be resolved, including the OOPSes on
module unload. A patch is appended, after the changelog. You can also soon
pull from

bk://ldm.bkbits.net/linux-2.5

Could people could please test it, and let me know if anything is still
broken?

-pat

[email protected], 2002-06-03 19:53:50-07:00, [email protected]
PCI driver mgmt:
- Make sure proper pci id is passed to probe()
- make sure pci_dev->driver is set and reset on driver registration/unregistration
- call remove_driver to force unload of driver on unregistration

drivers/pci/pci-driver.c | 28 ++++++++++++++++++++--------
1 files changed, 20 insertions, 8 deletions


[email protected], 2002-06-03 19:51:12-07:00, [email protected]
Do manual traversing of drivers' devices list when unbinding the driver.

driver_unbind was called when drv->refcount == 0.
It would call driver_for_each_dev to do the unbinding
The first thing that would do was get_device, which...
BUG()'d if drv->refcount == 0.
Duh.

drivers/base/core.c | 39 ++++++++++++++++++++++++++++++++-------
1 files changed, 32 insertions, 7 deletions


[email protected], 2002-06-03 19:48:44-07:00, [email protected]
device model udpate:
- make sure drv->devices is initialized on registration (from Peter Osterlund)
- add remove_driver for forcing removal of driver

There was a potential race with the module unload code. When a pci driver was unloaded, it would call pci_unregister_driver, which would simply call put_driver.
If the driver's refcount wasn't 0, it wouldn't unbind it from devices, but the module unload would still continue.
If something tried to access the driver later (since everyone thinks its still there), Bad Things would happen.
This fixes it until there can be tighter integration between the device model and module unload code.

drivers/base/driver.c | 35 ++++++++++++++++++++++++++---------
include/linux/device.h | 1 +
2 files changed, 27 insertions, 9 deletions


diff -Nru a/drivers/base/core.c b/drivers/base/core.c
--- a/drivers/base/core.c Mon Jun 3 20:11:55 2002
+++ b/drivers/base/core.c Mon Jun 3 20:11:55 2002
@@ -5,6 +5,8 @@
* 2002 Open Source Development Lab
*/

+#define DEBUG 0
+
#include <linux/device.h>
#include <linux/module.h>
#include <linux/slab.h>
@@ -118,9 +120,8 @@
return bus_for_each_dev(drv->bus,drv,do_driver_bind);
}

-static int do_driver_unbind(struct device * dev, void * data)
+static int do_driver_unbind(struct device * dev, struct device_driver * drv)
{
- struct device_driver * drv = (struct device_driver *)data;
lock_device(dev);
if (dev->driver == drv) {
dev->driver = NULL;
@@ -134,7 +135,31 @@

void driver_unbind(struct device_driver * drv)
{
- driver_for_each_dev(drv,drv,do_driver_unbind);
+ struct device * next;
+ struct device * dev = NULL;
+ struct list_head * node;
+ int error = 0;
+
+ read_lock(&drv->lock);
+ node = drv->devices.next;
+ while (node != &drv->devices) {
+ next = list_entry(node,struct device,driver_list);
+ get_device(next);
+ read_unlock(&drv->lock);
+
+ if (dev)
+ put_device(dev);
+ dev = next;
+ if ((error = do_driver_unbind(dev,drv))) {
+ put_device(dev);
+ break;
+ }
+ read_lock(&drv->lock);
+ node = dev->driver_list.next;
+ }
+ read_unlock(&drv->lock);
+ if (dev)
+ put_device(dev);
}

/**
@@ -178,8 +203,8 @@
}
spin_unlock(&device_lock);

- DBG("DEV: registering device: ID = '%s', name = %s\n",
- dev->bus_id, dev->name);
+ pr_debug("DEV: registering device: ID = '%s', name = %s\n",
+ dev->bus_id, dev->name);

if ((error = device_make_dir(dev)))
goto register_done;
@@ -212,8 +237,8 @@
list_del_init(&dev->g_list);
spin_unlock(&device_lock);

- DBG("DEV: Unregistering device. ID = '%s', name = '%s'\n",
- dev->bus_id,dev->name);
+ pr_debug("DEV: Unregistering device. ID = '%s', name = '%s'\n",
+ dev->bus_id,dev->name);

/* Notify the platform of the removal, in case they
* need to do anything...
diff -Nru a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
--- a/drivers/pci/pci-driver.c Mon Jun 3 20:11:55 2002
+++ b/drivers/pci/pci-driver.c Mon Jun 3 20:11:55 2002
@@ -38,23 +38,35 @@
static int pci_device_probe(struct device * dev)
{
int error = 0;
+ struct pci_driver *drv;
+ struct pci_dev *pci_dev;

- struct pci_driver * drv = list_entry(dev->driver,struct pci_driver,driver);
- struct pci_dev * pci_dev = list_entry(dev,struct pci_dev,dev);
+ drv = list_entry(dev->driver, struct pci_driver, driver);
+ pci_dev = list_entry(dev, struct pci_dev, dev);
+
+ if (drv->probe) {
+ const struct pci_device_id *id;

- if (drv->probe)
- error = drv->probe(pci_dev,drv->id_table);
- return error > 0 ? 0 : -ENODEV;
+ id = pci_match_device(drv->id_table, pci_dev);
+ if (id)
+ error = drv->probe(pci_dev, id);
+ if (error >= 0) {
+ pci_dev->driver = drv;
+ error = 0;
+ }
+ }
+ return error;
}

static int pci_device_remove(struct device * dev)
{
struct pci_dev * pci_dev = list_entry(dev,struct pci_dev,dev);
+ struct pci_driver * drv = pci_dev->driver;

- if (dev->driver) {
- struct pci_driver * drv = list_entry(dev->driver,struct pci_driver,driver);
+ if (drv) {
if (drv->remove)
drv->remove(pci_dev);
+ pci_dev->driver = NULL;
}
return 0;
}
@@ -124,7 +136,7 @@
void
pci_unregister_driver(struct pci_driver *drv)
{
- put_driver(&drv->driver);
+ remove_driver(&drv->driver);
}

static struct pci_driver pci_compat_driver = {

2002-06-04 05:19:38

by Greg KH

[permalink] [raw]
Subject: Re: [patch] PCI device matching fix

On Mon, Jun 03, 2002 at 08:14:25PM -0700, Patrick Mochel wrote:
>
> bk://ldm.bkbits.net/linux-2.5
>
> Could people could please test it, and let me know if anything is still
> broken?

These changesets seem to solve the USB controller problems I have been
seeing since 2.4.19, good job.

It looks like you didn't include all of the patch in your message. I've
attached it here for those without bk to test if they want to.

Now let's talk about why you took away pci_announce_device_to_drivers()...

thanks,

greg k-h


diff -Nru a/drivers/base/core.c b/drivers/base/core.c
--- a/drivers/base/core.c Mon Jun 3 22:27:50 2002
+++ b/drivers/base/core.c Mon Jun 3 22:27:50 2002
@@ -5,6 +5,8 @@
* 2002 Open Source Development Lab
*/

+#define DEBUG 0
+
#include <linux/device.h>
#include <linux/module.h>
#include <linux/slab.h>
@@ -118,9 +120,8 @@
return bus_for_each_dev(drv->bus,drv,do_driver_bind);
}

-static int do_driver_unbind(struct device * dev, void * data)
+static int do_driver_unbind(struct device * dev, struct device_driver * drv)
{
- struct device_driver * drv = (struct device_driver *)data;
lock_device(dev);
if (dev->driver == drv) {
dev->driver = NULL;
@@ -134,7 +135,31 @@

void driver_unbind(struct device_driver * drv)
{
- driver_for_each_dev(drv,drv,do_driver_unbind);
+ struct device * next;
+ struct device * dev = NULL;
+ struct list_head * node;
+ int error = 0;
+
+ read_lock(&drv->lock);
+ node = drv->devices.next;
+ while (node != &drv->devices) {
+ next = list_entry(node,struct device,driver_list);
+ get_device(next);
+ read_unlock(&drv->lock);
+
+ if (dev)
+ put_device(dev);
+ dev = next;
+ if ((error = do_driver_unbind(dev,drv))) {
+ put_device(dev);
+ break;
+ }
+ read_lock(&drv->lock);
+ node = dev->driver_list.next;
+ }
+ read_unlock(&drv->lock);
+ if (dev)
+ put_device(dev);
}

/**
@@ -178,8 +203,8 @@
}
spin_unlock(&device_lock);

- DBG("DEV: registering device: ID = '%s', name = %s\n",
- dev->bus_id, dev->name);
+ pr_debug("DEV: registering device: ID = '%s', name = %s\n",
+ dev->bus_id, dev->name);

if ((error = device_make_dir(dev)))
goto register_done;
@@ -212,8 +237,8 @@
list_del_init(&dev->g_list);
spin_unlock(&device_lock);

- DBG("DEV: Unregistering device. ID = '%s', name = '%s'\n",
- dev->bus_id,dev->name);
+ pr_debug("DEV: Unregistering device. ID = '%s', name = '%s'\n",
+ dev->bus_id,dev->name);

/* Notify the platform of the removal, in case they
* need to do anything...
diff -Nru a/drivers/base/driver.c b/drivers/base/driver.c
--- a/drivers/base/driver.c Mon Jun 3 22:27:50 2002
+++ b/drivers/base/driver.c Mon Jun 3 22:27:50 2002
@@ -3,6 +3,8 @@
*
*/

+#define DEBUG 0
+
#include <linux/device.h>
#include <linux/module.h>
#include <linux/errno.h>
@@ -67,6 +69,7 @@
get_bus(drv->bus);
atomic_set(&drv->refcount,2);
rwlock_init(&drv->lock);
+ INIT_LIST_HEAD(&drv->devices);
write_lock(&drv->bus->lock);
list_add(&drv->bus_list,&drv->bus->drivers);
write_unlock(&drv->bus->lock);
@@ -76,16 +79,8 @@
return 0;
}

-/**
- * put_driver - decrement driver's refcount and clean up if necessary
- * @drv: driver in question
- */
-void put_driver(struct device_driver * drv)
+static void __remove_driver(struct device_driver * drv)
{
- if (!atomic_dec_and_lock(&drv->refcount,&device_lock))
- return;
- spin_unlock(&device_lock);
-
if (drv->bus) {
pr_debug("Unregistering driver '%s' from bus '%s'\n",drv->name,drv->bus->name);

@@ -101,6 +96,28 @@
drv->release(drv);
}

+void remove_driver(struct device_driver * drv)
+{
+ spin_lock(&device_lock);
+ atomic_set(&drv->refcount,0);
+ spin_unlock(&device_lock);
+ __remove_driver(drv);
+}
+
+/**
+ * put_driver - decrement driver's refcount and clean up if necessary
+ * @drv: driver in question
+ */
+void put_driver(struct device_driver * drv)
+{
+ if (!atomic_dec_and_lock(&drv->refcount,&device_lock))
+ return;
+ spin_unlock(&device_lock);
+
+ __remove_driver(drv);
+}
+
EXPORT_SYMBOL(driver_for_each_dev);
EXPORT_SYMBOL(driver_register);
EXPORT_SYMBOL(put_driver);
+EXPORT_SYMBOL(remove_driver);
diff -Nru a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
--- a/drivers/pci/pci-driver.c Mon Jun 3 22:27:50 2002
+++ b/drivers/pci/pci-driver.c Mon Jun 3 22:27:50 2002
@@ -38,23 +38,35 @@
static int pci_device_probe(struct device * dev)
{
int error = 0;
+ struct pci_driver *drv;
+ struct pci_dev *pci_dev;

- struct pci_driver * drv = list_entry(dev->driver,struct pci_driver,driver);
- struct pci_dev * pci_dev = list_entry(dev,struct pci_dev,dev);
+ drv = list_entry(dev->driver, struct pci_driver, driver);
+ pci_dev = list_entry(dev, struct pci_dev, dev);
+
+ if (drv->probe) {
+ const struct pci_device_id *id;

- if (drv->probe)
- error = drv->probe(pci_dev,drv->id_table);
- return error > 0 ? 0 : -ENODEV;
+ id = pci_match_device(drv->id_table, pci_dev);
+ if (id)
+ error = drv->probe(pci_dev, id);
+ if (error >= 0) {
+ pci_dev->driver = drv;
+ error = 0;
+ }
+ }
+ return error;
}

static int pci_device_remove(struct device * dev)
{
struct pci_dev * pci_dev = list_entry(dev,struct pci_dev,dev);
+ struct pci_driver * drv = pci_dev->driver;

- if (dev->driver) {
- struct pci_driver * drv = list_entry(dev->driver,struct pci_driver,driver);
+ if (drv) {
if (drv->remove)
drv->remove(pci_dev);
+ pci_dev->driver = NULL;
}
return 0;
}
@@ -124,7 +136,7 @@
void
pci_unregister_driver(struct pci_driver *drv)
{
- put_driver(&drv->driver);
+ remove_driver(&drv->driver);
}

static struct pci_driver pci_compat_driver = {
diff -Nru a/include/linux/device.h b/include/linux/device.h
--- a/include/linux/device.h Mon Jun 3 22:27:50 2002
+++ b/include/linux/device.h Mon Jun 3 22:27:50 2002
@@ -118,6 +118,7 @@
}

extern void put_driver(struct device_driver * drv);
+extern void remove_driver(struct device_driver * drv);

extern int driver_for_each_dev(struct device_driver * drv, void * data,
int (*callback)(struct device * dev, void * data));

2002-06-04 12:47:08

by Sebastian Droege

[permalink] [raw]
Subject: Re: [patch] PCI device matching fix

On Mon, 03 Jun 2002 17:41:40 -0700
Andrew Morton <[email protected]> wrote:

> The new pci_device_probe() is always passing the zeroeth
> entry in the id_table to the device's probe method. It
> needs to scan that table for the correct ID first.
>
> This fixes the recent 3c59x strangenesses.

This fixes the yamaha ymfpci misdetection, too... Thanks :)

Bye


Attachments:
(No filename) (189.00 B)

2002-06-05 23:16:59

by Kai Germaschewski

[permalink] [raw]
Subject: Re: [patch] PCI device matching fix


> @@ -101,6 +96,28 @@
> drv->release(drv);
> }
>
> +void remove_driver(struct device_driver * drv)
> +{
> + spin_lock(&device_lock);
> + atomic_set(&drv->refcount,0);
> + spin_unlock(&device_lock);
> + __remove_driver(drv);
> +}

> @@ -124,7 +136,7 @@
> void
> pci_unregister_driver(struct pci_driver *drv)
> {
> - put_driver(&drv->driver);
> + remove_driver(&drv->driver);
> }
>
> static struct pci_driver pci_compat_driver = {

Now does that mean you have given up on getting the ref counting right?
Forcing the ref count to zero is obviously not a solution,
pci_unregister_driver is usually called at module unload time, which means
that struct pci_driver will go away immediately after the call returns. If
you know that there are no other users at this time (which I doubt), you
don't need to do the whole refcounting thing at all (since at all times
before the ref count is surely >= 1). If not, you're racy.

I think I brought up that issue before, though nobody seemed interested.
AFAICS there are two possible solutions:

o provide a destructor callback which is called from put_driver when we're
about to throw away the last reference which would let whoever registered
the thing in the beginning know that it's all his again now (If it was
kmalloced, that would be the time to kfree it). In this case it's
rather that the callback would tell us we can now finish
module_exit().
o Use a completion or something and do the above inside
pci_unregister_driver (or remove_driver). I.e. have it sleep until
all references are gone. That would fix the race for the typical

void my_module_exit(void) { pci_unregister_driver(&my_driver); }

case.

(The latter would be my preference, as I see no point in having driver
authors deal with the complication of waiting for completion of
pci_unregister_driver() themselves)

--Kai


2002-06-05 23:42:46

by Patrick Mochel

[permalink] [raw]
Subject: Re: [patch] PCI device matching fix


[cc list trimmed]

> Now does that mean you have given up on getting the ref counting right?
> Forcing the ref count to zero is obviously not a solution,
> pci_unregister_driver is usually called at module unload time, which means
> that struct pci_driver will go away immediately after the call returns. If
> you know that there are no other users at this time (which I doubt), you
> don't need to do the whole refcounting thing at all (since at all times
> before the ref count is surely >= 1). If not, you're racy.

No, I haven't given up on it, and I admit it's fishy. However, I did it to
ensure that no one can access the driver since it is going away as soon as
we return to pci_unregister_driver. Though, I'll also admit that making
the next potential user hit the BUG() in get_driver() is not the nicest
thing to do...

> I think I brought up that issue before, though nobody seemed interested.
> AFAICS there are two possible solutions:
>
> o provide a destructor callback which is called from put_driver when we're
> about to throw away the last reference which would let whoever registered
> the thing in the beginning know that it's all his again now (If it was
> kmalloced, that would be the time to kfree it). In this case it's
> rather that the callback would tell us we can now finish
> module_exit().
> o Use a completion or something and do the above inside
> pci_unregister_driver (or remove_driver). I.e. have it sleep until
> all references are gone. That would fix the race for the typical
>
> void my_module_exit(void) { pci_unregister_driver(&my_driver); }
>
> case.
>
> (The latter would be my preference, as I see no point in having driver
> authors deal with the complication of waiting for completion of
> pci_unregister_driver() themselves)

There is a destructor: struct device_driver::release(). It's the last
thing called from __remove_driver().

No matter what, the module will get unloaded immediately. What would be
nice is if we could delay it until we said it was ok (from the driver's
release callback). We could even define a common release callback for all
drivers:

void driver_release(struct device_driver * drv)
{
struct module * module = drv->module;

/* do unload of module */
}

We wouldn't need a completion; we would just have to wait until the
refcount hit 0 naturally. But, I'm just making this stuff up. Is this at
all possible?

-pat

2002-06-06 00:08:08

by Kai Germaschewski

[permalink] [raw]
Subject: Re: [patch] PCI device matching fix

On Wed, 5 Jun 2002, Patrick Mochel wrote:

> No, I haven't given up on it, and I admit it's fishy. However, I did it
> to ensure that no one can access the driver since it is going away as
> soon as we return to pci_unregister_driver. Though, I'll also admit that
> making the next potential user hit the BUG() in get_driver() is not the
> nicest thing to do...

There's two phases:
o remove_driver():
Make sure that now one else can find (and inc the refcount) your struct
anymore. That can and should be done immediately. Drop your reference.
o At some point later, everybody else will have dropped their references
and the destructor can be called.

Alternatively:
o remove_driver():
Make sure that now one else can find (and inc the refcount) your struct
anymore. That can and should be done immediately. Drop your reference.
Wait for everybody else to drop their reference. Then return to the
caller
o At some point later, everybody else will have dropped their references
and the remove_driver() should be woken up and you get to the return.

I claim the latter is more appropriate, since driver_register() and
driver_remove() (btw, I'd rather have the latter called
driver_unregister()) will always be called by the same user, normally
the driver module. That's different from some object which isn't
explicitly unregistered somewhere, but will be cleaned up e.g. due to
memory pressure some time later. You can always achieve the second
if you have the first, but I see no point in having driver authors deal
with it.

> There is a destructor: struct device_driver::release(). It's the last
> thing called from __remove_driver().
>
> No matter what, the module will get unloaded immediately. What would be
> nice is if we could delay it until we said it was ok (from the driver's
> release callback). We could even define a common release callback for all
> drivers:

That's not true. module_exit() is called from rmmod(8) and thus in process
context. It's perfectly fine to sleep in module_exit() until we're safe to
be deleted.

> void driver_release(struct device_driver * drv)
> {
> struct module * module = drv->module;
>
> /* do unload of module */
> }
>
> We wouldn't need a completion; we would just have to wait until the
> refcount hit 0 naturally. But, I'm just making this stuff up. Is this at
> all possible?

Now this is bad, what if the last reference went away from irq context?
Unloading a module from irq context doesn't sound like a good idea to me.

I'd suggest something like

pci_driver_release(pdrv)
{
complete(&pdrv->complete);
}

pci_register_driver(pdrv)
{
...
pdrv->driver.release = pci_driver_release;
pdrv->complete = NULL;
}

pci_unregister_driver(pdrv)
{
DECLARE_COMPLETION(complete);

pdrv->complete = &complete;
put_driver(&drv->driver);
wait_for_completion(&complete);
}

And make clear that pci_unregister_driver() can be called from process
context only.

Actually, as said above, I'd rather do the same thing in remove_driver(),
i.e. make that sleep as well - unless you can come up with an example
where anybody would want to call remove_driver() and cannot sleep.
Just replace ->release with the struct completion *.

--Kai




2002-06-06 00:39:54

by Patrick Mochel

[permalink] [raw]
Subject: Re: [patch] PCI device matching fix


On Wed, 5 Jun 2002, Kai Germaschewski wrote:

> On Wed, 5 Jun 2002, Patrick Mochel wrote:
>
> > No, I haven't given up on it, and I admit it's fishy. However, I did it
> > to ensure that no one can access the driver since it is going away as
> > soon as we return to pci_unregister_driver. Though, I'll also admit that
> > making the next potential user hit the BUG() in get_driver() is not the
> > nicest thing to do...
>
> There's two phases:
> o remove_driver():
> Make sure that now one else can find (and inc the refcount) your struct
> anymore. That can and should be done immediately. Drop your reference.
> o At some point later, everybody else will have dropped their references
> and the destructor can be called.
>
> Alternatively:
> o remove_driver():
> Make sure that now one else can find (and inc the refcount) your struct
> anymore. That can and should be done immediately. Drop your reference.
> Wait for everybody else to drop their reference. Then return to the
> caller
> o At some point later, everybody else will have dropped their references
> and the remove_driver() should be woken up and you get to the return.

Alright, I can deal. I expanded on what you had, and moved it into the
core. It's not tested, but lemme know what you think.

-pat

===== drivers/base/driver.c 1.7 vs edited =====
--- 1.7/drivers/base/driver.c Wed Jun 5 15:59:31 2002
+++ edited/drivers/base/driver.c Wed Jun 5 17:29:55 2002
@@ -8,6 +8,7 @@
#include <linux/device.h>
#include <linux/module.h>
#include <linux/errno.h>
+#include <linux/completion.h>
#include "base.h"


@@ -70,6 +71,8 @@
atomic_set(&drv->refcount,2);
rwlock_init(&drv->lock);
INIT_LIST_HEAD(&drv->devices);
+ init_completion(&drv->complete);
+
write_lock(&drv->bus->lock);
list_add(&drv->bus_list,&drv->bus->drivers);
write_unlock(&drv->bus->lock);
@@ -84,18 +87,21 @@
pr_debug("Unregistering driver '%s' from bus '%s'\n",drv->name,drv->bus->name);
driver_detach(drv);
driverfs_remove_dir(&drv->dir);
- if (drv->release)
- drv->release(drv);
put_bus(drv->bus);
}

-void remove_driver(struct device_driver * drv)
+void driver_unregister(struct device_driver * drv)
{
+ /* make sure no one can get to it via the bus any more */
write_lock(&drv->bus->lock);
- atomic_set(&drv->refcount,0);
list_del_init(&drv->bus_list);
write_unlock(&drv->bus->lock);
- __remove_driver(drv);
+
+ /* dec the refcount */
+ put_driver(drv);
+
+ /* if we didn't get rid of the driver, wait for it to be done */
+ wait_for_completion(&drv->complete);
}

/**
@@ -112,9 +118,10 @@
list_del_init(&drv->bus_list);
write_unlock(&drv->bus->lock);
__remove_driver(drv);
+ complete(&drv->complete);
}

EXPORT_SYMBOL(driver_for_each_dev);
EXPORT_SYMBOL(driver_register);
+EXPORT_SYMBOL(driver_unregister);
EXPORT_SYMBOL(put_driver);
-EXPORT_SYMBOL(remove_driver);
===== include/linux/device.h 1.20 vs edited =====
--- 1.20/include/linux/device.h Wed Jun 5 14:43:23 2002
+++ edited/include/linux/device.h Wed Jun 5 17:33:21 2002
@@ -28,6 +28,7 @@
#include <linux/ioport.h>
#include <linux/list.h>
#include <linux/sched.h>
+#include <linux/completion.h>
#include <linux/driverfs_fs.h>

#define DEVICE_NAME_SIZE 80
@@ -91,6 +92,7 @@

rwlock_t lock;
atomic_t refcount;
+ struct completion complete;

list_t bus_list;
list_t devices;
@@ -102,13 +104,12 @@

int (*suspend) (struct device * dev, u32 state, u32 level);
int (*resume) (struct device * dev, u32 level);
-
- void (*release) (struct device_driver * drv);
};



extern int driver_register(struct device_driver * drv);
+extern void driver_unregister(struct device_driver * drv);

static inline struct device_driver * get_driver(struct device_driver * drv)
{
@@ -118,7 +119,6 @@
}

extern void put_driver(struct device_driver * drv);
-extern void remove_driver(struct device_driver * drv);

extern int driver_for_each_dev(struct device_driver * drv, void * data,
int (*callback)(struct device * dev, void * data));

2002-06-06 01:23:24

by Kai Germaschewski

[permalink] [raw]
Subject: Re: [patch] PCI device matching fix

On Wed, 5 Jun 2002, Patrick Mochel wrote:

> Alright, I can deal. I expanded on what you had, and moved it into the
> core. It's not tested, but lemme know what you think.

Looks fine to me, though I didn't test it either ;-)

However, it's possible to put the struct completion onto the stack
as I suggested, it saves a couple of bytes in struct driver, and, more
importantly, it'll blow up if the refcount goes to zero before
driver_unregister() has been called, so it provides a bug trap for
messed-up refcounting.

Anyway, that's a question of taste, I suppose.

--Kai

2002-06-06 19:13:20

by Patrick Mochel

[permalink] [raw]
Subject: Re: [patch] PCI device matching fix


On Wed, 5 Jun 2002, Kai Germaschewski wrote:

> On Wed, 5 Jun 2002, Patrick Mochel wrote:
>
> > Alright, I can deal. I expanded on what you had, and moved it into the
> > core. It's not tested, but lemme know what you think.
>
> Looks fine to me, though I didn't test it either ;-)
>
> However, it's possible to put the struct completion onto the stack
> as I suggested, it saves a couple of bytes in struct driver, and, more
> importantly, it'll blow up if the refcount goes to zero before
> driver_unregister() has been called, so it provides a bug trap for
> messed-up refcounting.

Actually, I don't think the completion is needed at all. For one, it
didn't work, and I found another bug. When we find a driver to a device,
we inc the refcount. But, when we detach the driver via driver_detach(),
we weren't decrementing it. The patch below does this. When you call
driver_unregister(), it detaches the driver, then decs the refcount once,
and Lo! it works.

Second, Greg brought up an interesting point: if something is using a
module, it should have already inc'd the refcount, and rmmod(8) won't let
you remove it. If we can remove a module, but something is still using it,
it's a bug. And, exposing those will get those fixed sooner...

That said, I'm an offender of that rule with driverfs. It's the topic of
another thread, and I haven't quite figured out the best solution. When we
open a file belonging to a driver, we want to inc the module refcount.
But, we don't currently have anyway to get at the module to do this.

The f_ops of the file has an owner, but the f_ops structure is shared by
all files. We could add an owner to struct driver_file_entry and struct
device_driver, but that would make the former twice as large. We could add
it only to the latter, but it would require a differnt open for each type
of structure (device, device_driver, bus_type) to deference and access the
owner.

I'm leading towards the last, and will see how clean it can be..

-pat


===== drivers/base/core.c 1.28 vs edited =====
--- 1.28/drivers/base/core.c Thu Jun 6 09:50:30 2002
+++ edited/drivers/base/core.c Thu Jun 6 11:23:50 2002
@@ -144,6 +144,7 @@
drv->remove(dev);
} else
unlock_device(dev);
+ put_driver(drv);
return 0;
}

===== drivers/base/driver.c 1.7 vs edited =====
--- 1.7/drivers/base/driver.c Wed Jun 5 15:59:31 2002
+++ edited/drivers/base/driver.c Thu Jun 6 12:07:34 2002
@@ -79,23 +79,18 @@
return 0;
}

-static void __remove_driver(struct device_driver * drv)
-{
- pr_debug("Unregistering driver '%s' from bus '%s'\n",drv->name,drv->bus->name);
- driver_detach(drv);
- driverfs_remove_dir(&drv->dir);
- if (drv->release)
- drv->release(drv);
- put_bus(drv->bus);
-}
-
-void remove_driver(struct device_driver * drv)
+void driver_unregister(struct device_driver * drv)
{
+ /* make sure no one can get to it via the bus any more */
write_lock(&drv->bus->lock);
- atomic_set(&drv->refcount,0);
list_del_init(&drv->bus_list);
write_unlock(&drv->bus->lock);
- __remove_driver(drv);
+
+ /* force removal from devices */
+ driver_detach(drv);
+
+ /* dec the refcount (this should clean it up) */
+ put_driver(drv);
}

/**
@@ -111,10 +106,14 @@
}
list_del_init(&drv->bus_list);
write_unlock(&drv->bus->lock);
- __remove_driver(drv);
+
+ pr_debug("Unregistering driver '%s' from bus '%s'\n",drv->name,drv->bus->name);
+ driverfs_remove_dir(&drv->dir);
+
+ put_bus(drv->bus);
}

EXPORT_SYMBOL(driver_for_each_dev);
EXPORT_SYMBOL(driver_register);
+EXPORT_SYMBOL(driver_unregister);
EXPORT_SYMBOL(put_driver);
-EXPORT_SYMBOL(remove_driver);
===== include/linux/device.h 1.20 vs edited =====
--- 1.20/include/linux/device.h Wed Jun 5 14:43:23 2002
+++ edited/include/linux/device.h Thu Jun 6 11:46:07 2002
@@ -102,13 +102,12 @@

int (*suspend) (struct device * dev, u32 state, u32 level);
int (*resume) (struct device * dev, u32 level);
-
- void (*release) (struct device_driver * drv);
};



extern int driver_register(struct device_driver * drv);
+extern void driver_unregister(struct device_driver * drv);

static inline struct device_driver * get_driver(struct device_driver * drv)
{
@@ -118,7 +117,6 @@
}

extern void put_driver(struct device_driver * drv);
-extern void remove_driver(struct device_driver * drv);

extern int driver_for_each_dev(struct device_driver * drv, void * data,
int (*callback)(struct device * dev, void * data));

2002-06-06 20:00:17

by Kai Germaschewski

[permalink] [raw]
Subject: Re: [patch] PCI device matching fix

On Thu, 6 Jun 2002, Patrick Mochel wrote:

> Actually, I don't think the completion is needed at all. For one, it
> didn't work, and I found another bug. When we find a driver to a device,
> we inc the refcount. But, when we detach the driver via driver_detach(),
> we weren't decrementing it. The patch below does this. When you call
> driver_unregister(), it detaches the driver, then decs the refcount once,
> and Lo! it works.

Well, of course, when the refcounting is messed up, it doesn't work, but
at least it pointed out the error - I suppose after fixing that, it did
work?

> Second, Greg brought up an interesting point: if something is using a
> module, it should have already inc'd the refcount, and rmmod(8) won't let
> you remove it. If we can remove a module, but something is still using it,
> it's a bug. And, exposing those will get those fixed sooner...

That's not quite right. It's okay to be using a module while the module
refcount is zero, in fact it's very common. But you need to have a
module_exit() function which makes sure that the module is not used
anymore before it returns.

I think a good example are network drivers, since I think they get the
refcounting right. If you load your network driver, it'll register_netdev(),
but not inc the module use count. If someone ifconfig up's the interface,
only then the module use count is incremented and the ::open() method is
called. If you ifdown the interface, the module use count goes back to
zero after ::close() has been called. However, the struct net_device
is still registered with the network layer.

When you rmmod the module, which is possible since its use count is zero,
module_exit() will call unregister_netdev(), which initiate unregistering
and then sleep until noone else is using the struct net_device anymore.
At that point it's save to return to the module, which will kfree()
and exit.

(One thing is different there, though not really: struct net_device is
alloced dynamically, so the problem is to kfree() only after noone else
uses it, for struct driver it's most likely statically alloced and freed
after module_exit(), so that's why we have to wait for other users to go
away)

Anyway, the completion/wait is needed, there may be someone else using
struct driver at this moment (say someone doing driver_for_each_dev() on
it). Otherwise, the refcount is completely useless, between
driver_register() and driver_unregister() it will always be >= 1, so
inc'ing / dec'ing doesn't make any difference, the only point where it is
needed is exactly at driver_unregister() time.

> That said, I'm an offender of that rule with driverfs. It's the topic of
> another thread, and I haven't quite figured out the best solution. When we
> open a file belonging to a driver, we want to inc the module refcount.
> But, we don't currently have anyway to get at the module to do this.

Well, if you do the refcounting correctly, this should be doable, too.

I don't really have the time to go through drivers/base and fs/driverfs in
detail, but the general idea would be to following:

At driver_unregister() time, you need to unregister the dir_entry from its
parent, so that it won't be found anymore. However, the dir_entry still
is referenced from the dentry, so you hopefully inc'ed the
refcount for that. Eventually, the dentry will become unreferenced and
driverfs_d_delete_file() be called. You kfree() the dentry and put the
reference you had - if it was the last one, the usual mechanism
kicks in, your completion gets done and your module removed.

Alternatively, make it possible to dissociate the dentry and dir_entry
before, so you can do that, drop the ref count there and someone keeping
open a dir in /driversfs won't be able to delay the module unload
indefinitely. (The latter problem you'd surely have when you go
to driver::owner and use the to inc the use count)

You didn't select an easy problem to hack on, that's for sure. But I think
it's possible to get it right, just consistently make sure your refcount
is always right, and don't forget about indirect references e.g. via
list_entry in driversfs/inode.c.

--Kai


2002-06-06 22:02:09

by Patrick Mochel

[permalink] [raw]
Subject: Re: [patch] PCI device matching fix


> That's not quite right. It's okay to be using a module while the module
> refcount is zero, in fact it's very common. But you need to have a
> module_exit() function which makes sure that the module is not used
> anymore before it returns.

Is that correct behavior? Shouldn't any use increment the refcount?

> I think a good example are network drivers, since I think they get the
> refcounting right. If you load your network driver, it'll register_netdev(),
> but not inc the module use count. If someone ifconfig up's the interface,
> only then the module use count is incremented and the ::open() method is
> called. If you ifdown the interface, the module use count goes back to
> zero after ::close() has been called. However, the struct net_device
> is still registered with the network layer.
>
> When you rmmod the module, which is possible since its use count is zero,
> module_exit() will call unregister_netdev(), which initiate unregistering
> and then sleep until noone else is using the struct net_device anymore.
> At that point it's save to return to the module, which will kfree()
> and exit.
>
> (One thing is different there, though not really: struct net_device is
> alloced dynamically, so the problem is to kfree() only after noone else
> uses it, for struct driver it's most likely statically alloced and freed
> after module_exit(), so that's why we have to wait for other users to go
> away)

One minor, though important, distinction is that there is one struct
net_device allocated for each device the driver is attached to. When
module_exit() calls pci_unregister_driver(), the driver's remove callback
is called for each of these devices, at which point the struct net_device
is freed.

The struct device_driver is statically allocated, since there is one per
driver/module.

> Anyway, the completion/wait is needed, there may be someone else using
> struct driver at this moment (say someone doing driver_for_each_dev() on
> it). Otherwise, the refcount is completely useless, between
> driver_register() and driver_unregister() it will always be >= 1, so
> inc'ing / dec'ing doesn't make any difference, the only point where it is
> needed is exactly at driver_unregister() time.

Yes, you're right. I think I owe you a beer for keeping me honest.

However, taking a step back... The only time the driver refcount will hit
0 is if it's going away. The only time it goes away is if the module is
removed. If someone is using the driver, you don't want the module to
unload. Instead of trying to keep the refcounts of the two objects
synchronized against each other, can't we just use the same one?

We can replace the refcount in struct device_driver with a struct module
pointer, which modular drivers will already have. get_driver and
put_driver can either go away, or simply become wrappers for touching the
module reference count.

> > That said, I'm an offender of that rule with driverfs. It's the topic of
> > another thread, and I haven't quite figured out the best solution. When we
> > open a file belonging to a driver, we want to inc the module refcount.
> > But, we don't currently have anyway to get at the module to do this.
>
> Well, if you do the refcounting correctly, this should be doable, too.
>
> I don't really have the time to go through drivers/base and fs/driverfs in
> detail, but the general idea would be to following:
...

I'm pretty sure the file/directory refcounting is ok. Famous last words,
but I'm doing something like you described. My concern was instead over
the validity of the callback pointers that a driver has registered for a
file.

A user opens a file. Before they read from it, the module is unloaded.
They try and read from it, the show() callback is referenced, and we
crash.

Currently, the driverfs open call increments the reference count of the
device that created it. There are currently no provisions for device
drivers or bus drivers. So, no matter what, I have to modify that call to
handle those types. If the refcounting is right, the module won't be
unloaded, and the callbacks will be valid.

-pat


2002-06-06 22:22:20

by Kai Germaschewski

[permalink] [raw]
Subject: Re: [patch] PCI device matching fix

On Thu, 6 Jun 2002, Patrick Mochel wrote:

> > That's not quite right. It's okay to be using a module while the module
> > refcount is zero, in fact it's very common. But you need to have a
> > module_exit() function which makes sure that the module is not used
> > anymore before it returns.
>
> Is that correct behavior? Shouldn't any use increment the refcount?

It's safe, as in not racy. It's the only way to do it if you want to be
able to use "rmmod" to unregister your driver, see below. Well, there was
some talk about going to a two stage module unload in 2.5. I don't
remember how exactly that worked, but it didn't happen yet, anyway.

> > (One thing is different there, though not really: struct net_device is
> > alloced dynamically, so the problem is to kfree() only after noone else
> > uses it, for struct driver it's most likely statically alloced and freed
> > after module_exit(), so that's why we have to wait for other users to go
> > away)
>
> One minor, though important, distinction is that there is one struct
> net_device allocated for each device the driver is attached to. When
> module_exit() calls pci_unregister_driver(), the driver's remove callback
> is called for each of these devices, at which point the struct net_device
> is freed.

That's what I meant above, the way the memory is deallocated is different,
kfree() vs. module_unload(), but the way to handle it is the same - wait
until it's all ours, and the free it - explicitly with kfree() vs
implicitly by finishing module_exit()

> However, taking a step back... The only time the driver refcount will hit
> 0 is if it's going away. The only time it goes away is if the module is
> removed. If someone is using the driver, you don't want the module to
> unload. Instead of trying to keep the refcounts of the two objects
> synchronized against each other, can't we just use the same one?

Yes, you're basically right, whether you use the refcount in struct driver
or the module use count doesn't make any difference. Except for one thing:
You cannot call module_exit() when the module count is > 0. Since only
then unregister_driver() is called, there's no way to get the use count to
zero and thus unload the module. One could of course change the policy to
call unregister_driver() not from module_exit(), but e.g. by
"echo remove > /driversfs/../my_driver", but it's surely not that I'm
suggesting this.

> We can replace the refcount in struct device_driver with a struct module
> pointer, which modular drivers will already have. get_driver and
> put_driver can either go away, or simply become wrappers for touching the
> module reference count.

They would still need to be wrappers for touching the module count
instead. So it doesn't buy anything.

> I'm pretty sure the file/directory refcounting is ok. Famous last words,
> but I'm doing something like you described. My concern was instead over
> the validity of the callback pointers that a driver has registered for a
> file.
>
> A user opens a file. Before they read from it, the module is unloaded.
> They try and read from it, the show() callback is referenced, and we
> crash.
>
> Currently, the driverfs open call increments the reference count of the
> device that created it. There are currently no provisions for device
> drivers or bus drivers. So, no matter what, I have to modify that call to
> handle those types. If the refcounting is right, the module won't be
> unloaded, and the callbacks will be valid.

Yeah, that seems right. Maybe it's better to not install callbacks
directly, but a reference to the struct driver, where the struct driver
contains the callbacks. This way it's explicit you still reference the
struct driver (and need to get/put it). But I didn't even look how you do
it currently, so maybe you're doing it that way already.

--Kai


2002-06-06 23:01:41

by Patrick Mochel

[permalink] [raw]
Subject: Re: [patch] PCI device matching fix


> Yes, you're basically right, whether you use the refcount in struct driver
> or the module use count doesn't make any difference. Except for one thing:
> You cannot call module_exit() when the module count is > 0. Since only
> then unregister_driver() is called, there's no way to get the use count to
> zero and thus unload the module. One could of course change the policy to
> call unregister_driver() not from module_exit(), but e.g. by
> "echo remove > /driversfs/../my_driver", but it's surely not that I'm
> suggesting this.

We can keep the same policy as modules now - keep the usage count at 0
when not in use. That way the module can unload at any time. module_exit()
calls driver_unregister(), which, down the line, calls the drivers'
remove() for each device attached to it.

That seems a lot cleaner than anything so far; we leverage the existing
module infrastructure as much as possible. I'll see about codifying the
concept in the next day or so and sending it out..

-pat

2002-06-06 23:28:02

by Patrick Mochel

[permalink] [raw]
Subject: Re: [patch] PCI device matching fix


> That seems a lot cleaner than anything so far; we leverage the existing
> module infrastructure as much as possible. I'll see about codifying the
> concept in the next day or so and sending it out..

Actually, it appears to be as easy as the patch below. Or, I just could
just not know what the hell I'm talking about.

Things appear to be working (with the eepro100 as a module)....

-pat

===== drivers/base/driver.c 1.7 vs edited =====
--- 1.7/drivers/base/driver.c Wed Jun 5 15:59:31 2002
+++ edited/drivers/base/driver.c Thu Jun 6 16:13:35 2002
@@ -67,9 +67,9 @@
pr_debug("Registering driver '%s' with bus '%s'\n",drv->name,drv->bus->name);

get_bus(drv->bus);
- atomic_set(&drv->refcount,2);
rwlock_init(&drv->lock);
INIT_LIST_HEAD(&drv->devices);
+ SET_MODULE_OWNER(drv);
write_lock(&drv->bus->lock);
list_add(&drv->bus_list,&drv->bus->drivers);
write_unlock(&drv->bus->lock);
@@ -79,42 +79,41 @@
return 0;
}

-static void __remove_driver(struct device_driver * drv)
+void driver_unregister(struct device_driver * drv)
{
- pr_debug("Unregistering driver '%s' from bus '%s'\n",drv->name,drv->bus->name);
+ struct bus_type * bus;
+
+ write_lock(&drv->lock);
+ bus = drv->bus;
+ drv->bus = NULL;
+ write_unlock(&drv->lock);
+
+ /* make sure no one can get to it via the bus any more */
+ write_lock(&bus->lock);
+ list_del_init(&drv->bus_list);
+ write_unlock(&bus->lock);
+
+ /* force removal from devices */
driver_detach(drv);
+
+ pr_debug("Unregistering driver '%s' from bus '%s'\n",drv->name,drv->bus->name);
driverfs_remove_dir(&drv->dir);
- if (drv->release)
- drv->release(drv);
- put_bus(drv->bus);
+
+ put_bus(bus);
}

-void remove_driver(struct device_driver * drv)
+struct device_driver * get_driver(struct device_driver * drv)
{
- write_lock(&drv->bus->lock);
- atomic_set(&drv->refcount,0);
- list_del_init(&drv->bus_list);
- write_unlock(&drv->bus->lock);
- __remove_driver(drv);
+ __MOD_INC_USE_COUNT(drv->owner);
+ return drv;
}

-/**
- * put_driver - decrement driver's refcount and clean up if necessary
- * @drv: driver in question
- */
void put_driver(struct device_driver * drv)
{
- write_lock(&drv->bus->lock);
- if (!atomic_dec_and_test(&drv->refcount)) {
- write_unlock(&drv->bus->lock);
- return;
- }
- list_del_init(&drv->bus_list);
- write_unlock(&drv->bus->lock);
- __remove_driver(drv);
+ __MOD_DEC_USE_COUNT(drv->owner);
}

EXPORT_SYMBOL(driver_for_each_dev);
EXPORT_SYMBOL(driver_register);
+EXPORT_SYMBOL(driver_unregister);
EXPORT_SYMBOL(put_driver);
-EXPORT_SYMBOL(remove_driver);
===== include/linux/device.h 1.20 vs edited =====
--- 1.20/include/linux/device.h Wed Jun 5 14:43:23 2002
+++ edited/include/linux/device.h Thu Jun 6 16:13:19 2002
@@ -90,7 +90,7 @@
struct bus_type * bus;

rwlock_t lock;
- atomic_t refcount;
+ struct module * owner;

list_t bus_list;
list_t devices;
@@ -102,23 +102,15 @@

int (*suspend) (struct device * dev, u32 state, u32 level);
int (*resume) (struct device * dev, u32 level);
-
- void (*release) (struct device_driver * drv);
};



extern int driver_register(struct device_driver * drv);
+extern void driver_unregister(struct device_driver * drv);

-static inline struct device_driver * get_driver(struct device_driver * drv)
-{
- BUG_ON(!atomic_read(&drv->refcount));
- atomic_inc(&drv->refcount);
- return drv;
-}
-
+extern struct device_driver * get_driver(struct device_driver * drv);
extern void put_driver(struct device_driver * drv);
-extern void remove_driver(struct device_driver * drv);

extern int driver_for_each_dev(struct device_driver * drv, void * data,
int (*callback)(struct device * dev, void * data));

2002-06-07 00:02:56

by Kai Germaschewski

[permalink] [raw]
Subject: Re: [patch] PCI device matching fix

On Thu, 6 Jun 2002, Patrick Mochel wrote:

> Actually, it appears to be as easy as the patch below. Or, I just could
> just not know what the hell I'm talking about.
>
> Things appear to be working (with the eepro100 as a module)....
>
> -pat
>
> ===== drivers/base/driver.c 1.7 vs edited =====
> --- 1.7/drivers/base/driver.c Wed Jun 5 15:59:31 2002
> +++ edited/drivers/base/driver.c Thu Jun 6 16:13:35 2002
> @@ -67,9 +67,9 @@
> pr_debug("Registering driver '%s' with bus '%s'\n",drv->name,drv->bus->name);
>
> get_bus(drv->bus);
> - atomic_set(&drv->refcount,2);
> rwlock_init(&drv->lock);
> INIT_LIST_HEAD(&drv->devices);
> + SET_MODULE_OWNER(drv);
> write_lock(&drv->bus->lock);
> list_add(&drv->bus_list,&drv->bus->drivers);
> write_unlock(&drv->bus->lock);

drivers/base is always compiled in. So SET_MODULE_OWNER will set the owner
to NULL.

> -void remove_driver(struct device_driver * drv)
> +struct device_driver * get_driver(struct device_driver * drv)
> {
> - write_lock(&drv->bus->lock);
> - atomic_set(&drv->refcount,0);
> - list_del_init(&drv->bus_list);
> - write_unlock(&drv->bus->lock);
> - __remove_driver(drv);
> + __MOD_INC_USE_COUNT(drv->owner);
> + return drv;
> }
>
> -/**
> - * put_driver - decrement driver's refcount and clean up if necessary
> - * @drv: driver in question
> - */
> void put_driver(struct device_driver * drv)
> {
> - write_lock(&drv->bus->lock);
> - if (!atomic_dec_and_test(&drv->refcount)) {
> - write_unlock(&drv->bus->lock);
> - return;
> - }
> - list_del_init(&drv->bus_list);
> - write_unlock(&drv->bus->lock);
> - __remove_driver(drv);
> + __MOD_DEC_USE_COUNT(drv->owner);
> }

So the __MOD_{INC,DEC}_USE_COUNT() should oops right here.
If you tested it and it doesn't oops, I don't understand why.

So, for one, if you want to go that road, you should use fops_get()/put()
(you can use just these, or rename them appropriately), they'll do the
right thing if a thing is modular as opposed to built-in (owner == NULL).

And, you need to set the owner from the module which you want to protect.

So in the your driver:

struct pci_driver my_drv {
probe: ...
driver : {
owner: THIS_MODULE,
}
}

which certainly is ugly since owner is in a sub-struct. But it's not
really a possibility anyway, since in this case pci_register_driver will
do a MOD_INC_USE_COUNT and you never have a chance to call
pci_unregister_driver() to decrement it again, since you need to have it
decremented before you can call pci_unregister_driver() (because
that's called from your module_exit() function).

--Kai



2002-06-10 14:21:40

by Patrick Mochel

[permalink] [raw]
Subject: Re: [patch] PCI device matching fix


Hi. Sorry about the long delay in replying, I've been out of town, and am
actually about to leave again for a week+ in a few hours...

> So the __MOD_{INC,DEC}_USE_COUNT() should oops right here.
> If you tested it and it doesn't oops, I don't understand why.
>
> So, for one, if you want to go that road, you should use fops_get()/put()
> (you can use just these, or rename them appropriately), they'll do the
> right thing if a thing is modular as opposed to built-in (owner == NULL).

Ah yes; patch appended which does just that.

> And, you need to set the owner from the module which you want to protect.
>
> So in the your driver:
>
> struct pci_driver my_drv {
> probe: ...
> driver : {
> owner: THIS_MODULE,
> }
> }

This is certainly not the prettiest, and I've run into it when setting
other generic driver fields. Is there a cleaner way to do this?

> which certainly is ugly since owner is in a sub-struct. But it's not
> really a possibility anyway, since in this case pci_register_driver will
> do a MOD_INC_USE_COUNT and you never have a chance to call
> pci_unregister_driver() to decrement it again, since you need to have it
> decremented before you can call pci_unregister_driver() (because
> that's called from your module_exit() function).

pci_register_driver() doesn't inc the refcount. The refcount is held at 1
by the module loading code, IIUC, until module_init() is done. After that,
the refcount stays at 0 until someone uses it.

-pat

===== drivers/base/driver.c 1.7 vs edited =====
--- 1.7/drivers/base/driver.c Wed Jun 5 15:59:31 2002
+++ edited/drivers/base/driver.c Mon Jun 10 06:54:00 2002
@@ -67,54 +67,55 @@
pr_debug("Registering driver '%s' with bus '%s'\n",drv->name,drv->bus->name);

get_bus(drv->bus);
- atomic_set(&drv->refcount,2);
rwlock_init(&drv->lock);
INIT_LIST_HEAD(&drv->devices);
+ SET_MODULE_OWNER(drv);
write_lock(&drv->bus->lock);
list_add(&drv->bus_list,&drv->bus->drivers);
write_unlock(&drv->bus->lock);
driver_make_dir(drv);
driver_attach(drv);
- put_driver(drv);
return 0;
}

-static void __remove_driver(struct device_driver * drv)
+void driver_unregister(struct device_driver * drv)
{
- pr_debug("Unregistering driver '%s' from bus '%s'\n",drv->name,drv->bus->name);
+ struct bus_type * bus;
+
+ write_lock(&drv->lock);
+ bus = drv->bus;
+ drv->bus = NULL;
+ write_unlock(&drv->lock);
+
+ /* make sure no one can get to it via the bus any more */
+ write_lock(&bus->lock);
+ list_del_init(&drv->bus_list);
+ write_unlock(&bus->lock);
+
+ /* force removal from devices */
driver_detach(drv);
+
+ pr_debug("Unregistering driver '%s' from bus '%s'\n",drv->name,drv->bus->name);
driverfs_remove_dir(&drv->dir);
- if (drv->release)
- drv->release(drv);
- put_bus(drv->bus);
+
+ put_bus(bus);
}

-void remove_driver(struct device_driver * drv)
+struct device_driver * get_driver(struct device_driver * drv)
{
- write_lock(&drv->bus->lock);
- atomic_set(&drv->refcount,0);
- list_del_init(&drv->bus_list);
- write_unlock(&drv->bus->lock);
- __remove_driver(drv);
+ if (drv && drv->owner)
+ if (!try_inc_mod_count(drv->owner))
+ return NULL;
+ return drv;
}

-/**
- * put_driver - decrement driver's refcount and clean up if necessary
- * @drv: driver in question
- */
void put_driver(struct device_driver * drv)
{
- write_lock(&drv->bus->lock);
- if (!atomic_dec_and_test(&drv->refcount)) {
- write_unlock(&drv->bus->lock);
- return;
- }
- list_del_init(&drv->bus_list);
- write_unlock(&drv->bus->lock);
- __remove_driver(drv);
+ if (drv && drv->owner)
+ __MOD_DEC_USE_COUNT(drv->owner);
}

EXPORT_SYMBOL(driver_for_each_dev);
EXPORT_SYMBOL(driver_register);
+EXPORT_SYMBOL(driver_unregister);
EXPORT_SYMBOL(put_driver);
-EXPORT_SYMBOL(remove_driver);
===== include/linux/device.h 1.20 vs edited =====
--- 1.20/include/linux/device.h Wed Jun 5 14:43:23 2002
+++ edited/include/linux/device.h Thu Jun 6 16:13:19 2002
@@ -90,7 +90,7 @@
struct bus_type * bus;

rwlock_t lock;
- atomic_t refcount;
+ struct module * owner;

list_t bus_list;
list_t devices;
@@ -102,23 +102,15 @@

int (*suspend) (struct device * dev, u32 state, u32 level);
int (*resume) (struct device * dev, u32 level);
-
- void (*release) (struct device_driver * drv);
};



extern int driver_register(struct device_driver * drv);
+extern void driver_unregister(struct device_driver * drv);

-static inline struct device_driver * get_driver(struct device_driver * drv)
-{
- BUG_ON(!atomic_read(&drv->refcount));
- atomic_inc(&drv->refcount);
- return drv;
-}
-
+extern struct device_driver * get_driver(struct device_driver * drv);
extern void put_driver(struct device_driver * drv);
-extern void remove_driver(struct device_driver * drv);

extern int driver_for_each_dev(struct device_driver * drv, void * data,
int (*callback)(struct device * dev, void * data));

2002-06-10 14:39:49

by Kai Germaschewski

[permalink] [raw]
Subject: Re: [patch] PCI device matching fix

On Mon, 10 Jun 2002, Patrick Mochel wrote:

> > And, you need to set the owner from the module which you want to protect.
> >
> > So in the your driver:
> >
> > struct pci_driver my_drv {
> > probe: ...
> > driver : {
> > owner: THIS_MODULE,
> > }
> > }
>
> This is certainly not the prettiest, and I've run into it when setting
> other generic driver fields. Is there a cleaner way to do this?

Depends on your definition of clean ;) The only thing I can think of is
adding some new macro which hides it. Which is still ugly. Or, of course,
unnamed struct members, which however I think is not supported by gcc < 3.
It may be something to think about.

> pci_register_driver() doesn't inc the refcount. The refcount is held at 1
> by the module loading code, IIUC, until module_init() is done. After that,
> the refcount stays at 0 until someone uses it.

Sounds okay. Well, I'm generally lost in too many patches, if you put
the whole thing onto bkbits.net or so, I'll try to give it a look.

--Kai

2002-06-10 16:22:23

by Patrick Mochel

[permalink] [raw]
Subject: Re: [patch] PCI device matching fix


> Sounds okay. Well, I'm generally lost in too many patches, if you put
> the whole thing onto bkbits.net or so, I'll try to give it a look.

Heh, sure. You can pull from bk://ldm.bkbits.net/linux-2.5-module.

-pat