2002-07-08 18:42:16

by Patrick Mochel

[permalink] [raw]
Subject: Driverfs updates


Quick Summary

- Add struct module * owner field to struct device_driver
- Change {get,put}_driver to use it

- Add void * object to struct driver_file_entry that points to owner of
file.
- Add struct driverfs_subsys with {get,put}_object callbacks to do generic
reference counting on file open/release

- Convert driverfs interface to struct devices to use above mechnanism
- Add 'bindings' between struct device_driver and struct bus_type, using
above mechanism to do proper reference counting on objects.

Pull from bk://ldm.bkbits.net/linux-2.5-driverfs

Patch also available from:

http://kernel.org/pub/linux/kernel/people/mochel/patches/driverfs-patch-08072002.gz


Changelog and diffstat appended.

Explanation:

driverfs currently does reference counting on struct device on file open()
and release(). It accesses the device by doing a list_entry on the
parent directory entry of the driver_file_entry (which is in the inode's
u.generic_ip).

This works fine, but is not very extensible to other objects that we want
to export files for, like struct device_driver. One way to do it is create
separate open/release callbacks for each object type that does a similar
list_entry. But, I considered that bulky and instead genericized a sole
pair of open and release calls.

To make it work for all objects, I added

void * object;

to struct driver_file_entry, and created

struct driverfs_subsys {
int (*get_object)(void * obj);
void (*put_object)(void * obj);
};

and
struct driverfs_subsys * subsys;

in struct driver_dir_entry to do reference counting on that object.

Wait, come back; it should work.


Each type of object that is represented in driverfs gets a directory that
is created when registered with the subsystem they belong to. When this
directory is created, the subsystem is responsible for setting the
subsys pointer in the directory entry.

To create files, there should be wrapper functions for each type of object
that take an explicit pointer to that object. E.g.

device_create_file
driver_create_file
etc.

The subsystem is responsibile for setting the object pointer on file
creation.

During open(), we get the driver_file_entry from the inode's u.generic_ip.
>From that, we get the parent driver_dir_entry and the subsys structure. We
pass the driver_file_entry::object to the subsys's get_object() callback.
The subsystem pins the object in memory, and life is a bit happier.

On release, we do the same in order to call put_object().

Having get/set functions that take opaque pointers can appear to
completely sacrifice type-safety. However, the fact that the only thing we
have to work with initially is an opaque pointer in u.generic_ip makes
that point, IMHO.

Plus, the driverfs code never relies on the validity of that pointer. It
simply passes it back to the subsystem to verify it and refcount. It's the
sole responsibility of the subsystem to not do something stupid.

It is possible to change the contents of the object pointer, the
driver_file_entry, or the inode after the file has been created. But, if
you aim the gun at your foot, and your finger slips off the trigger, it's
not my fault...


Anyway, it is now really easy to extend driverfs to support any type of
object. To add 'bindings' for a particular object type, it is literally
about 45 lines (excluding comments). See drivers/base/fs/*.c for examples.


Concerning reference counting on objects that aren't devices; i.e.
drivers: Drivers can be modular, and the refcount wasn't tied to the
module refcount. I went around a few times with Kai about this, about a
month ago. What I ended doing was adding an owner field to the
device_driver structure, and using that to do refcounting.

During normal operation, the refcount stays at 0, so the module can be
removed. During any access to the driver, it bumps up the refcount. (This
likely still suffers from the same module race conditions, but it's a step
in the right direction.)


I've also updated the driverfs documentation to reflect the recent changes
and removed most gross inaccuracies.


-pat


[email protected], 2002-06-10 09:12:20-07:00, [email protected]
driver refcounting:
Make {get,put}_driver simply wrappers for touching module reference count
Get rid of driver's release callback
Rename remove_driver to driver_unregister

diffstat results:
drivers/base/driver.c | 53 +++++++++++++++++++++++------------------------
drivers/pci/pci-driver.c | 2 -
include/linux/device.h | 14 ++----------
3 files changed, 31 insertions, 38 deletions

[email protected], 2002-07-03 13:40:09-07:00, [email protected]
Don't set module owner in driver_register, that's just dumb. Drivers need to do it explicitly.

diffstat results:
drivers/base/driver.c | 1 -
1 files changed, 1 deletion

[email protected], 2002-07-03 13:59:55-07:00, [email protected]
driverfs: Break out file creation functions from inode.c into lib.c

diffstat results:
fs/driverfs/Makefile | 4
fs/driverfs/inode.c | 278 -------------------------------------------------
fs/driverfs/lib.c | 289 +++++++++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 292 insertions, 279 deletions

[email protected], 2002-07-03 14:21:45-07:00, [email protected]
driverfs:
- typedef show and store callbacks in driver_file_entry
- Add an object pointer that points to the object that owns the file

diffstat results:
include/linux/driverfs_fs.h | 9 +++++++--
1 files changed, 7 insertions, 2 deletions

[email protected], 2002-07-03 14:22:41-07:00, [email protected]
driverfs file creation for devices:
Set the object pointer in the struct driver_file_entry when creating a file for the device

diffstat results:
drivers/base/fs.c | 2 ++
1 files changed, 2 insertions

[email protected], 2002-07-03 14:24:04-07:00, [email protected]
driverfs:
- Use the object field of struct driver_file_entry when calling the owner's show and store callbacks
- Use the object field on open and release, instead of using list_entry

diffstat results:
fs/driverfs/inode.c | 14 ++++----------
1 files changed, 4 insertions, 10 deletions

[email protected], 2002-07-03 15:31:56-07:00, [email protected]
driverfs:
- add driverfs_subsys to describe a subsystem that owns a class of objects that are represented in driverfs
Previously, all directories in driverfs mapped to struct device's. This assumption was used to do reference
counting on the devices. They were also passed to the show and store callbacks of the file owners.

However, now we want to be able to add files for other types of objects. We want to do reference counting on these
objects, and pass them downstream. The show and store callbacks were modified to take void *, instead of creating
a different type of show and store for each type of object that can have files.

Reference counting takes place in file open and release. Instead of defining a completely separate open and release
for each subsystem or type of object, I've implemented only what we need - callbacks to increment and decrement
the reference count of the target objects.

It's assumed that each object that has presence in driverfs has a directory. When this directory is created (by the
subsystem that owns it), it the subsys field needs to be set appropriately.


diffstat results:
include/linux/driverfs_fs.h | 6 ++++++
1 files changed, 6 insertions

[email protected], 2002-07-03 15:37:54-07:00, [email protected]
driverfs:
Use the driverfs_subsys types in open and release to do reference counting on the object that owns the file
It tries to be stringent about the whole situation:
- Return an error if
- no subsys is present
- get_object() fails (returns 0)
- Don't return error if
- get_object() isn't implemented (no ref counting on objects)
- get_object() returns 1

On release, it's assumed the subsys pointer is valid, since we wouldn't have allowed file open if it wasn't.

diffstat results:
fs/driverfs/inode.c | 31 ++++++++++++++++++++++---------
1 files changed, 22 insertions, 9 deletions

[email protected], 2002-07-03 15:39:08-07:00, [email protected]
device <-> driverfs interface:
Declare a driverfs_device_subsys, complete with get_object and put_object callbacks
Set the pointer in device's directory entry when their directory is created

diffstat results:
drivers/base/fs.c | 18 ++++++++++++++++++
1 files changed, 18 insertions

[email protected], 2002-07-03 16:58:18-07:00, [email protected]
device model:
Move driverfs interface (fs.c) to drivers/base/fs/

diffstat results:
drivers/base/fs.c | 217 -----------------------------------------------
drivers/base/Makefile | 4
drivers/base/fs/Makefile | 5 +
drivers/base/fs/fs.c | 217 +++++++++++++++++++++++++++++++++++++++++++++++
4 files changed, 225 insertions, 218 deletions

[email protected], 2002-07-03 17:04:29-07:00, [email protected]
device model - driverfs bindings:
Move device <-> driverfs binding to separate file; pave way for new bindings to come

diffstat results:
drivers/base/fs/Makefile | 4 -
drivers/base/fs/device.c | 127 +++++++++++++++++++++++++++++++++++++++++++++++
drivers/base/fs/fs.c | 116 ------------------------------------------
drivers/base/fs/fs.h | 5 +
4 files changed, 134 insertions, 118 deletions

[email protected], 2002-07-03 17:42:02-07:00, [email protected]
device model:
- move driverfs <-> bus bindings into drivers/base/fs/bus.c
- add helpers to create/remove files for buses

diffstat results:
drivers/base/base.h | 3 +
drivers/base/bus.c | 38 -----------------
drivers/base/fs/Makefile | 4 -
drivers/base/fs/bus.c | 105 +++++++++++++++++++++++++++++++++++++++++++++++
4 files changed, 111 insertions, 39 deletions

[email protected], 2002-07-03 17:54:20-07:00, [email protected]
device model driverfs bindings:
Implement device_dup_file for use by various specific bindings:
- allocate new driver_file_entry
- copy template in
- create driverfs file
- return error to caller

diffstat results:
drivers/base/fs/bus.c | 22 +++++-----------------
drivers/base/fs/device.c | 24 ++++++------------------
drivers/base/fs/fs.c | 18 ++++++++++++++++++
drivers/base/fs/fs.h | 2 ++
4 files changed, 31 insertions, 35 deletions

[email protected], 2002-07-05 13:55:01-07:00, [email protected]
device model <-> driverfs bindings
Rename device_dup_file to common_create_file
Make it inc/dec reference count on object that's passed in
Add common_remove_file that touches reference count of object and calls driverfs to remove file

diffstat results:
drivers/base/fs/fs.c | 35 +++++++++++++++++++++++++----------
drivers/base/fs/fs.h | 3 ++-
2 files changed, 27 insertions, 11 deletions

[email protected], 2002-07-05 13:57:30-07:00, [email protected]
Add device driver <-> driverfs bindings

diffstat results:
drivers/base/base.h | 3 ++
drivers/base/driver.c | 10 -------
drivers/base/fs/driver.c | 60 +++++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 63 insertions, 10 deletions

[email protected], 2002-07-05 13:58:13-07:00, [email protected]
add driver.o target in drivers/base/fs/

diffstat results:
drivers/base/fs/Makefile | 4 ++--
1 files changed, 2 insertions, 2 deletions

[email protected], 2002-07-05 13:59:26-07:00, [email protected]
device model, bus drivers
Convert bus_create_file to use common_create_file and bus_remove_file to use common_remove_file (and let them handle reference counting)
Export bus_{create,remove}_file

diffstat results:
drivers/base/fs/bus.c | 19 +++++--------------
1 files changed, 5 insertions, 14 deletions

[email protected], 2002-07-05 14:00:42-07:00, [email protected]
Convert device_{create,remove}_file to use common_{create,remove}_file

diffstat results:
drivers/base/fs/device.c | 24 +++++++-----------------
1 files changed, 7 insertions, 17 deletions

[email protected], 2002-07-05 14:01:40-07:00, [email protected]
Add prototypes for {bus,driver}_{create,remove}_file to include/linux/device.h

diffstat results:
include/linux/device.h | 8 ++++++++
1 files changed, 8 insertions

[email protected], 2002-07-05 15:54:13-07:00, [email protected]
Rewrite driverfs documentation

diffstat results:
Documentation/filesystems/driverfs.txt | 366 ++++++++++++++++++++++-----------
1 files changed, 251 insertions, 115 deletions



2002-07-08 23:30:56

by Keith Owens

[permalink] [raw]
Subject: Re: Driverfs updates

On Mon, 8 Jul 2002 11:41:52 -0700 (PDT),
Patrick Mochel <[email protected]> wrote:
>- Add struct module * owner field to struct device_driver
>- Change {get,put}_driver to use it

struct device_driver * get_driver(struct device_driver * drv)
{
if (drv && drv->owner)
if (!try_inc_mod_count(drv->owner))
return NULL;
return drv;
}

is racy. The module can be unloaded after if (drv->owner) and before
try_inc_mod_count. To prevent that race, drv itself must be locked
around calls to get_driver().

The "normal" method is to have a high level lock that controls the drv
list and to take that lock in the register and unregister routines and
around the call to try_inc_mod_count. drv->bus->lock is no good,
anything that relies on reading drv without a lock or module reference
count is racy. I suggest you add a global driverfs_lock.

2002-07-08 23:49:41

by Thunder from the hill

[permalink] [raw]
Subject: Re: Driverfs updates

Hi,

On Tue, 9 Jul 2002, Keith Owens wrote:
> struct device_driver * get_driver(struct device_driver * drv)
> {
+ struct device_driver *ret = NULL;
+
+ if (!drv)
+ goto out;
+ lock_somehow(drv->lock);
+ if (drv->owner)
> if (!try_inc_mod_count(drv->owner))
+ goto out;
+
+ ret = drv;
+ out:
+ unlock_somehow(drv->lock);
+ return ret;
> }
>
> I suggest you add a global driverfs_lock.

Better than locking all kernel threads, isn't it?

Regards,
Thunder
--
(Use http://www.ebb.org/ungeek if you can't decode)
------BEGIN GEEK CODE BLOCK------
Version: 3.12
GCS/E/G/S/AT d- s++:-- a? C++$ ULAVHI++++$ P++$ L++++(+++++)$ E W-$
N--- o? K? w-- O- M V$ PS+ PE- Y- PGP+ t+ 5+ X+ R- !tv b++ DI? !D G
e++++ h* r--- y-
------END GEEK CODE BLOCK------

2002-07-09 00:07:09

by Keith Owens

[permalink] [raw]
Subject: Re: Driverfs updates

On Mon, 8 Jul 2002 17:52:13 -0600 (MDT),
Thunder from the hill <[email protected]> wrote:
>Hi,
>
>On Tue, 9 Jul 2002, Keith Owens wrote:
>> struct device_driver * get_driver(struct device_driver * drv)
>> {
> + struct device_driver *ret = NULL;
> +
> + if (!drv)
> + goto out;
> + lock_somehow(drv->lock);
> + if (drv->owner)
>> if (!try_inc_mod_count(drv->owner))
> + goto out;
> +
> + ret = drv;
> + out:
> + unlock_somehow(drv->lock);
> + return ret;
>> }
>>
>> I suggest you add a global driverfs_lock.
>
>Better than locking all kernel threads, isn't it?

What protects drv in that code? drv is a dynamically registered object
and can be dynamically unregistered and freed at any time from another
cpu, or even the same cpu with preempt. Any reference to drv without
an external lock or a reference count on the module that registered drv
is racy. In particular, you cannot use drv->anything to protect drv!

The global driverfs_lock is required to protect the bus/drv list
against changes while you are processing an entry on the list AND that
entry is in a module with a use count of 0. In that state, you have an
uncounted reference to module data which must be protected until you
can set the use count, at which point the use count will take over and
protect the structure.

Did I mention that this method is complex and fragile?

2002-07-09 08:51:44

by Oliver Neukum

[permalink] [raw]
Subject: Re: Driverfs updates


> > I suggest you add a global driverfs_lock.
>
> Better than locking all kernel threads, isn't it?

No, it is not, not by far.

-It means that modules are not transparent.
-Everybody is punished, module or no module.
-It limits modules to providing open/use/close APIs.
-It is slow.
-Modules can only be used on APIs that provide for them.

By freezing, which happens only on module removal,
only users of modules are punished. Handling of
module usage counts can be encapsulated in the modules
themselves. And alternative methods of determining removability
are possible.

Face it, SMP and module unloading have some fundamental problems.
Therefore you switch the box to pseudo-UP for unloading,
that's what freeze effectively does. You just have to disable preempt
on all CPUs and wait for the tasks running to leave kernel.

Cleanly killing a kernel thread of a module is duty of the module's
cleanup function. Independent kernel threads which use a module
must be allowed to sleep voluntarily before they are frozen.
In this case the old rule of "INC before you sleep" is valid again.

Regards
Oliver

2002-07-09 11:05:34

by Thunder from the hill

[permalink] [raw]
Subject: Re: Driverfs updates

Hi,

On Tue, 9 Jul 2002, Oliver Neukum wrote:
> -It is slow.

I wouldn't call it any fast when I think about the idea that 31 of my CPUs
on Hawkeye shall be stopped because I unload a module. Sometimes at high
noon my server (Hawkeye) can hardly keep up all the traffic. Just imagine
a module would be unloaded then! That's the problem I'm having with it.

What should make a lock for parts of the kernel slower than a lock for
the _whole_ kernel?

Regards,
Thunder
--
(Use http://www.ebb.org/ungeek if you can't decode)
------BEGIN GEEK CODE BLOCK------
Version: 3.12
GCS/E/G/S/AT d- s++:-- a? C++$ ULAVHI++++$ P++$ L++++(+++++)$ E W-$
N--- o? K? w-- O- M V$ PS+ PE- Y- PGP+ t+ 5+ X+ R- !tv b++ DI? !D G
e++++ h* r--- y-
------END GEEK CODE BLOCK------

2002-07-09 11:41:34

by Richard B. Johnson

[permalink] [raw]
Subject: Re: Driverfs updates

On Tue, 9 Jul 2002, Thunder from the hill wrote:

> Hi,
>
> On Tue, 9 Jul 2002, Oliver Neukum wrote:
> > -It is slow.
>
> I wouldn't call it any fast when I think about the idea that 31 of my CPUs
> on Hawkeye shall be stopped because I unload a module. Sometimes at high
> noon my server (Hawkeye) can hardly keep up all the traffic. Just imagine
> a module would be unloaded then! That's the problem I'm having with it.
>
> What should make a lock for parts of the kernel slower than a lock for
> the _whole_ kernel?
>
> Regards,
> Thunder

The module unload is to be used only during module development (so you
don't have to re-boot), as was the very first conjecture in this thread.

The current 'auto-unload' in some distributions like RedHat will go away.
The only way a module will be unloaded is if you, as root, unload it.

Cheers,
Dick Johnson

Penguin : Linux version 2.4.18 on an i686 machine (797.90 BogoMips).

Windows-2000/Professional isn't.

2002-07-09 12:18:01

by David D. Hagood

[permalink] [raw]
Subject: Re: Driverfs updates

I've been following this thread for some time, and one aspect of it
disturbs me - the principle of symmetry.

I've found that generally, in design thing should be symmetric - if you
can turn a thing on, you could be able to turn it off, if you can heat a
thing, you should be able to cool it, and if you can load a thing, you
should be able to unload it.

In the old days, a computer was "complete" when it booted - all things
that ever would be in the machine during that run were present at boot,
and the only way something could be added would be to turn the machine
off. In such an environment, a monolithic kernel loaded at boot made sense.

Now, we have things like Firewire, USB, Bluetooth, PCMCIA, Hot-Plug PCI
and TCP/IP attached devices, all of which can come and go as they
please. Loadable modules made supporting such things easy - witness the
trouble WinNT had dealing with PCMCIA vs. Linux.

However, if you cannot unload modules, then the kernel grows without
bound - the mere fact that a Bluetooth camera came into range causes the
kernel to grow as the driver gets loaded. True, you could go in as root
and clean up, but it seems to me that requiring root to do that sort of
periodic maintainance prevents Linux from being able to be the Energizer
Bunny OS - "it keeps going and going...." without much diddling.

It seems to me the problem is in designing modules to unload, and saying
"Then don't unload them" is not even a band-aid - it is willful
ignorance. If there is a potential race condition unloading a module,
then the module is BROKEN.

2002-07-09 12:31:10

by Thunder from the hill

[permalink] [raw]
Subject: Re: Driverfs updates

Hi,

On Tue, 9 Jul 2002, David D. Hagood wrote:
> Now, we have things like Firewire, USB, Bluetooth, PCMCIA, Hot-Plug PCI
> and TCP/IP attached devices, all of which can come and go as they
> please. Loadable modules made supporting such things easy - witness the
> trouble WinNT had dealing with PCMCIA vs. Linux.
>
> However, if you cannot unload modules, then the kernel grows without
> bound - the mere fact that a Bluetooth camera came into range causes the
> kernel to grow as the driver gets loaded. True, you could go in as root
> and clean up, but it seems to me that requiring root to do that sort of
> periodic maintainance prevents Linux from being able to be the Energizer
> Bunny OS - "it keeps going and going...." without much diddling.

If you plug any hotplug devices, the kernel allocates some space for the
control structures, and if you unplug it, structures get unallocated. No
need to do it as a module. Kernel grows _and_ shrinks on runtime, even
with CONFIG_MODULE=n.

> It seems to me the problem is in designing modules to unload, and saying
> "Then don't unload them" is not even a band-aid - it is willful
> ignorance. If there is a potential race condition unloading a module,
> then the module is BROKEN.

Our discussion is not _whether_ to support module unloading, but how to
support it sensibly on SMP systems.

Regards,
Thunder
--
(Use http://www.ebb.org/ungeek if you can't decode)
------BEGIN GEEK CODE BLOCK------
Version: 3.12
GCS/E/G/S/AT d- s++:-- a? C++$ ULAVHI++++$ P++$ L++++(+++++)$ E W-$
N--- o? K? w-- O- M V$ PS+ PE- Y- PGP+ t+ 5+ X+ R- !tv b++ DI? !D G
e++++ h* r--- y-
------END GEEK CODE BLOCK------

2002-07-09 14:36:02

by jbradford

[permalink] [raw]
Subject: Re: Driverfs updates

> It seems to me the problem is in designing modules to unload, and saying
> "Then don't unload them" is not even a band-aid - it is willful
> ignorance. If there is a potential race condition unloading a module,
> then the module is BROKEN.

Agreed. Unloading is as fundamental as loading - especially as a lot of users load and unload modules as a, (bad), way to use two incompatible devices on one port. Once you introude a bloatule (I.E. module that can't be unloaded), that stops working. As more and more people start relying on the behavior, it gets to be more of a problem.

John.

2002-07-09 16:57:11

by Patrick Mochel

[permalink] [raw]
Subject: Re: Driverfs updates


On Tue, 9 Jul 2002, Keith Owens wrote:

> On Mon, 8 Jul 2002 11:41:52 -0700 (PDT),
> Patrick Mochel <[email protected]> wrote:
> >- Add struct module * owner field to struct device_driver
> >- Change {get,put}_driver to use it
>
> struct device_driver * get_driver(struct device_driver * drv)
> {
> if (drv && drv->owner)
> if (!try_inc_mod_count(drv->owner))
> return NULL;
> return drv;
> }
>
> is racy. The module can be unloaded after if (drv->owner) and before
> try_inc_mod_count. To prevent that race, drv itself must be locked
> around calls to get_driver().
>
> The "normal" method is to have a high level lock that controls the drv
> list and to take that lock in the register and unregister routines and
> around the call to try_inc_mod_count. drv->bus->lock is no good,
> anything that relies on reading drv without a lock or module reference
> count is racy. I suggest you add a global driverfs_lock.

This race really sucks.

Adding a high level lock is no big deal, but I don't think it will solve
the problem. Hopefully you can educate me a bit more.

If you add a driver_lock, you might have something like:

struct device_driver * d = NULL;

spin_lock(&driver_lock);
if (drv && drv->owner)
if (try_inc_mod_count(drv->owner))
d = drv;

spin_unlock(&driver_lock):
return d;

...but, what if someone has unloaded the module before you get to the if
statement? The memory for the module has been freed, including drv itself.

How do you protect against that? The simplest solutions, given the current
infrastructure, are:

- The BKL
- Not allowing module unload
- Ignoring it, and hoping it goes away

None of those solutions are ideal, though I don't have any bright ideas
off the top of my head.

-pat

2002-07-09 17:01:11

by Oliver Neukum

[permalink] [raw]
Subject: Re: Driverfs updates

Am Dienstag, 9. Juli 2002 13:08 schrieb Thunder from the hill:
> Hi,
>
> On Tue, 9 Jul 2002, Oliver Neukum wrote:
> > -It is slow.
>
> I wouldn't call it any fast when I think about the idea that 31 of my
> CPUs on Hawkeye shall be stopped because I unload a module. Sometimes at
> high noon my server (Hawkeye) can hardly keep up all the traffic. Just
> imagine a module would be unloaded then! That's the problem I'm having
> with it.
>
> What should make a lock for parts of the kernel slower than a lock for
> the _whole_ kernel?

Because you unload modules rarely, but you'd take the lock millions of times
in vain.

Regards
Oliver

2002-07-09 23:27:17

by Keith Owens

[permalink] [raw]
Subject: Re: Driverfs updates

On Tue, 9 Jul 2002 09:56:55 -0700 (PDT),
Patrick Mochel <[email protected]> wrote:
>On Tue, 9 Jul 2002, Keith Owens wrote:
>> struct device_driver * get_driver(struct device_driver * drv)
>> {
>> if (drv && drv->owner)
>> if (!try_inc_mod_count(drv->owner))
>> return NULL;
>> return drv;
>> }
>>
>> is racy. The module can be unloaded after if (drv->owner) and before
>> try_inc_mod_count. To prevent that race, drv itself must be locked
>> around calls to get_driver().
>>
>> The "normal" method is to have a high level lock that controls the drv
>> list and to take that lock in the register and unregister routines and
>> around the call to try_inc_mod_count. drv->bus->lock is no good,
>> anything that relies on reading drv without a lock or module reference
>> count is racy. I suggest you add a global driverfs_lock.
>
>This race really sucks.
>
>Adding a high level lock is no big deal, but I don't think it will solve
>the problem. Hopefully you can educate me a bit more.
>
>If you add a driver_lock, you might have something like:
>
> struct device_driver * d = NULL;
>
> spin_lock(&driver_lock);
> if (drv && drv->owner)
> if (try_inc_mod_count(drv->owner))
> d = drv;
>
> spin_unlock(&driver_lock):
> return d;

That code is not quite correct, you need something like this.

spin_lock(&driver_lock);
drv = scan_driver_list();
if (drv && drv->owner)
if (!try_inc_mod_count(drv->owner))
drv = NULL;
spin_unlock(&driver_lock); /* either failed or protected by use count */
if (drv && drv->open)
drv->open();

>...but, what if someone has unloaded the module before you get to the if
>statement? The memory for the module has been freed, including drv itself.

It is assumed that the module unload routine will call
driver_unregister() which will also take the driver_lock. The
interaction between driver_lock in your code and the unregister routine
via module unload, together with the interaction between unload_lock in
sys_delete_module and try_inc_mod_count will prevent races, provided
you code it right. And provided that your code that does
try_inc_mod_count is in built in code, not in the module itself.

An alternative to driver_lock is BKL, provided you do not have high
activity on your open routine.

open:
lock_kernel();
drv = scan_driver_list();
if (drv && drv->owner)
if (!try_inc_mod_count(drv->owner))
drv = NULL;
unlock_kernel();
if (drv && drv->open)
drv->open();

That works because sys_delete_module(), including all the module clean
up code, runs under BKL. The module_cleanup routine will unregister
from driver_list under BKL so it cannot race with the open code.

use:
drv->func(); /* protected by non-zero mod use count */

You only need to lock drv and do try_inc_mod_count() if the use count
might be 0 at the start of the call. IOW, you only need that code on a
drv->open() or equivalent function. Once the use count is non-zero,
the module is locked down; it is assumed that drv belongs to the module
and is also protected.

close:
if (drv->owner) /* protected by non-zero mod use count */
__MOD_INC_USE_COUNT(drv->owner);
drv->close();
if (drv->owner)
__MOD_DEC_USE_COUNT(drv->owner);

There is a very small race on close where the module does
MOD_DEC_USE_COUNT() which can take the count to 0 but the close routine
has not returned from the module yet. Bumping the use count around the
close call closes that race.

Preventing module unload races with the current infrastructure is
complex and fragile. But I've said that before :).

ps. I am going to be off the net for a few days.

2002-07-10 07:12:48

by jw schultz

[permalink] [raw]
Subject: Re: Driverfs updates

On Tue, Jul 09, 2002 at 07:20:40AM -0500, David D. Hagood wrote:
> I've been following this thread for some time, and one aspect of it
> disturbs me - the principle of symmetry.
>
> I've found that generally, in design thing should be symmetric - if you
> can turn a thing on, you could be able to turn it off, if you can heat a
> thing, you should be able to cool it, and if you can load a thing, you
> should be able to unload it.

I hope you know the difference between a woman and a light bulb...

Apologies to the linux chix and humor impaired. It was such
a big set-up it cried out for the punchline.

I actually agree somewhat with you. If you can't unload
modules the only reasons to have them are to get out of
building kernels, or build and reboot when you get new
hot-plug hardware. Laziness is a virtue, sloth is a sin,
the line betwen is hard to define.

--
________________________________________________________________
J.W. Schultz Pegasystems Technologies
email address: [email protected]

Remember Cernan and Schmitt

2002-07-10 19:36:36

by Pavel Machek

[permalink] [raw]
Subject: Re: Driverfs updates

Hi!

> > -It is slow.
>
> I wouldn't call it any fast when I think about the idea that 31 of my CPUs
> on Hawkeye shall be stopped because I unload a module. Sometimes at high
> noon my server (Hawkeye) can hardly keep up all the traffic. Just imagine
> a module would be unloaded then! That's the problem I'm having with it.
>
> What should make a lock for parts of the kernel slower than a lock for
> the _whole_ kernel?

Lock for the whole kernel has less impact over overall code, I
believe.
Pavel
--
Worst form of spam? Adding advertisment signatures ala sourceforge.net.
What goes next? Inserting advertisment *into* email?

2002-07-10 20:02:41

by Patrick Mochel

[permalink] [raw]
Subject: Re: Driverfs updates


> That code is not quite correct, you need something like this.
>
> spin_lock(&driver_lock);
> drv = scan_driver_list();
> if (drv && drv->owner)
> if (!try_inc_mod_count(drv->owner))
> drv = NULL;
> spin_unlock(&driver_lock); /* either failed or protected by use count */
> if (drv && drv->open)
> drv->open();

Ok, I'll buy that that works. However, it's terribly inefficient.

Taking a step back, there are at least three cases when you want to pin
the driver in memory:

- A user process is opening a driverfs file that the driver owns
- A subsystem is about to call into the driver
- The list of drivers is being iterated over (i.e. when a device gets
inserted and you need to attach a driver to it)

[ General speaking, this is true for all types of drivers (device, bus,
and class). Even more generally, for objects of any subsystem that can be
modular. To keep it rooted in reality, I'm just talking about device
drivers. ]

In each case, you have a pointer to the driver, which lives in the module.
You want to pin the driver, and hence the module, in memory. But, first
you have to verify that the pointer you have is still a valid pointer.

The method above requires the driver to be inserted in a global list, and
scan_driver_list() to look something like:

struct device_driver * scan_driver_list(struct device_driver * drv)
{
struct list_head * node;
list_for_each(node,&driver_list) {
struct device_driver * d = list_entry(node,struct device_driver,global_list);
if (d == drv)
return d;
}
return NULL;
}

That should work fine, and performance won't matter at all for driverfs
file open or individual call-ins. But, I'm concerned about the case when
we iterate over the list of drivers, like during device discovery. Esp. in
the case when we're discovering a lot of devices.

It would even affect us when we're iterating over the list of devices for
suspend, resume, and shutdown. Before we call any of the driver's
callbacks, we want to pin it, causing the entire driver list to be
iterated over.

So, while I agree with your solution, I wonder if there is a better way to
do it. The one idea that I've been flirting with in the last couple of
days to keep a persistant data structure in the kernel for every driver
that is loaded, modular or not.

When a module is loaded, and the driver is registered for the first time,
a small structure is allocated, which includes a name, a refcount maybe,
and a pointer to the driver itself.

Even if the driver is unregistered, this structure stays around. If there
are any any in-flight calls to increment the reference count on the
driver, they will always have a valid pointer to operate on, making
validation O(1) instead of O(n).

If the driver is loaded again, the same structure would be reused.

This means that for every driver loaded, memory usage would increase by
~20 bytes + strlen(name). This memory wouldn't be freed. However, given
the usage model I typically see, it shouldn't be much of a problem - when
I use modular drivers, I typically load them once and never unload them;
or if I do unload them, I reload the same ones later on. So, though we're
leaking memory, it's a small amount at a slow rate.

Besides, as drivers are removed, these structures could be placed on some
inactive list, which could be purged later on.

> You only need to lock drv and do try_inc_mod_count() if the use count
> might be 0 at the start of the call. IOW, you only need that code on a
> drv->open() or equivalent function. Once the use count is non-zero,
> the module is locked down; it is assumed that drv belongs to the module
> and is also protected.

But, you can't assume that it's non-zero in any case, unless you're in a
file operation in which you own the entire fops structure; i.e. you get
some open call in which you're guaranteed that the refcount is bumped up.
For driverfs files, there is one fops structure shared for _all_ files.
The only thing the drivers get are callbacks to read/write data.

> close:
> if (drv->owner) /* protected by non-zero mod use count */
> __MOD_INC_USE_COUNT(drv->owner);
> drv->close();
> if (drv->owner)
> __MOD_DEC_USE_COUNT(drv->owner);
>
> There is a very small race on close where the module does
> MOD_DEC_USE_COUNT() which can take the count to 0 but the close routine
> has not returned from the module yet. Bumping the use count around the
> close call closes that race.

That's not a problem, for the same reason - there is a shared release()
call for all files, and that exists outside the module.

-pat







2002-07-11 00:37:51

by John Alvord

[permalink] [raw]
Subject: Re: Driverfs updates

On Tue, 9 Jul 2002 09:56:55 -0700 (PDT), Patrick Mochel
<[email protected]> wrote:

>
>On Tue, 9 Jul 2002, Keith Owens wrote:
>
>> On Mon, 8 Jul 2002 11:41:52 -0700 (PDT),
>> Patrick Mochel <[email protected]> wrote:
>> >- Add struct module * owner field to struct device_driver
>> >- Change {get,put}_driver to use it
>>
>> struct device_driver * get_driver(struct device_driver * drv)
>> {
>> if (drv && drv->owner)
>> if (!try_inc_mod_count(drv->owner))
>> return NULL;
>> return drv;
>> }
>>
>> is racy. The module can be unloaded after if (drv->owner) and before
>> try_inc_mod_count. To prevent that race, drv itself must be locked
>> around calls to get_driver().
>>
>> The "normal" method is to have a high level lock that controls the drv
>> list and to take that lock in the register and unregister routines and
>> around the call to try_inc_mod_count. drv->bus->lock is no good,
>> anything that relies on reading drv without a lock or module reference
>> count is racy. I suggest you add a global driverfs_lock.
>
>This race really sucks.
>
>Adding a high level lock is no big deal, but I don't think it will solve
>the problem. Hopefully you can educate me a bit more.
>
>If you add a driver_lock, you might have something like:
>
> struct device_driver * d = NULL;
>
> spin_lock(&driver_lock);
> if (drv && drv->owner)
> if (try_inc_mod_count(drv->owner))
> d = drv;
>
> spin_unlock(&driver_lock):
> return d;
>
>...but, what if someone has unloaded the module before you get to the if
>statement? The memory for the module has been freed, including drv itself.
>
>How do you protect against that? The simplest solutions, given the current
>infrastructure, are:
>
>- The BKL
>- Not allowing module unload
>- Ignoring it, and hoping it goes away
>
>None of those solutions are ideal, though I don't have any bright ideas
>off the top of my head.

The only idea I can see is to have a single kernel-thread process
which would do each load/unload request serially on a single
processor.

john alvord