LinuxLists.cc - PATCH: (as177) Add class_device_unregister_wait() and platform_device_unregister

[permalink] [raw]

Subject: Re: PATCH: (as177) Add class_device_unregister_wait() and platform_device_unregister_wait() to the driver model core

On Fri, 23 Jan 2004, Alan Stern wrote:
>
> Since I haven't seen any progress towards implementing the
> class_device_unregister_wait() and platform_device_unregister_wait()
> functions, here is my attempt.

So why would this not deadlock?

The reason we don't wait on things like this is that it's basically
impossible not to deadlock.

There are damn good reasons why the kernel uses reference counting
everywhere. Any other approach is broken.

Linus

2004-01-23 18:03:35

by Alan Stern

[permalink] [raw]

Subject: Re: PATCH: (as177) Add class_device_unregister_wait() and platform_device_unregister_wait() to the driver model core

On Fri, 23 Jan 2004, Linus Torvalds wrote:

> On Fri, 23 Jan 2004, Alan Stern wrote:
> >
> > Since I haven't seen any progress towards implementing the
> > class_device_unregister_wait() and platform_device_unregister_wait()
> > functions, here is my attempt.
>
> So why would this not deadlock?

Kind of a general question, so I'll give a general answer. This wouldn't
deadlock for the same reason as anything else: People use it properly!

> The reason we don't wait on things like this is that it's basically
> impossible not to deadlock.

That's an exaggeration. There are places where it's _necessary_ to
wait. For example, consider this extract from a recent patch written by
Greg KH:

+ /* FIXME change this when the driver core gets the
+ * class_device_unregister_wait() call */
+ init_completion(&bus->released);
class_device_unregister(&bus->class_dev);
+ wait_for_completion(&bus->released);

For the full patch, see
http://marc.theaimsgroup.com/?l=linux-usb-devel&m=107109069106188&w=2

The general context is that a module is trying to unload, but it can't
until the release() callback for its device has finished.

> There are damn good reasons why the kernel uses reference counting
> everywhere. Any other approach is broken.

And sometimes a part of the kernel has to wait until the reference count
drops to 0. Instead of making everyone who needs to wait for a
class_device or a platform_device roll their own completions, this
provides a central facility.

By the way, adding class_device_unregister_wait() has an excellent
precedent. The driver model core _already_ includes
device_unregister_wait(), merged by Pat Mochel. Are you saying that
function shouldn't exist either?

Alan Stern

2004-01-23 18:11:55

[permalink] [raw]

Subject: Re: PATCH: (as177) Add class_device_unregister_wait() and platform_device_unregister_wait() to the driver model core

On Fri, Jan 23, 2004 at 09:42:09AM -0800, Linus Torvalds wrote:
>
>
> On Fri, 23 Jan 2004, Alan Stern wrote:
> >
> > Since I haven't seen any progress towards implementing the
> > class_device_unregister_wait() and platform_device_unregister_wait()
> > functions, here is my attempt.
>
> So why would this not deadlock?

It will deadlock if the user does something braindead like:
rmmod foo < /sys/class/foo_class/foo1/file

Now I know the network code can handle something like that, but they
have their own thread to handle issues like this... It's not sane to
make every driver subsystem do that...

So in short, it's used to make sure that all references are dropped,
before allowing the module to be unloaded.

And Alan, I think Pat already has this in his tree, if only he would
send that to Linus one of these days...

thanks,

greg k-h

2004-01-23 18:10:13

[permalink] [raw]

Subject: Re: PATCH: (as177) Add class_device_unregister_wait() and platform_device_unregister_wait() to the driver model core

On Fri, Jan 23, 2004 at 01:03:33PM -0500, Alan Stern wrote:
> The general context is that a module is trying to unload, but it can't
> until the release() callback for its device has finished.

... and if I redirect rmmod stdin from sysfs, we get what? Exactly.

2004-01-23 18:18:42

[permalink] [raw]

Subject: Re: PATCH: (as177) Add class_device_unregister_wait() and platform_device_unregister_wait() to the driver model core

On Fri, Jan 23, 2004 at 06:10:04PM +0000, [email protected] wrote:
> On Fri, Jan 23, 2004 at 01:03:33PM -0500, Alan Stern wrote:
> > The general context is that a module is trying to unload, but it can't
> > until the release() callback for its device has finished.
>
> ... and if I redirect rmmod stdin from sysfs, we get what? Exactly.

You get what you deserve :)

2004-01-23 18:15:57

[permalink] [raw]

Subject: Re: PATCH: (as177) Add class_device_unregister_wait() and platform_device_unregister_wait() to the driver model core

On Fri, 23 Jan 2004, Alan Stern wrote:
> >
> > So why would this not deadlock?
>
> Kind of a general question, so I'll give a general answer. This wouldn't
> deadlock for the same reason as anything else: People use it properly!

No.

> > The reason we don't wait on things like this is that it's basically
> > impossible not to deadlock.
>
> That's an exaggeration.

Not by much.

> There are places where it's _necessary_ to
> wait. For example, consider this extract from a recent patch written by
> Greg KH:
>
> + /* FIXME change this when the driver core gets the
> + * class_device_unregister_wait() call */
> + init_completion(&bus->released);
> class_device_unregister(&bus->class_dev);
> + wait_for_completion(&bus->released);
>
> For the full patch, see
> http://marc.theaimsgroup.com/?l=linux-usb-devel&m=107109069106188&w=2
>
> The general context is that a module is trying to unload, but it can't
> until the release() callback for its device has finished.

And in just about any circumstance, with the _possible_ exception of
module unload, things like this can be fooled into deadlocking by a
non-root user.

To the point where root can no longer fix things up.

We've had these bugs before. It's a mistake to make interfaces that
positively _encourage_ bugs like this.

The canonical example of a bug like this is when a regular user can
trigger an event that causes the wait, and then makes sure that it holds a
reference count on the event - by opening a file descriptor.

> And sometimes a part of the kernel has to wait until the reference count
> drops to 0.

Not likely. The rest of the kernel never does it.

What is it with USB that makes people think so? Remember all the USB bugs
early on that were due to _exactly_ that thinking.

It's wrong. YOU SHOULD NEVER WAIT FOR THE REFERNCE COUNT TO DROP TO ZERO!

You just ignore it. With proper memory management it doesn't matter.

For module unload, it's likely better to return an error than to wait.
Tell the super-user that you're busy.

> By the way, adding class_device_unregister_wait() has an excellent
> precedent. The driver model core _already_ includes
> device_unregister_wait(), merged by Pat Mochel. Are you saying that
> function shouldn't exist either?

Quite probably.

Linus

2004-01-23 18:19:22

[permalink] [raw]

Subject: Re: PATCH: (as177) Add class_device_unregister_wait() and platform_device_unregister_wait() to the driver model core

On Fri, 23 Jan 2004, Greg KH wrote:
> >
> > So why would this not deadlock?
>
> It will deadlock if the user does something braindead like:
> rmmod foo < /sys/class/foo_class/foo1/file

I don't much worry about things like that, since only root can rmmod
anyway.

HOWEVER - I do worry when people start exporting interfaces that are
basically _designed_ to deadlock. It's a bad interface. Don't export it.
There is possibly just _one_ place that can do it, and it's the module
unload part. Everything else would be a bug.

So do it in the one place. Don't make a function that does it and that
others will start using because it's "simple".

Linus

2004-01-23 18:27:13

[permalink] [raw]

Subject: Re: PATCH: (as177) Add class_device_unregister_wait() and platform_device_unregister_wait() to the driver model core

On Fri, Jan 23, 2004 at 10:19:15AM -0800, Linus Torvalds wrote:
>
>
> On Fri, 23 Jan 2004, Greg KH wrote:
> > >
> > > So why would this not deadlock?
> >
> > It will deadlock if the user does something braindead like:
> > rmmod foo < /sys/class/foo_class/foo1/file
>
> I don't much worry about things like that, since only root can rmmod
> anyway.

Yeah, that's why I didn't care very much either.

> HOWEVER - I do worry when people start exporting interfaces that are
> basically _designed_ to deadlock. It's a bad interface. Don't export it.
> There is possibly just _one_ place that can do it, and it's the module
> unload part. Everything else would be a bug.
>
> So do it in the one place. Don't make a function that does it and that
> others will start using because it's "simple".

Ok, fair enough. I'll make up a patch to remove that other
"unregister_wait" function in the driver core.

thanks,

greg k-h

2004-01-23 18:31:13

[permalink] [raw]

Subject: Re: PATCH: (as177) Add class_device_unregister_wait() and platform_device_unregister_wait() to the driver model core

On Fri, Jan 23, 2004 at 10:15:36AM -0800, Linus Torvalds wrote:
>
> What is it with USB that makes people think so? Remember all the USB bugs
> early on that were due to _exactly_ that thinking.

Oh yeah, I remember :(

> It's wrong. YOU SHOULD NEVER WAIT FOR THE REFERNCE COUNT TO DROP TO ZERO!
>
> You just ignore it. With proper memory management it doesn't matter.
>
> For module unload, it's likely better to return an error than to wait.
> Tell the super-user that you're busy.

That patch that Alan pointed to is wrong anyway, it dies a horrible
death, and was only a hack to try something.

I think it might be a better idea to just prevent the module unload of
the USB host controller until all drivers bound to devices owned by it
are also unbound in order to fix these kinds of problems.

thanks,

greg k-h

2004-01-23 19:45:52

[permalink] [raw]

Subject: Re: PATCH: (as177) Add class_device_unregister_wait() and platform_device_unregister_wait() to the driver model core

On Fri, Jan 23, 2004 at 10:11:06AM -0800, Greg KH wrote:
> On Fri, Jan 23, 2004 at 09:42:09AM -0800, Linus Torvalds wrote:
> >
> >
> > On Fri, 23 Jan 2004, Alan Stern wrote:
> > >
> > > Since I haven't seen any progress towards implementing the
> > > class_device_unregister_wait() and platform_device_unregister_wait()
> > > functions, here is my attempt.
> >
> > So why would this not deadlock?
>
> It will deadlock if the user does something braindead like:
> rmmod foo < /sys/class/foo_class/foo1/file
>
> Now I know the network code can handle something like that, but they
> have their own thread to handle issues like this... It's not sane to
> make every driver subsystem do that...

Network code does *NOT* wait for references to kobject to disappear.
->release() for those buggers is not in a module and freeing can
happen way after the rmmod. No threads involved.

What it does wait for is different - in effect, net_device has a special-cased
rwsem in it, heavily optimised for down_read(). dev_hold() is an equivalent
of down_read(). dev_put() - up_read(). And unregister_netdev() does an
equivalent of down_write() after removing from all search structures.

That's what it is waiting for - it wants all readers (semaphore is unfair)
to go away. It's not a refcounting issue at all - it's an exclusion around
the net_device shutdown, freeing resources, etc.

There was an optimistic attempt to turn that animal into sysfs refcount.
It didn't work - we had to add real refcounting there and make sysfs code
do essentially an equivalent of down_try_read() before accessing the guts
of net_device(). And we have ->release() for those buggers in core kernel.

2004-01-25 17:32:19

by Alan Stern

[permalink] [raw]

Subject: Re: PATCH: (as177) Add class_device_unregister_wait() and platform_device_unregister_wait() to the driver model core

On Fri, 23 Jan 2004, Linus Torvalds wrote:

> HOWEVER - I do worry when people start exporting interfaces that are
> basically _designed_ to deadlock. It's a bad interface. Don't export it.
> There is possibly just _one_ place that can do it, and it's the module
> unload part. Everything else would be a bug.

You know, this problem and other similar ones related to module unloading
would go away if modules behaved more like other kernel objects:

if they could be unregistered (rmmod) at any time, regardless
of their refcount;

if they would persist in an unregistered state, until they were
released (unloaded from memory) when the refcount dropped to 0.

Then it wouldn't be necessary for a module's exit routine to wait when
unregistering devices. A device could simply hold a reference to the
module; when the device was released and things were safe the module would
be unloaded. The one exception would no longer exist.

Is there some reason why modules don't work like this?

Alan Stern

2004-01-25 19:03:06

[permalink] [raw]

Subject: Re: PATCH: (as177) Add class_device_unregister_wait() and platform_device_unregister_wait() to the driver model core

On Sun, 25 Jan 2004, Alan Stern wrote:
>
> Is there some reason why modules don't work like this?

There's a few:

- pain. pain. pain.

- doing proper refcounting of modules is _really_ really hard. The reason
is that proper refcounting is a "local" issue: you reference count a
single data structure. It's basically impossible to make a "global"
reference count without jumping through hoops.

- lack of testing. Unloading a module happens once in a blue moon, if
even then.

The proper thing to do (and what we _have_ done) is to say "unloading of
modules is not supported". It's a debugging feature, and you literally
shouldn't do it unless you are actively developing that module.

Sadly, some modules are broken. Old 16-bit PCMCIA in particular _depends_
on unloading modules, since the old PCMCIA layer doesn't do hotplug: it
literally thinks of module load/unload as the "plug/unplug" event.

But it basically boils down to: don't think of module unload as a "normal
event". It isn't. Getting it truly right is (a) too painful and (b) would
be too slow, so we're not even going to try.

(As an example of "too painful, too slow", think of something like a
packet filter module. You'd literally have to increment the count in every
part that gets a packet, and decrement the count at every point where it
lets the packet go. And since it would have to be SMP-safe, it would have
to be a locked cycle, or we'd have to have per-CPU counters - at which
point you now also have to worry about things like preemption and
sleeping, which just means that it would be a _lot_ of very fragile code).

Linus

2004-01-25 20:21:43

[permalink] [raw]

Subject: Re: PATCH: (as177) Add class_device_unregister_wait() and platform_device_unregister_wait() to the driver model core

On Sun, Jan 25, 2004 at 11:02:58AM -0800, Linus Torvalds wrote:
> The proper thing to do (and what we _have_ done) is to say "unloading of
> modules is not supported". It's a debugging feature, and you literally
> shouldn't do it unless you are actively developing that module.
>
> Sadly, some modules are broken. Old 16-bit PCMCIA in particular _depends_
> on unloading modules, since the old PCMCIA layer doesn't do hotplug: it
> literally thinks of module load/unload as the "plug/unplug" event.
>
> But it basically boils down to: don't think of module unload as a "normal
> event". It isn't. Getting it truly right is (a) too painful and (b) would
> be too slow, so we're not even going to try.
>
> (As an example of "too painful, too slow", think of something like a
> packet filter module. You'd literally have to increment the count in every
> part that gets a packet, and decrement the count at every point where it
> lets the packet go. And since it would have to be SMP-safe, it would have
> to be a locked cycle, or we'd have to have per-CPU counters - at which
> point you now also have to worry about things like preemption and
> sleeping, which just means that it would be a _lot_ of very fragile code).

Packet filter is hardly a normal module. For absolute majority of modules
it's nowhere near that bad.

HOWEVER, module unload is not the real problem. We have objects with
limited lifetimes. Always had, always will. Whether we remove pieces
of .text from the in-core kernel or not, we must be able to deal with
that. Even if methods themselves are present, they won't do you any
good when data structures belonging to object are destroyed.

If that is handled right, rmmod is trivial for 99% of modules. The
rest (including Rusty's stuff) can simply say "we can't be unloaded
at all" and be done with that.

Basically, "protect the module" is wrong - it should be "protect specific
object" and we need that anyway. We already have that for the largest
class of modules - 300-odd netdev ones. We have that for filesystems.
We have that for block devices. We have infrastructure for doing that
to character devices, ttys and ldiscs. Which leaves us with truly weird
stuff that doesn't work in such terms and yes, there your arguments
apply.

Frankly, I'm much more concerned about the stuff that _can't_ be disabled.
You can disable module unloading; hell, you can disable modules completely.
You can't disable ifdown(8). Which currently means very bad things for
stuff in /proc/sys/net/ipv4/{conf,neigh}/*. And all arguments re rmmod
deadlocks apply to that sort of situations...

2004-01-25 23:13:32

by Steve Youngs

[permalink] [raw]

Subject: Re: PATCH: (as177) Add class_device_unregister_wait() and platform_device_unregister_wait() to the driver model core

Attachments:

(No filename) (2.89 kB)
(No filename) (256.00 B)
Download all attachments

2004-01-26 03:13:00

by Adam Kropelin

[permalink] [raw]

Subject: Re: PATCH: (as177) Add class_device_unregister_wait() and platform_device_unregister_wait() to the driver model core

On Mon, Jan 26, 2004 at 09:12:58AM +1000, Steve Youngs wrote:
> * Linus Torvalds <[email protected]> writes:
>
> > - doing proper refcounting of modules is _really_ really
> > hard. The reason is that proper refcounting is a "local"
> > issue: you reference count a single data structure. It's
> > basically impossible to make a "global" reference count
> > without jumping through hoops.
>
> Please understand that I coming from an _extremely_ naive perspective,
> but why do refcounting at all? Couldn't the refcount be a simple
> boolean?

A boolean is just a one-bit reference count. If the maximum number of
simultaneous 'users' for a given module is one, then a boolean will work.
If there is potential for more than one simultaneous user then you need
more bits.

Either way, it doesn't simplify the problem.

> I see the process working along these lines: When a module is loaded
> into the kernel it (the module) exports a symbol (a function) that the
> kernel can use for determining whether or not the module is still in
> use.

And how will the module know when it is or is not "in use"? By keeping
a count of the number of current users, of course.

> > - lack of testing.
>
> A moot point once the kernel can safely and efficiently do module
> unloading.

I don't follow your logic. Once it works we don't have to test it so
therefore we never need to test it? I don't buy the premise or the
conclusion.

> > Unloading a module happens once in a blue moon, if even then.
>
> We get an awful lot of blue moons here.

This moon's not worth barking at.

> > But it basically boils down to: don't think of module unload as a "normal
> > event". It isn't. Getting it truly right is (a) too painful and (b) would
> > be too slow, so we're not even going to try.
>
> Now there's a cop out if ever I saw one. Surely, Linus, you've
> overcome _much_ bigger problems than this at different times.

Linus can of course speak for himself but from my perspective it's just
a simple cost/benefit analysis. This one's just not worth any more toil.
Several extremely bright people have tackled the problem and eventually
concluded it's extremely hard to solve and generally not worth the
trouble. It's time to let go.

--Adam

2004-01-26 05:08:11

by Steve Youngs

[permalink] [raw]

Subject: Re: PATCH: (as177) Add class_device_unregister_wait() and platform_device_unregister_wait() to the driver model core

Attachments:

(No filename) (3.85 kB)
(No filename) (256.00 B)
Download all attachments

2004-01-26 05:21:40

by Valdis Klētnieks

[permalink] [raw]

Subject: Re: PATCH: (as177) Add class_device_unregister_wait() and platform_device_unregister_wait() to the driver model core

On Mon, 26 Jan 2004 15:06:48 +1000, Steve Youngs <[email protected]> said:

> > A boolean is just a one-bit reference count. If the maximum number of
> > simultaneous 'users' for a given module is one, then a boolean will work.
> > If there is potential for more than one simultaneous user then you need
> > more bits.
>
> Why? A module is either being used or it isn't, the number of uses
> shouldn't even come into it.

OK. There's 2 users of the module. The first one exits. How does it (or
anything else) know that it's NOT safe to just clear the in-use bit and clean
it up?

And how does the second one know it IS safe to clean up?

Attachments:

(No filename) (226.00 B)

2004-01-26 05:55:21

by Steve Youngs

[permalink] [raw]

Subject: Re: PATCH: (as177) Add class_device_unregister_wait() and platform_device_unregister_wait() to the driver model core

Attachments:

(No filename) (1.35 kB)
(No filename) (256.00 B)
Download all attachments

2004-01-26 06:09:44

[permalink] [raw]

Subject: Re: PATCH: (as177) Add class_device_unregister_wait() and platform_device_unregister_wait() to the driver model core

On Fri, 23 Jan 2004 10:11:06 -0800
Greg KH <[email protected]> wrote:

> On Fri, Jan 23, 2004 at 09:42:09AM -0800, Linus Torvalds wrote:
> >
> >
> > On Fri, 23 Jan 2004, Alan Stern wrote:
> > >
> > > Since I haven't seen any progress towards implementing the
> > > class_device_unregister_wait() and platform_device_unregister_wait()
> > > functions, here is my attempt.
> >
> > So why would this not deadlock?
>
> It will deadlock if the user does something braindead like:
> rmmod foo < /sys/class/foo_class/foo1/file

Um, if module foo exports a class, then why doesn't opening the class file
bump the owner count so the above will fail?

If you want to safely remove parts of the kernel, you have to maintain
reference counts. At least with any sane scheme I've seen.

I know, I should go read the code...
Rusty.
--
there are those who do and those who hang on and you don't see too
many doers quoting their contemporaries. -- Larry McVoy

2004-01-26 06:25:19

by Valdis Klētnieks

[permalink] [raw]

Subject: Re: PATCH: (as177) Add class_device_unregister_wait() and platform_device_unregister_wait() to the driver model core

On Mon, 26 Jan 2004 15:55:06 +1000, Steve Youngs <[email protected]> said:

> Because the 2nd user is still using the module so its in-use bit
> should still be set. Remember that when the module was first loaded
> it registered a function with the kernel for testing whether the
> module is in use.

Anybody who's ever been in a bathroom stall and somebody turned off the
lights on their way out will intuitively understand why you need a reference
count and not an in-use bit,

Attachments:

(No filename) (226.00 B)

2004-01-26 08:35:42

by Helge Hafting

[permalink] [raw]

Subject: Re: PATCH: (as177) Add class_device_unregister_wait() and platform_device_unregister_wait() to the driver model core

Steve Youngs wrote:
> * Valdis Kletnieks <[email protected]> writes:
>
> > On Mon, 26 Jan 2004 15:06:48 +1000, Steve Youngs <[email protected]> said:
> >> > A boolean is just a one-bit reference count. If the maximum number of
> >> > simultaneous 'users' for a given module is one, then a boolean will work.
> >> > If there is potential for more than one simultaneous user then you need
> >> > more bits.
> >>
> >> Why? A module is either being used or it isn't, the number of uses
> >> shouldn't even come into it.
>
> > OK. There's 2 users of the module. The first one exits. How does
> > it (or anything else) know that it's NOT safe to just clear the
> > in-use bit and clean it up?
>
> Because the 2nd user is still using the module so its in-use bit
> should still be set. Remember that when the module was first loaded
> it registered a function with the kernel for testing whether the
> module is in use.

Are you talking about an in-use bit _per module_ or an in-use bit _per user_ ???
Either way has some problems:

In-use bit per module:
This is what everybody thought yopu were talking about up until now.
The problem above is real: There are 2 (or more) users. One exits.
How does the code know that it can't clear the module's in-use bit?
This one user doesn't know about the other users - how could it?
And there is no "number of users" anywhere because you said you
didn't want that. (The number would be the refcount you tries to get rid of)
Problems, and you haven't provided a solution yet.

Possible solution: Let each user know about all the others so it
can check wether it is ok to clear the in-use bit. This is unworkable!

In-use bit per user:
Your comment about the second users in-use bit being set seems to imply
that you want an in-use bit per module user. (instead of per module).
The rule now becomes "no unloading as long as there is at least one
in-use bit set for this module." There may be several such bits.
This seems straightforward enough until you try to manage such
a set of bits. I might try to unload a module: The unload code checks
the first user bit - it is not set. So that user isn't using the module
now. Fine. Check the next bit - not in use there either. And so on.
It is a lengthy procedure, which is dangerous. Think SMP or preempt:
The unload function are checking the last in-use bits (after first checking the first ones
and finding them all zeroed), and then the first
module user sets his bit because he starts using the module again.
This is what we call a "race". The module unloader comes to the wrong
conclusion - that the module isn't in use - because a user came in and
used it _while_ the bits were being counted.

Solutions:
1. Use locks for manipulating the multiple in-use bits. Too slow.
2. Use refcounting instead. Better, because the count can be changed and tested
atomically, avoiding the above mentioned race _and_ the slow locking.
There are some cases where even this is too slow, but such a module could
be made "not unloadable". To take Linus' example of a packet filter
module: It cannot be unloaded on its own because an interrupt can
make use of it anytime, making refcounting too expensive.
Unloading it is possible however, as part
of unloading the entire TCP/IP module as a whole. You may then
reload tcp/ip without packet filtering if you so wish...

>
> I must be overlooking something
Sure.

> because I see the answer so clearly.
> Maybe if someone could give me a real world example of a situation
> where it'd be hard/impossible/unsafe to unload a module and I'll see
> if my ideas can be applied.

Try the above. Or look at actual code people have problems with. Run tests
(on SMP) and get some surprises. Basically, the surprises revolve
around several things happening simultaneously.

Note that many modules (but not all of the existing ones) _can_ be made safely unloadable.
Those that can do it simply don't bother - because they have more important
and/or interesting work to do. Feel free to volunteer for fixing things . . .

Helge Hafting

2004-01-26 15:40:56

by Adam Kropelin

[permalink] [raw]

Subject: Re: PATCH: (as177) Add class_device_unregister_wait() and platform_device_unregister_wait() to the driver model core

On Mon, Jan 26, 2004 at 03:06:48PM +1000, Steve Youngs wrote:
> * Adam Kropelin <[email protected]> writes:
>
> > On Mon, Jan 26, 2004 at 09:12:58AM +1000, Steve Youngs wrote:
> >> * Linus Torvalds <[email protected]> writes:
> >>
> >> > - doing proper refcounting of modules is _really_ really
> >> > hard. The reason is that proper refcounting is a "local"
> >>
> >> Please understand that I coming from an _extremely_ naive perspective,
> >> but why do refcounting at all? Couldn't the refcount be a simple
> >> boolean?
>
> > A boolean is just a one-bit reference count. If the maximum number of
> > simultaneous 'users' for a given module is one, then a boolean will work.
> > If there is potential for more than one simultaneous user then you need
> > more bits.
>
> Why? A module is either being used or it isn't, the number of uses
> shouldn't even come into it.

I think others in the thread have adequately explained the details
you're missing.

> I'm suggesting that the responsibility for determining when it is safe
> to unload a particular module should lay with the module itself and
> not with the kernel.

That does not change anything at all. The same problems apply and the
same pool of solutions is still available. The easy cases are still easy
and the hard cases are still *really* hard.

> >> > - lack of testing.
> >>
> >> A moot point once the kernel can safely and efficiently do module
> >> unloading.
>
> > I don't follow your logic. Once it works we don't have to test it so
> > therefore we never need to test it?
>
> Possibly a poor choice of words on my part. What I meant was that
> once the functionality goes into the kernel testing will happen on
> every single Linux box in the land that has this future kernel. Some
> of those users will report bugs if there are any. And some of those
> users may even help to fix those bugs.

Hello? 2.4, etc. support removing modules. Linus was speaking from
experience. One of the reasons module removal is perpetually broken in
subtle ways on those kernels is that it simply does not receive enough
testing. Having some new implementation on 2.6 doesn't change that.

> Also what I meant is that you can't test something that doesn't
> exist.

As I pointed out above, it does exist.

> >> We get an awful lot of blue moons here.
>
> > This moon's not worth barking at.
>
> There are an awful lot of users out there who would disagree with you
> on that. _One_ of the reasons that I believe that this moon is worth
> barking at is:
>
> If a module should never, in the normal course of events, be unloaded,
> then there isn't _any_ point to being able to load them in the first
> place. Not being able to unload them _totally_ defeats the purpose of
> modules.

Think a little harder and you'll see why that's a ridiculous conclusion.
Hint #1: Distributions. Hint #2: SILO.

> Tell me that this problem is _impossible_ to solve and providing you
> can show me _why_ it is impossible I'll speak no more on this
> matter.

No one yet has claimed this is "impossible" to solve, only that it's not
worth solving. Clearly it's important to you, so have at it. Further
discussion of how you "see the answer so clearly" and the rest of us are
"not trying hard enough" should come in the form of patches.

--Adam

2004-01-26 15:51:18

by Alan Stern

[permalink] [raw]

Subject: Re: PATCH: (as177) Add class_device_unregister_wait() and platform_device_unregister_wait() to the driver model core

On Mon, 26 Jan 2004, Rusty Russell wrote:

> If you want to safely remove parts of the kernel, you have to maintain
> reference counts. At least with any sane scheme I've seen.
>
> I know, I should go read the code...
> Rusty.

The problem I raised originally is that in many cases the reference count
can't be maintained properly without preventing the module from ever
exiting at all. The difficulty is that the reference count won't go
down to 0 until the module code deregisters itself from some list. But
the deregistration only happens within the module's exit routine!

A two-step exit process, like that used for kobjects, would avoid this
difficulty. During the first step the module would deregister itself.
The second step, unloading from memory, would occur when the reference
count was 0.

Contrary what Linus said, in many modules it's not necessary to update the
module's reference count with every single transaction (packet or
whatever) that goes through. Usually a single registration event, or just
a small handful, is critical for unloading.

A good case in point is the one that led to the start of this thread. A
USB host controller driver registers a USB bus that it will manage, and
from then on it handles lots of USB packets. It's not necessary to update
the driver module's reference count with every packet; all that matters is
that the module has to wait for the bus to be released before it can be
unloaded from memory. (That's because existing mechanisms cause each
packet to hold a reference to a USB device and the device holds a
reference to the bus.) For lack of any other way to avoid exiting early
we simply have to wait for the bus's release callback to finish -- not
waiting will cause a kernel panic if the module unloads before the release
method runs.

Furthermore, in other cases where it _is_ necessary to update a reference
count with every packet, it's not necessary that doing so involve a lot of
additional overhead such as acquiring a lock of some sort. If some
driver, like a network interface driver, is managing lots of packets then
it must _already_ be using a lock to keep track of things like the total
number of outstanding packets. Any extra work could be done under the
protection of this pre-existing lock and would involve minimal overhead.

One aspect of what Linus wrote is absolutely right, however: Getting this
to work right, for all the loadable kernel modules, would be quite
difficult. Here's one way to attack that, an incremental approach.

Create a new module entry point, the module_unreg routine. For all
existing modules this entry point would be undefined and hence not used.
The module_unreg routine is called to start the deregistration process,
invoked say by some special flag to rmmod. The module_exit routine would
then be called when an unregistered module's reference count dropped to 0.
Existing modules would experience exactly the same sequence of events as
they do now, and newly-written modules could take advantage of the extra
mechanism.

Admittedly, this is just a theoretical exercise. Linus said that module
unloading should basically be unsupported. I have no doubt that making it
even more complicated, like this, is not something he would approve of.

Alan Stern

2004-01-26 16:23:03

[permalink] [raw]

Subject: Re: PATCH: (as177) Add class_device_unregister_wait() and platform_device_unregister_wait() to the driver model core

Hi,

On Sun, 25 Jan 2004, Linus Torvalds wrote:

> - doing proper refcounting of modules is _really_ really hard. The reason
> is that proper refcounting is a "local" issue: you reference count a
> single data structure. It's basically impossible to make a "global"
> reference count without jumping through hoops.

IOW module reference counts are evil, no matter what you do it will always
add more overhead. I fully agree with Al, once we have proper per object
protection it's rather easy to build the module infrastructure on top of
it. I only want to add a bit what it means practically.
For example pci drivers currently do something like:

int init()
{
if (pci_register_driver(drv) < 0)
pci_unregister_driver(drv);
}

void exit()
{
pci_unregister_driver(drv);
}

All this is done without a module count, this means that
pci_unregister_driver() cannot return before the last reference is gone.
For network devices this is not that much of a problem, as they can be
rather easily deconfigured automatically, but that's not that easy for
mounted block devices, so one has to be damned careful when to call the
exit function.
To prevent module unloading we usually added a reference to the module, so
the generic module code knows, when it's safe to unload the module. OTOH
with the driver object we can easily tell without any additional overhead
whether the driver is busy, we only have to find a way to tell that the
generic module code.
Technically there are a number of ways to do this, but practically we have
to decide which module unload semantics we want to have and how easy it
should be to upgrade the drivers to this.
We are now in the situation, where we should decide whether we want to
continue the status quo and module unloading stays a PITA or we fix it
properly once and for all, but that requires changing every single driver.
For a large number of drivers (all which Al mentioned) the updates should
be rather staightforward, the rest would simply be not unloadable, unless
they fix their interfaces.

bye, Roman

2004-01-27 06:56:45

[permalink] [raw]

Subject: Re: PATCH: (as177) Add class_device_unregister_wait() and platform_device_unregister_wait() to the driver model core

On Sun, 25 Jan 2004 11:02:58 -0800 (PST)
Linus Torvalds <[email protected]> wrote:

> On Sun, 25 Jan 2004, Alan Stern wrote:
> >
> > Is there some reason why modules don't work like this?
>
> There's a few:
>
> - pain. pain. pain.
>
> - doing proper refcounting of modules is _really_ really hard. The reason
> is that proper refcounting is a "local" issue: you reference count a
> single data structure. It's basically impossible to make a "global"
> reference count without jumping through hoops.
>
> - lack of testing. Unloading a module happens once in a blue moon, if
> even then.

And modules do work like you proposed, if you use "rmmod --wait".

Doing proper refcounting is actually fairly easy: every function pointer
has an associated reference count (or pointer to the module containing
the refcount).

But how much pain are you prepared to put up with to have a pseudo-pagable
kernel?

> (As an example of "too painful, too slow", think of something like a
> packet filter module. You'd literally have to increment the count in every
> part that gets a packet, and decrement the count at every point where it
> lets the packet go. And since it would have to be SMP-safe, it would have
> to be a locked cycle, or we'd have to have per-CPU counters - at which
> point you now also have to worry about things like preemption and
> sleeping, which just means that it would be a _lot_ of very fragile code).

Actually, this is already handled. The module reference counts are per-cpu
and don't contain any barriers. We go to an *awful* lot of pain on remove
to synchronize, but as Linus says, it's not the normal case.

Since we hit the (atomic_t) ref to the devices on every packet, bumping
the refcount on the module is lost in the noise.

But Dave doesn't want to do it: it makes the code uglier and painful.

Cheers,
Rusty.
--
there are those who do and those who hang on and you don't see too
many doers quoting their contemporaries. -- Larry McVoy

2004-01-27 06:56:41

[permalink] [raw]

Subject: Re: PATCH: (as177) Add class_device_unregister_wait() and platform_device_unregister_wait() to the driver model core

On Sun, 25 Jan 2004 20:21:37 +0000
[email protected] wrote:

> Basically, "protect the module" is wrong - it should be "protect specific
> object" and we need that anyway.

Agreed. You're oversimplifying a little, though.

In this model, the object here is the function text. So if you hand out
a pointer to the function text, you need to hold a refcount.

BUT since the module itself is the only one which can hand these out,
and it unregisters everything it has registered, and all those references
fall to zero, it's trivial to prove that there are no more references to
the module functions.

This (as Al points out by referring to lifetime) is the same problem if you
want to kfree() the thing you've registered: either deregistration is
synchronous or it supplies a callback which does the actual kfree. And most
registration interfaces in the kernel are headed towards this model, and
it can be pressed into service for module removal as well.

Hope that clarifies,
Rusty.
--
there are those who do and those who hang on and you don't see too
many doers quoting their contemporaries. -- Larry McVoy

2004-01-27 13:56:43

[permalink] [raw]

Subject: Re: PATCH: (as177) Add class_device_unregister_wait() and platform_device_unregister_wait() to the driver model core

Hi,

On Tue, 27 Jan 2004, Rusty Russell wrote:

> [email protected] wrote:
>
> > Basically, "protect the module" is wrong - it should be "protect specific
> > object" and we need that anyway.
>
> Agreed. You're oversimplifying a little, though.
>
> In this model, the object here is the function text. So if you hand out
> a pointer to the function text, you need to hold a refcount.
>
> BUT since the module itself is the only one which can hand these out,
> and it unregisters everything it has registered, and all those references
> fall to zero, it's trivial to prove that there are no more references to
> the module functions.

There is a very simple rule: If two or more objects have the same
lifetime, they all only need to be protected once.
So if you have two static objects (e.g. a static structure with a
reference to a function), the module count is indeed enough to protect
both objects. The problem starts when you mix dynamic and static objects,
then the module count is not enough anymore and you have to use proper
refcounts, but the generic module code currently has no decent way to get
to that information, so we have to maintain the module count additionally
to the refcount. Now we only have to find a way to utilize the object
protection to also protect the module it refers to and you can get rid of
the module count.
Rusty, you need to get away from this idea, that every single function
pointer must be protected by itself, it can also be protected via another
object, the real problem is that the generic module code doesn't know
anything about this. Fixing this requires changing every single module,
but in the end it would be worth it, as it avoids the duplicated
protection and we had decent module unload semantics.

bye, Roman

2004-01-27 19:33:18

by Russell King

[permalink] [raw]

Subject: Re: PATCH: (as177) Add class_device_unregister_wait() and platform_device_unregister_wait() to the driver model core

On Mon, Jan 26, 2004 at 05:22:41PM +0100, Roman Zippel wrote:
> For example pci drivers currently do something like:
>
> int init()
> {
> if (pci_register_driver(drv) < 0)
> pci_unregister_driver(drv);
> }
>
> void exit()
> {
> pci_unregister_driver(drv);
> }

I'd like to take this opportunity to mention that the above is buggy
as written. If pci_register_driver() fails, the device_driver structure
is not registered, and therefore pci_unregister_driver() may cause
Bad Things(tm) to happen.

(and yes, pci_module_init() is buggy as it currently stands, and I
believe GregKH has a patch in his queue from the stability freeze
from yours truely to fix it.)

--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of: 2.6 PCMCIA - http://pcmcia.arm.linux.org.uk/
2.6 Serial core

2004-01-27 20:29:56

[permalink] [raw]

Subject: Re: PATCH: (as177) Add class_device_unregister_wait() and platform_device_unregister_wait() to the driver model core

On Tue, Jan 27, 2004 at 07:32:28PM +0000, Russell King wrote:
>
> (and yes, pci_module_init() is buggy as it currently stands, and I
> believe GregKH has a patch in his queue from the stability freeze
> from yours truely to fix it.)

Yes I still have it, I need to dig that out and send it onward... Sorry
about the delay.

greg k-h

2004-01-27 20:30:00

[permalink] [raw]

Subject: Re: PATCH: (as177) Add class_device_unregister_wait() and platform_device_unregister_wait() to the driver model core

On Mon, Jan 26, 2004 at 05:22:41PM +0100, Roman Zippel wrote:
>
> All this is done without a module count, this means that
> pci_unregister_driver() cannot return before the last reference is gone.
> For network devices this is not that much of a problem, as they can be
> rather easily deconfigured automatically, but that's not that easy for
> mounted block devices, so one has to be damned careful when to call the
> exit function.

Um, not anymore. I can yank out a mounted block device and watch the
scsi core recover just fine. The need to make everything hotpluggable
has fixed up a lot of issues like this (as well as made more
problems...)

thanks,

greg k-h

2004-01-28 00:57:56

[permalink] [raw]

Subject: Re: PATCH: (as177) Add class_device_unregister_wait() and platform_device_unregister_wait() to the driver model core

In message <Pine.LNX.4.58.0401271142510.7855@serv> you write:
> Hi,

Hi Roman!

> Fixing this requires changing every single module, but in the end it
> would be worth it, as it avoids the duplicated protection and we had
> decent module unload semantics.

And I still disagree. <shrug>

If it's any consolation, I don't plan any significant module work in
2.7. If you want to work on this, you're welcome to it. Perhaps you
can convince Linus et al that it's worth the pain?

Cheers,
Rusty.
--
Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

2004-01-28 00:57:54

[permalink] [raw]

Subject: Re: PATCH: (as177) Add class_device_unregister_wait() and platform_device_unregister_wait() to the driver model core

In message <[email protected]> you write:
> Create a new module entry point, the module_unreg routine. For all
> existing modules this entry point would be undefined and hence not used.

Just use the notifier, which already exists, just needs a few more
points.

Cheers!
Rusty.
--
Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

Name: More Module Notifiers
Author: Rusty Russell
Status: Trivial

D: Put in the rest of the module notifiers, esp. one when a module is being
D: removed.

diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal linux-2.6.0-test5-bk2/include/linux/module.h working-2.6.0-test5-bk2-nat-expect/include/linux/module.h
--- linux-2.6.0-test5-bk2/include/linux/module.h 2003-07-31 01:50:19.000000000 +1000
+++ working-2.6.0-test5-bk2-nat-expect/include/linux/module.h 2003-09-21 15:24:09.000000000 +1000
@@ -180,6 +180,7 @@ enum module_state
MODULE_STATE_LIVE,
MODULE_STATE_COMING,
MODULE_STATE_GOING,
+ MODULE_STATE_GONE, /* Only for notifier: module about to be freed */
};

struct module
diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal linux-2.6.0-test5-bk2/kernel/module.c working-2.6.0-test5-bk2-nat-expect/kernel/module.c
--- linux-2.6.0-test5-bk2/kernel/module.c 2003-09-09 10:35:05.000000000 +1000
+++ working-2.6.0-test5-bk2-nat-expect/kernel/module.c 2003-09-21 15:22:36.000000000 +1000
@@ -83,6 +83,13 @@ int unregister_module_notifier(struct no
}
EXPORT_SYMBOL(unregister_module_notifier);

+static void module_notify(struct module *mod, enum module_state state)
+{
+ down(&notify_mutex);
+ notifier_call_chain(&module_notify_list, state, mod);
+ up(&notify_mutex);
+}
+
/* We require a truly strong try_module_get() */
static inline int strong_try_module_get(struct module *mod)
{
@@ -723,12 +730,15 @@ sys_delete_module(const char __user *nam
mod->state = MODULE_STATE_GOING;
restart_refcounts();

+ module_notify(mod, MODULE_STATE_GOING);
+
/* Never wait if forced. */
if (!forced && module_refcount(mod) != 0)
wait_for_zero_refcount(mod);

/* Final destruction now noone is using it. */
mod->exit();
+ module_notify(mod, MODULE_STATE_GONE);
free_module(mod);

out:
@@ -1718,9 +1728,7 @@ sys_init_module(void __user *umod,
/* Drop lock so they can recurse */
up(&module_mutex);

- down(&notify_mutex);
- notifier_call_chain(&module_notify_list, MODULE_STATE_COMING, mod);
- up(&notify_mutex);
+ module_notify(mod, MODULE_STATE_COMING);

/* Start the module */
ret = mod->init();
@@ -1728,12 +1736,14 @@ sys_init_module(void __user *umod,
/* Init routine failed: abort. Try to protect us from
buggy refcounters. */
mod->state = MODULE_STATE_GOING;
+ module_notify(mod, MODULE_STATE_GOING);
synchronize_kernel();
if (mod->unsafe)
printk(KERN_ERR "%s: module is now stuck!\n",
mod->name);
else {
module_put(mod);
+ module_notify(mod, MODULE_STATE_GONE);
down(&module_mutex);
free_module(mod);
up(&module_mutex);
@@ -1751,6 +1761,7 @@ sys_init_module(void __user *umod,
mod->init_size = 0;
mod->init_text_size = 0;
up(&module_mutex);
+ module_notify(mod, MODULE_STATE_LIVE);

return 0;
}

2004-01-28 02:03:49

[permalink] [raw]

Subject: Re: PATCH: (as177) Add class_device_unregister_wait() and platform_device_unregister_wait() to the driver model core

Hi,

On Tue, 27 Jan 2004, Greg KH wrote:

> > All this is done without a module count, this means that
> > pci_unregister_driver() cannot return before the last reference is gone.
> > For network devices this is not that much of a problem, as they can be
> > rather easily deconfigured automatically, but that's not that easy for
> > mounted block devices, so one has to be damned careful when to call the
> > exit function.
>
> Um, not anymore. I can yank out a mounted block device and watch the
> scsi core recover just fine. The need to make everything hotpluggable
> has fixed up a lot of issues like this (as well as made more
> problems...)

Recovery of the scsi core is IMO one the smallest problems, but how do you
recover at the block layer? The point is that you have here theoretically
more one recovery strategy, but simply pulling out the module leaves you
not much room for a controlled recovery.

bye, Roman

2004-01-28 02:17:23

[permalink] [raw]

Subject: Re: PATCH: (as177) Add class_device_unregister_wait() and platform_device_unregister_wait() to the driver model core

On Wed, Jan 28, 2004 at 03:03:31AM +0100, Roman Zippel wrote:
> Hi,
>
> On Tue, 27 Jan 2004, Greg KH wrote:
>
> > > All this is done without a module count, this means that
> > > pci_unregister_driver() cannot return before the last reference is gone.
> > > For network devices this is not that much of a problem, as they can be
> > > rather easily deconfigured automatically, but that's not that easy for
> > > mounted block devices, so one has to be damned careful when to call the
> > > exit function.
> >
> > Um, not anymore. I can yank out a mounted block device and watch the
> > scsi core recover just fine. The need to make everything hotpluggable
> > has fixed up a lot of issues like this (as well as made more
> > problems...)
>
> Recovery of the scsi core is IMO one the smallest problems, but how do you
> recover at the block layer? The point is that you have here theoretically
> more one recovery strategy, but simply pulling out the module leaves you
> not much room for a controlled recovery.

Block layer is not too big issue. We have almost everything in the tree
already - the main problem is to get check_disk_change() use regularized.
Now, sound and character devices in general...

2004-01-28 02:36:19

[permalink] [raw]

Subject: Re: PATCH: (as177) Add class_device_unregister_wait() and platform_device_unregister_wait() to the driver model core

Hi Rusty,

On Wed, 28 Jan 2004, Rusty Russell wrote:

> > Fixing this requires changing every single module, but in the end it
> > would be worth it, as it avoids the duplicated protection and we had
> > decent module unload semantics.
>
> And I still disagree. <shrug>

And I still don't know why. :(

> If it's any consolation, I don't plan any significant module work in
> 2.7. If you want to work on this, you're welcome to it. Perhaps you
> can convince Linus et al that it's worth the pain?

Well, the problem is that this won't be an one man show, it requires that
a number of kernel hackers understand the problem and the possible
solutions are discussed beforehand. I can understand that a lot here are
scared of such big change, but either we either continue complaining about
module unloading or we do something about it and this requires exploring
the various possibilities.
Rusty, you are the modules maintainer, you are supposed to understand
these issues, if you already block a discussion like that, what am I
supposed to expect from others? I really don't claim to have all the
answers and I really would like to discuss the various possible approaches
to solve this problem, so people also know what they can expect.

bye, Roman

2004-01-28 02:53:48

[permalink] [raw]

Subject: Re: PATCH: (as177) Add class_device_unregister_wait() and platform_device_unregister_wait() to the driver model core

Hi,

On Wed, 28 Jan 2004 [email protected] wrote:

> > Recovery of the scsi core is IMO one the smallest problems, but how do you
> > recover at the block layer? The point is that you have here theoretically
> > more one recovery strategy, but simply pulling out the module leaves you
> > not much room for a controlled recovery.
>
> Block layer is not too big issue. We have almost everything in the tree
> already - the main problem is to get check_disk_change() use regularized.
> Now, sound and character devices in general...

Hmm, I more meant "user controlled recovery", the simplest strategy is of
course to throw everything away and that should be indeed not too
difficult, but I don't really think that this is user prefered strategy if
he accidentally unplugs/plugs a device. OTOH that the simple strategy
works reliably is of course a prerequisite to even think about an any more
advanced recovery.
But with the current module infrastructure the user has not much choice
anyway, without any indication of module usage state the user can only
guess what will happen when he tries to unload a module, so that currently
the best advice is indeed: don't do it.

bye, Roman

2004-01-29 00:32:14