LinuxLists.cc - RE2: [OKS] Module removal

2002-07-01 20:20:03

Subject: RE2: [OKS] Module removal

Hello,

> >
> > The suggestion was made that kernel module removal be depreciated or
> > removed. I'd like to note that there are two common uses for this
> > capability, and the problems addressed by module removal should be
> > kept in mind. These are in addition to the PCMCIA issue raised.

I saw this mail flashes thru the reflector. It is worrying to know
that this great feature is on the discussion table for removal.

My humbly two cents, is that the Kernel Module is very much appreciative
in the Linux embedded development. We, in embedded development, write
"Kernel Modules/Drivers" to run our hardware. The Kernel Module feature
allows us to segregate the HW specific from the Linux, and it also
allows
us to upgrade the module code without reload of Linux. This approach is
very efficient for us in the embedded products.

I hope others can share this comment, and help keep this feature as is.

Reagards,
Michael.

2002-07-02 01:33:27

by Werner Almesberger

[permalink] [raw]

Subject: Re: RE2: [OKS] Module removal

Michael Nguyen wrote:
> I saw this mail flashes thru the reflector. It is worrying to know
> that this great feature is on the discussion table for removal.

One work-around that was suggested was to allow modules to be
superseded, i.e. the old module stays forever, but a new
version can be loaded in parallel. I must say that I'm very
sceptic about this idea, as it seems likely to just mask more
severe problems.

If I remember right, the main arguments why module removal can
race with references were:

- buggy modules that don't even know themselves if they still
serve a purpose or not (solution: fix 'em)
- likewise, but with the excuse that correctness was
sacrificed on the altar of performance
- references getting copied without the module knowing. Looks
like a problem in the subsystem managing the references.
(This was discussed mainly in the context of automating
reference tracking.)
- removal happening immediately after module usage count is
decremented to zero may unload module before module has
executed "return" instruction

While I can accept the theoretical possibility that some code
may indeed not be able to afford handling the module usage
count, I kind of doubt that such conditions exist in real life.

For the removal-before-return problem, I thought a bit about it
on my return flight. It would seem to me that an "atomic"
"decrement_module_count_and_return" function would do the trick.

That function would prepare to return from its caller, then
decrement the module count, and finally do the return. That way,
no resources of the caller would be used after the module usage
counter drops to zero. Obviously, any related cleanup would have
to happen before this. Also, you couldn't call
decrement_module_count_and_return from a function that gets
called from another function in the same module.

Not sure if such a solution for removal-before-return has been
considered/rejected yet. It would seem obvious enough.

- Werner

--
_________________________________________________________________________
/ Werner Almesberger, Buenos Aires, Argentina [email protected] /
/_http://icapeople.epfl.ch/almesber/_____________________________________/

2002-07-02 02:23:27

by Keith Owens

[permalink] [raw]

Subject: Re: RE2: [OKS] Module removal

On Mon, 1 Jul 2002 22:40:34 -0300,
Werner Almesberger <[email protected]> wrote:
>If I remember right, the main arguments why module removal can
>race with references were:
>....
> - removal happening immediately after module usage count is
> decremented to zero may unload module before module has
> executed "return" instruction
>For the removal-before-return problem, I thought a bit about it
>on my return flight. It would seem to me that an "atomic"
>"decrement_module_count_and_return" function would do the trick.

This is just one symptom of the overall problem, which is module code
that adjusts its use count by executing code that belongs to the
module. The same problem exists on entry to a module function, the
module can be removed before MOD_INC_USE_COUNT is reached.

Apart from abandoning module removal, there are only two viable fixes:

1) Do the reference counting outside the module, before it is entered.

This is why Al Viro added the owner: __THIS_MODULE; line to various
structures. The problem is that it spreads like a cancer. Every
structure that contains function pointers needs an owner field.
Every bit of code that dereferences a function pointer must first
bump the owner's use count (using atomic ops) and must cope with the
owner no longer existing.

Not only does this pollute all structures that contain function
pointers, it introduces overhead on every function dereference. All
of this just to cope with the relatively low possibility that a
module will be removed.

2) Introduce a delay after unregistering a module's services and before
removing the code from memory.

This puts all the penalty and complexity where it should be, in the
unload path. However it requires a two stage rmmod process (check
use count, unregister, delay, recheck use count, remove if safe)
so all module cleanup routines need to be split into unregister and
final remove routines.

This is relatively easy to do without preemption, it is
significantly harder with preempt. There are also unsolved problems
with long running device commands with callbacks (e.g. CD-R fixate)
and with kernel threads started from a module (must wait until
zombies have been reaped).

Rusty and I agree that option (2) is the only sane way to do module
unload, assuming that we retain module unloading. First decide if the
extra work is justified.

2002-07-02 03:04:58

by Werner Almesberger

[permalink] [raw]

Subject: Re: RE2: [OKS] Module removal

Keith Owens wrote:
> This is just one symptom of the overall problem, which is module code
> that adjusts its use count by executing code that belongs to the
> module. The same problem exists on entry to a module function, the
> module can be removed before MOD_INC_USE_COUNT is reached.

Ah yes, now I remember, thanks. I filed that under "improper reference
tracking". After all, why would anybody hold an uncounted reference in
the first place ?

I can understand that the exact number of references may be unknown,
e.g. if I pass a reference to some registration function, which may in
turn hand it to third parties, but why wouldn't I know that there is
at least one reference ?

If some other module B hands out uncounted references on behalf of
some module A, it would seem natural for B to make sure that it
collects them before getting unloaded (and thereby releasing A).

> 1) Do the reference counting outside the module, before it is entered.

Evil, agreed ;-)

- Werner

--
_________________________________________________________________________
/ Werner Almesberger, Buenos Aires, Argentina [email protected] /
/_http://icapeople.epfl.ch/almesber/_____________________________________/

2002-07-02 03:39:57

by Keith Owens

[permalink] [raw]

Subject: Re: RE2: [OKS] Module removal

On Tue, 2 Jul 2002 00:11:52 -0300,
Werner Almesberger <[email protected]> wrote:
>Keith Owens wrote:
>> This is just one symptom of the overall problem, which is module code
>> that adjusts its use count by executing code that belongs to the
>> module. The same problem exists on entry to a module function, the
>> module can be removed before MOD_INC_USE_COUNT is reached.
>
>Ah yes, now I remember, thanks. I filed that under "improper reference
>tracking". After all, why would anybody hold an uncounted reference in
>the first place ?

All functions passed to registration routines by modules are uncounted
references. A module is loaded, registers its operations and exits
from the cleanup routine. At that point its use count is 0, even
though it there are references to the module from tables outside the
module.

When the open routine (or its equivalent) is called, then the use count
is incremented from within the module. The executing code between

if (ops->open)
ops->open();

and MOD_INC_USE_COUNT in the module's open routine is racy, there is no
lock that prevents the module being removed while the start of the open
routine is being executed.

Incrementing the use count at registration time is no good, it stops
the module being unloaded. Operations are deregistered at rmmod time.
Setting the use count at registration prevents rmmod from removing the
module, so you cannot deregister the operations. Catch 22.

Module unload is not racy on UP without preempt. It is racy on SMP or
with preempt. It used to be safe on SMP because almost everything was
under the BKL, but that protection no longer exists.

2002-07-02 04:04:51

by Werner Almesberger

[permalink] [raw]

Subject: Re: RE2: [OKS] Module removal

Keith Owens wrote:
> Incrementing the use count at registration time is no good, it stops
> the module being unloaded. Operations are deregistered at rmmod time.
> Setting the use count at registration prevents rmmod from removing the
> module, so you cannot deregister the operations. Catch 22.

But those references go through the module exit function, which
acts like an implicit reference counter. So as long as

- module exit de-registers all of them (if it doesn't, we're
screwed anyhow), and
- the registry itself isn't racy (if it is, this is likely to
surface in other circumstances too, e.g. if a driver destroys
internal state immediately after de-registration)

they should be safe, shouldn't they ?

- Werner

P.S. mail.ocs.com.au thinks I'm a spammer :-(

--
_________________________________________________________________________
/ Werner Almesberger, Buenos Aires, Argentina [email protected] /
/_http://icapeople.epfl.ch/almesber/_____________________________________/

2002-07-02 04:11:45

by Brian Gerst

[permalink] [raw]

Subject: Re: RE2: [OKS] Module removal

Keith Owens wrote:
> On Mon, 1 Jul 2002 22:40:34 -0300,
> Werner Almesberger <[email protected]> wrote:
>
>>If I remember right, the main arguments why module removal can
>>race with references were:
>>....
>>- removal happening immediately after module usage count is
>> decremented to zero may unload module before module has
>> executed "return" instruction
>>For the removal-before-return problem, I thought a bit about it
>>on my return flight. It would seem to me that an "atomic"
>>"decrement_module_count_and_return" function would do the trick.
>
>
> This is just one symptom of the overall problem, which is module code
> that adjusts its use count by executing code that belongs to the
> module. The same problem exists on entry to a module function, the
> module can be removed before MOD_INC_USE_COUNT is reached.
>
> Apart from abandoning module removal, there are only two viable fixes:
>
> 1) Do the reference counting outside the module, before it is entered.
>
> This is why Al Viro added the owner: __THIS_MODULE; line to various
> structures. The problem is that it spreads like a cancer. Every
> structure that contains function pointers needs an owner field.
> Every bit of code that dereferences a function pointer must first
> bump the owner's use count (using atomic ops) and must cope with the
> owner no longer existing.
>
> Not only does this pollute all structures that contain function
> pointers, it introduces overhead on every function dereference. All
> of this just to cope with the relatively low possibility that a
> module will be removed.

Only "first use" (ie. ->open) functions need gaurding against unloads.
Any subsequent functions are guaranteed to have a reference to the
module, and don't need to bother with the refcount. I have a few ideas
to optimize the refcounting better than it is now.

> 2) Introduce a delay after unregistering a module's services and before
> removing the code from memory.
>
> This puts all the penalty and complexity where it should be, in the
> unload path. However it requires a two stage rmmod process (check
> use count, unregister, delay, recheck use count, remove if safe)
> so all module cleanup routines need to be split into unregister and
> final remove routines.
>
> This is relatively easy to do without preemption, it is
> significantly harder with preempt. There are also unsolved problems
> with long running device commands with callbacks (e.g. CD-R fixate)
> and with kernel threads started from a module (must wait until
> zombies have been reaped).

The callbacks should hold references that would not allow the module to
unload. Other than that, this is the same problem the RCU folks are
working on.

> Rusty and I agree that option (2) is the only sane way to do module
> unload, assuming that we retain module unloading. First decide if the
> extra work is justified.

Freeing up the limited vmalloc address space should be justification enough.

--
Brian Gerst

2002-07-02 04:50:56

by Keith Owens

[permalink] [raw]

Subject: RE2: [OKS] Module removal

Subject: Re: RE2: [OKS] Module removal

Subject: Re: RE2: [OKS] Module removal

Subject: Re: RE2: [OKS] Module removal

Subject: Re: RE2: [OKS] Module removal

Subject: Re: RE2: [OKS] Module removal

Subject: Re: RE2: [OKS] Module removal

Subject: Re: [OKS] Module removal

Subject: Re: [OKS] Module removal

Subject: Re: RE2: [OKS] Module removal

Subject: Re: RE2: [OKS] Module removal

Subject: Re: [OKS] Module removal

Subject: Re: [OKS] Module removal

Subject: Re: [OKS] Module removal

Subject: Re: RE2: [OKS] Module removal

Subject: Re: RE2: [OKS] Module removal

Subject: Re: [OKS] Module removal

Subject: Re: [OKS] Module removal

Subject: Re: [OKS] Module removal

Subject: Re: [OKS] Module removal

Subject: Re: [OKS] Module removal

Subject: Re: [OKS] Module removal

Subject: Re: [OKS] Module removal

Subject: Re: [OKS] Module removal

Subject: Re: [OKS] Module removal

Subject: Re: [OKS] Module removal

Subject: Re: [OKS] Module removal

Subject: Re: [OKS] Module removal

Subject: Re: [OKS] Module removal

Subject: Re: [OKS] Module removal

Subject: Re: [OKS] Module removal

Subject: Re: [OKS] Module removal

Subject: Re: [OKS] Module removal

Subject: Re: [OKS] Module removal

Subject: Re: [OKS] Module removal

Subject: Re: [OKS] Module removal

Subject: Re: [OKS] Module removal

Subject: Re: [OKS] Module removal

Subject: Re: RE2: [OKS] Module removal