2002-08-05 04:53:36

by Kingsley Cheung

[permalink] [raw]
Subject: Possible Bug in "sys_init_module"?

Greetings,

Please cc me since I'm not on the mailing list.

While debugging several proprietary modules at work with a dual SMP x86
box running a 2.4.18 kernel, I noticed that when attempting to
concurrently execute two scripts that loaded and unloaded a stack of
modules, the box kept on crashing. In my search for the problem I noticed
that the function "sys_init_module" in kernel/module.c may have a
possible bug.

Assume that one script invokes modprobe which calls "sys_init_module"
first. The big kernel lock is taken and then plenty of sanity checks
done. After dependencies are checked and updated, the "init_module"
function of the module is invoked. Now if this function happens to block,
the kernel lock is dropped. A call to "sys_init_module" by modprobe in
the other script to initialise a second module dependent on the first
could then take the big kernel lock, check the dependencies and find them
okay, and then have its "init_module" function invoked. And if this
second module relies on the first module being properly initialised
before it is loaded, this can break.

Is this an issue that requires attention? Or am I overlooking something
in the code?

Thanks,
Kingsley




2002-08-05 05:32:41

by Keith Owens

[permalink] [raw]
Subject: Re: Possible Bug in "sys_init_module"?

On Mon, 5 Aug 2002 14:57:07 +1000 (EST),
Kingsley Cheung <[email protected]> wrote:
>Please cc me since I'm not on the mailing list.
>
>Assume that one script invokes modprobe which calls "sys_init_module"
>first. The big kernel lock is taken and then plenty of sanity checks
>done. After dependencies are checked and updated, the "init_module"
>function of the module is invoked. Now if this function happens to block,
>the kernel lock is dropped. A call to "sys_init_module" by modprobe in
>the other script to initialise a second module dependent on the first
>could then take the big kernel lock, check the dependencies and find them
>okay, and then have its "init_module" function invoked. And if this
>second module relies on the first module being properly initialised
>before it is loaded, this can break.

This is a trade off between two conflicting requirements. If a module
fails during initialization then we want the module symbols to debug
the module. But those same symbols should not be considered valid when
doing insmod. The query_module() interface does not have the
flexibility to distinguish between the two types of user space query.

In any case the problem is bigger than module symbols. What happens
when a module_init breaks after registering some functions? The
functions are registered and can be called, but the module is stuffed.
insmod symbols are just one instance of the wider problem - if a module
fails during init or exit and does not recover then the kernel is in an
unreliable state. It is broken, you get to keep the pieces.

On my todo list for modutils 2.5 is to invoke init_module() from a
separate task. That task will be killed by the kernel oops (there is
no way for userspace to recover from oops) but the parent insmod will
detect the failure and say

init_module() for foo failed. The kernel is in an unreliable state.

2002-08-05 07:22:01

by Kingsley Cheung

[permalink] [raw]
Subject: Re: Possible Bug in "sys_init_module"?

On Mon, 5 Aug 2002, Keith Owens wrote:

> Kingsley Cheung <[email protected]> wrote:
> >Please cc me since I'm not on the mailing list.
> >
> >Assume that one script invokes modprobe which calls "sys_init_module"
> >first. The big kernel lock is taken and then plenty of sanity checks
> >done. After dependencies are checked and updated, the "init_module"
> >function of the module is invoked. Now if this function happens to block,
> >the kernel lock is dropped. A call to "sys_init_module" by modprobe in
> >the other script to initialise a second module dependent on the first
> >could then take the big kernel lock, check the dependencies and find them
> >okay, and then have its "init_module" function invoked. And if this
> >second module relies on the first module being properly initialised
> >before it is loaded, this can break.
>
> This is a trade off between two conflicting requirements. If a module
> fails during initialization then we want the module symbols to debug
> the module. But those same symbols should not be considered valid when
> doing insmod. The query_module() interface does not have the
> flexibility to distinguish between the two types of user space query.
>
> In any case the problem is bigger than module symbols. What happens
> when a module_init breaks after registering some functions? The
> functions are registered and can be called, but the module is stuffed.
> insmod symbols are just one instance of the wider problem - if a module
> fails during init or exit and does not recover then the kernel is in an
> unreliable state. It is broken, you get to keep the pieces.
>

Keith,

Thanks for the quick reply. I understand the need for having the module
symbols in early for debugging purposes. However, I'm not sure I see what
the invalid state of the module symbols has to do with the problem.

Now I'm assuming you're saying the module symbols are invalid because the
initialisation function has yet to complete. Please correct me if I'm
wrong. For this reason shouldn't another concurrent call to
"sys_init_module" wait or be prevented from invoking the "init_module"
function of a module that is dependent on the first module until the
first module has its initialisation complete? For debugging purposes
"query_module" should still be able to see the symbols and other relevant
data of the first module but no module dependent on the first
uninitialised module should ever be loaded.

Anyway, at this point I'm still not certain I've completely grasped your
reply. Maybe I didn't make my first email clear. If this is so, then
what I was saying is that the invocation of "init_module" functions of
*dependent* modules needs be appropriately serialised. Right now
"sys_init_module" is relying on the big kernel lock to completely
serialise these calls but if any "init_module" function blocks, the
kernel lock is released and this serialisation can be broken. Maybe one
way to avoid this is to check the flags of modules depended on during
"sys_init_module". So if the MOD_RUNNING flag is not set for the module
we depend on, then the module currently being loaded must have its
invocation of "sys_init_module" wait or return an appropriate error
indicating why.

Kingsley

2002-10-16 04:26:06

by Kingsley Cheung

[permalink] [raw]
Subject: [PATCH] Bug in "sys_init_module", kernel/module.c, 2.4.19

Hi,

I've a trivial patch against kernel/module.c for 2.4.19 below that
fixes a problem with the initialisation of a stack of modules.
Though it rarely occurs, without this the kernel can crash when a module
is being initialised and before this initialisation finishes another
*dependent* module is loaded. This occurs when the first module blocks
in init_module and releases the big kernel lock. I discussed this earlier
with Keith Owens but it looks like its been forgotten:

On Mon, 5 Aug 2002, Keith Owens wrote:

> On Mon, 5 Aug 2002 17:25:21 +1000 (EST),
> Kingsley Cheung <[email protected]> wrote:
> >Anyway, at this point I'm still not certain I've completely grasped your
> >reply. Maybe I didn't make my first email clear. If this is so, then
> >what I was saying is that the invocation of "init_module" functions of
> >*dependent* modules needs be appropriately serialised. Right now
> >"sys_init_module" is relying on the big kernel lock to completely
> >serialise these calls but if any "init_module" function blocks, the
> >kernel lock is released and this serialisation can be broken. Maybe one
> >way to avoid this is to check the flags of modules depended on during
> >"sys_init_module". So if the MOD_RUNNING flag is not set for the module
> >we depend on, then the module currently being loaded must have its
> >invocation of "sys_init_module" wait or return an appropriate error
> >indicating why.
>
> Agreed, this is a bigger problem than a failed module_init(). I am
> going to think about this overnight.
>

Cheers,
Kingsley


--- module.c.old Wed Oct 16 10:46:10 2002
+++ module.c Wed Oct 16 10:50:13 2002
@@ -528,6 +528,10 @@
"(no longer?) a module.\n");
goto err3;
}
+
+ /* Dependents must be initialised and running */
+ if (!(d->flags & MOD_RUNNING) || (d->flags & MOD_DELETED))
+ goto err3;
}

/* Update module references. */