Commit c9a3ba55 (module: wait for dependent modules doing init.)
didn't quite work because the waiter holds the module lock, meaning
that the state of the module it's waiting for cannot change.
Fortunately, it's fairly simple to update the state outside the lock
and do the wakeup.
Thanks to Jan Glauber <[email protected]> for tracking this down
and testing (qdio and qeth).
Signed-off-by: Rusty Russell <[email protected]>
---
kernel/module.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff -r 1198dd206438 kernel/module.c
--- a/kernel/module.c Tue Mar 04 11:18:59 2008 +1100
+++ b/kernel/module.c Tue Mar 04 13:23:22 2008 +1100
@@ -2172,9 +2172,11 @@ sys_init_module(void __user *umod,
return ret;
}
- /* Now it's a first class citizen! */
+ /* Now it's a first class citizen! Wake up anyone waiting for it. */
+ mod->state = MODULE_STATE_LIVE;
+ wake_up(&module_wq);
+
mutex_lock(&module_mutex);
- mod->state = MODULE_STATE_LIVE;
/* Drop initial reference. */
module_put(mod);
unwind_remove_table(mod->unwind_info, 1);
@@ -2183,7 +2185,6 @@ sys_init_module(void __user *umod,
mod->init_size = 0;
mod->init_text_size = 0;
mutex_unlock(&module_mutex);
- wake_up(&module_wq);
return 0;
}
Subject: Whine about suspicious return values from module's ->init() hook
Date: Mon, 11 Feb 2008 01:09:06 +0300
From: Alexey Dobriyan <[email protected]>
Return value convention of module's init functions is 0/-E. Sometimes, e.g.
during forward-porting mistakes happen and buggy module created, where result
of comparison "workqueue != NULL" is propagated all the way up to
sys_init_module. What happens is that some other module created workqueue in
question, our module created it again and module was successfully loaded.
Or it could be some other bug.
Let's make such mistakes much more visible. In retrospective, such messages
would noticeably shorten some of my head-scratching sessions.
Note, that dump_stack() is just a way to get attention from user.
Sample message:
sys_init_module: 'foo'->init suspiciously returned 1, it should follow 0/-E convention
sys_init_module: loading module anyway...
Pid: 4223, comm: modprobe Not tainted 2.6.24-25f666300625d894ebe04bac2b4b3aadb907c861 #5
Call Trace:
[<ffffffff80254b05>] sys_init_module+0xe5/0x1d0
[<ffffffff8020b39b>] system_call_after_swapgs+0x7b/0x80
Signed-off-by: Alexey Dobriyan <[email protected]>
Signed-off-by: Rusty Russell <[email protected]>
---
kernel/module.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff -u
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -2174,6 +2174,14 @@ sys_init_module(void __user *umod,
wake_up(&module_wq);
return ret;
}
+ if (ret > 0) {
+ printk(KERN_WARNING "%s: '%s'->init suspiciously returned %d, "
+ "it should follow 0/-E convention\n"
+ KERN_WARNING "%s: loading module anyway...\n",
+ __func__, mod->name, ret,
+ __func__);
+ dump_stack();
+ }
/* Now it's a first class citizen! */
mutex_lock(&module_mutex);
On Tue, Mar 04, 2008 at 11:22:26PM +1100, Rusty Russell wrote:
> Subject: Whine about suspicious return values from module's ->init() hook
> Date: Mon, 11 Feb 2008 01:09:06 +0300
> From: Alexey Dobriyan <[email protected]>
>
> Return value convention of module's init functions is 0/-E. Sometimes, e.g.
> during forward-porting mistakes happen and buggy module created, where result
> of comparison "workqueue != NULL" is propagated all the way up to
> sys_init_module. What happens is that some other module created workqueue in
> question, our module created it again and module was successfully loaded.
>
> Or it could be some other bug.
>
> Let's make such mistakes much more visible. In retrospective, such messages
> would noticeably shorten some of my head-scratching sessions.
>
> Note, that dump_stack() is just a way to get attention from user.
> Sample message:
>
> sys_init_module: 'foo'->init suspiciously returned 1, it should follow 0/-E convention
> sys_init_module: loading module anyway...
>...
While I agree with Andrew that a BUG() would not be appropriate here I'm
wondering why the module should be loaded?
We do know that something in the module is buggy.
And not loading the module also seems to be a good compromise between
making the user notice the problem and not doing a panic().
cu
Adrian
--
"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed
On Thursday 06 March 2008 10:04:45 Adrian Bunk wrote:
> On Tue, Mar 04, 2008 at 11:22:26PM +1100, Rusty Russell wrote:
> > Subject: Whine about suspicious return values from module's ->init() hook
> > Date: Mon, 11 Feb 2008 01:09:06 +0300
> > From: Alexey Dobriyan <[email protected]>
> >
> > Return value convention of module's init functions is 0/-E. Sometimes,
> > e.g. during forward-porting mistakes happen and buggy module created,
> > where result of comparison "workqueue != NULL" is propagated all the way
> > up to sys_init_module. What happens is that some other module created
> > workqueue in question, our module created it again and module was
> > successfully loaded.
> >
> > Or it could be some other bug.
> >
> > Let's make such mistakes much more visible. In retrospective, such
> > messages would noticeably shorten some of my head-scratching sessions.
> >
> > Note, that dump_stack() is just a way to get attention from user.
> > Sample message:
> >
> > sys_init_module: 'foo'->init suspiciously returned 1, it should follow
> > 0/-E convention sys_init_module: loading module anyway...
> >...
>
> While I agree with Andrew that a BUG() would not be appropriate here I'm
> wondering why the module should be loaded?
>
> We do know that something in the module is buggy.
Unfortunately not: it's a semantic change. Previously as per Unix standard,
>=0 was good, < 0 was bad. This resulted in some confusion, and so Alexey
proposed tightening the rules, to only allow <= 0 values.
So we don't know that the module is buggy. It's possible that it's returning
ENODEV instead of -ENODEV, but it's also possible that it's returning 1 to
mean "ok".
> And not loading the module also seems to be a good compromise between
> making the user notice the problem and not doing a panic().
Depends which way the failure is. Andrew said not to break existing systems.
To avoid this, we'd need an audit which noone wants to do, so we're being
lazy.
Hope that clarifies my thinking,
Rusty.