On Tue, 1 Jun 2010 12:54:18 pm Linus Torvalds wrote:
> I'm sorry, but that's simply not good enough. We do not do ad-hoc locking
> "just becuse it works". Locking is too damn easy to get wrong, and "one
> person knows how the locking works" is not an excuse for anything at all.
You asked who the hell knew what the results were.
You couldn't figure it out, so you shotgunned an innocent patch, which papered
over the real problem.
> > See, this I agree with, but you could have said this in far fewer words and
> > much more politely.
>
> Why? Why does "locking is crap" need any politeness? And why is it wrong
> to explain at length exactly _why_ our module locking is a piece of sh*t?
>
> So explain why I should be more polite, or more terse?
How about this:
"I hate this patch. It makes already-ugly locking in module.c awful, and I
can't see that it's correct. Why not just reduce the locking to cover
the minimum needed?"
Getting email from you is the least fun part of kernel hacking, and that
sucks. It's never "this code sucks, let's make it better", and always
"your code sucks and you're wrong" :(
> > As posted, I had a patch to clean up the locking. Seems you ignored it.
>
> Umm. That patch was not the original one that I objected to, is it?
Nope, but you didn't respond to it either.
> I do agree with your 1/2, btw, the one you posted under protest after I
> pointed out that the locking was crap. I was just explaining _why_ the
> locking was crap, and what the problem was to Andrew.
Ugly as dropping the lock was, it was in some ways less ambitious than
either patch. I understand that you disagree.
> I happen to also think that my solution to the problem is actually better
> and more straightforward than your one is.
I don't think so, for several reasons.
(1) You no longer print out which module you gave up waiting for.
(2) You now need that code complexity for !CONFIG_MODULE_UNLOAD.
(3) It's obviously *not* straightforward otherwise you'd see why it's buggy.
(4) The fast booting nutters want more parallelism in module loading anyway.
(5) The locking *is* getting hairy (we drop it once already), and my patch
makes that more explicit and clearer.
> > No, those modules could still fail init.
>
> Umm. Which I took care of.
No, you didn't.
Prior to your patch, noone could get a reference on a module before it had
done init. So when init fails, we simply free the module.
Now, you've changed that with your grab-and-check-later code. And you can't
fix it for the !CONFIG_MODULE_UNLOAD, because there are no reference counts
for that case.
> I posted the damn series. Your next email even claims that my first one in
> the series (the bigger one) was a nice cleanup.
I was being polite. In practice, it added lines of code for no real win,
but hey, you obviously needed some positive feedback or something.
> You didn't comment on the
> second one, apparently because you're too embarrassed to admit you were
> wrong, and it wasn't all that 'wrong, complex, and fundamentally broken'
> to begin with.
No, Linus. Difficult as it is, I always try to frankly and publicly admit
when I was wrong. I'm sure you've experienced this in the past.
I didn't respond to it because you'd said "if we can't fix the
locking, we should try...".
I fixed the locking, so what gain in pointing out your mistakes?
Hope that helps,
Rusty.
On Tue, 1 Jun 2010, Rusty Russell wrote:
>
> > So explain why I should be more polite, or more terse?
>
> How about this:
>
> "I hate this patch. It makes already-ugly locking in module.c awful, and I
> can't see that it's correct. Why not just reduce the locking to cover
> the minimum needed?"
Umm. Did you read the first few emails I sent out on this thread
originally? They were actually _politer_ than your suggestion above (no
crap mentioned), and did exactly what you ask for. Let me quote the one
where I suggested tightening the lock, for example (along with saying that
I don't want to see pointless semantic changes):
I'm not re-applying it with the pointless semantic changes that are
visible to modules. It doesn't matter if they were informed, if it means
that they'll then just have some odd version dependency and add crazy
"#if LINUX_VERSION" tests that aren't even exact.
I also wonder exactly what that module_mutex() actually ends up
protecting. 99% of load_module() seems to be stuff that is purely about
local issues. Maybe we could tighten the actual lock section to just the
parts that actually expose the vmalloc'ed area to others?
It's generally pointless releasing a lock in the middle - that just makes
the lock even less valid. If it's valid to just release the lock (without
some retry logic starting everything from the beginning), it likely the
lock shouldn't have been held there in the first place.
See? How bad was that?
> Getting email from you is the least fun part of kernel hacking, and that
> sucks. It's never "this code sucks, let's make it better", and always
> "your code sucks and you're wrong" :(
See above. That wasn't what I said. That "you're wrong" was what I got to
when I got frustrated with you for complaining about me actually trying to
fix it.
And "crap locking" only came up in the long explanation to Andrew about
what the original problem was, and why the EBUSY error wasn't a problem.
And we all know the module locking was/is crap. You must have known it
too, since you apparently had a "fix up locking" patch from before.
So what was so bad about calling the locking crap? Really?
So yes, we both get frustrated here. But read the thread again, Rusty, and
look who it is that actually starts complaining about other peoples code.
I called the locking crap, you're the one who called something "wrong,
complex and fundamentally broken".
Kettle. Pot. Black.
> > I do agree with your 1/2, btw, the one you posted under protest after I
> > pointed out that the locking was crap. I was just explaining _why_ the
> > locking was crap, and what the problem was to Andrew.
>
> Ugly as dropping the lock was, it was in some ways less ambitious than
> either patch. I understand that you disagree.
I agree with "less ambitious". But I'd also call it "papering over bad
code", or "making bad locking much worse" or "likely introduce new and
even more subtle problems".
In fact, I might even go so far as using "crap".
Because it really is fundamentally wrong to drop a lock in the middle of
an operation. It may _work_, but it certainly doesn't make the code
better.
The thing is, when we have a problem in some code, we should try to make
sure that the fixed code is _better_ than the old code. Not just fix the
symptoms. Fix the underlying _reason_ for why the bug happened in the
first place.
And the underlying reason the bug happened is (I think we both agree)
simply that the locking was broken. Your patch fixed the symptoms, yes,
but it did so by making the locking _worse_. That's why I hated it. I
really don't think it was a good approach.
And I do think you agree in the end, and know that's true.
> > I happen to also think that my solution to the problem is actually better
> > and more straightforward than your one is.
>
> I don't think so, for several reasons.
>
> (1) You no longer print out which module you gave up waiting for.
> (2) You now need that code complexity for !CONFIG_MODULE_UNLOAD.
> (3) It's obviously *not* straightforward otherwise you'd see why it's buggy.
> (4) The fast booting nutters want more parallelism in module loading anyway.
> (5) The locking *is* getting hairy (we drop it once already), and my patch
> makes that more explicit and clearer.
I agree with 1-2, and even had a comment about #2 pointing that out, and
thinking that it would be good for other reasons (ie /proc/modules).
And as to #4, the "wait for everything after the fact" is actually the
faster approach, although in practice it probably doesn't matter (you'd
hope that modules depending on other modules that are still busy loading
is such a rare case that it does _not_ matter whether you wait for them to
load serially or batched up).
And as to #5, I would not actually suggest that we do "wait later _or_ fix
the locking", I'd do both. It really isn't a either-or situation, Rusty.
So I don't think those ones matter.
HOWEVER.
But as to #3, I think you bring up an interesting issue:
> Prior to your patch, noone could get a reference on a module before it had
> done init. So when init fails, we simply free the module.
>
> Now, you've changed that with your grab-and-check-later code. And you can't
> fix it for the !CONFIG_MODULE_UNLOAD, because there are no reference counts
> for that case.
I agree. And you're right, we'd have to move even more code from the
MODULE_UNLOAD case into !MODULE_UNLOAD to fix it. At which point I do
agree that if we want to keep !MODULE_UNLOD (and I do think we do), my
approach really doesn't work. Or we'd basically make !MODULE_UNLOAD a
pointless exercise where unloading is not allowed, but we still do
everything else the same way.
So I do agree that your later two-patch series is the way to go.
I would like to note that your original "fix things by dropping the lock"
patch that I hated so much had this exact bug too. Making this a good
example of _why_ it's basically always wrong to drop locks in the middle -
even if you claimed you knew and understood the locking.
So I do hope we can agree to call the module locking "total and utter
crap", ok?
And it really wasn't a personal complaint about your prowess. Crap locking
(or code in general) is crap, but calling the code crap shouldn't be
something you take personally. Especially not when there are various valid
historical reasons _why_ it is all crap.
And yes, my code was crap too. I wrote it for the MODULE_UNLOAD case, and
only added a comment saying the !unload case was broken, but didn't look
at just _how_ broken it was. My bad.
Linus
On Tue, 1 Jun 2010, Linus Torvalds wrote:
>
> So I do agree that your later two-patch series is the way to go.
Oh, btw, did we ever get that version tested by the only person who seems
to see the problem? Brandon?
I'd like to have a Tested-By: line too.
Linus
On 10:53 Tue 01 Jun 2010, Linus Torvalds wrote:
> On Tue, 1 Jun 2010, Linus Torvalds wrote:
> > So I do agree that your later two-patch series is the way to go.
>
> Oh, btw, did we ever get that version tested by the only person who seems
> to see the problem? Brandon?
Sorry for the delay. I broke the fuse on the remote power switch I was
using.
This is the subset of patches that everyone agreed, right?
git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6 modules
Otherwise what patches should I be applying to 2.6.35-rc1?
When I tested a Kernel with Rusty's modules branch pulled onto
2.6.35-rc1 I got duplicate sysfs path errors:
[ 8.985303] WARNING: at fs/sysfs/dir.c:451 sysfs_add_one+0xb2/0xd0()
[ 8.998161] Hardware name: Dinar
[ 9.004734] sysfs: cannot create duplicate filename '/module/libcrc32c'
[ 9.018068] Modules linked in: libcrc32c(+) button ohci_hcd
ehci_hcd usbcore sd_mod crc_t10dif edd ext3 mbcache jbd fan thermal
processor thermal_sys hwmon ide_pci_generic atiixp ide_core
ata_generic ahci libahci pata_atiixp libata scsi_mod
[ 9.062682] Pid: 1588, comm: modprobe Not tainted 2.6.35-rc1-0.7-default #3
[ 9.076711] Call Trace:
[ 9.081731] [<ffffffff8116a162>] ? sysfs_add_one+0xb2/0xd0
[ 9.092988] [<ffffffff8104f6aa>] warn_slowpath_common+0x7a/0xd0
[ 9.105115] [<ffffffff8104f7a1>] warn_slowpath_fmt+0x41/0x50
[ 9.116718] [<ffffffff8116a162>] sysfs_add_one+0xb2/0xd0
[ 9.127628] [<ffffffff8116ace5>] create_dir+0x75/0xc0
[ 9.138015] [<ffffffff8116adc6>] sysfs_create_dir+0x96/0xc0
[ 9.149447] [<ffffffff811c5297>] kobject_add_internal+0xe7/0x250
[ 9.161744] [<ffffffff811c5518>] kobject_add_varg+0x38/0x60
[ 9.173173] [<ffffffff811c5593>] kobject_init_and_add+0x53/0x70
[ 9.185295] [<ffffffff811c5160>] ? kset_find_obj+0x40/0x90
[ 9.196555] [<ffffffff8107f193>] mod_sysfs_init+0x83/0xf0
[ 9.207639] [<ffffffff810807ec>] load_module+0xbec/0x1e10
[ 9.218723] [<ffffffff810d7467>] ? handle_mm_fault+0x227/0xaf0
[ 9.230672] [<ffffffff810da47c>] ? vma_link+0xac/0x110
[ 9.241236] [<ffffffff81385168>] ? do_page_fault+0x228/0x410
[ 9.252839] [<ffffffff810dcc19>] ? do_mmap_pgoff+0x359/0x380
[ 9.264442] [<ffffffff81081a6a>] sys_init_module+0x5a/0x220
[ 9.275873] [<ffffffff81002ec2>] system_call_fastpath+0x16/0x1b
[ 9.287993] ---[ end trace 2a73785cc061234f ]---
[ 9.297350] kobject_add_internal failed for libcrc32c with -EEXIST,
don't try to register things with the same name in the same directory.
To fix this we need to make sure that we only have one copy of a module
going through load_module at a time. Patch suggestion follows which
boots without errors. I am sure there is a better way to do what it does
;)
Full dmesg:
http://ifup.org/~philips/rusty-module-branch.dmesg
Cheers,
Brandon
>From 35c6bb7c9a48b3ce53bc4b51871e6bed6e09c80a Mon Sep 17 00:00:00 2001
From: Brandon Philips <[email protected]>
Date: Wed, 2 Jun 2010 01:13:54 +0200
Subject: [PATCH] module: track module names currently loading
After "make locking more fine-grained" multiple instances of the same module
can reach mod_sysfs_init() and spew -EEXIST from kobject_add_internal().
Track the modules that are currently in load_module() but are not in the
modules list yet so we can kick out duplicate modules early.
NOTE: I didn't use mod->list to track the loading modules because I
couldn't figure out how to unwind from an error in a nice way.
Signed-off-by: Brandon Philips <[email protected]>
---
kernel/module.c | 45 +++++++++++++++++++++++++++++++++++++++++----
1 file changed, 41 insertions(+), 4 deletions(-)
Index: linux-2.6/kernel/module.c
===================================================================
--- linux-2.6.orig/kernel/module.c
+++ linux-2.6/kernel/module.c
@@ -77,6 +77,7 @@
DEFINE_MUTEX(module_mutex);
EXPORT_SYMBOL_GPL(module_mutex);
static LIST_HEAD(modules);
+static LIST_HEAD(modules_loading);
#ifdef CONFIG_KGDB_KDB
struct list_head *kdb_modules = &modules; /* kdb needs the list of modules */
#endif /* CONFIG_KGDB_KDB */
@@ -2055,6 +2056,23 @@ static inline void kmemleak_load_module(
}
#endif
+/* Track module names running through load_module but not in modules list */
+struct module_name {
+ struct list_head list;
+ char name[MODULE_NAME_LEN];
+};
+
+static struct module_name *find_module_loading(const char *name)
+{
+ struct module_name *mod_name;
+
+ list_for_each_entry(mod_name, &modules_loading, list) {
+ if (strcmp(mod_name->name, name) == 0)
+ return mod_name;
+ }
+ return NULL;
+}
+
/* Allocate and load the module: note that size of section 0 is always
zero, and we rely on this for optional sections. */
static noinline struct module *load_module(void __user *umod,
@@ -2074,6 +2092,7 @@ static noinline struct module *load_modu
void *ptr = NULL; /* Stops spurious gcc warning */
unsigned long symoffs, stroffs, *strmap;
void __percpu *percpu;
+ struct module_name *mod_name;
mm_segment_t old_fs;
@@ -2198,24 +2217,36 @@ static noinline struct module *load_modu
goto free_mod;
}
- if (find_module(mod->name)) {
- err = -EEXIST;
+ mod_name = kzalloc(sizeof(*mod_name), GFP_KERNEL);
+ if (!mod_name) {
+ err = -ENOMEM;
goto free_mod;
}
+ strcpy(mod_name->name, mod->name);
+ INIT_LIST_HEAD(&mod_name->list);
+
+ mutex_lock(&module_mutex);
+ if (find_module(mod->name) || find_module_loading(mod->name)) {
+ mutex_unlock(&module_mutex);
+ err = -EEXIST;
+ goto free_mod_name;
+ }
+ list_add(&mod_name->list, &modules_loading);
+ mutex_unlock(&module_mutex);
mod->state = MODULE_STATE_COMING;
/* Allow arches to frob section contents and sizes. */
err = module_frob_arch_sections(hdr, sechdrs, secstrings, mod);
if (err < 0)
- goto free_mod;
+ goto del_loading;
if (pcpuindex) {
/* We have a special allocation for this section. */
err = percpu_modalloc(mod, sechdrs[pcpuindex].sh_size,
sechdrs[pcpuindex].sh_addralign);
if (err)
- goto free_mod;
+ goto del_loading;
sechdrs[pcpuindex].sh_flags &= ~(unsigned long)SHF_ALLOC;
}
/* Keep this around for failure path. */
@@ -2486,6 +2517,8 @@ static noinline struct module *load_modu
* The mutex protects against concurrent writers.
*/
mutex_lock(&module_mutex);
+ list_del(&mod_name->list);
+ kfree(mod_name);
list_add_rcu(&mod->list, &modules);
mutex_unlock(&module_mutex);
@@ -2530,6 +2563,10 @@ static noinline struct module *load_modu
/* mod will be freed with core. Don't access it beyond this line! */
free_percpu:
free_percpu(percpu);
+ del_loading:
+ list_del(&mod_name->list);
+ free_mod_name:
+ kfree(mod_name);
free_mod:
kfree(args);
kfree(strmap);
On Tue, 1 Jun 2010, Brandon Philips wrote:
>
> When I tested a Kernel with Rusty's modules branch pulled onto
> 2.6.35-rc1 I got duplicate sysfs path errors:
Hmm. Yeah, the module_mutex used to be held across the whole "find -> add"
state, but isn't any more.
> To fix this we need to make sure that we only have one copy of a module
> going through load_module at a time. Patch suggestion follows which
> boots without errors. I am sure there is a better way to do what it does
> ;)
I think Rusty may have made the lock a bit _too_ finegrained there, and
didn't add it to some places that needed it. It looks, for example, like
PATCH 1/2 actually drops the lock in places where it's needed
("find_module()" is documented to need it, but now load_module() didn't
hold it at all when it did the find_module()).
Rather than adding a new "module_loading" list, I think we should be able
to just use the existing "modules" list, and just fix up the locking a
bit.
In fact, maybe we could just move the "look up existing module" a bit
later - optimistically assuming that the module doesn't exist, and then
just undoing the work if it turns out that we were wrong, just before
adding ourselves to the list.
A patch something like the appended (TOTALLY UNTESTED!)
Linus
---
kernel/module.c | 13 ++++++++-----
1 files changed, 8 insertions(+), 5 deletions(-)
diff --git a/kernel/module.c b/kernel/module.c
index a1f46a5..21f7ffa 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -2198,11 +2198,6 @@ static noinline struct module *load_module(void __user *umod,
goto free_mod;
}
- if (find_module(mod->name)) {
- err = -EEXIST;
- goto free_mod;
- }
-
mod->state = MODULE_STATE_COMING;
/* Allow arches to frob section contents and sizes. */
@@ -2486,6 +2481,13 @@ static noinline struct module *load_module(void __user *umod,
* The mutex protects against concurrent writers.
*/
mutex_lock(&module_mutex);
+
+ if (find_module(mod->name)) {
+ err = -EEXIST;
+ /* This will also unlock the mutex */
+ goto already_exists;
+ }
+
list_add_rcu(&mod->list, &modules);
mutex_unlock(&module_mutex);
@@ -2511,6 +2513,7 @@ static noinline struct module *load_module(void __user *umod,
mutex_lock(&module_mutex);
/* Unlink carefully: kallsyms could be walking list. */
list_del_rcu(&mod->list);
+ already_exists:
mutex_unlock(&module_mutex);
synchronize_sched();
module_arch_cleanup(mod);
On 16:51 Tue 01 Jun 2010, Linus Torvalds wrote:
> On Tue, 1 Jun 2010, Brandon Philips wrote:
> > When I tested a Kernel with Rusty's modules branch pulled onto
> > 2.6.35-rc1 I got duplicate sysfs path errors:
> Hmm. Yeah, the module_mutex used to be held across the whole "find -> add"
> state, but isn't any more.
Right.
> > To fix this we need to make sure that we only have one copy of a
> > module going through load_module at a time. Patch suggestion
> > follows which boots without errors. I am sure there is a better
> > way to do what it does ;)
>
> I think Rusty may have made the lock a bit _too_ finegrained there,
> and didn't add it to some places that needed it. It looks, for
> example, like PATCH 1/2 actually drops the lock in places where it's
> needed ("find_module()" is documented to need it, but now
> load_module() didn't hold it at all when it did the find_module()).
Right, I noticed that too and held the lock in the patch I sent.
> Rather than adding a new "module_loading" list, I think we should be
> able to just use the existing "modules" list, and just fix up the
> locking a bit.
>
> In fact, maybe we could just move the "look up existing module" a
> bit later - optimistically assuming that the module doesn't exist,
> and then just undoing the work if it turns out that we were wrong,
> just before adding ourselves to the list.
>
> A patch something like the appended (TOTALLY UNTESTED!)
FWIW, I tried this same idea initially and it breaks because the
kobject EEXIST is coming from mod_sysfs_init() which happens further
up in load_module() before the list_add_rcu().
I also tried the obvious variation of moving the list_add_rcu() up
to where the find_module is but got:
[ 5.495549] sd 0:0:1:0: [sda] 976773168 512-byte logical blocks: (500 GB/465 GiB)
[ 5.496931] ehci_hcd: Unknown symbol usb_hcd_resume_root_hub (err 0)
[ 5.497002] ehci_hcd: Unknown symbol usb_hcd_pci_probe (err 0)
[ 5.497070] ehci_hcd: Unknown symbol usb_hcd_unlink_urb_from_ep (err 0)
Feeling a bit like GoldiLocks I gave up and sent the modules_loading
patch to illustrate the issue. :)
I will keep working out all the interdependencies to see if I can get
something to boot without something like the modules_loading list.
Cheers,
Brandon
> diff --git a/kernel/module.c b/kernel/module.c
> index a1f46a5..21f7ffa 100644
> --- a/kernel/module.c
> +++ b/kernel/module.c
> @@ -2198,11 +2198,6 @@ static noinline struct module *load_module(void __user *umod,
> goto free_mod;
> }
>
> - if (find_module(mod->name)) {
> - err = -EEXIST;
> - goto free_mod;
> - }
> -
> mod->state = MODULE_STATE_COMING;
>
> /* Allow arches to frob section contents and sizes. */
> @@ -2486,6 +2481,13 @@ static noinline struct module *load_module(void __user *umod,
> * The mutex protects against concurrent writers.
> */
> mutex_lock(&module_mutex);
> +
> + if (find_module(mod->name)) {
> + err = -EEXIST;
> + /* This will also unlock the mutex */
> + goto already_exists;
> + }
> +
> list_add_rcu(&mod->list, &modules);
> mutex_unlock(&module_mutex);
>
> @@ -2511,6 +2513,7 @@ static noinline struct module *load_module(void __user *umod,
> mutex_lock(&module_mutex);
> /* Unlink carefully: kallsyms could be walking list. */
> list_del_rcu(&mod->list);
> + already_exists:
> mutex_unlock(&module_mutex);
> synchronize_sched();
> module_arch_cleanup(mod);
On Wed, 2 Jun 2010 11:40:29 am Brandon Philips wrote:
> On 16:51 Tue 01 Jun 2010, Linus Torvalds wrote:
> > On Tue, 1 Jun 2010, Brandon Philips wrote:
> > > When I tested a Kernel with Rusty's modules branch pulled onto
> > > 2.6.35-rc1 I got duplicate sysfs path errors:
> > Hmm. Yeah, the module_mutex used to be held across the whole "find -> add"
> > state, but isn't any more.
>
> Right.
>
> > > To fix this we need to make sure that we only have one copy of a
> > > module going through load_module at a time. Patch suggestion
> > > follows which boots without errors. I am sure there is a better
> > > way to do what it does ;)
> >
> > I think Rusty may have made the lock a bit _too_ finegrained there,
> > and didn't add it to some places that needed it. It looks, for
> > example, like PATCH 1/2 actually drops the lock in places where it's
> > needed ("find_module()" is documented to need it, but now
> > load_module() didn't hold it at all when it did the find_module()).
>
> Right, I noticed that too and held the lock in the patch I sent.
>
> > Rather than adding a new "module_loading" list, I think we should be
> > able to just use the existing "modules" list, and just fix up the
> > locking a bit.
> >
> > In fact, maybe we could just move the "look up existing module" a
> > bit later - optimistically assuming that the module doesn't exist,
> > and then just undoing the work if it turns out that we were wrong,
> > just before adding ourselves to the list.
> >
> > A patch something like the appended (TOTALLY UNTESTED!)
>
> FWIW, I tried this same idea initially and it breaks because the
> kobject EEXIST is coming from mod_sysfs_init() which happens further
> up in load_module() before the list_add_rcu().
I'll take a look. Linus was right that my lock reduction was overzealous.
The kobject error should be harmless.
I have a simplified version of your problem (see testing patch); I'll
see what I can fix...
Thanks!
Rusty.
On Wed, 2 Jun 2010 12:28:31 am Linus Torvalds wrote:
>
> On Tue, 1 Jun 2010, Rusty Russell wrote:
> >
> > > So explain why I should be more polite, or more terse?
> >
> > How about this:
> >
> > "I hate this patch. It makes already-ugly locking in module.c awful, and I
> > can't see that it's correct. Why not just reduce the locking to cover
> > the minimum needed?"
>
> Umm. Did you read the first few emails I sent out on this thread
> originally?
OK, this might be worthwhile. Take the very first mail you sent:
Hmm. That does seem to be buggy. We can't just drop and re-take the lock:
that may make sense internally as far as resolve_symbol() itself is
concerned, but the caller will its own local variables, and some of those
will no longer be valid if the lock was dropped.
The implication here is that that I don't know locking, and that this was
done without thought or care. In fact, I did worry about it and decided it
was less risky than a locking reduction given my limited cycles (indeed,
it has been slightly).
That commit also changes the return value semantics of "use_module()",
which is an exported interface where the only users seem to be
out-of-kernel (the only in-kernel use is in kernel/module.c itself). That
seems like a really really bad idea too.
Again with the implication that this was done without consideration. I keep
that exported as a courtesy to the ksplice team, but I never intended to let
that shackle internal changes even slightly. And I think it's wrong to even
start down that path (though I changed the name for you in the recent patches)
So I think reverting it is definitely the right thing to do. The commit
seems fundamentally broken. And having modules do request_module() in
their init functions has always been invalid anyway, so that excuse
doesn't really seem to be a reason to do anything crazy like this either.
The code is "fundamentally broken" and "crazy". The purpose of the patch
was "invalid anyway".
You managed to accuse me four times of being an incompetent and downright
crazy maintainer in your first three paragraphs of the first mail!
> They were actually _politer_ than your suggestion above (no
> crap mentioned), and did exactly what you ask for.
My example quote entirely lacked accusations of incompetence; the only
implications are:
(1) Hey, I might be wrong, maybe I just didn't spend enough time on it.
(2) Otherwise, here's what I'd do, you're smart enough to run with it if
it makes sense or explain why I'm barking up the wrong tree.
See Andrew Morton's correspondence for how his "maybe I'm dumb but" tone
is used to brutal effect...
> I would like to note that your original "fix things by dropping the lock"
> patch that I hated so much had this exact bug too. Making this a good
> example of _why_ it's basically always wrong to drop locks in the middle -
> even if you claimed you knew and understood the locking.
And I would like to note that it didn't :) It grabbed references only on
completed modules.
It still had that find_module race, but the kset error actually it. The
lock reduction patch needs that fixed.
> So I do hope we can agree to call the module locking "total and utter
> crap", ok?
OK. And I agree that I was too cautious here; I should have taken the time
to fix locking once and for all. That's much easier to do when there's a
Linus yelling at you about it.
> And yes, my code was crap too. I wrote it for the MODULE_UNLOAD case, and
> only added a comment saying the !unload case was broken, but didn't look
> at just _how_ broken it was. My bad.
Oh no, your code was broken for UNLOAD too.
And sloppy; you would have got a checklist reply email if you had mailed it
anonymously :)
Thanks for the consideration,
Rusty.
On Wed, 2 Jun 2010, Rusty Russell wrote:
>
> > I would like to note that your original "fix things by dropping the lock"
> > patch that I hated so much had this exact bug too. Making this a good
> > example of _why_ it's basically always wrong to drop locks in the middle -
> > even if you claimed you knew and understood the locking.
>
> And I would like to note that it didn't :) It grabbed references only on
> completed modules.
Right you are, I misread the patch. You re-did the whole module lookup,
something that the original code didn't do. And yes, that should be safe.
Linus
On Tue, 1 Jun 2010, Brandon Philips wrote:
>
> FWIW, I tried this same idea initially and it breaks because the
> kobject EEXIST is coming from mod_sysfs_init() which happens further
> up in load_module() before the list_add_rcu().
Ahh, right. So we'd need to mvoe that down too. As Rusty says, the kobject
EEXIST warning should be fairly harmless - albeit annoying. Do things
actually _work_ with that thing apart from the kobject warning?
> I also tried the obvious variation of moving the list_add_rcu() up
> to where the find_module is but got:
Yeah, I don't think moving it up will work, because then the module symbol
resolver will see itself before having set up the symbols. So I think we
need to expose it on the modules list only after having done the
"simplify_symbols()" etc.
I dunno.
Does this work?
(Caveat emptor - I tried to make sure the error cases nest correctly, and
it all compiles, but it's untested. As usual. Rusty mentioned a "see
testing patch", but I didn't see it. Maybe he did the same thing)
Linus
---
kernel/module.c | 35 +++++++++++++++++++----------------
1 files changed, 19 insertions(+), 16 deletions(-)
diff --git a/kernel/module.c b/kernel/module.c
index a1f46a5..247c0aa 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -2198,11 +2198,6 @@ static noinline struct module *load_module(void __user *umod,
goto free_mod;
}
- if (find_module(mod->name)) {
- err = -EEXIST;
- goto free_mod;
- }
-
mod->state = MODULE_STATE_COMING;
/* Allow arches to frob section contents and sizes. */
@@ -2293,11 +2288,6 @@ static noinline struct module *load_module(void __user *umod,
/* Now we've moved module, initialize linked lists, etc. */
module_unload_init(mod);
- /* add kobject, so we can reference it. */
- err = mod_sysfs_init(mod);
- if (err)
- goto free_unload;
-
/* Set up license info based on the info section */
set_license(mod, get_modinfo(sechdrs, infoindex, "license"));
@@ -2486,16 +2476,28 @@ static noinline struct module *load_module(void __user *umod,
* The mutex protects against concurrent writers.
*/
mutex_lock(&module_mutex);
+
+ if (find_module(mod->name)) {
+ err = -EEXIST;
+ /* This will also unlock the mutex */
+ goto already_exists;
+ }
+
list_add_rcu(&mod->list, &modules);
mutex_unlock(&module_mutex);
+ /* add kobject, so we can reference it. */
+ err = mod_sysfs_init(mod);
+ if (err)
+ goto unlink;
+
err = parse_args(mod->name, mod->args, mod->kp, mod->num_kp, NULL);
if (err < 0)
- goto unlink;
+ goto sysfs_uninit;
err = mod_sysfs_setup(mod, mod->kp, mod->num_kp);
if (err < 0)
- goto unlink;
+ goto sysfs_uninit;
add_sect_attrs(mod, hdr->e_shnum, secstrings, sechdrs);
add_notes_attrs(mod, hdr->e_shnum, secstrings, sechdrs);
@@ -2507,18 +2509,19 @@ static noinline struct module *load_module(void __user *umod,
/* Done! */
return mod;
+ sysfs_uninit:
+ free_modinfo(mod);
+ kobject_del(&mod->mkobj.kobj);
+ kobject_put(&mod->mkobj.kobj);
unlink:
mutex_lock(&module_mutex);
/* Unlink carefully: kallsyms could be walking list. */
list_del_rcu(&mod->list);
+ already_exists:
mutex_unlock(&module_mutex);
synchronize_sched();
module_arch_cleanup(mod);
cleanup:
- free_modinfo(mod);
- kobject_del(&mod->mkobj.kobj);
- kobject_put(&mod->mkobj.kobj);
- free_unload:
module_unload_free(mod);
#if defined(CONFIG_MODULE_UNLOAD)
free_percpu(mod->refptr);
On Tue, 1 Jun 2010, Linus Torvalds wrote:
>
> (Caveat emptor - I tried to make sure the error cases nest correctly, and
> it all compiles, but it's untested. As usual.
In fact, I didn't nest it right.
The "free_modinfo()" pairs with the "setup_modinfo()" call, and should go
into the "cleanup" error case, not the "sysfs_uninit" error case. IOW, I
moved one too many error case cleanup lines.
So in that patch, the "free_modinfo()" call should move back to the
cleanup case. Like the appended (still untested - I just stared at the
code some more, rather than do anything as mundane as _test_ it) patch.
It may still not be right, of course. But it might be closer.
(That function _really_ should be peeled like an onion, and split into
many smaller functions, so that we don't have ten error cases needing
unwinding. I like "goto error", but at some point you can't see the
unwinding any more, and that function has passed that point a long time
ago, I think)
Linus
---
kernel/module.c | 33 ++++++++++++++++++---------------
1 files changed, 18 insertions(+), 15 deletions(-)
diff --git a/kernel/module.c b/kernel/module.c
index a1f46a5..135577c 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -2198,11 +2198,6 @@ static noinline struct module *load_module(void __user *umod,
goto free_mod;
}
- if (find_module(mod->name)) {
- err = -EEXIST;
- goto free_mod;
- }
-
mod->state = MODULE_STATE_COMING;
/* Allow arches to frob section contents and sizes. */
@@ -2293,11 +2288,6 @@ static noinline struct module *load_module(void __user *umod,
/* Now we've moved module, initialize linked lists, etc. */
module_unload_init(mod);
- /* add kobject, so we can reference it. */
- err = mod_sysfs_init(mod);
- if (err)
- goto free_unload;
-
/* Set up license info based on the info section */
set_license(mod, get_modinfo(sechdrs, infoindex, "license"));
@@ -2486,16 +2476,28 @@ static noinline struct module *load_module(void __user *umod,
* The mutex protects against concurrent writers.
*/
mutex_lock(&module_mutex);
+
+ if (find_module(mod->name)) {
+ err = -EEXIST;
+ /* This will also unlock the mutex */
+ goto already_exists;
+ }
+
list_add_rcu(&mod->list, &modules);
mutex_unlock(&module_mutex);
+ /* add kobject, so we can reference it. */
+ err = mod_sysfs_init(mod);
+ if (err)
+ goto unlink;
+
err = parse_args(mod->name, mod->args, mod->kp, mod->num_kp, NULL);
if (err < 0)
- goto unlink;
+ goto sysfs_uninit;
err = mod_sysfs_setup(mod, mod->kp, mod->num_kp);
if (err < 0)
- goto unlink;
+ goto sysfs_uninit;
add_sect_attrs(mod, hdr->e_shnum, secstrings, sechdrs);
add_notes_attrs(mod, hdr->e_shnum, secstrings, sechdrs);
@@ -2507,18 +2509,19 @@ static noinline struct module *load_module(void __user *umod,
/* Done! */
return mod;
+ sysfs_uninit:
+ kobject_del(&mod->mkobj.kobj);
+ kobject_put(&mod->mkobj.kobj);
unlink:
mutex_lock(&module_mutex);
/* Unlink carefully: kallsyms could be walking list. */
list_del_rcu(&mod->list);
+ already_exists:
mutex_unlock(&module_mutex);
synchronize_sched();
module_arch_cleanup(mod);
cleanup:
free_modinfo(mod);
- kobject_del(&mod->mkobj.kobj);
- kobject_put(&mod->mkobj.kobj);
- free_unload:
module_unload_free(mod);
#if defined(CONFIG_MODULE_UNLOAD)
free_percpu(mod->refptr);
On Wed, 2 Jun 2010, Rusty Russell wrote:
>
> OK, this might be worthwhile. Take the very first mail you sent:
>
> Hmm. That does seem to be buggy. We can't just drop and re-take the lock:
> that may make sense internally as far as resolve_symbol() itself is
> concerned, but the caller will its own local variables, and some of those
> will no longer be valid if the lock was dropped.
>
> The implication here is that that I don't know locking, and that this was
> done without thought or care. In fact, I did worry about it and decided it
> was less risky than a locking reduction given my limited cycles (indeed,
> it has been slightly).
Well, part of the context here is that the commit had been bisected as
being buggy. It turns out the bug was a different issue, but at the same
time, it very much looked like the locking was simply known a-priori to be
buggy. No?
And I do agree with the notion that it might have been a "simpler hack for
a quick fix", and potentially less risky (due to being more targeted to
the particular problem), when it then causes oopses, the default
explanation is that the dubious locking trick really was broken. No?
> That commit also changes the return value semantics of "use_module()",
> which is an exported interface where the only users seem to be
> out-of-kernel (the only in-kernel use is in kernel/module.c itself). That
> seems like a really really bad idea too.
>
> Again with the implication that this was done without consideration.
No. The implication was that it's wrong to do and a bad idea. Anything
else is just you reading things into it.
It's an exported interface, and you changed it with no real reason.
Whether you then talked to users or not is immaterial.
You could easily have just changed the _internal_ thing, and exported the
unchanged interface.
Even your second version is just very confused. It renames it, but then
exports the renamed version. What does that help? It still means that the
external module - for no good reason - needs to have basically a
source-level version number check.
Why?
It's still unclear to me. No reason for that exported interface change
seems to exist.
Linus
On Wed, 2 Jun 2010 02:26:52 pm Linus Torvalds wrote:
>
> On Wed, 2 Jun 2010, Rusty Russell wrote:
> >
> > OK, this might be worthwhile. Take the very first mail you sent:
> >
> > Hmm. That does seem to be buggy. We can't just drop and re-take the lock:
> > that may make sense internally as far as resolve_symbol() itself is
> > concerned, but the caller will its own local variables, and some of those
> > will no longer be valid if the lock was dropped.
> >
> > The implication here is that that I don't know locking, and that this was
> > done without thought or care. In fact, I did worry about it and decided it
> > was less risky than a locking reduction given my limited cycles (indeed,
> > it has been slightly).
>
> Well, part of the context here is that the commit had been bisected as
> being buggy. It turns out the bug was a different issue, but at the same
> time, it very much looked like the locking was simply known a-priori to be
> buggy. No?
Hmm, you're still missing it. Let me try again.
You weren't the best person to make the call. That didn't occur to you, did
it?
Subsystem maintainers aren't just patch monkeys, we collect all the metadata
which informs these decisions. Without that, you wandered in, came to the
obvious conclusion, and were wrong.
Really grating is the arrogant lack of respect for that knowledge combined
with your multiple mistakes since then due to your lack of it. See below.
> > That commit also changes the return value semantics of "use_module()",
> > which is an exported interface where the only users seem to be
> > out-of-kernel (the only in-kernel use is in kernel/module.c itself). That
> > seems like a really really bad idea too.
> >
> > Again with the implication that this was done without consideration.
>
> No. The implication was that it's wrong to do and a bad idea. Anything
> else is just you reading things into it.
>
> It's an exported interface, and you changed it with no real reason.
> Whether you then talked to users or not is immaterial.
>
> You could easily have just changed the _internal_ thing, and exported the
> unchanged interface.
I *genuinely* thought our policy with out-of-tree code was to smother
out-of-tree code with absolute apathy so they'll get sick of riding the
churn.
Still, easy to avoid in this case.
> Even your second version is just very confused. It renames it, but then
> exports the renamed version. What does that help?
This *almost* looks like an honest "hey, I don't understand this".
But no, "very confused" already assumes you're right. You do ask if it
helps...
> It still means that the external module - for no good reason - needs to
> have basically a source-level version number check.
...and go on to assume it doesn't. I must be really confused, right?
Or maybe, I know something you don't about the code I maintain.
We have infrastructure to look up symbols dynamically: symbol_get().
Sure, they have to adapt their code, but now they can maintain their own
damn wrapper, *and* they'll discover the change as soon as they try to
build their module against the new kernel.
I'd go so far as to say it's clever...
Rusty.
On Wed, 2 Jun 2010 02:05:10 pm Linus Torvalds wrote:
>
> On Tue, 1 Jun 2010, Brandon Philips wrote:
> >
> > FWIW, I tried this same idea initially and it breaks because the
> > kobject EEXIST is coming from mod_sysfs_init() which happens further
> > up in load_module() before the list_add_rcu().
>
> Ahh, right. So we'd need to mvoe that down too. As Rusty says, the kobject
> EEXIST warning should be fairly harmless - albeit annoying. Do things
> actually _work_ with that thing apart from the kobject warning?
>
> > I also tried the obvious variation of moving the list_add_rcu() up
> > to where the find_module is but got:
>
> Yeah, I don't think moving it up will work, because then the module symbol
> resolver will see itself before having set up the symbols. So I think we
> need to expose it on the modules list only after having done the
> "simplify_symbols()" etc.
>
> I dunno.
>
> Does this work?
>
> (Caveat emptor - I tried to make sure the error cases nest correctly, and
> it all compiles, but it's untested. As usual. Rusty mentioned a "see
> testing patch", but I didn't see it. Maybe he did the same thing)
Yep, oops. I am testing a similar thing, but a bit more of a cleanup
on the way.
Moved all the sysfs-exposing stuff to the end just after we put in the
list (and thus to after the find check). In turn spawns a sysfs cleanup
(those funcs exposed for no good reason). Also, your two-way deps patch
really wins since we need to walk the list to create the links.
Code soon...
Rusty
On Wed, 2 Jun 2010 02:14:37 pm Linus Torvalds wrote:
> (That function _really_ should be peeled like an onion, and split into
> many smaller functions, so that we don't have ten error cases needing
> unwinding. I like "goto error", but at some point you can't see the
> unwinding any more, and that function has passed that point a long time
> ago, I think)
Agreed. Feel free to take a stab at it if you're bored. Last I tried,
there wasn't an obvious split point which actually reduced the size of the
function.
And here's the tree I'm testing now (it compiles, currently rebasing to
dodge some unrelated -rc1 fs/block weirdness):
The following changes since commit 67a3e12b05e055c0415c556a315a3d3eb637e29e:
Linus Torvalds (1):
Linux 2.6.35-rc1
are available in the git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6 modules
Eric Dumazet (1):
module: module_unload_init() cleanup
Linus Torvalds (2):
module: Make the 'usage' lists be two-way
module: move find_module check to end
Rusty Russell (6):
module: fix reference to mod->percpu after freeing module.
module: fix kdb's illicit use of struct module_use.
module: move sysfs exposure to end of load_module
module: Make module sysfs functions private.
module: make locking more fine-grained.
module: fix bne2 "gave up waiting for init of module libcrc32c"
include/linux/module.h | 44 ++-----
kernel/debug/kdb/kdb_main.c | 12 +--
kernel/module.c | 325 ++++++++++++++++++++++++++++---------------
3 files changed, 226 insertions(+), 155 deletions(-)
On Wed, 2 Jun 2010, Rusty Russell wrote:
>
> Hmm, you're still missing it. Let me try again.
>
> You weren't the best person to make the call. That didn't occur to you, did
> it?
Sure, fair enough.
At the same time, I'm the one who has to make the call. And quite often,
that call is "this causes more problems than it fixes, we need to revert
it".
Maybe I was too eager to revert this time. Quite often it ends up the
other way, where we end up having broken kernels for too long because
people weren't eager _enough_ to revert commits that had been bisected to
be troublesome. It's hard to tell beforehand.
And quite often, subsystem maintainers are _way_ too eager to not revert
the commits they wrote. So I really do end up having to balance that
force.
Does it always work out? Nope.
Linus
On Wed, 2 Jun 2010, Rusty Russell wrote:
>
> Moved all the sysfs-exposing stuff to the end just after we put in the
> list (and thus to after the find check).
Yeah, makes more sense that way. No reason to expose anything to sysfs
early. And splitting it into two patches makes it easier to follow than
the patch I posted. Ack.
Linus
On Wed, 2 Jun 2010, Rusty Russell wrote:
>
> Agreed. Feel free to take a stab at it if you're bored. Last I tried,
> there wasn't an obvious split point which actually reduced the size of the
> function.
I'd start from the trivial stuff. There's a fair amount of straight-line
code that just makes the function hard to read just because you have to
page up and down so far. Some of it is trivial to just create a helper
function for.
IOW, things like this.. Pure code movement to peel off some stuff.
No, this patch on its own doesn't really make anything easier to read, but
a handful of these kinds of things might bring down what is currently an
almost 500-line function (yeah, really) to the point where it could
potentially be just fifty lines (most of which is just calling helper
functions, and checking an error case).
At that point, it should be possible to see it in one (biggish) window,
which would make it a _lot_ easier to follow the cleanup cases.
Does this make the function smaller in any _absolute_ sense? No. The patch
has 6 added lines, because the whole function declaration, braces, empty
lines, call site. And the code is obviously not going to be smaller. It
would just be a bit more easy to get an overview of.
Worth it? I dunno. But currently that 'load_module()' thing does end up
being the function from hell. Trying to figure out the nesting of the
error cases was a matter of paging up-and-down and trying to remember what
particular 'goto' target I was looking for. It _should_ be possible to do
better.
Linus
---
kernel/module.c | 124 +++++++++++++++++++++++++++++--------------------------
1 files changed, 65 insertions(+), 59 deletions(-)
diff --git a/kernel/module.c b/kernel/module.c
index d4a55f1..165d97e 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -2106,6 +2106,70 @@ static inline void kmemleak_load_module(struct module *mod, Elf_Ehdr *hdr,
}
#endif
+static void find_module_sections(struct module *mod, Elf_Ehdr *hdr, Elf_Shdr *sechdrs, char *secstrings)
+{
+ mod->kp = section_objs(hdr, sechdrs, secstrings, "__param",
+ sizeof(*mod->kp), &mod->num_kp);
+ mod->syms = section_objs(hdr, sechdrs, secstrings, "__ksymtab",
+ sizeof(*mod->syms), &mod->num_syms);
+ mod->crcs = section_addr(hdr, sechdrs, secstrings, "__kcrctab");
+ mod->gpl_syms = section_objs(hdr, sechdrs, secstrings, "__ksymtab_gpl",
+ sizeof(*mod->gpl_syms),
+ &mod->num_gpl_syms);
+ mod->gpl_crcs = section_addr(hdr, sechdrs, secstrings, "__kcrctab_gpl");
+ mod->gpl_future_syms = section_objs(hdr, sechdrs, secstrings,
+ "__ksymtab_gpl_future",
+ sizeof(*mod->gpl_future_syms),
+ &mod->num_gpl_future_syms);
+ mod->gpl_future_crcs = section_addr(hdr, sechdrs, secstrings,
+ "__kcrctab_gpl_future");
+
+#ifdef CONFIG_UNUSED_SYMBOLS
+ mod->unused_syms = section_objs(hdr, sechdrs, secstrings,
+ "__ksymtab_unused",
+ sizeof(*mod->unused_syms),
+ &mod->num_unused_syms);
+ mod->unused_crcs = section_addr(hdr, sechdrs, secstrings,
+ "__kcrctab_unused");
+ mod->unused_gpl_syms = section_objs(hdr, sechdrs, secstrings,
+ "__ksymtab_unused_gpl",
+ sizeof(*mod->unused_gpl_syms),
+ &mod->num_unused_gpl_syms);
+ mod->unused_gpl_crcs = section_addr(hdr, sechdrs, secstrings,
+ "__kcrctab_unused_gpl");
+#endif
+#ifdef CONFIG_CONSTRUCTORS
+ mod->ctors = section_objs(hdr, sechdrs, secstrings, ".ctors",
+ sizeof(*mod->ctors), &mod->num_ctors);
+#endif
+
+#ifdef CONFIG_TRACEPOINTS
+ mod->tracepoints = section_objs(hdr, sechdrs, secstrings,
+ "__tracepoints",
+ sizeof(*mod->tracepoints),
+ &mod->num_tracepoints);
+#endif
+#ifdef CONFIG_EVENT_TRACING
+ mod->trace_events = section_objs(hdr, sechdrs, secstrings,
+ "_ftrace_events",
+ sizeof(*mod->trace_events),
+ &mod->num_trace_events);
+ /*
+ * This section contains pointers to allocated objects in the trace
+ * code and not scanning it leads to false positives.
+ */
+ kmemleak_scan_area(mod->trace_events, sizeof(*mod->trace_events) *
+ mod->num_trace_events, GFP_KERNEL);
+#endif
+#ifdef CONFIG_FTRACE_MCOUNT_RECORD
+ /* sechdrs[0].sh_size is always zero */
+ mod->ftrace_callsites = section_objs(hdr, sechdrs, secstrings,
+ "__mcount_loc",
+ sizeof(*mod->ftrace_callsites),
+ &mod->num_ftrace_callsites);
+#endif
+}
+
/* Allocate and load the module: note that size of section 0 is always
zero, and we rely on this for optional sections. */
static noinline struct module *load_module(void __user *umod,
@@ -2365,66 +2429,8 @@ static noinline struct module *load_module(void __user *umod,
/* Now we've got everything in the final locations, we can
* find optional sections. */
- mod->kp = section_objs(hdr, sechdrs, secstrings, "__param",
- sizeof(*mod->kp), &mod->num_kp);
- mod->syms = section_objs(hdr, sechdrs, secstrings, "__ksymtab",
- sizeof(*mod->syms), &mod->num_syms);
- mod->crcs = section_addr(hdr, sechdrs, secstrings, "__kcrctab");
- mod->gpl_syms = section_objs(hdr, sechdrs, secstrings, "__ksymtab_gpl",
- sizeof(*mod->gpl_syms),
- &mod->num_gpl_syms);
- mod->gpl_crcs = section_addr(hdr, sechdrs, secstrings, "__kcrctab_gpl");
- mod->gpl_future_syms = section_objs(hdr, sechdrs, secstrings,
- "__ksymtab_gpl_future",
- sizeof(*mod->gpl_future_syms),
- &mod->num_gpl_future_syms);
- mod->gpl_future_crcs = section_addr(hdr, sechdrs, secstrings,
- "__kcrctab_gpl_future");
+ find_module_sections(mod, hdr, sechdrs, secstrings);
-#ifdef CONFIG_UNUSED_SYMBOLS
- mod->unused_syms = section_objs(hdr, sechdrs, secstrings,
- "__ksymtab_unused",
- sizeof(*mod->unused_syms),
- &mod->num_unused_syms);
- mod->unused_crcs = section_addr(hdr, sechdrs, secstrings,
- "__kcrctab_unused");
- mod->unused_gpl_syms = section_objs(hdr, sechdrs, secstrings,
- "__ksymtab_unused_gpl",
- sizeof(*mod->unused_gpl_syms),
- &mod->num_unused_gpl_syms);
- mod->unused_gpl_crcs = section_addr(hdr, sechdrs, secstrings,
- "__kcrctab_unused_gpl");
-#endif
-#ifdef CONFIG_CONSTRUCTORS
- mod->ctors = section_objs(hdr, sechdrs, secstrings, ".ctors",
- sizeof(*mod->ctors), &mod->num_ctors);
-#endif
-
-#ifdef CONFIG_TRACEPOINTS
- mod->tracepoints = section_objs(hdr, sechdrs, secstrings,
- "__tracepoints",
- sizeof(*mod->tracepoints),
- &mod->num_tracepoints);
-#endif
-#ifdef CONFIG_EVENT_TRACING
- mod->trace_events = section_objs(hdr, sechdrs, secstrings,
- "_ftrace_events",
- sizeof(*mod->trace_events),
- &mod->num_trace_events);
- /*
- * This section contains pointers to allocated objects in the trace
- * code and not scanning it leads to false positives.
- */
- kmemleak_scan_area(mod->trace_events, sizeof(*mod->trace_events) *
- mod->num_trace_events, GFP_KERNEL);
-#endif
-#ifdef CONFIG_FTRACE_MCOUNT_RECORD
- /* sechdrs[0].sh_size is always zero */
- mod->ftrace_callsites = section_objs(hdr, sechdrs, secstrings,
- "__mcount_loc",
- sizeof(*mod->ftrace_callsites),
- &mod->num_ftrace_callsites);
-#endif
#ifdef CONFIG_MODVERSIONS
if ((mod->num_syms && !mod->crcs)
|| (mod->num_gpl_syms && !mod->gpl_crcs)
On Wed, 2 Jun 2010, Linus Torvalds wrote:
>
> IOW, things like this.. Pure code movement to peel off some stuff.
Here's a second one. It's slightly less trivial - since we now have error
cases - and equally untested so it may well be totally broken. But it also
cleans up a bit more, and avoids one of the goto targets, because the
"move_module()" helper now does both allocations or none.
But now I'm bored and tired, so I think this will be all for tonight. The
load_module function has gone from ~490 lines to ~370 lines, but that's
still so many that I don't honestly think this makes any readability
improvement yet.
But add a few more helper functions (say, one for doing relocations, one
for doing some of the verification, one for doing the license and
tainting, one to trivially push the icache flush into its own helper etc),
and it probably will never fit in 50 lines, but maybe it could be 150.
Linus
---
kernel/module.c | 119 ++++++++++++++++++++++++++++++-------------------------
1 files changed, 65 insertions(+), 54 deletions(-)
diff --git a/kernel/module.c b/kernel/module.c
index 165d97e..72567e6 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -2170,6 +2170,67 @@ static void find_module_sections(struct module *mod, Elf_Ehdr *hdr, Elf_Shdr *se
#endif
}
+static struct module *move_module(struct module *mod, Elf_Ehdr *hdr, Elf_Shdr *sechdrs, char *secstrings, unsigned modindex)
+{
+ int i;
+ void *ptr;
+
+ /* Do the allocs. */
+ ptr = module_alloc_update_bounds(mod->core_size);
+ /*
+ * The pointer to this block is stored in the module structure
+ * which is inside the block. Just mark it as not being a
+ * leak.
+ */
+ kmemleak_not_leak(ptr);
+ if (!ptr)
+ return ERR_PTR(-ENOMEM);
+
+ memset(ptr, 0, mod->core_size);
+ mod->module_core = ptr;
+
+ ptr = module_alloc_update_bounds(mod->init_size);
+ /*
+ * The pointer to this block is stored in the module structure
+ * which is inside the block. This block doesn't need to be
+ * scanned as it contains data and code that will be freed
+ * after the module is initialized.
+ */
+ kmemleak_ignore(ptr);
+ if (!ptr && mod->init_size) {
+ module_free(mod, mod->module_core);
+ return ERR_PTR(-ENOMEM);
+ }
+ memset(ptr, 0, mod->init_size);
+ mod->module_init = ptr;
+
+ /* Transfer each section which specifies SHF_ALLOC */
+ DEBUGP("final section addresses:\n");
+ for (i = 0; i < hdr->e_shnum; i++) {
+ void *dest;
+
+ if (!(sechdrs[i].sh_flags & SHF_ALLOC))
+ continue;
+
+ if (sechdrs[i].sh_entsize & INIT_OFFSET_MASK)
+ dest = mod->module_init
+ + (sechdrs[i].sh_entsize & ~INIT_OFFSET_MASK);
+ else
+ dest = mod->module_core + sechdrs[i].sh_entsize;
+
+ if (sechdrs[i].sh_type != SHT_NOBITS)
+ memcpy(dest, (void *)sechdrs[i].sh_addr,
+ sechdrs[i].sh_size);
+ /* Update sh_addr to point to copy in image. */
+ sechdrs[i].sh_addr = (unsigned long)dest;
+ DEBUGP("\t0x%lx %s\n", sechdrs[i].sh_addr, secstrings + sechdrs[i].sh_name);
+ }
+ /* Module has been moved. */
+ mod = (void *)sechdrs[modindex].sh_addr;
+ kmemleak_load_module(mod, hdr, sechdrs, secstrings);
+ return mod;
+}
+
/* Allocate and load the module: note that size of section 0 is always
zero, and we rely on this for optional sections. */
static noinline struct module *load_module(void __user *umod,
@@ -2186,7 +2247,6 @@ static noinline struct module *load_module(void __user *umod,
unsigned int modindex, versindex, infoindex, pcpuindex;
struct module *mod;
long err = 0;
- void *ptr = NULL; /* Stops spurious gcc warning */
unsigned long symoffs, stroffs, *strmap;
void __percpu *percpu;
@@ -2338,60 +2398,12 @@ static noinline struct module *load_module(void __user *umod,
symoffs = layout_symtab(mod, sechdrs, symindex, strindex, hdr,
secstrings, &stroffs, strmap);
- /* Do the allocs. */
- ptr = module_alloc_update_bounds(mod->core_size);
- /*
- * The pointer to this block is stored in the module structure
- * which is inside the block. Just mark it as not being a
- * leak.
- */
- kmemleak_not_leak(ptr);
- if (!ptr) {
- err = -ENOMEM;
+ /* Allocate and move to the final place */
+ mod = move_module(mod, hdr, sechdrs, secstrings, modindex);
+ if (IS_ERR(mod)) {
+ err = PTR_ERR(mod);
goto free_percpu;
}
- memset(ptr, 0, mod->core_size);
- mod->module_core = ptr;
-
- ptr = module_alloc_update_bounds(mod->init_size);
- /*
- * The pointer to this block is stored in the module structure
- * which is inside the block. This block doesn't need to be
- * scanned as it contains data and code that will be freed
- * after the module is initialized.
- */
- kmemleak_ignore(ptr);
- if (!ptr && mod->init_size) {
- err = -ENOMEM;
- goto free_core;
- }
- memset(ptr, 0, mod->init_size);
- mod->module_init = ptr;
-
- /* Transfer each section which specifies SHF_ALLOC */
- DEBUGP("final section addresses:\n");
- for (i = 0; i < hdr->e_shnum; i++) {
- void *dest;
-
- if (!(sechdrs[i].sh_flags & SHF_ALLOC))
- continue;
-
- if (sechdrs[i].sh_entsize & INIT_OFFSET_MASK)
- dest = mod->module_init
- + (sechdrs[i].sh_entsize & ~INIT_OFFSET_MASK);
- else
- dest = mod->module_core + sechdrs[i].sh_entsize;
-
- if (sechdrs[i].sh_type != SHT_NOBITS)
- memcpy(dest, (void *)sechdrs[i].sh_addr,
- sechdrs[i].sh_size);
- /* Update sh_addr to point to copy in image. */
- sechdrs[i].sh_addr = (unsigned long)dest;
- DEBUGP("\t0x%lx %s\n", sechdrs[i].sh_addr, secstrings + sechdrs[i].sh_name);
- }
- /* Module has been moved. */
- mod = (void *)sechdrs[modindex].sh_addr;
- kmemleak_load_module(mod, hdr, sechdrs, secstrings);
#if defined(CONFIG_MODULE_UNLOAD)
mod->refptr = alloc_percpu(struct module_ref);
@@ -2577,7 +2589,6 @@ static noinline struct module *load_module(void __user *umod,
free_init:
#endif
module_free(mod, mod->module_init);
- free_core:
module_free(mod, mod->module_core);
/* mod will be freed with core. Don't access it beyond this line! */
free_percpu:
On Wed, 2 Jun 2010 05:42:32 pm Linus Torvalds wrote:
>
> On Wed, 2 Jun 2010, Linus Torvalds wrote:
> >
> > IOW, things like this.. Pure code movement to peel off some stuff.
>
> Here's a second one. It's slightly less trivial - since we now have error
> cases - and equally untested so it may well be totally broken. But it also
> cleans up a bit more, and avoids one of the goto targets, because the
> "move_module()" helper now does both allocations or none.
>
> But now I'm bored and tired, so I think this will be all for tonight. The
> load_module function has gone from ~490 lines to ~370 lines, but that's
> still so many that I don't honestly think this makes any readability
> improvement yet.
Both applied and I'll play a little more later...
Thanks!
Rusty.
On Wed, 2 Jun 2010 04:51:46 pm Linus Torvalds wrote:
>
> On Wed, 2 Jun 2010, Rusty Russell wrote:
> >
> > Moved all the sysfs-exposing stuff to the end just after we put in the
> > list (and thus to after the find check).
>
> Yeah, makes more sense that way. No reason to expose anything to sysfs
> early. And splitting it into two patches makes it easier to follow than
> the patch I posted. Ack.
Found another locking issue: the code which verifies we don't export over
an existing symbol needs to be atomic wrt. adding us to the list.
This will be in tomorrow's linux-next.
And load_module is down to 259 lines. The label chain at the end is no
shorter tho :( I'll leave those cleanups until next merge window.
The following changes since commit aef4b9aaae1decc775778903922bd0075cce7a88:
Linus Torvalds (1):
Merge branch 'next' of git://git.kernel.org/.../benh/powerpc
are available in the git repository at:
ssh://master.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6 modules
Eric Dumazet (1):
module: module_unload_init() cleanup
Linus Torvalds (4):
module: Make the 'usage' lists be two-way
module: move find_module check to end
module: refactor load_module
module: refactor load_module part 2
Rusty Russell (9):
module: fix kdb's illicit use of struct module_use.
module: move sysfs exposure to end of load_module
module: Make module sysfs functions private.
module: make locking more fine-grained.
module: verify_export_symbols under the lock
module: fix bne2 "gave up waiting for init of module libcrc32c"
module: refactor load_module part 3
module: refactor load_module part 4
module: refactor load_module part 5
include/linux/module.h | 44 +--
kernel/debug/kdb/kdb_main.c | 12 +-
kernel/module.c | 899 +++++++++++++++++++++++++------------------
3 files changed, 547 insertions(+), 408 deletions(-)
On Wed, 2 Jun 2010, Rusty Russell wrote:
>
> Found another locking issue: the code which verifies we don't export over
> an existing symbol needs to be atomic wrt. adding us to the list.
Yup.
And now that I'm looking at that call-chain (to see if it would make sense
to use some other more specific lock - doesn't look like it: all the
readers are using RCU and this is the only writer), I also give you this
trivial one-liner. It changes each_symbol() to not put that constant array
on the stack, resulting in changing
movq $C.388.31095, %rsi #, tmp85
subq $376, %rsp #,
movq %rdi, %rbx # fn, fn
leaq -208(%rbp), %rdi #, tmp84
movq %rbx, %rdx # fn,
rep movsl
xorl %esi, %esi #
leaq -208(%rbp), %rdi #, tmp87
movq %r12, %rcx # data,
call each_symbol_in_section.clone.0 #
into
xorl %esi, %esi #
subq $216, %rsp #,
movq %rdi, %rbx # fn, fn
movq $arr.31078, %rdi #,
call each_symbol_in_section.clone.0 #
which is not so much about being obviously shorter and simpler because we
don't unnecessarily copy that constant array around onto the stack, but
also about having a much smaller stack footprint (376 vs 216 bytes - see
the update of 'rsp').
Signed-off-by: Linus Torvalds <[email protected]>
---
Btw, feel free to consider all those peeling patches signed-off-on, since
they seem to work.
> And load_module is down to 259 lines. The label chain at the end is no
> shorter tho :( I'll leave those cleanups until next merge window.
I don't think it's necessarily bad to have multiple exit targets (although
9 is pushing it), if the thing is short enough that you can fairly easily
see how they work. Not that it's quite there yet.
Linus
---
kernel/module.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/kernel/module.c b/kernel/module.c
index 01e8874..2602264 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -227,7 +227,7 @@ bool each_symbol(bool (*fn)(const struct symsearch *arr, struct module *owner,
unsigned int symnum, void *data), void *data)
{
struct module *mod;
- const struct symsearch arr[] = {
+ static const struct symsearch arr[] = {
{ __start___ksymtab, __stop___ksymtab, __start___kcrctab,
NOT_GPL_ONLY, false },
{ __start___ksymtab_gpl, __stop___ksymtab_gpl,
On 23:36 Wed 02 Jun 2010, Rusty Russell wrote:
> On Wed, 2 Jun 2010 04:51:46 pm Linus Torvalds wrote:
> > On Wed, 2 Jun 2010, Rusty Russell wrote:
> > >
> > > Moved all the sysfs-exposing stuff to the end just after we put in the
> > > list (and thus to after the find check).
> >
> > Yeah, makes more sense that way. No reason to expose anything to sysfs
> > early. And splitting it into two patches makes it easier to follow than
> > the patch I posted. Ack.
>
> Found another locking issue: the code which verifies we don't export over
> an existing symbol needs to be atomic wrt. adding us to the list.
>
...
> are available in the git repository at:
>
> ssh://master.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6 modules
I tested this on the machine with the bnx2/libcrc32c issue and it
works. Feel free to add Tested-by: Brandon Philips <[email protected]>
Thanks,
Brandon
On Wed, 2 Jun 2010, Rusty Russell wrote:
>
> And load_module is down to 259 lines. The label chain at the end is no
> shorter tho :( I'll leave those cleanups until next merge window.
Btw, here's a patch that _looks_ large, but it really pretty trivial, and
sets things up so that it would be way easier to split off pieces of the
module loading.
The reason it looks large is that it creates a "module_info" structure
that contains all the module state that we're building up while loading,
instead of having individual variables for all the indices etc.
So the patch ends up being large, because every "symindex" access instead
becomes "info.index.sym" etc. That may be a few characters longer, but it
then means that we can just pass a pointer to that "info" structure
around. and let all the pieces fill it in very naturally.
As an example of that, the patch also moves the initialization of all
those convenience variables into a "setup_module_info()" function. And at
this point it really does become very natural to start to peel off some of
the error labels and move them into the helper functions - now the
"truncated" case is gone, and is handled inside that setup function
instead.
So maybe you don't like this approach, and it does make the variable
accesses a bit longer, but I don't think unreadably so. And the patch
really does look big and scary, but there really should be absolutely no
semantic changes - most of it was a trivial and mindless rename.
In fact, it was so mindless that I on purpose kept the existing helper
functions looking like this:
- err = check_modinfo(mod, sechdrs, infoindex, versindex);
+ err = check_modinfo(mod, info.sechdrs, info.index.info, info.index.vers);
rather than changing them to just take the "info" pointer. IOW, a second
phase (if you think the approach is ok) would change that calling
convention to just do
err = check_modinfo(mod, &info);
(and same for "layout_sections()", "layout_symtabs()" etc.) Similarly,
while right now it makes things _look_ bigger, with things like this:
versindex = find_sec(hdr, sechdrs, secstrings, "__versions");
becoming
info->index.vers = find_sec(info->hdr, info->sechdrs, info->secstrings, "__versions");
in the new "setup_module_info()" function, that's again just a result of
it being a search-and-replace patch. By using the 'info' pointer, we could
just change the 'find_sec()' interface so that it ends up being
info->index.vers = find_sec(info, "__versions");
instead, and then we'd actually have a shorter and more readable line. So
for a lot of those mindless variable name expansions there's would be room
for separate cleanups.
I didn't move quite everything in there - if we do this to layout_symtabs,
for example, we'd want to move the percpu, symoffs, stroffs, *strmap
variables to be fields in that module_info structure too. But that's a
much smaller patch, I moved just the really core stuff that is currently
being set up and used in various parts.
But even in this rough form, it removes close to 70 lines from that
function (but adds 22 lines overall, of course - the structure definition,
the helper function declarations and call-sites etc etc).
Linus
---
kernel/module.c | 228 ++++++++++++++++++++++++++++++-------------------------
1 files changed, 125 insertions(+), 103 deletions(-)
diff --git a/kernel/module.c b/kernel/module.c
index 01e8874..5cfe82a 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -2139,8 +2139,17 @@ static inline void kmemleak_load_module(struct module *mod, Elf_Ehdr *hdr,
}
#endif
-static int copy_and_check(Elf_Ehdr **hdrp,
- const void __user *umod, unsigned long len)
+struct module_info {
+ Elf_Ehdr *hdr;
+ unsigned long len;
+ Elf_Shdr *sechdrs;
+ char *secstrings, *args, *strtab;
+ struct {
+ unsigned int sym, str, mod, vers, info, pcpu;
+ } index;
+};
+
+static int copy_and_check(struct module_info *info, const void __user *umod, unsigned long len)
{
int err;
Elf_Ehdr *hdr;
@@ -2150,7 +2159,7 @@ static int copy_and_check(Elf_Ehdr **hdrp,
/* Suck in entire file: we'll want most of it. */
/* vmalloc barfs on "unusual" numbers. Check here */
- if (len > 64 * 1024 * 1024 || (hdr = *hdrp = vmalloc(len)) == NULL)
+ if (len > 64 * 1024 * 1024 || (hdr = vmalloc(len)) == NULL)
return -ENOMEM;
if (copy_from_user(hdr, umod, len) != 0) {
@@ -2172,6 +2181,8 @@ static int copy_and_check(Elf_Ehdr **hdrp,
err = -ENOEXEC;
goto free_hdr;
}
+ info->hdr = hdr;
+ info->len = len;
return 0;
free_hdr:
@@ -2179,6 +2190,80 @@ free_hdr:
return err;
}
+/*
+ * Set up our basic convenience variables (pointers to section headers,
+ * search for module section index etc), and do some basic section
+ * verification.
+ *
+ * Return the temporary module pointer (we'll replace it with the final
+ * one when we move the module sections around).
+ */
+static struct module *setup_module_info(struct module_info *info)
+{
+ unsigned int i;
+ struct module *mod;
+
+ /* Set up the convenience variables */
+ info->sechdrs = (void *)info->hdr + info->hdr->e_shoff;
+ info->secstrings = (void *)info->hdr + info->sechdrs[info->hdr->e_shstrndx].sh_offset;
+ info->sechdrs[0].sh_addr = 0;
+
+ for (i = 1; i < info->hdr->e_shnum; i++) {
+ if (info->sechdrs[i].sh_type != SHT_NOBITS
+ && info->len < info->sechdrs[i].sh_offset + info->sechdrs[i].sh_size)
+ goto truncated;
+
+ /* Mark all sections sh_addr with their address in the
+ temporary image. */
+ info->sechdrs[i].sh_addr = (size_t)info->hdr + info->sechdrs[i].sh_offset;
+
+ /* Internal symbols and strings. */
+ if (info->sechdrs[i].sh_type == SHT_SYMTAB) {
+ info->index.sym = i;
+ info->index.str = info->sechdrs[i].sh_link;
+ info->strtab = (char *)info->hdr + info->sechdrs[info->index.str].sh_offset;
+ }
+#ifndef CONFIG_MODULE_UNLOAD
+ /* Don't load .exit sections */
+ if (strstarts(info->secstrings+info->sechdrs[i].sh_name, ".exit"))
+ info->sechdrs[i].sh_flags &= ~(unsigned long)SHF_ALLOC;
+#endif
+ }
+
+ info->index.mod = find_sec(info->hdr, info->sechdrs, info->secstrings,
+ ".gnu.linkonce.this_module");
+ if (!info->index.mod) {
+ printk(KERN_WARNING "No module found in object\n");
+ return ERR_PTR(-ENOEXEC);
+ }
+ /* This is temporary: point mod into copy of data. */
+ mod = (void *)info->sechdrs[info->index.mod].sh_addr;
+
+ if (info->index.sym == 0) {
+ printk(KERN_WARNING "%s: module has no symbols (stripped?)\n",
+ mod->name);
+ return ERR_PTR(-ENOEXEC);
+ }
+
+ info->index.vers = find_sec(info->hdr, info->sechdrs, info->secstrings, "__versions");
+ info->index.info = find_sec(info->hdr, info->sechdrs, info->secstrings, ".modinfo");
+ info->index.pcpu = find_pcpusec(info->hdr, info->sechdrs, info->secstrings);
+
+ /* Don't keep modinfo and version sections. */
+ info->sechdrs[info->index.info].sh_flags &= ~(unsigned long)SHF_ALLOC;
+ info->sechdrs[info->index.vers].sh_flags &= ~(unsigned long)SHF_ALLOC;
+
+ /* Check module struct version now, before we try to use module. */
+ if (!check_modstruct_version(info->sechdrs, info->index.vers, mod))
+ return ERR_PTR(-ENOEXEC);
+
+ return mod;
+
+ truncated:
+ printk(KERN_ERR "Module len %lu truncated\n", info->len);
+ return ERR_PTR(-ENOEXEC);
+}
+
static int check_modinfo(struct module *mod,
const Elf_Shdr *sechdrs,
unsigned int infoindex, unsigned int versindex)
@@ -2404,13 +2489,7 @@ static noinline struct module *load_module(void __user *umod,
unsigned long len,
const char __user *uargs)
{
- Elf_Ehdr *hdr;
- Elf_Shdr *sechdrs;
- char *secstrings, *args, *strtab = NULL;
- unsigned int i;
- unsigned int symindex = 0;
- unsigned int strindex = 0;
- unsigned int modindex, versindex, infoindex, pcpuindex;
+ struct module_info info = { NULL, };
struct module *mod;
long err;
unsigned long symoffs, stroffs, *strmap;
@@ -2419,80 +2498,28 @@ static noinline struct module *load_module(void __user *umod,
DEBUGP("load_module: umod=%p, len=%lu, uargs=%p\n",
umod, len, uargs);
- err = copy_and_check(&hdr, umod, len);
+ err = copy_and_check(&info, umod, len);
if (err)
return ERR_PTR(err);
- /* Convenience variables */
- sechdrs = (void *)hdr + hdr->e_shoff;
- secstrings = (void *)hdr + sechdrs[hdr->e_shstrndx].sh_offset;
- sechdrs[0].sh_addr = 0;
-
- for (i = 1; i < hdr->e_shnum; i++) {
- if (sechdrs[i].sh_type != SHT_NOBITS
- && len < sechdrs[i].sh_offset + sechdrs[i].sh_size)
- goto truncated;
-
- /* Mark all sections sh_addr with their address in the
- temporary image. */
- sechdrs[i].sh_addr = (size_t)hdr + sechdrs[i].sh_offset;
-
- /* Internal symbols and strings. */
- if (sechdrs[i].sh_type == SHT_SYMTAB) {
- symindex = i;
- strindex = sechdrs[i].sh_link;
- strtab = (char *)hdr + sechdrs[strindex].sh_offset;
- }
-#ifndef CONFIG_MODULE_UNLOAD
- /* Don't load .exit sections */
- if (strstarts(secstrings+sechdrs[i].sh_name, ".exit"))
- sechdrs[i].sh_flags &= ~(unsigned long)SHF_ALLOC;
-#endif
- }
-
- modindex = find_sec(hdr, sechdrs, secstrings,
- ".gnu.linkonce.this_module");
- if (!modindex) {
- printk(KERN_WARNING "No module found in object\n");
- err = -ENOEXEC;
- goto free_hdr;
- }
- /* This is temporary: point mod into copy of data. */
- mod = (void *)sechdrs[modindex].sh_addr;
-
- if (symindex == 0) {
- printk(KERN_WARNING "%s: module has no symbols (stripped?)\n",
- mod->name);
- err = -ENOEXEC;
- goto free_hdr;
- }
-
- versindex = find_sec(hdr, sechdrs, secstrings, "__versions");
- infoindex = find_sec(hdr, sechdrs, secstrings, ".modinfo");
- pcpuindex = find_pcpusec(hdr, sechdrs, secstrings);
-
- /* Don't keep modinfo and version sections. */
- sechdrs[infoindex].sh_flags &= ~(unsigned long)SHF_ALLOC;
- sechdrs[versindex].sh_flags &= ~(unsigned long)SHF_ALLOC;
-
- /* Check module struct version now, before we try to use module. */
- if (!check_modstruct_version(sechdrs, versindex, mod)) {
- err = -ENOEXEC;
+ mod = setup_module_info(&info);
+ if (IS_ERR(mod)) {
+ err = PTR_ERR(mod);
goto free_hdr;
}
- err = check_modinfo(mod, sechdrs, infoindex, versindex);
+ err = check_modinfo(mod, info.sechdrs, info.index.info, info.index.vers);
if (err)
goto free_hdr;
/* Now copy in args */
- args = strndup_user(uargs, ~0UL >> 1);
- if (IS_ERR(args)) {
- err = PTR_ERR(args);
+ info.args = strndup_user(uargs, ~0UL >> 1);
+ if (IS_ERR(info.args)) {
+ err = PTR_ERR(info.args);
goto free_hdr;
}
- strmap = kzalloc(BITS_TO_LONGS(sechdrs[strindex].sh_size)
+ strmap = kzalloc(BITS_TO_LONGS(info.sechdrs[info.index.str].sh_size)
* sizeof(long), GFP_KERNEL);
if (!strmap) {
err = -ENOMEM;
@@ -2502,17 +2529,17 @@ static noinline struct module *load_module(void __user *umod,
mod->state = MODULE_STATE_COMING;
/* Allow arches to frob section contents and sizes. */
- err = module_frob_arch_sections(hdr, sechdrs, secstrings, mod);
+ err = module_frob_arch_sections(info.hdr, info.sechdrs, info.secstrings, mod);
if (err < 0)
goto free_mod;
- if (pcpuindex) {
+ if (info.index.pcpu) {
/* We have a special allocation for this section. */
- err = percpu_modalloc(mod, sechdrs[pcpuindex].sh_size,
- sechdrs[pcpuindex].sh_addralign);
+ err = percpu_modalloc(mod, info.sechdrs[info.index.pcpu].sh_size,
+ info.sechdrs[info.index.pcpu].sh_addralign);
if (err)
goto free_mod;
- sechdrs[pcpuindex].sh_flags &= ~(unsigned long)SHF_ALLOC;
+ info.sechdrs[info.index.pcpu].sh_flags &= ~(unsigned long)SHF_ALLOC;
}
/* Keep this around for failure path. */
percpu = mod_percpu(mod);
@@ -2520,12 +2547,12 @@ static noinline struct module *load_module(void __user *umod,
/* Determine total sizes, and put offsets in sh_entsize. For now
this is done generically; there doesn't appear to be any
special cases for the architectures. */
- layout_sections(mod, hdr, sechdrs, secstrings);
- symoffs = layout_symtab(mod, sechdrs, symindex, strindex, hdr,
- secstrings, &stroffs, strmap);
+ layout_sections(mod, info.hdr, info.sechdrs, info.secstrings);
+ symoffs = layout_symtab(mod, info.sechdrs, info.index.sym, info.index.str, info.hdr,
+ info.secstrings, &stroffs, strmap);
/* Allocate and move to the final place */
- mod = move_module(mod, hdr, sechdrs, secstrings, modindex);
+ mod = move_module(mod, info.hdr, info.sechdrs, info.secstrings, info.index.mod);
if (IS_ERR(mod)) {
err = PTR_ERR(mod);
goto free_percpu;
@@ -2538,36 +2565,36 @@ static noinline struct module *load_module(void __user *umod,
/* Now we've got everything in the final locations, we can
* find optional sections. */
- find_module_sections(mod, hdr, sechdrs, secstrings);
+ find_module_sections(mod, info.hdr, info.sechdrs, info.secstrings);
- err = check_module_license_and_versions(mod, sechdrs);
+ err = check_module_license_and_versions(mod, info.sechdrs);
if (err)
goto free_unload;
/* Set up MODINFO_ATTR fields */
- setup_modinfo(mod, sechdrs, infoindex);
+ setup_modinfo(mod, info.sechdrs, info.index.info);
/* Fix up syms, so that st_value is a pointer to location. */
- err = simplify_symbols(sechdrs, symindex, strtab, versindex, pcpuindex,
+ err = simplify_symbols(info.sechdrs, info.index.sym, info.strtab, info.index.vers, info.index.pcpu,
mod);
if (err < 0)
goto cleanup;
- err = apply_relocations(mod, hdr, sechdrs, symindex, strindex);
+ err = apply_relocations(mod, info.hdr, info.sechdrs, info.index.sym, info.index.str);
if (err < 0)
goto cleanup;
/* Set up and sort exception table */
- mod->extable = section_objs(hdr, sechdrs, secstrings, "__ex_table",
+ mod->extable = section_objs(info.hdr, info.sechdrs, info.secstrings, "__ex_table",
sizeof(*mod->extable), &mod->num_exentries);
sort_extable(mod->extable, mod->extable + mod->num_exentries);
/* Finally, copy percpu area over. */
- percpu_modcopy(mod, (void *)sechdrs[pcpuindex].sh_addr,
- sechdrs[pcpuindex].sh_size);
+ percpu_modcopy(mod, (void *)info.sechdrs[info.index.pcpu].sh_addr,
+ info.sechdrs[info.index.pcpu].sh_size);
- add_kallsyms(mod, sechdrs, hdr->e_shnum, symindex, strindex,
- symoffs, stroffs, secstrings, strmap);
+ add_kallsyms(mod, info.sechdrs, info.hdr->e_shnum, info.index.sym, info.index.str,
+ symoffs, stroffs, info.secstrings, strmap);
kfree(strmap);
strmap = NULL;
@@ -2575,19 +2602,19 @@ static noinline struct module *load_module(void __user *umod,
struct _ddebug *debug;
unsigned int num_debug;
- debug = section_objs(hdr, sechdrs, secstrings, "__verbose",
+ debug = section_objs(info.hdr, info.sechdrs, info.secstrings, "__verbose",
sizeof(*debug), &num_debug);
if (debug)
dynamic_debug_setup(debug, num_debug);
}
- err = module_finalize(hdr, sechdrs, mod);
+ err = module_finalize(info.hdr, info.sechdrs, mod);
if (err < 0)
goto cleanup;
flush_module_icache(mod);
- mod->args = args;
+ mod->args = info.args;
/* Now sew it into the lists so we can get lockdep and oops
* info during argument parsing. Noone should access us, since
@@ -2618,11 +2645,11 @@ static noinline struct module *load_module(void __user *umod,
if (err < 0)
goto unlink;
- add_sect_attrs(mod, hdr->e_shnum, secstrings, sechdrs);
- add_notes_attrs(mod, hdr->e_shnum, secstrings, sechdrs);
+ add_sect_attrs(mod, info.hdr->e_shnum, info.secstrings, info.sechdrs);
+ add_notes_attrs(mod, info.hdr->e_shnum, info.secstrings, info.sechdrs);
/* Get rid of temporary copy */
- vfree(hdr);
+ vfree(info.hdr);
trace_module_load(mod);
@@ -2648,16 +2675,11 @@ static noinline struct module *load_module(void __user *umod,
free_percpu:
free_percpu(percpu);
free_mod:
- kfree(args);
+ kfree(info.args);
kfree(strmap);
free_hdr:
- vfree(hdr);
+ vfree(info.hdr);
return ERR_PTR(err);
-
- truncated:
- printk(KERN_ERR "Module len %lu truncated\n", len);
- err = -ENOEXEC;
- goto free_hdr;
}
/* Call module constructors. */
On Thu, 3 Jun 2010 03:31:06 am Linus Torvalds wrote:
>
> On Wed, 2 Jun 2010, Rusty Russell wrote:
> >
> > And load_module is down to 259 lines. The label chain at the end is no
> > shorter tho :( I'll leave those cleanups until next merge window.
>
> Btw, here's a patch that _looks_ large, but it really pretty trivial, and
> sets things up so that it would be way easier to split off pieces of the
> module loading.
>
> The reason it looks large is that it creates a "module_info" structure
> that contains all the module state that we're building up while loading,
> instead of having individual variables for all the indices etc.
>
> So the patch ends up being large, because every "symindex" access instead
> becomes "info.index.sym" etc. That may be a few characters longer, but it
> then means that we can just pass a pointer to that "info" structure
> around. and let all the pieces fill it in very naturally.
>
> As an example of that, the patch also moves the initialization of all
> those convenience variables into a "setup_module_info()" function. And at
> this point it really does become very natural to start to peel off some of
> the error labels and move them into the helper functions - now the
> "truncated" case is gone, and is handled inside that setup function
> instead.
>
> So maybe you don't like this approach, and it does make the variable
> accesses a bit longer, but I don't think unreadably so. And the patch
> really does look big and scary, but there really should be absolutely no
> semantic changes - most of it was a trivial and mindless rename.
>
> In fact, it was so mindless that I on purpose kept the existing helper
> functions looking like this:
>
> - err = check_modinfo(mod, sechdrs, infoindex, versindex);
> + err = check_modinfo(mod, info.sechdrs, info.index.info, info.index.vers);
>
> rather than changing them to just take the "info" pointer. IOW, a second
> phase (if you think the approach is ok) would change that calling
> convention to just do
>
> err = check_modinfo(mod, &info);
>
> (and same for "layout_sections()", "layout_symtabs()" etc.) Similarly,
> while right now it makes things _look_ bigger, with things like this:
>
> versindex = find_sec(hdr, sechdrs, secstrings, "__versions");
>
> becoming
>
> info->index.vers = find_sec(info->hdr, info->sechdrs, info->secstrings, "__versions");
>
> in the new "setup_module_info()" function, that's again just a result of
> it being a search-and-replace patch. By using the 'info' pointer, we could
> just change the 'find_sec()' interface so that it ends up being
>
> info->index.vers = find_sec(info, "__versions");
>
> instead, and then we'd actually have a shorter and more readable line. So
> for a lot of those mindless variable name expansions there's would be room
> for separate cleanups.
>
> I didn't move quite everything in there - if we do this to layout_symtabs,
> for example, we'd want to move the percpu, symoffs, stroffs, *strmap
> variables to be fields in that module_info structure too. But that's a
> much smaller patch, I moved just the really core stuff that is currently
> being set up and used in various parts.
>
> But even in this rough form, it removes close to 70 lines from that
> function (but adds 22 lines overall, of course - the structure definition,
> the helper function declarations and call-sites etc etc).
Applied. I thought about the same thing but had the same doubts as you.
However, you're right that it has potential. I'll rename module_info to
load_info if you don't mind tho: contains more semantic punch IMHO.
On top of this, I'm right now closing on another ideal of mine: encapsulate
all the "before we move module" into one function. That before vs. after
always made me nervous...
Thanks!
Rusty.
On Thu, 3 Jun 2010 12:20:48 am Linus Torvalds wrote:
>
> On Wed, 2 Jun 2010, Rusty Russell wrote:
> >
> > Found another locking issue: the code which verifies we don't export over
> > an existing symbol needs to be atomic wrt. adding us to the list.
>
> Yup.
>
> And now that I'm looking at that call-chain (to see if it would make sense
> to use some other more specific lock - doesn't look like it: all the
> readers are using RCU and this is the only writer), I also give you this
> trivial one-liner. It changes each_symbol() to not put that constant array
> on the stack, resulting in changing
>
> movq $C.388.31095, %rsi #, tmp85
> subq $376, %rsp #,
> movq %rdi, %rbx # fn, fn
> leaq -208(%rbp), %rdi #, tmp84
> movq %rbx, %rdx # fn,
> rep movsl
> xorl %esi, %esi #
> leaq -208(%rbp), %rdi #, tmp87
> movq %r12, %rcx # data,
> call each_symbol_in_section.clone.0 #
>
> into
>
> xorl %esi, %esi #
> subq $216, %rsp #,
> movq %rdi, %rbx # fn, fn
> movq $arr.31078, %rdi #,
> call each_symbol_in_section.clone.0 #
>
> which is not so much about being obviously shorter and simpler because we
> don't unnecessarily copy that constant array around onto the stack, but
> also about having a much smaller stack footprint (376 vs 216 bytes - see
> the update of 'rsp').
>
> Signed-off-by: Linus Torvalds <[email protected]>
BTW, applied, thanks!
I've finished the cleanup now, and removed the noinline on load_module;
we're down to 124 bytes of stack for sys_init_module here (32 bit).
The following changes since commit aef4b9aaae1decc775778903922bd0075cce7a88:
Linus Torvalds (1):
Merge branch 'next' of git://git.kernel.org/.../benh/powerpc
are available in the git repository at:
ssh://master.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6 modules
Eric Dumazet (1):
module: module_unload_init() cleanup
Linus Torvalds (6):
module: Make the 'usage' lists be two-way
module: move find_module check to end
module: refactor load_module
module: refactor load_module part 2
module: reduce stack usage for each_symbol()
module: fix bne2 "gave up waiting for init of module libcrc32c"
Rusty Russell (18):
module: fix kdb's illicit use of struct module_use.
module: move sysfs exposure to end of load_module
module: Make module sysfs functions private.
module: make locking more fine-grained.
module: verify_export_symbols under the lock
module: fix bne2 "gave up waiting for init of module libcrc32c"
module: refactor load_module part 3
module: refactor load_module part 4
module: refactor load_module part 5
module: refactor out section header rewriting
module: kallsyms functions take struct load_info
module: layout_and_allocate
module: sysfs cleanup
module: pass load_info into other functions
module: move module args strndup_user to just before use
module: simplify per-cpu handling a little
module: group post-relocation functions into post_relocation()
module: cleanup comments, remove noinline
include/linux/module.h | 44 +--
kernel/debug/kdb/kdb_main.c | 12 +-
kernel/module.c | 1361 ++++++++++++++++++++++++-------------------
On Thu, 3 Jun 2010, Rusty Russell wrote:
>
> However, you're right that it has potential. I'll rename module_info to
> load_info if you don't mind tho: contains more semantic punch IMHO.
Umm. One problem is that you will almost certainly eventually want to
expose that to the architecture "fixup" routines (ie things like
module_frob_arch_sections(), arch_mod_section_prepend()), and at that
point "load_info" is a horribly bad structure name, since it would show
up in <linux/module.h> and thus be exported all over.
At least call it "struct module_load_info". But yes, I do agree that the
"load" part is important.
> On top of this, I'm right now closing on another ideal of mine: encapsulate
> all the "before we move module" into one function. That before vs. after
> always made me nervous...
Yeah, that should be trivial, and I agree that it would be good to not
have "mod" mean two things in the same function. Especially with all the
"goto failure-case", and some of the failure cases using "mod", it is a
bit scary for it to point into the (before movement) 'hdr+len' structure,
and then (after movement) into the relocated module allocations.
I looked at that particularly when doing that whole
mod = setup_module_info(&info);
if (IS_ERR(mod)) {
err = PTR_ERR(mod);
goto free_hdr;
}
thing, because that made "mod" have _three_ totally different values
(error, before, after) when jumping out to the failure paths.
Linus
On Fri, 4 Jun 2010 01:54:19 am Linus Torvalds wrote:
>
> On Thu, 3 Jun 2010, Rusty Russell wrote:
> >
> > However, you're right that it has potential. I'll rename module_info to
> > load_info if you don't mind tho: contains more semantic punch IMHO.
>
> Umm. One problem is that you will almost certainly eventually want to
> expose that to the architecture "fixup" routines (ie things like
> module_frob_arch_sections(), arch_mod_section_prepend()), and at that
> point "load_info" is a horribly bad structure name, since it would show
> up in <linux/module.h> and thus be exported all over.
>
> At least call it "struct module_load_info". But yes, I do agree that the
> "load" part is important.
Looking at the arch code, it has the advantage that it's self-contained.
They've been pleasantly undemanding from the core over the years; I think
archs doing tricky things with elf prefer to parse the object themselves
anyway. And I'm not sure they want to revisit it, either.
So I don't think we'd win much from changing them. I'm wrong later, I'll
prepend "module_" to the struct name as an internal change then hit them
all.
> I looked at that particularly when doing that whole
>
> mod = setup_module_info(&info);
> if (IS_ERR(mod)) {
> err = PTR_ERR(mod);
> goto free_hdr;
> }
>
> thing, because that made "mod" have _three_ totally different values
> (error, before, after) when jumping out to the failure paths.
Yep, it now is back to sanity. Let's see if today's linux-next is
happy.
If so, do you want just the fixes or the whole refactoring too, while
it's nice and fresh?
Thanks!
Rusty.
On Fri, 4 Jun 2010, Rusty Russell wrote:
> >
> > At least call it "struct module_load_info". But yes, I do agree that the
> > "load" part is important.
>
> Looking at the arch code, it has the advantage that it's self-contained.
> They've been pleasantly undemanding from the core over the years; I think
> archs doing tricky things with elf prefer to parse the object themselves
> anyway. And I'm not sure they want to revisit it, either.
>
> So I don't think we'd win much from changing them. I'm wrong later, I'll
> prepend "module_" to the struct name as an internal change then hit them
> all.
Ok. So if we don't expect to ever pass the full load_info struct down to
the arch code, and we can keep it entirely internal to module.c, then
"struct load_info" is fine by me.
> If so, do you want just the fixes or the whole refactoring too, while
> it's nice and fresh?
Gaah. "Just the fixes" is definitely the prudent thing to do. At the same
time, I've now so deeply bought into the whole cleanup thing too, that I
want to argue that the cleanup might make it easier to handle any locking
problems if we find them.
But I suspect that is just myself trying to fool/argue my smarter half
into taking it all.
So you can probably push me either way.
Linus
On Fri, 4 Jun 2010 11:25:00 am Linus Torvalds wrote:
> But I suspect that is just myself trying to fool/argue my smarter half
> into taking it all.
Similar motivation for even asking.
They can stew in linux-next for another cycle: just found a trivial
!SMP compile issue, and with all the other config options in there
it could use some baking...
Thanks,
Rusty.
On Fri, 4 Jun 2010, Rusty Russell wrote:
> On Fri, 4 Jun 2010 11:25:00 am Linus Torvalds wrote:
> > But I suspect that is just myself trying to fool/argue my smarter half
> > into taking it all.
>
> Similar motivation for even asking.
>
> They can stew in linux-next for another cycle: just found a trivial
> !SMP compile issue, and with all the other config options in there
> it could use some baking...
Yeah, considering the ia64 report, I can't lie to myself and say that the
cleanups are going to be totally safe.
Can you send me a pointer to the safe/bugfix part, and I'll pull it for
-rc2? I'd like to do that Saturday.
Linus
On Sat, 5 Jun 2010 08:18:13 am Linus Torvalds wrote:
> Can you send me a pointer to the safe/bugfix part, and I'll pull it for
> -rc2? I'd like to do that Saturday.
Yep, here it is:
The following changes since commit ad8456361fa19068cf49b50a4f98e41b73c08e76:
Linus Torvalds (1):
Merge branch 'upstream-linus' of git://git.kernel.org/.../jgarzik/libata-dev
are available in the git repository at:
ssh://master.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus.git master
Linus Torvalds (2):
module: Make the 'usage' lists be two-way
module: move find_module check to end
Rusty Russell (6):
module: fix kdb's illicit use of struct module_use.
module: move sysfs exposure to end of load_module
module: Make module sysfs functions private.
module: make locking more fine-grained.
module: verify_export_symbols under the lock
module: fix bne2 "gave up waiting for init of module libcrc32c"
include/linux/module.h | 44 ++-----
kernel/debug/kdb/kdb_main.c | 12 +--
kernel/module.c | 320 ++++++++++++++++++++++++++++---------------
3 files changed, 221 insertions(+), 155 deletions(-)