Hi,
Below is an (unfinished) patch, which makes a small but important change
to the module format. The module structure is no longer generated by
insmod and is instead under control of the kernel now. This means we have
now the freedom to properly cleanup the module code without needing to
tell insmod about it. The dependency between insmod and the kernel is
greatly reduced by this.
To further reduce the dependency we need some changes to kbuild to
automatically include start/stop symbols for sections, currently I still
let insmod do that. With this change we can freely change the module
layout, without having to change insmod everytime.
The next steps could be:
1. Properly fixing module races: I'm playing with a init/start/stop/exit
model, this has the advantage that we can stop anyone from reusing a
module and we only have to wait for remaining users to go away until we
can safely unload the module.
For old style modules all the magic would be hidden in module_init(), but
this also means these modules can be loaded, but never unloaded (IMO a
safe default). New style modules could look like this:
DEFINE_MODULE
init: foo_init,
start: foo_start,
stop: foo_stop,
exit: foo_exit,
END_MODULE
BTW this way we could easily generate a listing of built-in modules and so
further reduce the difference between compiled in and separate modules.
2. Since we can extend the module structure now, something like this would
be possible:
DEFINE_FS_MODULE
fs_type: &foo_fs_type,
init: foo_init,
start: foo_start,
stop: foo_stop,
exit: foo_exit,
END_FS_MODULE
This would automatically call register_filesystem(&foo_fs_type) and it
would also take care of correctly unloading the module.
3. Something I was thinking about in this context: Is there a really good
reason, why we have to store the symbol information in the kernel? The
only real user is insmod, which is retrieving and storing that information
in kernel memory. Insmod could as well store this information somewhere
else. This would require a writable fs to load modules, but otherwise I
don't see any real problem. The big advantage would be less data is stored
in the kernel and the module code becomes smaller.
Any comments?
bye, Roman
PS: These patches are in early stage, so I know that they need a cleanup. :)
diff -ur modutils-2.4.15.org/insmod/insmod.c modutils-2.4.15/insmod/insmod.c
--- modutils-2.4.15.org/insmod/insmod.c Fri Mar 1 01:39:06 2002
+++ modutils-2.4.15/insmod/insmod.c Tue Jul 16 00:53:43 2002
@@ -478,8 +478,42 @@
static int create_this_module(struct obj_file *f, const char *m_name)
{
- struct obj_section *sec;
+ struct obj_section *sec, *mod_sec;
+
+ mod_sec = obj_find_section(f, ".module");
+ if (mod_sec) {
+ ElfW(Addr) start, end;
+
+ sec = obj_find_section(f, "__ex_table");
+ if (sec) {
+ start = sec->header.sh_addr;
+ end = sec->header.sh_addr + sec->header.sh_size;
+ } else {
+ sec = mod_sec;
+ start = end = 0;
+ }
+ obj_add_symbol(f, "__ex_table_start", -1, ELFW(ST_INFO) (STB_LOCAL, STT_OBJECT),
+ sec->idx, start, 0);
+ obj_add_symbol(f, "__ex_table_end", -1, ELFW(ST_INFO) (STB_LOCAL, STT_OBJECT),
+ sec->idx, end, 0);
+
+ sec = obj_find_section(f, "__ksymtab");
+ if (!sec)
+ sec = obj_create_alloced_section(f, "__ksymtab", tgt_sizeof_void_p, 0, 0);
+ obj_add_symbol(f, "__syms_start", -1, ELFW(ST_INFO) (STB_LOCAL, STT_NOTYPE),
+ sec->idx, 0, 0);
+ obj_add_symbol(f, "__syms_end", -1, ELFW(ST_INFO) (STB_LOCAL, STT_NOTYPE),
+ sec->idx, 0, 0);
+ sec = obj_find_section(f, ".kmodtab");
+ if (!sec)
+ sec = obj_create_alloced_section(f, ".kmodtab", tgt_sizeof_void_p, 0, 0);
+ obj_add_symbol(f, "__deps_start", -1, ELFW(ST_INFO) (STB_LOCAL, STT_NOTYPE),
+ sec->idx, 0, 0);
+ obj_add_symbol(f, "__deps_end", -1, ELFW(ST_INFO) (STB_LOCAL, STT_NOTYPE),
+ sec->idx, 0, 0);
+ return 1;
+ }
sec = obj_create_alloced_section_first(f, ".this", tgt_sizeof_long,
sizeof(struct module));
memset(sec->contents, 0, sizeof(struct module));
@@ -1121,6 +1155,7 @@
tgt_long m_addr;
sec = obj_find_section(f, ".this");
+ if (sec) {
module = (struct module *) sec->contents;
m_addr = sec->header.sh_addr;
@@ -1169,6 +1204,7 @@
}
if (!arch_init_module(f, module))
return 0;
+ }
/*
* Whew! All of the initialization is complete.
@@ -1790,6 +1826,15 @@
goto out;
#endif
+{
+ struct obj_section *sec = obj_find_section(f, "__ksymtab");
+
+ if (sec && !(sec->header.sh_flags & SHF_ALLOC)) {
+ *((char *)(sec->name)) = 'x'; /* override const */
+ obj_create_alloced_section(f, "__ksymtab", tgt_sizeof_void_p, 0, 0);
+ sec->header.sh_flags |= SHF_ALLOC;
+ }
+}
/* Allocate common symbols, symbol tables, and string tables.
*
* The calls marked DEPMOD indicate the bits of code that depmod
@@ -1962,6 +2007,25 @@
/* kallsyms based on relocatable addresses */
if (add_kallsyms(f, &kallsyms, force_kallsyms))
goto out;
+
+{
+ struct obj_section *sec;
+ struct obj_symbol *sym;
+
+ sec = obj_find_section(f, "__ksymtab");
+ sym = obj_find_symbol(f, "__syms_start");
+ sym->value = sec->header.sh_addr;
+ sym->secidx = sec->idx;
+ sym = obj_find_symbol(f, "__syms_end");
+ sym->value = sec->header.sh_addr + sec->header.sh_size;
+ sym->secidx = sec->idx;
+
+ sec = obj_find_section(f, ".kmodtab");
+ sym = obj_find_symbol(f, "__deps_start");
+ sym->value = sec->header.sh_addr;
+ sym = obj_find_symbol(f, "__deps_end");
+ sym->value = sec->header.sh_addr + sec->header.sh_size;
+}
/**** No symbols or sections to be changed after kallsyms above ***/
if (errors)
diff -ur modutils-2.4.15.org/obj/obj_common.c modutils-2.4.15/obj/obj_common.c
--- modutils-2.4.15.org/obj/obj_common.c Fri Mar 1 01:39:06 2002
+++ modutils-2.4.15/obj/obj_common.c Tue Jul 16 00:53:12 2002
@@ -244,6 +244,8 @@
af = a->header.sh_flags;
ac = 0;
+ if (!strcmp(a->name, ".module"))
+ ac |= 128;
if (a->name[0] != '.'
|| strlen(a->name) != 10
|| strcmp(a->name + 5, ".init"))
Index: fs/affs/super.c
===================================================================
RCS file: /usr/src/cvsroot/linux-2.4/fs/affs/super.c,v
retrieving revision 1.1.1.8
diff -u -p -r1.1.1.8 super.c
--- fs/affs/super.c 2 Mar 2002 12:19:54 -0000 1.1.1.8
+++ fs/affs/super.c 15 Jul 2002 21:25:55 -0000
@@ -509,3 +509,19 @@ MODULE_LICENSE("GPL");
module_init(init_affs_fs)
module_exit(exit_affs_fs)
+
+extern struct module_symbol __syms_start, __syms_end;
+extern struct module_ref __deps_start, __deps_end;
+extern struct exception_table_entry __ex_table_start, __ex_table_end;
+
+static struct module __this_module __attribute__ ((section (".module"), unused)) = {
+ magic: MODULE_MAGIC,
+ init: init_module,
+ cleanup: cleanup_module,
+ syms: &__syms_start,
+ syms_end: &__syms_end,
+ deps: &__deps_start,
+ deps_end: &__deps_end,
+ ex_table_start: &__ex_table_start,
+ ex_table_end: &__ex_table_end,
+};
Index: include/linux/module.h
===================================================================
RCS file: /usr/src/cvsroot/linux-2.4/include/linux/module.h,v
retrieving revision 1.1.1.11
diff -u -p -r1.1.1.11 module.h
--- include/linux/module.h 23 Nov 2001 19:24:57 -0000 1.1.1.11
+++ include/linux/module.h 15 Jul 2002 21:22:22 -0000
@@ -50,11 +50,13 @@ struct module_ref
/* TBD */
struct module_persist;
+#define MODULE_MAGIC 0x4d4f4453
+
struct module
{
- unsigned long size_of_struct; /* == sizeof(module) */
+ unsigned int magic;
struct module *next;
- const char *name;
+ char *name;
unsigned long size;
union
@@ -65,11 +67,8 @@ struct module
unsigned long flags; /* AUTOCLEAN et al */
- unsigned nsyms;
- unsigned ndeps;
-
- struct module_symbol *syms;
- struct module_ref *deps;
+ struct module_symbol *syms, *syms_end;
+ struct module_ref *deps, *deps_end;
struct module_ref *refs;
int (*init)(void);
void (*cleanup)(void);
@@ -126,9 +125,7 @@ struct module_info
is present in the header received from insmod before we can use it.
This function returns true if the member is present. */
-#define mod_member_present(mod,member) \
- ((unsigned long)(&((struct module *)0L)->member + 1) \
- <= (mod)->size_of_struct)
+#define mod_member_present(mod,member) 1
/*
* Ditto for archdata. Assumes mod->archdata_start and mod->archdata_end
@@ -141,7 +138,7 @@ struct module_info
/* Check if an address p with number of entries n is within the body of module m */
-#define mod_bound(p, n, m) ((unsigned long)(p) >= ((unsigned long)(m) + ((m)->size_of_struct)) && \
+#define mod_bound(p, n, m) ((unsigned long)(p) >= ((unsigned long)(m) + (sizeof(struct module))) && \
(unsigned long)((p)+(n)) <= (unsigned long)(m) + (m)->size)
/* Backwards compatibility definition. */
@@ -287,7 +284,7 @@ static const char __module_license[] __a
"license=" license
/* Define the module variable, and usage macros. */
-extern struct module __this_module;
+static struct module __this_module;
#define THIS_MODULE (&__this_module)
#define MOD_INC_USE_COUNT __MOD_INC_USE_COUNT(THIS_MODULE)
Index: kernel/module.c
===================================================================
RCS file: /usr/src/cvsroot/linux-2.4/kernel/module.c,v
retrieving revision 1.1.1.11
diff -u -p -r1.1.1.11 module.c
--- kernel/module.c 23 Nov 2001 19:24:48 -0000 1.1.1.11
+++ kernel/module.c 15 Jul 2002 22:41:51 -0000
@@ -42,11 +42,12 @@ extern const char __stop___kallsyms[] __
struct module kernel_module =
{
- size_of_struct: sizeof(struct module),
+ magic: MODULE_MAGIC,
name: "",
uc: {ATOMIC_INIT(1)},
flags: MOD_RUNNING,
syms: __start___ksymtab,
+ syms_end: __stop___ksymtab,
ex_table_start: __start___ex_table,
ex_table_end: __stop___ex_table,
kallsyms_start: __start___kallsyms,
@@ -233,8 +234,6 @@ void inter_module_put(const char *im_nam
#if defined(CONFIG_MODULES) /* The rest of the source */
-static long get_mod_name(const char *user_name, char **buf);
-static void put_mod_name(char *buf);
struct module *find_module(const char *name);
void free_module(struct module *, int tag_freed);
@@ -245,46 +244,10 @@ void free_module(struct module *, int ta
void __init init_modules(void)
{
- kernel_module.nsyms = __stop___ksymtab - __start___ksymtab;
-
arch_init_modules(&kernel_module);
}
/*
- * Copy the name of a module from user space.
- */
-
-static inline long
-get_mod_name(const char *user_name, char **buf)
-{
- unsigned long page;
- long retval;
-
- page = __get_free_page(GFP_KERNEL);
- if (!page)
- return -ENOMEM;
-
- retval = strncpy_from_user((char *)page, user_name, PAGE_SIZE);
- if (retval > 0) {
- if (retval < PAGE_SIZE) {
- *buf = (char *)page;
- return retval;
- }
- retval = -ENAMETOOLONG;
- } else if (!retval)
- retval = -EINVAL;
-
- free_page(page);
- return retval;
-}
-
-static inline void
-put_mod_name(char *buf)
-{
- free_page((unsigned long)buf);
-}
-
-/*
* Allocate space for a module.
*/
@@ -294,45 +257,41 @@ sys_create_module(const char *name_user,
char *name;
long namelen, error;
struct module *mod;
- unsigned long flags;
if (!capable(CAP_SYS_MODULE))
return -EPERM;
lock_kernel();
- if ((namelen = get_mod_name(name_user, &name)) < 0) {
- error = namelen;
+ name = getname(name_user);
+ if (IS_ERR(name)) {
+ error = PTR_ERR(name);
goto err0;
}
- if (size < sizeof(struct module)+namelen) {
- error = -EINVAL;
- goto err1;
- }
+ namelen = strlen(name);
if (find_module(name) != NULL) {
error = -EEXIST;
goto err1;
}
- if ((mod = (struct module *)module_map(size)) == NULL) {
+ if ((mod = (struct module *)module_map(size + namelen + 1)) == NULL) {
error = -ENOMEM;
goto err1;
}
memset(mod, 0, sizeof(*mod));
- mod->size_of_struct = sizeof(*mod);
- mod->name = (char *)(mod + 1);
- mod->size = size;
- memcpy((char*)(mod+1), name, namelen+1);
+ mod->name = (char *)mod + size;
+ mod->size = size + namelen + 1;
+ strcpy(mod->name, name);
- put_mod_name(name);
+ putname(name);
- spin_lock_irqsave(&modlist_lock, flags);
+ spin_lock(&modlist_lock);
mod->next = module_list;
module_list = mod; /* link it in */
- spin_unlock_irqrestore(&modlist_lock, flags);
+ spin_unlock(&modlist_lock);
error = (long) mod;
goto err0;
err1:
- put_mod_name(name);
+ putname(name);
err0:
unlock_kernel();
return error;
@@ -346,154 +305,47 @@ asmlinkage long
sys_init_module(const char *name_user, struct module *mod_user)
{
struct module mod_tmp, *mod;
- char *name, *n_name, *name_tmp = NULL;
- long namelen, n_namelen, i, error;
- unsigned long mod_user_size;
+ char *name;
+ long namelen, i, error;
struct module_ref *dep;
+ int ndeps;
if (!capable(CAP_SYS_MODULE))
return -EPERM;
lock_kernel();
- if ((namelen = get_mod_name(name_user, &name)) < 0) {
- error = namelen;
+ name = getname(name_user);
+ if (IS_ERR(name)) {
+ error = PTR_ERR(name);
goto err0;
}
+ namelen = strlen(name);
if ((mod = find_module(name)) == NULL) {
error = -ENOENT;
goto err1;
}
- /* Check module header size. We allow a bit of slop over the
- size we are familiar with to cope with a version of insmod
- for a newer kernel. But don't over do it. */
- if ((error = get_user(mod_user_size, &mod_user->size_of_struct)) != 0)
- goto err1;
- if (mod_user_size < (unsigned long)&((struct module *)0L)->persist_start
- || mod_user_size > sizeof(struct module) + 16*sizeof(void*)) {
- printk(KERN_ERR "init_module: Invalid module header size.\n"
- KERN_ERR "A new version of the modutils is likely "
- "needed.\n");
- error = -EINVAL;
- goto err1;
- }
-
/* Hold the current contents while we play with the user's idea
of righteousness. */
- mod_tmp = *mod;
- name_tmp = kmalloc(strlen(mod->name) + 1, GFP_KERNEL); /* Where's kstrdup()? */
- if (name_tmp == NULL) {
- error = -ENOMEM;
- goto err1;
- }
- strcpy(name_tmp, mod->name);
-
- error = copy_from_user(mod, mod_user, mod_user_size);
+ error = copy_from_user(&mod_tmp, mod_user, sizeof(mod_tmp));
if (error) {
error = -EFAULT;
goto err2;
}
- /* Sanity check the size of the module. */
- error = -EINVAL;
-
- if (mod->size > mod_tmp.size) {
- printk(KERN_ERR "init_module: Size of initialized module "
- "exceeds size of created module.\n");
- goto err2;
- }
-
- /* Make sure all interesting pointers are sane. */
-
- if (!mod_bound(mod->name, namelen, mod)) {
- printk(KERN_ERR "init_module: mod->name out of bounds.\n");
- goto err2;
- }
- if (mod->nsyms && !mod_bound(mod->syms, mod->nsyms, mod)) {
- printk(KERN_ERR "init_module: mod->syms out of bounds.\n");
- goto err2;
- }
- if (mod->ndeps && !mod_bound(mod->deps, mod->ndeps, mod)) {
- printk(KERN_ERR "init_module: mod->deps out of bounds.\n");
- goto err2;
- }
- if (mod->init && !mod_bound(mod->init, 0, mod)) {
- printk(KERN_ERR "init_module: mod->init out of bounds.\n");
- goto err2;
- }
- if (mod->cleanup && !mod_bound(mod->cleanup, 0, mod)) {
- printk(KERN_ERR "init_module: mod->cleanup out of bounds.\n");
- goto err2;
- }
- if (mod->ex_table_start > mod->ex_table_end
- || (mod->ex_table_start &&
- !((unsigned long)mod->ex_table_start >= ((unsigned long)mod + mod->size_of_struct)
- && ((unsigned long)mod->ex_table_end
- < (unsigned long)mod + mod->size)))
- || (((unsigned long)mod->ex_table_start
- - (unsigned long)mod->ex_table_end)
- % sizeof(struct exception_table_entry))) {
- printk(KERN_ERR "init_module: mod->ex_table_* invalid.\n");
- goto err2;
- }
- if (mod->flags & ~MOD_AUTOCLEAN) {
- printk(KERN_ERR "init_module: mod->flags invalid.\n");
- goto err2;
- }
- if (mod_member_present(mod, can_unload)
- && mod->can_unload && !mod_bound(mod->can_unload, 0, mod)) {
- printk(KERN_ERR "init_module: mod->can_unload out of bounds.\n");
- goto err2;
- }
- if (mod_member_present(mod, kallsyms_end)) {
- if (mod->kallsyms_end &&
- (!mod_bound(mod->kallsyms_start, 0, mod) ||
- !mod_bound(mod->kallsyms_end, 0, mod))) {
- printk(KERN_ERR "init_module: mod->kallsyms out of bounds.\n");
- goto err2;
- }
- if (mod->kallsyms_start > mod->kallsyms_end) {
- printk(KERN_ERR "init_module: mod->kallsyms invalid.\n");
- goto err2;
- }
- }
- if (mod_member_present(mod, archdata_end)) {
- if (mod->archdata_end &&
- (!mod_bound(mod->archdata_start, 0, mod) ||
- !mod_bound(mod->archdata_end, 0, mod))) {
- printk(KERN_ERR "init_module: mod->archdata out of bounds.\n");
- goto err2;
- }
- if (mod->archdata_start > mod->archdata_end) {
- printk(KERN_ERR "init_module: mod->archdata invalid.\n");
- goto err2;
- }
- }
- if (mod_member_present(mod, kernel_data) && mod->kernel_data) {
- printk(KERN_ERR "init_module: mod->kernel_data must be zero.\n");
- goto err2;
- }
-
- /* Check that the user isn't doing something silly with the name. */
-
- if ((n_namelen = get_mod_name(mod->name - (unsigned long)mod
- + (unsigned long)mod_user,
- &n_name)) < 0) {
- printk(KERN_ERR "init_module: get_mod_name failure.\n");
- error = n_namelen;
+ if (mod_tmp.magic != MODULE_MAGIC) {
+ printk("update modutils?\n");
goto err2;
}
- if (namelen != n_namelen || strcmp(n_name, mod_tmp.name) != 0) {
- printk(KERN_ERR "init_module: changed module name to "
- "`%s' from `%s'\n",
- n_name, mod_tmp.name);
- goto err3;
- }
- /* Ok, that's about all the sanity we can stomach; copy the rest. */
+ mod_tmp.next = mod->next;
+ mod_tmp.name = mod->name;
+ mod_tmp.flags = mod->flags;
+ mod_tmp.size = mod->size;
+ *mod = mod_tmp;
- if (copy_from_user((char *)mod+mod_user_size,
- (char *)mod_user+mod_user_size,
- mod->size-mod_user_size)) {
+ if (copy_from_user((char *)mod + sizeof(*mod),
+ (char *)mod_user + sizeof(*mod),
+ mod->name - ((char *)mod + sizeof(*mod)))) {
error = -EFAULT;
goto err3;
}
@@ -505,11 +357,9 @@ sys_init_module(const char *name_user, s
to make the I and D caches consistent. */
flush_icache_range((unsigned long)mod, (unsigned long)mod + mod->size);
- mod->next = mod_tmp.next;
- mod->refs = NULL;
-
/* Sanity check the module's dependents */
- for (i = 0, dep = mod->deps; i < mod->ndeps; ++i, ++dep) {
+ ndeps = mod->deps_end - mod->deps;
+ for (i = 0, dep = mod->deps; i < ndeps; ++i, ++dep) {
struct module *o, *d = dep->dep;
/* Make sure the indicated dependencies are really modules. */
@@ -531,7 +381,7 @@ sys_init_module(const char *name_user, s
}
/* Update module references. */
- for (i = 0, dep = mod->deps; i < mod->ndeps; ++i, ++dep) {
+ for (i = 0, dep = mod->deps; i < ndeps; ++i, ++dep) {
struct module *d = dep->dep;
dep->ref = mod;
@@ -543,8 +393,7 @@ sys_init_module(const char *name_user, s
}
/* Free our temporary memory. */
- put_mod_name(n_name);
- put_mod_name(name);
+ putname(name);
/* Initialize the module. */
atomic_set(&mod->uc.usecount,1);
@@ -564,15 +413,11 @@ sys_init_module(const char *name_user, s
goto err0;
err3:
- put_mod_name(n_name);
err2:
- *mod = mod_tmp;
- strcpy((char *)mod->name, name_tmp); /* We know there is room for this */
err1:
- put_mod_name(name);
+ putname(name);
err0:
unlock_kernel();
- kfree(name_tmp);
return error;
}
@@ -604,14 +449,17 @@ sys_delete_module(const char *name_user)
lock_kernel();
if (name_user) {
- if ((error = get_mod_name(name_user, &name)) < 0)
+ name = getname(name_user);
+ if (IS_ERR(name)) {
+ error = PTR_ERR(name);
goto out;
+ }
error = -ENOENT;
if ((mod = find_module(name)) == NULL) {
- put_mod_name(name);
+ putname(name);
goto out;
}
- put_mod_name(name);
+ putname(name);
error = -EBUSY;
if (mod->refs != NULL)
goto out;
@@ -708,7 +556,7 @@ calc_space_needed:
static int
qm_deps(struct module *mod, char *buf, size_t bufsize, size_t *ret)
{
- size_t i, space, len;
+ size_t i, space, len, ndeps;
if (mod == &kernel_module)
return -EINVAL;
@@ -719,7 +567,8 @@ qm_deps(struct module *mod, char *buf, s
return 0;
space = 0;
- for (i = 0; i < mod->ndeps; ++i) {
+ ndeps = mod->deps_end - mod->deps;
+ for (i = 0; i < ndeps; ++i) {
const char *dep_name = mod->deps[i].dep->name;
len = strlen(dep_name)+1;
@@ -739,7 +588,7 @@ qm_deps(struct module *mod, char *buf, s
calc_space_needed:
space += len;
- while (++i < mod->ndeps)
+ while (++i < ndeps)
space += strlen(mod->deps[i].dep->name)+1;
if (put_user(space, ret))
@@ -799,6 +648,7 @@ qm_symbols(struct module *mod, char *buf
struct module_symbol *s;
char *strings;
unsigned long *vals;
+ int nsyms;
if (!MOD_CAN_QUERY(mod))
if (put_user(0, ret))
@@ -806,7 +656,8 @@ qm_symbols(struct module *mod, char *buf
else
return 0;
- space = mod->nsyms * 2*sizeof(void *);
+ nsyms = mod->syms_end - mod->syms;
+ space = nsyms * 2*sizeof(void *);
i = len = 0;
s = mod->syms;
@@ -821,7 +672,7 @@ qm_symbols(struct module *mod, char *buf
vals = (unsigned long *)buf;
strings = buf+space;
- for (; i < mod->nsyms ; ++i, ++s, vals += 2) {
+ for (; i < nsyms ; ++i, ++s, vals += 2) {
len = strlen(s->name)+1;
if (len > bufsize)
goto calc_space_needed;
@@ -841,7 +692,7 @@ qm_symbols(struct module *mod, char *buf
return 0;
calc_space_needed:
- for (; i < mod->nsyms; ++i, ++s)
+ for (; i < nsyms; ++i, ++s)
space += strlen(s->name)+1;
if (put_user(space, ret))
@@ -891,19 +742,19 @@ sys_query_module(const char *name_user,
if (name_user == NULL)
mod = &kernel_module;
else {
- long namelen;
char *name;
- if ((namelen = get_mod_name(name_user, &name)) < 0) {
- err = namelen;
+ name = getname(name_user);
+ if (IS_ERR(name)) {
+ err = PTR_ERR(name);
goto out;
}
err = -ENOENT;
if ((mod = find_module(name)) == NULL) {
- put_mod_name(name);
+ putname(name);
goto out;
}
- put_mod_name(name);
+ putname(name);
}
/* __MOD_ touches the flags. We must avoid that */
@@ -955,11 +806,12 @@ sys_get_kernel_syms(struct kernel_sym *t
struct module *mod;
int i;
struct kernel_sym ksym;
+ int nsyms;
lock_kernel();
for (mod = module_list, i = 0; mod; mod = mod->next) {
/* include the count for the module name! */
- i += mod->nsyms + 1;
+ i += mod->syms_end - mod->syms + 1;
}
if (table == NULL)
@@ -985,10 +837,11 @@ sys_get_kernel_syms(struct kernel_sym *t
goto out;
++i, ++table;
- if (mod->nsyms == 0)
+ nsyms = mod->syms_end - mod->syms;
+ if (nsyms == 0)
continue;
- for (j = 0, msym = mod->syms; j < mod->nsyms; ++j, ++msym) {
+ for (j = 0, msym = mod->syms; j < nsyms; ++j, ++msym) {
ksym.value = msym->value;
strncpy(ksym.name, msym->name, sizeof(ksym.name));
ksym.name[sizeof(ksym.name)-1] = '\0';
@@ -1030,8 +883,7 @@ void
free_module(struct module *mod, int tag_freed)
{
struct module_ref *dep;
- unsigned i;
- unsigned long flags;
+ unsigned i, ndeps;
/* Let the module clean up. */
@@ -1043,8 +895,8 @@ free_module(struct module *mod, int tag_
}
/* Remove the module from the dependency lists. */
-
- for (i = 0, dep = mod->deps; i < mod->ndeps; ++i, ++dep) {
+ ndeps = mod->deps_end - mod->deps;
+ for (i = 0, dep = mod->deps; i < ndeps; ++i, ++dep) {
struct module_ref **pp;
for (pp = &dep->dep->refs; *pp != dep; pp = &(*pp)->next_ref)
continue;
@@ -1055,7 +907,7 @@ free_module(struct module *mod, int tag_
/* And from the main module list. */
- spin_lock_irqsave(&modlist_lock, flags);
+ spin_lock(&modlist_lock);
if (mod == module_list) {
module_list = mod->next;
} else {
@@ -1064,7 +916,7 @@ free_module(struct module *mod, int tag_
continue;
p->next = mod->next;
}
- spin_unlock_irqrestore(&modlist_lock, flags);
+ spin_unlock(&modlist_lock);
/* And free the memory. */
@@ -1169,12 +1021,14 @@ static void *s_start(struct seq_file *m,
struct mod_sym *p = kmalloc(sizeof(*p), GFP_KERNEL);
struct module *v;
loff_t n = *pos;
+ int nsyms;
if (!p)
return ERR_PTR(-ENOMEM);
lock_kernel();
- for (v = module_list, n = *pos; v; n -= v->nsyms, v = v->next) {
- if (n < v->nsyms) {
+ for (v = module_list, n = *pos; v; n -= nsyms, v = v->next) {
+ nsyms = v->syms_end - v->syms;
+ if (n < nsyms) {
p->mod = v;
p->index = n;
return p;
@@ -1188,8 +1042,9 @@ static void *s_start(struct seq_file *m,
static void *s_next(struct seq_file *m, void *p, loff_t *pos)
{
struct mod_sym *v = p;
+ int nsyms = v->mod->syms_end - v->mod->syms;
(*pos)++;
- if (++v->index >= v->mod->nsyms) {
+ if (++v->index >= nsyms) {
do {
v->mod = v->mod->next;
if (!v->mod) {
@@ -1197,7 +1052,8 @@ static void *s_next(struct seq_file *m,
kfree(p);
return NULL;
}
- } while (!v->mod->nsyms);
+ nsyms = v->mod->syms_end - v->mod->syms;
+ } while (!nsyms);
v->index = 0;
}
return p;
On Tuesday 16 July 2002 15:04, Roman Zippel wrote:
> 1. Properly fixing module races: I'm playing with a init/start/stop/exit
> model, this has the advantage that we can stop anyone from reusing a
> module and we only have to wait for remaining users to go away until we
> can safely unload the module.
I'm satisfied that, for filesystems at least, all the module races can be
solved without adding start/stop, and I will present code in due course.
However, Rusty tells me there are harder cases than filesystems. At this
point I'm waiting for a specific example.
For filesystems, we rely on the filesystem code itself to know when all users
have gone away. If somebody is still executing in a filesystem module after
all umounts are done, it's a horrible nasty bug. We might still want to play
games with checking execution addresses of processes to see if anybody is
still in a module, but that would just be for debug; sys_delete_module can
rely on the filesystem's opinion about whether a module is quiescent or not.
Somebody please give me an example of why this same strategy will not
work for all types of modular code.
--
Daniel
Hi,
On Wed, 17 Jul 2002, Daniel Phillips wrote:
> > 1. Properly fixing module races: I'm playing with a init/start/stop/exit
> > model, this has the advantage that we can stop anyone from reusing a
> > module and we only have to wait for remaining users to go away until we
> > can safely unload the module.
>
> I'm satisfied that, for filesystems at least, all the module races can be
> solved without adding start/stop, and I will present code in due course.
The start/stop methods are not needed to fix the races, they allow better
control of the unload process.
> However, Rusty tells me there are harder cases than filesystems. At this
> point I'm waiting for a specific example.
For filesystems it's only simpler because they only have a single entry
point, but the basic problem is always the same. We have to protect
against module load/unload and unregister. Without an interface change we
will have to add module owner pointers everywhere and we will see
contention on the unload_lock due to try_inc_mod_count.
bye, Roman
On Wednesday 17 July 2002 22:12, Roman Zippel wrote:
> Hi,
>
> On Wed, 17 Jul 2002, Daniel Phillips wrote:
>
> > > 1. Properly fixing module races: I'm playing with a init/start/stop/exit
> > > model, this has the advantage that we can stop anyone from reusing a
> > > module and we only have to wait for remaining users to go away until we
> > > can safely unload the module.
> >
> > I'm satisfied that, for filesystems at least, all the module races can be
> > solved without adding start/stop, and I will present code in due course.
>
> The start/stop methods are not needed to fix the races, they allow better
> control of the unload process.
I'm afraid it must show that I didn't read the previous threads closely
enough, but what is the specific benefit supposed to be, if not to address
the races?
> > However, Rusty tells me there are harder cases than filesystems. At this
> > point I'm waiting for a specific example.
>
> For filesystems it's only simpler because they only have a single entry
> point, but the basic problem is always the same.
What do you mean by single entry point? Mount? Register_filesystem?
Lowlevel activity on a filesystem is certainly not restricted to a single
entry point.
> We have to protect
> against module load/unload and unregister. Without an interface change we
> will have to add module owner pointers everywhere and we will see
> contention on the unload_lock due to try_inc_mod_count.
It makes perfect sense for mount to be able to know which module implements
its filesystem. I do not see why updating every mount-like thing in the
system is bad, if it's the best interface.
It's really hard to see why contention on a slow-path lock is anything to
worry about. Anyway, it's not hard to fix the locking model so the lock
only covers the transitions of state bits, instead of all of free_module.
So, I'm still hoping to hear a substantive reason why the filesystem model
can't be applied in general to all forms of modular code. To remind you
of the issue: the proposition is that the subsystem in the module is
always capable of knowing when the module is quiescent, because it does
whatever is necessary to keep track of the users and what they're doing.
--
Daniel
Hi,
On Wed, 17 Jul 2002, Daniel Phillips wrote:
> > The start/stop methods are not needed to fix the races, they allow better
> > control of the unload process.
>
> I'm afraid it must show that I didn't read the previous threads closely
> enough, but what is the specific benefit supposed to be, if not to address
> the races?
The basic idea is to allow modules to let the unload fail. The unload
process would basically look like this:
unregister all interfaces;
if (no users)
free all resources;
These two phases have to be done anyway, making it explicit in the module
interface gives more control to the user, e.g. you can disallow the
mounting of a new fs, while others are still mounted.
> > For filesystems it's only simpler because they only have a single entry
> > point, but the basic problem is always the same.
>
> What do you mean by single entry point? Mount? Register_filesystem?
I meant the first entry point into the module code that needs to be
protected (e.g. get_sb during mount).
> So, I'm still hoping to hear a substantive reason why the filesystem model
> can't be applied in general to all forms of modular code.
It's possible to use the filesystem model, but it's unnecessary complex
and inflexible. It should be possible to keep try_inc_mod_count() out of
the hot paths, but you have to call it e.g. at every single open().
Another problem is that the more interfaces a module has (e.g. proc), the
harder it becomes to unload a module (or the easier it becomes to prevent
the unloading of a module).
> To remind you
> of the issue: the proposition is that the subsystem in the module is
> always capable of knowing when the module is quiescent, because it does
> whatever is necessary to keep track of the users and what they're doing.
The problem is that the module can't do anything with that information, at
the time cleanup_module() is called it's already to late. That information
has currently to be managed completely outside of the module.
bye, Roman
On Thursday 18 July 2002 00:45, Roman Zippel wrote:
> Hi,
>
> On Wed, 17 Jul 2002, Daniel Phillips wrote:
>
> > > The start/stop methods are not needed to fix the races, they allow better
> > > control of the unload process.
> >
> > I'm afraid it must show that I didn't read the previous threads closely
> > enough, but what is the specific benefit supposed to be, if not to address
> > the races?
>
> The basic idea is to allow modules to let the unload fail. The unload
> process would basically look like this:
>
> unregister all interfaces;
> if (no users)
> free all resources;
>
> These two phases have to be done anyway, making it explicit in the module
> interface gives more control to the user, e.g. you can disallow the
> mounting of a new fs, while others are still mounted.
But I don't see why this needs to be a two step process:
module support calls mod->cleanup
module code checks number of users
if none, unregister all interfaces
otherwise return -EBUSY
if return wasn't -EBUSY, free all resources
> > > For filesystems it's only simpler because they only have a single entry
> > > point, but the basic problem is always the same.
> >
> > What do you mean by single entry point? Mount? Register_filesystem?
>
> I meant the first entry point into the module code that needs to be
> protected (e.g. get_sb during mount).
You haven't given a coherent argument re why it gets harder when there
are multiple ways of adding a user. Conceptually, each user incs
mod->count; we don't care how many they are, or what kind they are or
when they do it. All we care about is that they inc the count under
the appropriate lock.
> > So, I'm still hoping to hear a substantive reason why the filesystem model
> > can't be applied in general to all forms of modular code.
>
> It's possible to use the filesystem model, but it's unnecessary complex
> and inflexible.
What is the complex and inflexible part?
> It should be possible to keep try_inc_mod_count() out of
> the hot paths, but you have to call it e.g. at every single open().
That's not correct. It's currently called on every mount.
> Another problem is that the more interfaces a module has (e.g. proc), the
> harder it becomes to unload a module (or the easier it becomes to prevent
> the unloading of a module).
I don't see that this makes any difference at all.
> > To remind you
> > of the issue: the proposition is that the subsystem in the module is
> > always capable of knowing when the module is quiescent, because it does
> > whatever is necessary to keep track of the users and what they're doing.
>
> The problem is that the module can't do anything with that information, at
> the time cleanup_module() is called it's already to late. That information
> has currently to be managed completely outside of the module.
I'm proposing to add a return code to mod->cleanup (and pick a better
name). Yes, every module will have to be fixed to use this interface, but
why not? The current interface is broken, and besides, changing the module
interface every year or so helps nVidia remember why they should be good
and document their hardware, so we can maintain their driver for them.
Since we are changing the interface whichever way it goes, we should favor
the simplest interface.
Anyway, you're proposing to do it backwards. We need to first ensure
there are no users, then unregister the interfaces.
--
Daniel
Hi,
On Thu, 18 Jul 2002, Daniel Phillips wrote:
> But I don't see why this needs to be a two step process:
>
> module support calls mod->cleanup
> module code checks number of users
> if none, unregister all interfaces
> otherwise return -EBUSY
> if return wasn't -EBUSY, free all resources
So after you checked for users, someone starts using the module again,
before you were able to remove all interfaces -> OOPS.
This is exactly the reason why I prefer to make this explicit in the
module interface - the chances are higher to get this right.
> > It's possible to use the filesystem model, but it's unnecessary complex
> > and inflexible.
>
> What is the complex and inflexible part?
It has to work around the problem that cleanup_module() can't fail.
> > Another problem is that the more interfaces a module has (e.g. proc), the
> > harder it becomes to unload a module (or the easier it becomes to prevent
> > the unloading of a module).
>
> I don't see that this makes any difference at all.
It's currently impossible to force the unloading of a module, some user
can always keep a reference to the module. Even if you kill the process,
someone else can get a new reference, before you could unload the module.
> I'm proposing to add a return code to mod->cleanup (and pick a better
> name). Yes, every module will have to be fixed to use this interface, but
> why not?
I don't disagree, but if we break the module interface anyway, why don't
we clean it up properly?
(Patches that do that will follow shortly.)
> Anyway, you're proposing to do it backwards. We need to first ensure
> there are no users, then unregister the interfaces.
That's broken (see above).
bye, Roman
On Thursday 18 July 2002 12:01, Roman Zippel wrote:
> Hi,
>
> On Thu, 18 Jul 2002, Daniel Phillips wrote:
>
> > But I don't see why this needs to be a two step process:
> >
> > module support calls mod->cleanup
> > module code checks number of users
> > if none, unregister all interfaces
> > otherwise return -EBUSY
> > if return wasn't -EBUSY, free all resources
>
> So after you checked for users, someone starts using the module again,
> before you were able to remove all interfaces -> OOPS.
That's a different issue, and it is handled, though I omitted some details
for brevity:
module support calls mod->cleanup
spin_lock(&some_spinlock);
if (<mod_user_count>)
<clear_mod_active_bit>
spin_unlock(&some_spinlock);
if <no users>, unregister all interfaces, free resources
otherwise return -EBUSY
if return wasn't -EBUSY, free module itself
To add a new user, the active bit has to be set, as shown in this skeleton,
which is pretty much the existing try_inc_mod_count scheme:
spin_lock(&some_spinlock);
if (<mod_active_bit>)
<inc_mod_user_count>
spin_unlock(&some_spinlock);
if <users>, do the mount
In other words, the module has some state, the transitions of which are
protected by a spinlock. My improvement is to move that code out of the
module support code and into the module code itself, so that the module
support code doesn't have to be aware of the details of how this
particular serialization is done - allowing the possibility of other
ways of accomplishing it. Rusty had no trouble pointing out an example
where the filesystem mechanism per se doesn't work: LSM, where the
module hooks little bits of code in the vfs that cut across all the
file interfaces; it isn't practical to unmount all filesystems to remove
a security module. So there isn't going to be any 'one size fits all'
determination of what constitutes 'no users', or possibly there really
are modules that just can't be unloaded.
> This is exactly the reason why I prefer to make this explicit in the
> module interface - the chances are higher to get this right.
You see that the code form spin_lock to if <no users> can be a helper
function. This is truly a dumbed-down interface.
> > > It's possible to use the filesystem model, but it's unnecessary complex
> > > and inflexible.
> >
> > What is the complex and inflexible part?
>
> It has to work around the problem that cleanup_module() can't fail.
That's a circular argument. I said above that mod->clean aka cleanup_module
*can* fail, in a nice defined way. It fails early, before unregistering or
destroying any resources, if there are users on the module. This is totally
flexible, because it's under the control of the module code.
> > > Another problem is that the more interfaces a module has (e.g. proc), the
> > > harder it becomes to unload a module (or the easier it becomes to prevent
> > > the unloading of a module).
> >
> > I don't see that this makes any difference at all.
>
> It's currently impossible to force the unloading of a module, some user
> can always keep a reference to the module. Even if you kill the process,
> someone else can get a new reference, before you could unload the module.
The module's active bit prevents that, and forced uload (i.e., the
filesystem module tries to umount all users) is permitted by the
interface. To accomodate this, we just have to pass a 'force' flag to
mod->cleanup (which, I mentioned, would get a new name).
> > I'm proposing to add a return code to mod->cleanup (and pick a better
> > name). Yes, every module will have to be fixed to use this interface, but
> > why not?
>
> I don't disagree, but if we break the module interface anyway, why don't
> we clean it up properly?
> (Patches that do that will follow shortly.)
I am talking about cleaning it up properly, we just disagree on what the
definition of proper is. Perhaps you haven't explained yourself clearly,
but you have not yet proved a need to add two new function hooks to the
module interface.
I presume we do agree that, if there are two possible interfaces that
accomplish the same thing, we should choose the simpler of them.
> > Anyway, you're proposing to do it backwards. We need to first ensure
> > there are no users, then unregister the interfaces.
>
> That's broken (see above).
Err. I don't see that, no. Currently, filesystems currently check for
users before unregistering. I can see there's no way to do that for
LSM, and we may well want to remove all the interfaces first in that
case, then try some of the voodoo that is supposed to ensure no task
is executing in the module code, or give up and admit it's too messy.
In any event, either way of doing it is accomodated by the simple
interface I propose. It's entirely up to the module to decide which
it should use.
--
Daniel
Hi,
On Thu, 18 Jul 2002, Daniel Phillips wrote:
> To add a new user, the active bit has to be set, as shown in this skeleton,
> which is pretty much the existing try_inc_mod_count scheme:
>
> spin_lock(&some_spinlock);
> if (<mod_active_bit>)
> <inc_mod_user_count>
> spin_unlock(&some_spinlock);
>
> if <users>, do the mount
>
> In other words, the module has some state, the transitions of which are
> protected by a spinlock.
This means you still need another lock to protect the data structures and
you still have module pointers everywhere. I want to get rid of that
"same_spinlock" (aka unload_lock), because it's not needed.
I suggest we continue this discussion when I post the new patches in a few
days, then it should become more clear.
bye, Roman
On Thursday 18 July 2002 14:02, Roman Zippel wrote:
> Hi,
>
> On Thu, 18 Jul 2002, Daniel Phillips wrote:
>
> > To add a new user, the active bit has to be set, as shown in this skeleton,
> > which is pretty much the existing try_inc_mod_count scheme:
> >
> > spin_lock(&some_spinlock);
> > if (<mod_active_bit>)
> > <inc_mod_user_count>
> > spin_unlock(&some_spinlock);
> >
> > if <users>, do the mount
> >
> > In other words, the module has some state, the transitions of which are
> > protected by a spinlock.
>
> This means you still need another lock to protect the data structures and
> you still have module pointers everywhere.
A module pointer per filesystem does not count as 'everywhere'.
> I want to get rid of that
> "same_spinlock" (aka unload_lock), because it's not needed.
> I suggest we continue this discussion when I post the new patches in a few
> days, then it should become more clear.
Sure.
--
Daniel
On Tue, 16 Jul 2002 15:04:36 +0200 (CEST)
Roman Zippel <[email protected]> wrote:
> Hi,
>
> Below is an (unfinished) patch, which makes a small but important change
> to the module format. The module structure is no longer generated by
> insmod and is instead under control of the kernel now. This means we have
> now the freedom to properly cleanup the module code without needing to
> tell insmod about it. The dependency between insmod and the kernel is
> greatly reduced by this.
Um, Hi Roman,
I've started updating my in-kernel module loader patches for 2.5.26.
I'll send you a mail when it's done.
Thanks,
Rusty.
--
there are those who do and those who hang on and you don't see too
many doers quoting their contemporaries. -- Larry McVoy
Hi,
On Fri, 19 Jul 2002, Rusty Russell wrote:
> I've started updating my in-kernel module loader patches for 2.5.26.
> I'll send you a mail when it's done.
The more I think about it, the more I like the idea to go into the other
direction. Most of the module information (e.g. symbols, dependencies)
stored in the kernel can be as well managed in userspace.
bye, Roman
In message <Pine.LNX.4.44.0207191128030.28515-100000@serv> you write:
> Hi,
>
> On Fri, 19 Jul 2002, Rusty Russell wrote:
>
> > I've started updating my in-kernel module loader patches for 2.5.26.
> > I'll send you a mail when it's done.
>
> The more I think about it, the more I like the idea to go into the other
> direction. Most of the module information (e.g. symbols, dependencies)
> stored in the kernel can be as well managed in userspace.
You might be right. All kinds of things can be pushed into userspace:
code will tell.
Cheers!
Rusty.
--
Anyone who quotes me in their sig is an idiot. -- Rusty Russell.