2002-11-20 20:59:14

by Roman Zippel

[permalink] [raw]
Subject: [RFC] module fs or how to not break everything at once

Hi,

What is worse than getting flamed? Right, not getting flamed. I can
understand that Rusty is pissed at me, for hitting like this on his code
and I'm really sorry about that, but code quality is more important to me
right now. (The other possibility is that he's now laughing maniacally at
what a fool I am. ;-). ) OTOH it's pretty interesting to see that nobody
seems to care, as the kernel loader is now in everybody's kernel and if
everyone were happy, someone should be able tell me, why it should be kept
in the kernel.
Anyway, this is the last part from me in this series, with this I said
everything I wanted and if anyone is interested in discussion about module
design (instead of just throwing it into the kernel), I'd be happy,
otherwise I'll just shut up for a while.

Below is a basic module fs implementation, which demonstrates an
alternative way of managing modules. It's reduced to the most basic
functionality needed to get modules working: map an object into kernel
space and start/stop a module. Everything else can and should be done in
user space. On top of this functionality it's easy to emulate the old
system calls, so old insmod/rmmod work just fine. This allows now to write
new module tools and when they are ready (this means we're also happy with
the new module interface), we can drop all the old syscalls. The new
loader OTOH has not much to do either (see the mini loader I posted last
week), it only has to relocate the module and move it to the kernel. Most
important here is the loader doesn't need to know anything about any
kernel structures, any information needed by the kernel is added in the
final link stage. This gives us very much freedom with the module layout
and makes it very unlikely that the user space loader needs to change.
Once the module tools are ready and we don't need to care about
compatibility anymore, it's possible to change the driver module interface
however we want. This should be the last stage during the module code
cleanup, otherwise complete breakage is guaranteed.

Overall this means all module tasks become nicely separated. During build
we prepare the module with all the information the kernel needs, the
loader takes care of dependencies and just relocates the module, modfs
maps it into the kernel and starts the module. If the interfaces are
halfway flexible, changes in one part don't require a change somewhere
else.
This sounds simple, but it's lots of work and I had preferred, we had to
completely change the module interfaces/tools only once. I will further
develop the code below over the next months, whenever I'm bored (or if I
should need modules for some reason) and it should be ready for 2.7.

Some comments about the code: it can be basically dropped into any
kernel/module.c < 2.5.48 (just remove/comment out the old
sys_create_module/sys_init_module/sys_delete_module functions).
Loading/removing symbols into kernel through modfs still has to be done. A
module is simply created with mkdir, the module is a normal (write only)
file and it can be mapped/unmmapped/inited/exited through a control file,
Right now it's possible to map multiple files per module dir, e.g. for
separate core/init sections, but I consider to keep this to one object
file per directory and just truncate the init part. Anyway, this is only
supposed to give an idea how it could look like, it's far away from ready,
but it already works here. Comments are very welcome.

bye, Roman


#include <linux/file.h>
#include <linux/namei.h>
#include <linux/backing-dev.h>
#include <linux/pagemap.h>
#include <linux/ctype.h>

struct module_dir {
struct dentry *dir;
struct dentry *control;
struct dentry *module;
unsigned int usecount;
};

struct module_data {
struct dentry *dentry;
struct module *module;
};

static struct vfsmount *modfs_mount;

static int modfs_mkdir(struct inode *dir, struct dentry *dentry, int mode);
static int modfs_rmdir(struct inode *dir, struct dentry *dentry);
static int modfs_create(struct inode *dir, struct dentry *dentry, int mode);
static int modfs_unlink(struct inode *dir, struct dentry *dentry);
static int modfs_control_file_open(struct inode *inode, struct file *file);
static ssize_t modfs_control_file_write(struct file *file, const char *buffer,
size_t count, loff_t *ppos);
static int modfs_mod_file_open(struct inode *inode, struct file *file);
static int modfs_mod_file_setattr(struct dentry *dentry, struct iattr *attr);

static struct backing_dev_info modfs_backing_dev_info = {
.ra_pages = 0, /* No readahead */
.memory_backed = 1, /* Does not contribute to dirty memory */
};

static struct address_space_operations modfs_aops = {
.readpage = simple_readpage,
.writepage = fail_writepage,
.prepare_write = simple_prepare_write,
.commit_write = simple_commit_write,
};

struct inode_operations modfs_rootdir_inode_operations = {
.mkdir = modfs_mkdir,
.lookup = simple_lookup,
.rmdir = modfs_rmdir,
};

struct inode_operations modfs_moddir_inode_operations = {
.create = modfs_create,
.lookup = simple_lookup,
.unlink = modfs_unlink,
};

static struct file_operations modfs_control_file_operations = {
.open = modfs_control_file_open,
.read = seq_read,
.llseek = seq_lseek,
.release = single_release,
.write = modfs_control_file_write,
};

static struct file_operations modfs_mod_file_operations = {
.open = modfs_mod_file_open,
.read = seq_read,
.llseek = seq_lseek,
.release = single_release,
.write = generic_file_write,
.mmap = generic_file_mmap,
};

static struct inode_operations modfs_mod_inode_operations = {
.setattr = modfs_mod_file_setattr,
};

static struct super_operations modfs_ops = {
.statfs = simple_statfs,
.drop_inode = generic_delete_inode,
};

struct inode *modfs_get_inode(struct super_block *sb, int mode, int dev)
{
struct inode *inode = new_inode(sb);

if (inode) {
inode->i_mode = mode;
inode->i_uid = current->fsuid;
inode->i_gid = current->fsgid;
inode->i_blksize = PAGE_CACHE_SIZE;
inode->i_blocks = 0;
inode->i_rdev = NODEV;
inode->i_mapping->a_ops = &modfs_aops;
inode->i_mapping->backing_dev_info = &modfs_backing_dev_info;
inode->i_atime = inode->i_mtime = inode->i_ctime = CURRENT_TIME;
switch (mode & S_IFMT) {
default:
init_special_inode(inode, mode, dev);
break;
case S_IFREG:
break;
case S_IFDIR:
inode->i_op = &simple_dir_inode_operations;
inode->i_fop = &simple_dir_operations;

/* directory inodes start off with i_nlink == 2 (for "." entry) */
inode->i_nlink++;
break;
case S_IFLNK:
inode->i_op = &page_symlink_inode_operations;
break;
}
}
return inode;
}

static int modfs_mkdir(struct inode *dir, struct dentry *dentry, int mode)
{
struct module_dir *moddir;
struct inode *inode, *inode2;

moddir = kmalloc(sizeof(*moddir), GFP_KERNEL);
if (!moddir)
return -ENOMEM;
memset(moddir, 0, sizeof(*moddir));

inode = modfs_get_inode(dir->i_sb, mode | S_IFDIR, 0);
if (!inode)
/* FIXME */
return -ENOSPC;

moddir->dir = dentry;
inode->u.generic_ip = moddir;
inode->i_op = &modfs_moddir_inode_operations;
d_instantiate(dentry, inode);
dget(dentry);
dir->i_nlink++;

inode2 = modfs_get_inode(dir->i_sb, (mode & ~S_IXUGO) | S_IFREG, 0);
if (!inode)
/* FIXME */
return -ENOSPC;
down(&inode->i_sem);
moddir->control = lookup_one_len("control", moddir->dir, 7);
if (!IS_ERR(dentry)) {
inode2->i_fop = &modfs_control_file_operations;
d_instantiate(moddir->control, inode2);
} else
/* FIXME */;
up(&inode->i_sem);

return 0;
}

static int modfs_rmdir(struct inode *dir, struct dentry *dentry)
{
int res = simple_rmdir(dir, dentry);
if (!res)
kfree(dentry->d_inode->u.generic_ip);
return res;
}

static int modfs_create(struct inode *dir, struct dentry *dentry, int mode)
{
struct inode *inode;
struct module_dir *moddir = dir->u.generic_ip;
struct module_data *data;

inode = modfs_get_inode(dir->i_sb, mode | S_IFREG, 0);
if (!inode)
return -ENOSPC;

data = kmalloc(sizeof(*data), GFP_KERNEL);
if (!data)
/* FIXME */
return -ENOMEM;
memset(data, 0, sizeof(*data));
data->dentry = dentry;
inode->u.generic_ip = data;
inode->i_fop = &modfs_mod_file_operations;
inode->i_op = &modfs_mod_inode_operations;
d_instantiate(dentry, inode);
dget(dentry);
moddir->usecount++;
return 0;
}

static int modfs_unlink(struct inode *dir, struct dentry *dentry)
{
struct inode *inode = dentry->d_inode;
struct module_dir *moddir = dir->u.generic_ip;

if (dentry == moddir->control) {
if (moddir->usecount)
return -EBUSY;
} else {
struct module_data *mdata = inode->u.generic_ip;
if (mdata->module)
return -EBUSY;
moddir->usecount--;
}
simple_unlink(dir, dentry);
return 0;
}

static int modfs_control_file_show(struct seq_file *m, void *v)
{
seq_printf(m, "todo\n");
return 0;
}

static int modfs_control_file_open(struct inode *inode, struct file *file)
{
return single_open(file, modfs_control_file_show, NULL);
}

static struct dentry *modfs_control_get_modfile(struct dentry *parent, char *name)
{
struct dentry *dentry;
char *end;

while (isspace(*name))
name++;
end = name;
while (!isspace(*end))
end++;
dentry = lookup_one_len(name, parent, end - name);
if (IS_ERR(dentry))
return dentry;
if (!dentry->d_inode || dentry->d_inode->i_fop != &modfs_mod_file_operations) {
dput(dentry);
return ERR_PTR(-ENOENT);
}
return dentry;
}

static int modfs_mod_file_map(struct dentry *dentry)
{
struct module_data *mdata;
int i, pcnt = (dentry->d_inode->i_size + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT;
struct page **pagevec;

mdata = dentry->d_inode->u.generic_ip;
if (mdata->module)
return -EEXIST;
pagevec = kmalloc(pcnt * sizeof(*pagevec), GFP_KERNEL);
if (!pagevec)
/* FIXME */;
for (i = 0; i < pcnt; i++) {
pagevec[i] = grab_cache_page(dentry->d_inode->i_mapping, i);
if (!pagevec[i])
/* FIXME */;
/* init page... */;
unlock_page(pagevec[i]);
}
mdata->module = vmap(pagevec, pcnt);
return 0;
}

static int modfs_mod_file_unmap(struct dentry *dentry)
{
struct module_data *mdata = dentry->d_inode->u.generic_ip;
if (!mdata->module)
return -ENOENT;
vunmap(mdata->module);
/* release pages */;
mdata->module = NULL;
return 0;
}

static int modfs_mod_file_init(struct dentry *dentry)
{
struct module_data *mdata = dentry->d_inode->u.generic_ip;
struct module *mod = mdata->module;
struct module_ref *dep;
unsigned long flags;
int res, i;

if (!mod)
return -ENOENT;
if (module_arch_init(mod))
return -EINVAL;

spin_lock_irqsave(&modlist_lock, flags);
mod->next = module_list;
module_list = mod; /* link it in */
spin_unlock_irqrestore(&modlist_lock, flags);

/* On some machines it is necessary to do something here
to make the I and D caches consistent. */
flush_icache_range((unsigned long)mod, (unsigned long)mod + dentry->d_inode->i_size);

/* Update module references. */
for (i = 0, dep = mod->deps; i < mod->ndeps; ++i, ++dep) {
struct module *d = dep->dep;

dep->ref = mod;
dep->next_ref = d->refs;
d->refs = dep;
/* Being referenced by a dependent module counts as a
use as far as kmod is concerned. */
d->flags |= MOD_USED_ONCE;
}

/* Initialize the module. */
atomic_set(&mod->uc.usecount,1);
mod->flags |= MOD_INITIALIZING;
if (mod->init && (res = mod->init()) != 0) {
atomic_set(&mod->uc.usecount,0);
mod->flags &= ~MOD_INITIALIZING;
if (res > 0) /* Buggy module */
res = -EBUSY;
return res;
}
atomic_dec(&mod->uc.usecount);

/* And set it running. */
mod->flags = (mod->flags | MOD_RUNNING) & ~MOD_INITIALIZING;

return 0;
}

static int modfs_mod_file_exit(struct dentry *dentry)
{
struct module_data *mdata = dentry->d_inode->u.generic_ip;
struct module *mod = mdata->module;
struct module_ref *dep;
int res = 0, i;

if (!mod)
return -ENOENT;

if (mod->refs != NULL)
return -EBUSY;

spin_lock(&unload_lock);
if (!__MOD_IN_USE(mod))
mod->flags |= MOD_DELETED;
else
res = -EBUSY;
spin_unlock(&unload_lock);

if (!res) {
unsigned long flags;

/* Let the module clean up. */
if (mod->flags & MOD_RUNNING) {
if(mod->cleanup)
mod->cleanup();
mod->flags &= ~MOD_RUNNING;
}

/* Remove the module from the dependency lists. */
for (i = 0, dep = mod->deps; i < mod->ndeps; ++i, ++dep) {
struct module_ref **pp;
for (pp = &dep->dep->refs; *pp != dep; pp = &(*pp)->next_ref)
continue;
*pp = dep->next_ref;
}

/* And from the main module list. */
spin_lock_irqsave(&modlist_lock, flags);
if (mod == module_list) {
module_list = mod->next;
} else {
struct module *p;
for (p = module_list; p->next != mod; p = p->next)
continue;
p->next = mod->next;
}
spin_unlock_irqrestore(&modlist_lock, flags);
}

return 0;
}

static ssize_t modfs_control_file_write(struct file *file, const char *buffer,
size_t count, loff_t *ppos)
{
struct dentry *dentry;
char *data;

data = __getname();
if (!data)
return -ENOMEM;
count = min(count, (size_t)PATH_MAX - 1);
if (!copy_from_user(data, buffer, count)) {
data[count] = 0;
if (!strncmp(data, "map ", 4)) {
dentry = modfs_control_get_modfile(file->f_dentry->d_parent, data + 4);
if (!IS_ERR(dentry)) {
modfs_mod_file_map(dentry);
dput(dentry);
}
} else if (!strncmp(data, "unmap ", 6)) {
dentry = modfs_control_get_modfile(file->f_dentry->d_parent, data + 6);
if (!IS_ERR(dentry)) {
modfs_mod_file_unmap(dentry);
dput(dentry);
}
} else if (!strncmp(data, "init ", 5)) {
dentry = modfs_control_get_modfile(file->f_dentry->d_parent, data + 5);
if (!IS_ERR(dentry)) {
modfs_mod_file_init(dentry);
dput(dentry);
}
} else if (!strncmp(data, "exit ", 5)) {
dentry = modfs_control_get_modfile(file->f_dentry->d_parent, data + 5);
if (!IS_ERR(dentry)) {
modfs_mod_file_exit(dentry);
dput(dentry);
}
}
}
putname(data);
return count;
}

static int modfs_mod_file_show(struct seq_file *m, void *v)
{
struct module_data *data = m->private;
if (data->module) {
struct module *mod = data->module;
if (mod->flags & MOD_DELETED)
seq_puts(m, "deleted\n");
else if (mod->flags & MOD_RUNNING) {
if (mod->flags & MOD_AUTOCLEAN)
seq_puts(m, "autoclean\n");
if (!(mod->flags & MOD_USED_ONCE))
seq_puts(m, "unused\n");
} else if (mod->flags & MOD_INITIALIZING)
seq_puts(m, "initializing\n");
else
seq_puts(m, "uninitialized\n");
seq_printf(m, "%p\n", mod);
} else
seq_printf(m, "unmapped\n");
return 0;
}

static int modfs_mod_file_open(struct inode *inode, struct file *file)
{
return single_open(file, modfs_mod_file_show, inode->u.generic_ip);
}

static int modfs_mod_file_setattr(struct dentry *dentry, struct iattr *attr)
{
struct inode *inode = dentry->d_inode;

if (attr->ia_valid & ATTR_SIZE) {
struct module_data *data = inode->u.generic_ip;
if (data->module)
return -EPERM;
}
return inode_setattr(inode, attr);
}

static int modfs_fill_super(struct super_block * sb, void * data, int silent)
{
struct inode * inode;
struct dentry * root;

sb->s_blocksize = PAGE_CACHE_SIZE;
sb->s_blocksize_bits = PAGE_CACHE_SHIFT;
sb->s_magic = 0x7a3dc6cd;//MODFS_MAGIC;
sb->s_op = &modfs_ops;
inode = modfs_get_inode(sb, S_IFDIR | 0755, 0);
if (!inode)
return -ENOMEM;
inode->i_op = &modfs_rootdir_inode_operations;

root = d_alloc_root(inode);
if (!root) {
iput(inode);
return -ENOMEM;
}
sb->s_root = root;
return 0;
}

static struct super_block *modfs_get_sb(struct file_system_type *fs_type,
int flags, char *dev_name, void *data)
{
return get_sb_single(fs_type, flags, data, modfs_fill_super);
}

static struct file_system_type module_fs_type = {
.name = "modfs",
.get_sb = modfs_get_sb,
.kill_sb = kill_litter_super,
};

int __init init_modfs(void)
{
if (!register_filesystem(&module_fs_type)) {
modfs_mount = kern_mount(&module_fs_type);
if (IS_ERR(modfs_mount)) {
printk(KERN_ERR "modfs: could not mount!\n");
modfs_mount = NULL;
}
}
return 0;
}

device_initcall(init_modfs);

/*
* Allocate space for a module.
*/

asmlinkage unsigned long
sys_create_module(const char *name_user, size_t size)
{
char *name;
struct dentry *dentry;
struct module_dir *moddir;
int res = 0;

if (!capable(CAP_SYS_MODULE))
return -EPERM;

name = getname(name_user);
if (IS_ERR(name))
return PTR_ERR(name);

down(&modfs_mount->mnt_sb->s_root->d_inode->i_sem);
dentry = lookup_one_len(name, modfs_mount->mnt_sb->s_root, strlen(name));
if (!IS_ERR(dentry)) {
if (!dentry->d_inode)
res = modfs_mkdir(modfs_mount->mnt_sb->s_root->d_inode, dentry, 0755);
else
res = -EEXIST;
} else
res = PTR_ERR(dentry);
up(&modfs_mount->mnt_sb->s_root->d_inode->i_sem);
if (res)
goto exit;

moddir = dentry->d_inode->u.generic_ip;
if (moddir->module) {
res = -EEXIST;
goto exit;
}

down(&dentry->d_inode->i_sem);
moddir->module = lookup_one_len("module", dentry, 6);
if (!IS_ERR(moddir->module)) {
if (!moddir->module->d_inode)
res = modfs_create(dentry->d_inode, moddir->module, 0600);
else
res = -EEXIST;
} else
res = PTR_ERR(moddir->module);
up(&dentry->d_inode->i_sem);

if (!res) {
res = do_truncate(moddir->module, size);
modfs_mod_file_map(moddir->module);
}

exit:
if (res)
/* clean up */;
else
res = (unsigned long)(((struct module_data *)moddir->module->d_inode->u.generic_ip)->module);
putname(name);
dput(dentry);

return res;
}

/*
* Initialize a module.
*/

asmlinkage long
sys_init_module(const char *name_user, struct module *mod_user)
{
char *name;
struct dentry *dentry;
struct module_dir *moddir;
struct file *file;
long res = 0;

if (!capable(CAP_SYS_MODULE))
return -EPERM;

name = getname(name_user);
if (IS_ERR(name))
return PTR_ERR(name);

down(&modfs_mount->mnt_sb->s_root->d_inode->i_sem);
dentry = lookup_one_len(name, modfs_mount->mnt_sb->s_root, strlen(name));
if (!IS_ERR(dentry)) {
if (!dentry->d_inode)
res = -ENOENT;
} else
res = PTR_ERR(dentry);
up(&modfs_mount->mnt_sb->s_root->d_inode->i_sem);

if (res)
goto exit;

moddir = dentry->d_inode->u.generic_ip;
if (!moddir->module) {
res = -ENOENT;
goto exit;
}

mntget(modfs_mount);
dget(moddir->module);
file = dentry_open(moddir->module, modfs_mount, O_RDWR);
generic_file_write(file, (void *)mod_user, moddir->module->d_inode->i_size, &file->f_pos);
fput(file);

modfs_mod_file_init(moddir->module);

exit:
putname(name);
dput(dentry);

return res;
}

asmlinkage long
sys_delete_module(const char *name_user)
{
struct dentry *dentry;
struct module_dir *moddir;
char *name;
long res = 0;

if (!capable(CAP_SYS_MODULE))
return -EPERM;

if (!capable(CAP_SYS_MODULE))
return -EPERM;

name = getname(name_user);
if (IS_ERR(name))
return PTR_ERR(name);

down(&modfs_mount->mnt_sb->s_root->d_inode->i_sem);
dentry = lookup_one_len(name, modfs_mount->mnt_sb->s_root, strlen(name));
if (!IS_ERR(dentry)) {
if (!dentry->d_inode)
res = -ENOENT;
} else
res = PTR_ERR(dentry);
up(&modfs_mount->mnt_sb->s_root->d_inode->i_sem);

if (res)
goto exit;

moddir = dentry->d_inode->u.generic_ip;
if (!moddir->module) {
res = -ENOENT;
goto exit;
}

res = modfs_mod_file_exit(moddir->module);
if (!res) {
modfs_mod_file_unmap(moddir->module);
down(&moddir->dir->d_inode->i_sem);
modfs_unlink(moddir->dir->d_inode, moddir->module);
d_invalidate(moddir->module);
dput(moddir->module);
moddir->module = NULL;
if (!modfs_unlink(moddir->dir->d_inode, moddir->control))
d_invalidate(moddir->control);
up(&moddir->dir->d_inode->i_sem);
down(&modfs_mount->mnt_sb->s_root->d_inode->i_sem);
if (!modfs_rmdir(modfs_mount->mnt_sb->s_root->d_inode, moddir->dir))
d_invalidate(moddir->dir);
up(&modfs_mount->mnt_sb->s_root->d_inode->i_sem);
}

exit:
putname(name);
dput(dentry);

return res;
}


2002-11-20 21:58:20

by Petr Vandrovec

[permalink] [raw]
Subject: Re: [RFC] module fs or how to not break everything at once

On Wed, Nov 20, 2002 at 10:06:01PM +0100, Roman Zippel wrote:
>
> what a fool I am. ;-). ) OTOH it's pretty interesting to see that nobody
> seems to care, as the kernel loader is now in everybody's kernel and if
> everyone were happy, someone should be able tell me, why it should be kept
> in the kernel.

Nobody is answering for almost hour, so I'll do...

It was surprise to me that after patching about 100 lines in my
kernel I can live without modules ;-) I cannot redistribute it because
of vmmon/vmnet are linked directly to kernel, but it works. Maybe
I should be constructive, and update module-init-tools to fit my
needs (-p, -s; error messages from insmod going to screen and
not to the dmesg by default..., post-install/pre-install/post-remove/pre-remove
support in modprobe), but after comparing size of modutils and
module-init-tools I'll just leave this job to others: it looks like
fulltime job for couple of months.

> Below is a basic module fs implementation, which demonstrates an
> alternative way of managing modules. It's reduced to the most basic
> functionality needed to get modules working: map an object into kernel
> space and start/stop a module. Everything else can and should be done in
> user space. On top of this functionality it's easy to emulate the old
> system calls, so old insmod/rmmod work just fine. This allows now to write
> new module tools and when they are ready (this means we're also happy with
> the new module interface), we can drop all the old syscalls. The new

Nice, but...

> static ssize_t modfs_control_file_write(struct file *file, const char *buffer,
> size_t count, loff_t *ppos)
> {
> struct dentry *dentry;
> char *data;
>
> data = __getname();
> if (!data)
> return -ENOMEM;
> count = min(count, (size_t)PATH_MAX - 1);
> if (!copy_from_user(data, buffer, count)) {
> data[count] = 0;
> if (!strncmp(data, "map ", 4)) {

... now when we have setxattr() in kernel, what about using
setxattr() on module directory instead of separate control file?
I know, you cannot manage it with "echo" then, but it looks
strange that mkdir automatically creates control file while
rmdir does not remove it automatically... and without control
file, sys_delete_module() could then use simple setxattr/vfs_unlink/vfs_rmdir
instead of doing vfs's job by hand.

> down(&modfs_mount->mnt_sb->s_root->d_inode->i_sem);
> dentry = lookup_one_len(name, modfs_mount->mnt_sb->s_root, strlen(name));
> if (!IS_ERR(dentry)) {
> if (!dentry->d_inode)
> res = modfs_mkdir(modfs_mount->mnt_sb->s_root->d_inode, dentry, 0755);
> else
> res = -EEXIST;
> } else
> res = PTR_ERR(dentry);
> up(&modfs_mount->mnt_sb->s_root->d_inode->i_sem);

You are skipping security_ops hooks. Can you use vfs_mkdir() instead
of modfs_mkdir(), just to make sure that if someone adds some new
features into vfs_mkdir(), you'll not miss them ?

> asmlinkage long
> sys_delete_module(const char *name_user)
> {
> struct dentry *dentry;
> struct module_dir *moddir;
> char *name;
> long res = 0;
>
> if (!capable(CAP_SYS_MODULE))
> return -EPERM;
>
> if (!capable(CAP_SYS_MODULE))
> return -EPERM;

One pair of these lines is redundant...

And one minor comment: do you really need both module_dir->module
and module_data->module? Do you use it only to make sure that
sys_delete_module will not operate on modules not created by
sys_init_module?

It has unfortunate feature that sys_create_module();
sys_delete_module() (without suceeding sys_init_module between
them) will return -ENOENT, and you'll have to use rm/rmdir to get
rid of module :-(
Thanks,
Petr Vandrovec
[email protected]


2002-11-20 23:26:06

by Roman Zippel

[permalink] [raw]
Subject: Re: [RFC] module fs or how to not break everything at once

Hi,

On Wed, 20 Nov 2002, Petr Vandrovec wrote:

> ... now when we have setxattr() in kernel, what about using
> setxattr() on module directory instead of separate control file?
> I know, you cannot manage it with "echo" then, but it looks

A normal file is easier for testing.

> strange that mkdir automatically creates control file while
> rmdir does not remove it automatically... and without control

Indeed, rmdir should probably also remove the control file.

> You are skipping security_ops hooks. Can you use vfs_mkdir() instead
> of modfs_mkdir(), just to make sure that if someone adds some new
> features into vfs_mkdir(), you'll not miss them ?

The system calls are only a temporary solution, so skipping security hooks
is no real loss. :)

> And one minor comment: do you really need both module_dir->module
> and module_data->module? Do you use it only to make sure that
> sys_delete_module will not operate on modules not created by
> sys_init_module?

module_dir->module is an optimization to get quickly to the module file
created by sys_create_module(). module_data->module is the pointer to the
real module data.

> It has unfortunate feature that sys_create_module();
> sys_delete_module() (without suceeding sys_init_module between
> them) will return -ENOENT, and you'll have to use rm/rmdir to get
> rid of module :-(

As the system calls are only temporary, they don't have to be perfect, but
why should it return -ENOINT, AFAICS it should fail for other reason.
Anyway, the module state management needs a overhaul, this is just a
copy&paste from the old code.

bye, Roman

2002-11-21 01:52:57

by Petr Vandrovec

[permalink] [raw]
Subject: Re: [RFC] module fs or how to not break everything at once

On Thu, Nov 21, 2002 at 12:32:40AM +0100, Roman Zippel wrote:
> Hi,
>
> On Wed, 20 Nov 2002, Petr Vandrovec wrote:
>
> > ... now when we have setxattr() in kernel, what about using
> > setxattr() on module directory instead of separate control file?
> > I know, you cannot manage it with "echo" then, but it looks
>
> A normal file is easier for testing.

read/write on directory? OK, if you can make behavior of "control"
file more consistent.

> > strange that mkdir automatically creates control file while
> > rmdir does not remove it automatically... and without control
>
> Indeed, rmdir should probably also remove the control file.

Actually I was thinking in opposite way: requiring user to create
"control", instead of creating it automatically.

And more I think about it, why there is one "control" for each
directory? Is not one "control" for whole modfs sufficient?
fs is then just mounted with existing control, and unmount
gets rid of it.

And after implementing releasing of .init as truncate() on
module file, you can remove directories from modfs completely,
even more simplifying it.

> > And one minor comment: do you really need both module_dir->module
> > and module_data->module? Do you use it only to make sure that
> > sys_delete_module will not operate on modules not created by
> > sys_init_module?
>
> module_dir->module is an optimization to get quickly to the module file
> created by sys_create_module().

Ok, I'll try something else. If you'll create module through
sys_create_module, and remove using echo/rm/rmdir, will not it leak
dentry?

> > It has unfortunate feature that sys_create_module();
> > sys_delete_module() (without suceeding sys_init_module between
> > them) will return -ENOENT, and you'll have to use rm/rmdir to get
> > rid of module :-(
>
> As the system calls are only temporary, they don't have to be perfect, but
> why should it return -ENOINT, AFAICS it should fail for other reason.

Your current implementation will return -ENOENT because of moddir->module
will be NULL: but it is not critical error, it just means that
sys_create_module() was called, but sys_init_module() was not, and
you should not fail cleanup for such case.

I think that you should not be too clever in insmod/rmmod. Just code
them like userspace clients will do: open "control", write "exit module"
and "unmap module" into it, and then unlink "module" and remove
that directory...

And do we need separate map/init and exit/unmap? I do not think that
it should be supported to do map/init/exit/init/exit/unmap...
And because of you do not allow unlink() on mapped module,
what about doing implicit exit/unmap on unlink, removing exit/unmap
operations from control completely?
Best regards,
Petr Vandrovec
[email protected]

2002-11-21 19:39:57

by Roman Zippel

[permalink] [raw]
Subject: Re: [RFC] module fs or how to not break everything at once

Hi,

On Thu, 21 Nov 2002, Petr Vandrovec wrote:

> > Indeed, rmdir should probably also remove the control file.
>
> Actually I was thinking in opposite way: requiring user to create
> "control", instead of creating it automatically.
>
> And more I think about it, why there is one "control" for each
> directory? Is not one "control" for whole modfs sufficient?
> fs is then just mounted with existing control, and unmount
> gets rid of it.

I'd rather create a central control file (which might be really the
cleaner solution), than let the user create it.

> And after implementing releasing of .init as truncate() on
> module file, you can remove directories from modfs completely,
> even more simplifying it.

Directories are likely required, e.g. symbols have to be added and I'm
still thinking about how to represent the module arguments.

> > module_dir->module is an optimization to get quickly to the module file
> > created by sys_create_module().
>
> Ok, I'll try something else. If you'll create module through
> sys_create_module, and remove using echo/rm/rmdir, will not it leak
> dentry?

I noticed that too, but it's easy to fix and not the only leak, as you
probably noticed. :)

> > > It has unfortunate feature that sys_create_module();
> > > sys_delete_module() (without suceeding sys_init_module between
> > > them) will return -ENOENT, and you'll have to use rm/rmdir to get
> > > rid of module :-(
> >
> > As the system calls are only temporary, they don't have to be perfect, but
> > why should it return -ENOINT, AFAICS it should fail for other reason.
>
> Your current implementation will return -ENOENT because of moddir->module
> will be NULL: but it is not critical error, it just means that
> sys_create_module() was called, but sys_init_module() was not, and
> you should not fail cleanup for such case.

moddir->module is initialized in sys_create_module, so this shouldn't be a
problem.

> I think that you should not be too clever in insmod/rmmod. Just code
> them like userspace clients will do: open "control", write "exit module"
> and "unmap module" into it, and then unlink "module" and remove
> that directory...

Well, first I have to see if I have to emulate these system calls at all.
I still have to see how much work it will be to reimplement the magic
do-it-all module load syscall on top of modfs, it probably will be easier
to keep them separate.

> And do we need separate map/init and exit/unmap? I do not think that
> it should be supported to do map/init/exit/init/exit/unmap...

You first have to map the module into the kernel (or at least reserve the
space), so you know the address the module has to be relocated to.

> And because of you do not allow unlink() on mapped module,
> what about doing implicit exit/unmap on unlink, removing exit/unmap
> operations from control completely?

For now I'd rather keep a fine grained interface, it can still be
simplified later.

bye, Roman

2002-11-21 23:14:35

by Adam J. Richter

[permalink] [raw]
Subject: Re: [RFC] module fs or how to not break everything at once

Hi Roman,

Can you tell me what problem in the existing userland module
code (i.e., before 2.5.48) modulefs solves? Does it actually make
the kernel smaller? Can you give me an example?

I agree with you that module linking should be done in user
space. I have asked repeatedly for someone to show real benefits to
in-kernel module linking, and, so far, nobody has. To my knowledge,
your proposal hasn't passed this test either, but I'll make a few
suggestions on it anyway.

1. Please take Petr's advice and just make module removal
occur when you rmdir the directory. Making the removal happen in
two stages introduces additioal states that have to be defined, such
as the state where the "module unmap" command has been received
but the module's directory has been removed. What if somebody else
wants to load the same module again at this time? Either you will
get flakiness in facilities that rely on automatic kernel module
loading or modprobe has to be modified to deal with that
possibility and perhaps try to do the remove itself.

2. I would really like insmod to be able to flock modules (or
do something similar). This would increment the reference count on a
module until insmod would exit (or would do an execve, I suppose).
This would eliminate a race when insmod is loading a module that
references symbols in other modules. I don't think flock is actually
propagated down the vfs layer, so perhaps some other primitive could be
used. Perhaps just holding open an open file descriptor on each of
the modules in question would be a better interface.

3. It's not a modulefs thing, but you might want to consider
moving all symbol management to user space, including that for the
core kernel symbols. The symbol tables can be maintained in user
files by the insmod and rmmod programs, they could even be inferred
from the module's start address and the module's .o in the file
system. If people really want to store this data to be stored in
kernel memory, they can allocate extra space where the module is
actually loaded and their favorite symbol table format there.

Adam J. Richter __ ______________ 575 Oroville Road
[email protected] \ / Milpitas, California 95035
+1 408 309-6081 | g g d r a s i l United States of America
"Free Software For The Rest Of Us."

2002-11-21 23:52:53

by Adam J. Richter

[permalink] [raw]
Subject: Re: [RFC] module fs or how to not break everything at once

Here are some more comments on modfs or modules in general:

4. Regarding what I said about leaving symbol tracking to
user space, I'd like to add a note about reliability. It is not
necessary for rmmod to reliably remove symbols from these tables.
insmod can check for symbols that map to places that are no longer
allocated to modules and remove stale symbols before attempting to
load a new module.

5. You should not need to do anything special in modfs
to support init sections. That can be done by loading two separate
modules and then removing the one corresponding to an init section.

6. For module_init functions, consider taking a pointer to
an array of init functions and a count. That would be compatible
with the system for built-in kernel object files, so it would
eliminate a difference in linux/include/init.h. You could also do
the same for module_exit functions. This change would also
allow linking .o's together more or less arbitrarily into single
modules, which would allow loading of modules that have reference
loops among them.

7. Instead of module parameters, consider just supporting
__setup() declarations used in kernel objects. These would then
process a string that you'd pass. insmod might want to prepend the
contents of /proc/cmdline to that string so that people could pass
command lines at the boot prompt even if the driver they wanted to
talk to were a module. This change would eliminate another difference
between module and core kernel objects. Ultimately, I would like
eliminate any compilation difference between kernel and module
objects. This would allow deferring the decision about what should be
in modules and what should be in the kernel until link time. Not only
would this slightly simplify the build process and modules' source
code, but it would also allow binary distributions for a wider
variety of uses.

Adam J. Richter __ ______________ 575 Oroville Road
[email protected] \ / Milpitas, California 95035
+1 408 309-6081 | g g d r a s i l United States of America
"Free Software For The Rest Of Us."

2002-11-22 10:43:20

by Roman Zippel

[permalink] [raw]
Subject: Re: [RFC] module fs or how to not break everything at once

Hi,

On Thu, 21 Nov 2002, Adam J. Richter wrote:

> Can you tell me what problem in the existing userland module
> code (i.e., before 2.5.48) modulefs solves? Does it actually make
> the kernel smaller? Can you give me an example?

Previous insmod knows too much about kernel internals, it has to fill in
kernel structures, which makes changing this structure a real pain. Part
of the relocation work can be moved to kbuild, what makes insmod simpler
and gives us better control over the module layout. See the mini module
loader I posted last week.

> 1. Please take Petr's advice and just make module removal
> occur when you rmdir the directory. Making the removal happen in
> two stages introduces additioal states that have to be defined, such
> as the state where the "module unmap" command has been received
> but the module's directory has been removed. What if somebody else
> wants to load the same module again at this time? Either you will
> get flakiness in facilities that rely on automatic kernel module
> loading or modprobe has to be modified to deal with that
> possibility and perhaps try to do the remove itself.

Synchronizing multiple insmod is needed in user space anyway, this doesn't
make the kernel side more complex.

> 2. I would really like insmod to be able to flock modules (or
> do something similar). This would increment the reference count on a
> module until insmod would exit (or would do an execve, I suppose).
> This would eliminate a race when insmod is loading a module that
> references symbols in other modules. I don't think flock is actually
> propagated down the vfs layer, so perhaps some other primitive could be
> used. Perhaps just holding open an open file descriptor on each of
> the modules in question would be a better interface.

I don't know what you're trying to do here, but it sounds like you're
mixing user space and kernel space problems here.

> 3. It's not a modulefs thing, but you might want to consider
> moving all symbol management to user space, including that for the
> core kernel symbols. The symbol tables can be maintained in user
> files by the insmod and rmmod programs, they could even be inferred
> from the module's start address and the module's .o in the file
> system. If people really want to store this data to be stored in
> kernel memory, they can allocate extra space where the module is
> actually loaded and their favorite symbol table format there.

This is planned, symbols are only needed by kksymoops, otherwise they
don't have to be kept in memory all the time.

> 4. Regarding what I said about leaving symbol tracking to
> user space, I'd like to add a note about reliability. It is not
> necessary for rmmod to reliably remove symbols from these tables.
> insmod can check for symbols that map to places that are no longer
> allocated to modules and remove stale symbols before attempting to
> load a new module.

That's a mostly user space problem and I'd rather do this right, why
should I keep invalid symbols?

> 5. You should not need to do anything special in modfs
> to support init sections. That can be done by loading two separate
> modules and then removing the one corresponding to an init section.

I doubt that's a good idea, mostly also because of the following point.

> 6. For module_init functions, consider taking a pointer to
> an array of init functions and a count. That would be compatible
> with the system for built-in kernel object files, so it would
> eliminate a difference in linux/include/init.h. You could also do
> the same for module_exit functions. This change would also
> allow linking .o's together more or less arbitrarily into single
> modules, which would allow loading of modules that have reference
> loops among them.

Reference loops are never a good idea, if object files are that closely
related, create a single module. Multiple init/exit functions are no good
idea either, in which order should they be executed? If it's really one
module than also keep it one module.

> 7. Instead of module parameters, consider just supporting
> __setup() declarations used in kernel objects. These would then
> process a string that you'd pass. insmod might want to prepend the
> contents of /proc/cmdline to that string so that people could pass
> command lines at the boot prompt even if the driver they wanted to
> talk to were a module. This change would eliminate another difference
> between module and core kernel objects. Ultimately, I would like
> eliminate any compilation difference between kernel and module
> objects. This would allow deferring the decision about what should be
> in modules and what should be in the kernel until link time. Not only
> would this slightly simplify the build process and modules' source
> code,

Kernel parameter unification might actually be one of the few good parts
from Rusty's loader, but I have to look at, when it's working again.

> but it would also allow binary distributions for a wider
> variety of uses.

What does this mean???

bye, Roman

2002-11-22 18:03:55

by Adam J. Richter

[permalink] [raw]
Subject: Re: [RFC] module fs or how to not break everything at once

On Fri, 22 Nov 2002, Roman Zippel wrote:
>On Thu, 21 Nov 2002, Adam J. Richter wrote:

>> Can you tell me what problem in the existing userland module
>> code (i.e., before 2.5.48) modulefs solves? Does it actually make
>> the kernel smaller? Can you give me an example?

>Previous insmod knows too much about kernel internals, it has to fill in
>kernel structures, which makes changing this structure a real pain. Part
>of the relocation work can be moved to kbuild, what makes insmod simpler
>and gives us better control over the module layout.

I meant "what real advantage does a _filesystem_ interface
give you?"

>See the mini module loader I posted last week.

Thanks for the pointer. Looking at your mini-loader, it seems
to me that you could eliminate your modules.lds script if you would
split module-init.c into module-init.c and module-finish.c, which would
simplify the ability to link multiple modules together or have
a module with multiple module_init() functions if we used the
definition of module_init() that is used for kernel objects (more
on this below).

>> 1. Please take Petr's advice and just make module removal
>> occur when you rmdir the directory. [...]

>Synchronizing multiple insmod is needed in user space anyway, this doesn't
>make the kernel side more complex.

Then a module_init function that may cause another module to
be loaded can deadlock. I'll have to think about whether that is
a problem.

I still don't see how having an "almost unloaded" state
visible to user space has any benefit, but I suppose its an irrelevant
question until you show benefits of having a filesystem model.


>> 2. I would really like insmod to be able to flock modules (or
>> do something similar). This would increment the reference count on a
>> module until insmod would exit (or would do an execve, I suppose).
>> This would eliminate a race when insmod is loading a module that
>> references symbols in other modules. I don't think flock is actually
>> propagated down the vfs layer, so perhaps some other primitive could be
>> used. Perhaps just holding open an open file descriptor on each of
>> the modules in question would be a better interface.

>I don't know what you're trying to do here, but it sounds like you're
>mixing user space and kernel space problems here.

If you cannot do complete serialization of rmmod and insmod
because it really does cause a deadlock, then the problem occurs when
a module that insmod depends on gets unloaded after insmod has checked
for the module and read its symbols but before insmod has loaded the
new module.

If it is OK to serialize all rmmod and insmod calls, then you
already have an implicit method for preventing modules that insmod
depends from being unloaded while insmod is running.

By the way, if you do serialize all rmmod and insmod calls,
you still have the question of how to convey dependency information to
the kernel if you do not want the loader to know about kernel data
formats like struct module and struct module_ref. Conceivably you
could represent these dependencies by doing something like "ln
/proc/modfs/libmodule/control /proc/modfs/mymodule/deps/libmodule".
So, it may be possible to have a scheme where you do not need complete
serialization of depmod and insmod, although I don't know if that will
prove to be important.


>> 3. It's not a modulefs thing, but you might want to consider
>> moving all symbol management to user space [...]

>This is planned, symbols are only needed by kksymoops, otherwise they
>don't have to be kept in memory all the time.

Great.

>> 4. Regarding what I said about leaving symbol tracking to
>> user space, I'd like to add a note about reliability. It is not
>> necessary for rmmod to reliably remove symbols from these tables.
>> insmod can check for symbols that map to places that are no longer
>> allocated to modules and remove stale symbols before attempting to
>> load a new module.

>That's a mostly user space problem and I'd rather do this right, why
>should I keep invalid symbols?

My point was just about being resiliant against, say, someone
accidentally hitting a control-C or a kill signal during one of these
programs, such as during an aborted shutdown. However, now that you
mention it, it would be useful for developers to have an rmmod option
to retain symbols for debugging a bad memory reference that happens
after a module has been unloaded.


>> 5. You should not need to do anything special in modfs
>> to support init sections. That can be done by loading two separate
>> modules and then removing the one corresponding to an init section.

>I doubt that's a good idea, mostly also because of the following point.

Even if you don't want to support circular dependencies
(loading multiple .o's together) or multiple module_{init,exit}
functions, having insmod load separate modules for the init and
non-init parts will shrink the kernel code. It would be nice
for insmod to be able to do ELF section manipulations like this
also to implement things like having .exit{,func} sections
that insmod could drop if it were given a flag to load the
module "permanently."


>> 6. For module_init functions, consider taking a pointer to
>> an array of init functions and a count. That would be compatible
>> with the system for built-in kernel object files, so it would
>> eliminate a difference in linux/include/init.h. You could also do
>> the same for module_exit functions. This change would also
>> allow linking .o's together more or less arbitrarily into single
>> modules, which would allow loading of modules that have reference
>> loops among them.

>Reference loops are never a good idea,

An advantage of allowing circular loops is that everything
would be modularized or nearly modularized with only a few changes.
For example, I want to modularize CONFIG_NET (not just CONFIG_INET).
However, there is a circular reference:

net/core ---> drivers/net/loopback.c ----> net/ethernet --> net/core

I might try to remove the net/core --> drivers/net/loopback.c
dependency to break this loop. However, there may be some positive
trade-offs made in the networking code about being able to asume that
there is always at least one device present and that the first device
on the list is the loopback device, an assumption mentioned in some
code comments. If I move drivers/net/loopback.c and net/ethernet,
I'll lose cvs tracking, plus I really think they belong where they are
from a software maintenance standpoint. Probably I will have to make
a special link rule and litter a bunch of Makefiles and .c files with
ifdefs. In the meantime CONFIG_NET=y seems to work just fine with
this circular dependency, and there is no reason why execution should
fail if you load the same functionality as a module.

>if object files are that closely
>related, create a single module. Multiple init/exit functions are no good
>idea either, in which order should they be executed? If it's really one
>module than also keep it one module.

Module_init calls can be used to initialize facilities local
to each file, which allows for fewer global symbols and may simplify
inclusion or exclusion of various facilities just by the CONFIG_xxx
options in the Makefile. Their order of execution would controlled by
the order in which are linked, just like kernel object files. I
thought I came across a couple of examples of this, mostly where a
/proc interface for functionality was set up in a separate file,
but I no longer remember where. Perhaps all of those cases have
now been eliminated.

The circular dependency and multiple module_init are not as
important to me as just being able to defer the decision about what is
a module and what is compiled-in until link time.


>> 7. Instead of module parameters, consider just supporting
>> __setup() declarations used in kernel objects. [...]

>Kernel parameter unification might actually be one of the few good parts
>from Rusty's loader, but I have to look at, when it's working again.

Great.


>> but it would also allow binary distributions for a wider
>> variety of uses.

>What does this mean???

A vendor could build a kernel with "make allmodconfig", and
then a user could run a script to choose which modules should be
included into the kernel for relatively customized purposes. The
default configuration might be link in binfmt_elf, rd, romfs to boot
an initial ramdisk, but some customers might want decide they want to
boot directly to their IDE or SCSI disk and link in the modules that
they already use.

Also, perhaps the initial program of mounting a root file
system and exec'ing /sbin/init could be separated out (with some
rearrangement). So, it might be easier to build, say, a router kernel
with no built-in filesystems, no PCI support, etc. for less
engineering cost.

I think there are many companies that want to build gadgets
but are more concerned with time to market and perceived engineering
risk than last 25% of memory footprint, and they go with Windows CE or
NT Embedded in part because they do not want to be running a custom
source build. I think it would simplify Linux usage if one could
decide what modules to link in without needing to recompile sources.

Adam J. Richter __ ______________ 575 Oroville Road
[email protected] \ / Milpitas, California 95035
+1 408 309-6081 | g g d r a s i l United States of America
"Free Software For The Rest Of Us."

2002-11-22 20:21:26

by Roman Zippel

[permalink] [raw]
Subject: Re: [RFC] module fs or how to not break everything at once

Hi,

On Fri, 22 Nov 2002, Adam J. Richter wrote:

> I meant "what real advantage does a _filesystem_ interface
> give you?"

- more flexibility, you get a lot for free from libfs.c and other parts of
the kernel. Check how small the modfs example is, much more isn't needed.
- no extra syscalls needed, which might have to be emulated by 64bit
system. Module objects are normal files, why waste a syscall if you can
just write() them to the kernel?

> Thanks for the pointer. Looking at your mini-loader, it seems
> to me that you could eliminate your modules.lds script if you would
> split module-init.c into module-init.c and module-finish.c, which would
> simplify the ability to link multiple modules together or have
> a module with multiple module_init() functions if we used the
> definition of module_init() that is used for kernel objects (more
> on this below).

The linker script will not go away, it makes the loader a _lot_ easier and
I want to keep it this way. Multiple module_init() calls won't happen,
don't even try it.

> >Synchronizing multiple insmod is needed in user space anyway, this doesn't
> >make the kernel side more complex.
>
> Then a module_init function that may cause another module to
> be loaded can deadlock. I'll have to think about whether that is
> a problem.

The user space locking should of course be a bit more intelligent than one
big lock.

> By the way, if you do serialize all rmmod and insmod calls,
> you still have the question of how to convey dependency information to
> the kernel if you do not want the loader to know about kernel data
> formats like struct module and struct module_ref. Conceivably you
> could represent these dependencies by doing something like "ln
> /proc/modfs/libmodule/control /proc/modfs/mymodule/deps/libmodule".
> So, it may be possible to have a scheme where you do not need complete
> serialization of depmod and insmod, although I don't know if that will
> prove to be important.

The kernel doesn't need the dependency information, they can stay as well
in user space. The checks currently done don't have to be done in the
kernel.

> >That's a mostly user space problem and I'd rather do this right, why
> >should I keep invalid symbols?
>
> My point was just about being resiliant against, say, someone
> accidentally hitting a control-C or a kill signal during one of these
> programs, such as during an aborted shutdown. However, now that you
> mention it, it would be useful for developers to have an rmmod option
> to retain symbols for debugging a bad memory reference that happens
> after a module has been unloaded.

I want to keep the kernel implementation simple, this would only bloat it.
Cleaning up after a signal shouldn't bother the kernel (at least as far
as modfs is concerned).

> Even if you don't want to support circular dependencies
> (loading multiple .o's together) or multiple module_{init,exit}
> functions, having insmod load separate modules for the init and
> non-init parts will shrink the kernel code. It would be nice
> for insmod to be able to do ELF section manipulations like this
> also to implement things like having .exit{,func} sections
> that insmod could drop if it were given a flag to load the
> module "permanently."

The point of the linker script is to avoid that the loader has to play
around with the sections. If you want to get rid of init data, put it at
the end and truncate() it when you're done. This is trivial to do with a
module fs.

> >Reference loops are never a good idea,
>
> An advantage of allowing circular loops is that everything
> would be modularized or nearly modularized with only a few changes.
> For example, I want to modularize CONFIG_NET (not just CONFIG_INET).
> However, there is a circular reference:

You're trying to move your problems to someone else - very bad idea.
If you can't get modules references right, you have very likely more
problems like module initialization races. Link everything together and
make the initialization explicit and don't rely on that the linker/loader
gets it magically right.

bye, Roman

2002-11-22 22:04:57

by Adam J. Richter

[permalink] [raw]
Subject: Re: [RFC] module fs or how to not break everything at once

On Fri, 22 Nov 2002, Roman Zippel wrote:
>On Fri, 22 Nov 2002, Adam J. Richter wrote:

>> I meant "what real advantage does a _filesystem_ interface
>> give you?"

>- more flexibility, you get a lot for free from libfs.c and other parts of
>the kernel. Check how small the modfs example is, much more isn't needed.
>- no extra syscalls needed, which might have to be emulated by 64bit
>system. Module objects are normal files, why waste a syscall if you can
>just write() them to the kernel?

>> Thanks for the pointer. Looking at your mini-loader, it seems
>> to me that you could eliminate your modules.lds script if you would
>> split module-init.c into module-init.c and module-finish.c, which would
>> simplify the ability to link multiple modules together or have
>> a module with multiple module_init() functions if we used the
>> definition of module_init() that is used for kernel objects (more
>> on this below).

>The linker script will not go away, it makes the loader a _lot_ easier and
>I want to keep it this way. Multiple module_init() calls won't happen,
>don't even try it.

How does one script and one .o make the loader "a _lot_ easier"
than with two .o's?

You cannot determine technical merit ex cathedra. Obviously
you can do whatever you want with your own tree, but if you can't
convince someone is basically a proponent of moving as much of the
module support into user level as possible (and a moderate proponent
of filesystem interfaces), then how likely do you think it is that you
will convince others?


>> >Synchronizing multiple insmod is needed in user space anyway, this doesn't
>> >make the kernel side more complex.
>>
>> Then a module_init function that may cause another module to
>> be loaded can deadlock. I'll have to think about whether that is
>> a problem.

>The user space locking should of course be a bit more intelligent than one
>big lock.

You would have provided more details you had thought this
through. Maybe you never need to do anything from a module_init
function that loads another module, but, if you do, then there may
not be an answer in the form of a user space locking policy, and it
may necessitate moving many elements back into the kernel, such as
this one:

[...]
>The kernel doesn't need the dependency information, they can stay as well
>in user space. The checks currently done don't have to be done in the
>kernel.

That is intriguing, although it depends on whether on that user
space locking scheme working.


>> >That's a mostly user space problem and I'd rather do this right, why
>> >should I keep invalid symbols?
>>
>> My point was just about being resiliant against, say, someone
>> accidentally hitting a control-C or a kill signal during one of these
>> programs, such as during an aborted shutdown. However, now that you
>> mention it, it would be useful for developers to have an rmmod option
>> to retain symbols for debugging a bad memory reference that happens
>> after a module has been unloaded.

>I want to keep the kernel implementation simple, this would only bloat it.
>Cleaning up after a signal shouldn't bother the kernel (at least as far
>as modfs is concerned).

User level programs cannot catch SIGKILL. I was not talking about
adding anything to the kernel.


>> Even if you don't want to support circular dependencies
>> (loading multiple .o's together) or multiple module_{init,exit}
>> functions, having insmod load separate modules for the init and
>> non-init parts will shrink the kernel code. It would be nice
>> for insmod to be able to do ELF section manipulations like this
>> also to implement things like having .exit{,func} sections
>> that insmod could drop if it were given a flag to load the
>> module "permanently."

>The point of the linker script is to avoid that the loader has to play
>around with the sections. If you want to get rid of init data, put it at
>the end and truncate() it when you're done. This is trivial to do with a
>module fs.

If insmod knows what size to truncate the module to, then it
has at least a little code dealing with certain sections. Theoretically,
it should also determine that the extable contains no pointers into .init
(although in practice I believe that that is always true right now).

Small modules can potentially be allocated with kmalloc rather
than vmalloc for better memory efficiency and, I believe, so that they
can live in the kernel's 4MB huge page on architectures that support
it, eliminating their TLB utilization. If you allocate the init and
non-init modules separately, the non-init module will more often be
small enough to be allocated this way (also true of the init module, but
we probably don't care much about that).

In addition to missing opportunities to use kmalloc,
implementing a truncation to a non-zero size in your modfs requires
additional code, including a vmalloc_shrink (I think Keith Owens
posted one). I was not aware that you even were going to provide
files for writing the module's contents. That seems unnecessary.
Even if you don't want a system call interface for copying the
module's contents in, why can't insmod just lseek and write /dev/kmem?

[...]
>> An advantage of allowing circular loops is that everything
>> would be modularized or nearly modularized with only a few changes.
>> For example, I want to modularize CONFIG_NET (not just CONFIG_INET).
>> However, there is a circular reference:

>You're trying to move your problems to someone else - very bad idea.
>If you can't get modules references right, you have very likely more
>problems like module initialization races. Link everything together and
>make the initialization explicit and don't rely on that the linker/loader
>gets it magically right.

First of all, you can have circular references and still have
only one module_init function. That is in fact that case with the
CONFIG_NET=m example that I described and you deleted from your
response--not very persuasive.

Secondly, you have not shown that these really are necessarily
always problems with cicular dependencies. They are, after all, used
in the built-in kernel objects.

Adam J. Richter __ ______________ 575 Oroville Road
[email protected] \ / Milpitas, California 95035
+1 408 309-6081 | g g d r a s i l United States of America
"Free Software For The Rest Of Us."

2002-11-22 23:17:14

by Roman Zippel

[permalink] [raw]
Subject: Re: [RFC] module fs or how to not break everything at once

Hi,

On Fri, 22 Nov 2002, Adam J. Richter wrote:

> >The linker script will not go away, it makes the loader a _lot_ easier and
> >I want to keep it this way. Multiple module_init() calls won't happen,
> >don't even try it.
>
> How does one script and one .o make the loader "a _lot_ easier"
> than with two .o's?

Check the previous discussions about kloader support for other archs, e.g.
some have special requirements for section ordering, all this work is
already done by the linker, why should I duplicate this in the loader?

> >The user space locking should of course be a bit more intelligent than one
> >big lock.
>
> You would have provided more details you had thought this
> through. Maybe you never need to do anything from a module_init
> function that loads another module, but, if you do, then there may
> not be an answer in the form of a user space locking policy, and it
> may necessitate moving many elements back into the kernel,

I'm planning to do this with a per module locking, for every module in the
kernel there is a file with symbol/dependency/... information on which you
can easily use flock.

> >I want to keep the kernel implementation simple, this would only bloat it.
> >Cleaning up after a signal shouldn't bother the kernel (at least as far
> >as modfs is concerned).
>
> User level programs cannot catch SIGKILL. I was not talking about
> adding anything to the kernel.

Why would you kill insmod with SIGKILL? If you do this you likely have to
cleanup manually anyway (what modfs might actually make possible).

> If insmod knows what size to truncate the module to, then it
> has at least a little code dealing with certain sections. Theoretically,
> it should also determine that the extable contains no pointers into .init
> (although in practice I believe that that is always true right now).

You're trying here to assign tasks to insmod it shouldn't know about. The
more insmod knows about the module layout the harder is it to change it
from the kernel side and the more you loose flexibilty.
All this can be easily done by the kernel.

> Small modules can potentially be allocated with kmalloc rather
> than vmalloc for better memory efficiency and, I believe, so that they
> can live in the kernel's 4MB huge page on architectures that support
> it, eliminating their TLB utilization. If you allocate the init and
> non-init modules separately, the non-init module will more often be
> small enough to be allocated this way (also true of the init module, but
> we probably don't care much about that).
>
> In addition to missing opportunities to use kmalloc,
> implementing a truncation to a non-zero size in your modfs requires
> additional code, including a vmalloc_shrink (I think Keith Owens
> posted one). I was not aware that you even were going to provide
> files for writing the module's contents. That seems unnecessary.
> Even if you don't want a system call interface for copying the
> module's contents in, why can't insmod just lseek and write /dev/kmem?

Why should all these internals be exposed to insmod? Why isn't a simple
'map this file into the kernel' interface not enough? Every change in this
area requires changes to insmod, now add all backward compability code and
soon you get the current insmod monster.

> >You're trying to move your problems to someone else - very bad idea.
> >If you can't get modules references right, you have very likely more
> >problems like module initialization races. Link everything together and
> >make the initialization explicit and don't rely on that the linker/loader
> >gets it magically right.
>
> First of all, you can have circular references and still have
> only one module_init function. That is in fact that case with the
> CONFIG_NET=m example that I described and you deleted from your
> response--not very persuasive.
>
> Secondly, you have not shown that these really are necessarily
> always problems with cicular dependencies. They are, after all, used
> in the built-in kernel objects.

Why do you want to move this problem to the module tools? As I said make
this dependency explicit (either by resolving them or explicit
initialization) and be done with it.
You cannot easily compare it boot initialization, e.g. net/core uses
subsys_initcall, so it's initialized earlier. The network was never
written as module and you are trying the easy way out by pushing the
complexity to someone else. Nobody else needs this feature and there are
alternative ways, so I fail to see why everyone should pay for this.

bye, Roman

2002-11-23 06:03:19

by Adam J. Richter

[permalink] [raw]
Subject: Re: [RFC] module fs or how to not break everything at once

On Sat, 23 Nov 2002 00:24:18 +0100, Roman Zippel wrote:
>On Fri, 22 Nov 2002, Adam J. Richter wrote:

>> >The linker script will not go away, it makes the loader a _lot_ easier and
>> >I want to keep it this way. Multiple module_init() calls won't happen,
>> >don't even try it.
>>
>> How does one script and one .o make the loader "a _lot_ easier"
>> than with two .o's?

>Check the previous discussions about kloader support for other archs, e.g.
>some have special requirements for section ordering, all this work is
>already done by the linker, why should I duplicate this in the loader?

I will try to find that discussion. In the meantime, I'll just
point out that you can still use the linker with two .o files instead
of one .o and an ld script. At the very least, it you would be
relying on one less GNUism (linker scripts).

>> >The user space locking should of course be a bit more intelligent than one
>> >big lock.
>>
>> You would have provided more details you had thought this
>> through. Maybe you never need to do anything from a module_init
>> function that loads another module, but, if you do, then there may
>> not be an answer in the form of a user space locking policy, and it
>> may necessitate moving many elements back into the kernel,

>I'm planning to do this with a per module locking, for every module in the
>kernel there is a file with symbol/dependency/... information on which you
>can easily use flock.

This is pretty close to my original suggestion of "I would
really like insmod to be able to flock modules [...]" except that
you don't need the kernel support to increment the reference
count on the modules because rmmod will also do the same flock. Come
to think of it, if insmod requests shared locks on the modules that
it wants and rmmod requests exclusive locks, I think that should
completely resolve the problem.


>> >I want to keep the kernel implementation simple, this would only bloat it.
>> >Cleaning up after a signal shouldn't bother the kernel (at least as far
>> >as modfs is concerned).
>>
>> User level programs cannot catch SIGKILL. I was not talking about
>> adding anything to the kernel.

>Why would you kill insmod with SIGKILL?

For example, you might be aborting a shutdown, or you might
have decided to kill all processes on a certain terminal because
you're trying to kill some runaway activity.

>If you do this you likely have to
>cleanup manually anyway (what modfs might actually make possible).

The point is that you could reduce or eliminate the "cleanup
manually" work to the point where orindary Linux users could manage
it.


>> If insmod knows what size to truncate the module to, then it
>> has at least a little code dealing with certain sections. Theoretically,
>> it should also determine that the extable contains no pointers into .init
>> (although in practice I believe that that is always true right now).

>You're trying here to assign tasks to insmod it shouldn't know about. The
>more insmod knows about the module layout the harder is it to change it
>from the kernel side and the more you loose flexibilty.
>All this can be easily done by the kernel.

By that logic, we must go with Rusty's kernel module loader,
perhaps with an interface like "int sys_insmod(char **argv)."

Fortunatly, I don't buy that logic. I think it greatly
overestimates the cost and degree of modutils compatability issues.
As I see it, we can weight what benefits we hope to get by dividing
the software in certain ways rather than trying to make unsupported
proclamations that something "shouldn't" be a certain way.

Putting module truncate code in the kernel will add kernel
code for the truncate and for vmalloc_shrink(), and will impede
allocating smaller non-init module sections with kmalloc. I think
that the compatability issues that we've had with the user level
module utilities have been relatively small and these changes will
make them smaller. So, I think it will be the better trade-off in
terms of kernel memory consumption vs. insmod code maintenance for
insmod to continue to know about know about the ".init*" section
prefix and the fixup table section name (it does not have to know the
actual fixup data structure format).

If you're really paranoid about tracking insmod and yet
not so paranoid that you want the module loader in kernel, I
suppose we could consider shipping insmod in the kernel tree
and installing it in /lib/modules/<version>/, or doing the same
for a user level dynamic library.



>> Small modules can potentially be allocated with kmalloc rather
>> than vmalloc for better memory efficiency and, I believe, so that they
>> can live in the kernel's 4MB huge page on architectures that support
>> it, eliminating their TLB utilization. If you allocate the init and
>> non-init modules separately, the non-init module will more often be
>> small enough to be allocated this way (also true of the init module, but
>> we probably don't care much about that).
>>
>> In addition to missing opportunities to use kmalloc,
>> implementing a truncation to a non-zero size in your modfs requires
>> additional code, including a vmalloc_shrink (I think Keith Owens
>> posted one). I was not aware that you even were going to provide
>> files for writing the module's contents. That seems unnecessary.
>> Even if you don't want a system call interface for copying the
>> module's contents in, why can't insmod just lseek and write /dev/kmem?

>Why should all these internals be exposed to insmod? Why isn't a simple
>'map this file into the kernel' interface not enough?

The above paragraphs explain this: performance, kernel code
size. There is a benefit to being able to allocate the small non-init
sections of module with kmalloc.

>Every change in this area requires changes to insmod,

Lots of changes in this area would not require changes to
insmod anymore. For example, changes to struct module or new fields
in just about any other kernel structure would no longer require changes
to insmod.


>now add all backward compability code and
>soon you get the current insmod monster.

2.5.48 has already broken backward compatability. So that's
not really a requirement at this point. Detecting older kernels and
execing /sbin/insmod.old in that case seems to have sufficed.

>> >You're trying to move your problems to someone else - very bad idea.
>> >If you can't get modules references right, you have very likely more
>> >problems like module initialization races. Link everything together and
>> >make the initialization explicit and don't rely on that the linker/loader
>> >gets it magically right.
>>
>> First of all, you can have circular references and still have
>> only one module_init function. That is in fact that case with the
>> CONFIG_NET=m example that I described and you deleted from your
>> response--not very persuasive.
>>
>> Secondly, you have not shown that these really are necessarily
>> always problems with cicular dependencies. They are, after all, used
>> in the built-in kernel objects.

>Why do you want to move this problem to the module tools?

"Move this problem" is not how I would describe it. I want to
eliminate an incompatibility with built-in objects, and that
incompatability was introduced by the module system.

>As I said make
>this dependency explicit (either by resolving them or explicit
>initialization) and be done with it.

I don't know exactly what you mean by this. For example,
go back to my previous message and explain how to achieve CONFIG_NET=m.

>You cannot easily compare it boot initialization, e.g. net/core uses
>subsys_initcall, so it's initialized earlier. The network was never
>written as module

Almost all modules were originally "never written as a module."

>and you are trying the easy way out by pushing the
>complexity to someone else.

I don't know what you mean by "pushing the complexity on
someone else."

>Nobody else needs this feature and there are
>alternative ways, so I fail to see why everyone should pay for this.

I don't know what standard you have in mind when you say
"needs." Nobody "needs" anything in Linux, or computers. People
survived just fine without them for millenia.

Many people could benefit from CONFIG_NET=m, because it would
allow for smaller media boot images or ones with more drivers, while
still allowing the network functionality to be loaded later without
the need to reboot to a new kernel. There are also still people who
use PC's without networking, although not many in the developed world.
It might also be another pebble on the balance scale for gadget
manufacturers that do not have Linux kernel developers but might
consider Linux for some devices that might not due networking (mp3
players, low end PDA's, or, more likely, more specialized products).
Besides, I'm just using CONFIG_NET=m as one example. It's probably
not the only example of a facility that could be more easily
modularized reference loops among modules were allowed (for example, I
think isa-pnp was recently unmodularized for this reason, although my
recollection could easily be wrong about that).

Adam J. Richter __ ______________ 575 Oroville Road
[email protected] \ / Milpitas, California 95035
+1 408 309-6081 | g g d r a s i l United States of America
"Free Software For The Rest Of Us."

2002-11-23 12:26:58

by Roman Zippel

[permalink] [raw]
Subject: Re: [RFC] module fs or how to not break everything at once

Hi,

On Fri, 22 Nov 2002, Adam J. Richter wrote:

> I will try to find that discussion. In the meantime, I'll just
> point out that you can still use the linker with two .o files instead
> of one .o and an ld script. At the very least, it you would be
> relying on one less GNUism (linker scripts).

If you manage to remove arch/*/vmlinux.lds, we can continue this
discussion. I need the module linker script for very much the same
reasons.

> >Why would you kill insmod with SIGKILL?
>
> For example, you might be aborting a shutdown, or you might
> have decided to kill all processes on a certain terminal because
> you're trying to kill some runaway activity.

Well, if you kill processes with SIGKILL, you get exactly what you ask
for.

> >You're trying here to assign tasks to insmod it shouldn't know about. The
> >more insmod knows about the module layout the harder is it to change it
> >from the kernel side and the more you loose flexibilty.
> >All this can be easily done by the kernel.
>
> By that logic, we must go with Rusty's kernel module loader,
> perhaps with an interface like "int sys_insmod(char **argv)."

Think _very_ carefully about what I wrote earlier:
"Overall this means all module tasks become nicely separated. During build
we prepare the module with all the information the kernel needs, the
loader takes care of dependencies and just relocates the module, modfs
maps it into the kernel and starts the module. If the interfaces are
halfway flexible, changes in one part don't require a change somewhere
else."
This is the basic design goal and I will not change it without a very good
reason.

Otherwise I stop this discussion here, I doubt you will get _anyone_ to
implement the features you want. It also makes little sense to discuss
details of an implementation, which doesn't exist yet. As soon as it does
exist, you are free to implement whatever you want on top of it. If you
get anyone to look at it without a barf bag, I'll consider it. If you
think you can do better, try to implement your own module loader, I think
it will be an interesting learning experience for you.

bye, Roman