2003-07-25 17:24:30

by Rusty Russell

[permalink] [raw]
Subject: [PATCH] Remove module reference counting.

Hi all,

When the initial module patch was submitted, it made modules
start isolated, so they would not be accessible until (if)
initialization had succeeded. This broke partition scanning, and was
immediately reverted, leaving us with a module reference count scheme
identical to the previous one (just a faster implementation): we still
have cases where modules can be access on failed load.

Then Dave decided that the work of reference counting network
driver modules everywhere is too invasive, so network driver modules
now have zero reference counts always. The idea is that if you don't
want the module removed, don't do it. ie. only remove the module if
there's a bug, or you want to replace it.

If module removal is to be a rare and unusual event, it
doesn't seem so sensible to go to great lengths in the code to handle
just that case. In fact, it's easier to leave the module memory in
place, and not have the concept of parts of the kernel text (and some
types of kernel data) vanishing.

Polite feedback welcome,
Rusty.
--
Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

Name: Get Rid of Module Reference Counting
Author: Rusty Russell
Status: Tested on 2.6.0-test1-bk2

D: Several people have complained about the complexity of doing
D: reference counting on modules, and existing bugs in the unload
D: routines of modules.
D:
D: In particular, David Miller has chosen not to reference count
D: modules for network devices: even when "in use" their reference
D: count remains zero, and the interfaces shut down when rmmod is
D: called. This breaks the current "remove if it will have no effect,
D: otherwise fail" semantics of modules: the result is that modules
D: should not be removed unless there's a bug anyway.
D:
D: So, a fair solution is to never free module memory. When a module
D: fails initialization, or is deactivated, the memory is left around
D: so future calls into it (or references to its data) will not crash.
D: The payoff (other than lack of complexity) is that reference
D: counting disappears.

diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal .6124-linux-2.6.0-test1-bk2/include/linux/module.h .6124-linux-2.6.0-test1-bk2.updated/include/linux/module.h
--- .6124-linux-2.6.0-test1-bk2/include/linux/module.h 2003-07-21 00:04:13.000000000 +1000
+++ .6124-linux-2.6.0-test1-bk2.updated/include/linux/module.h 2003-07-24 07:12:46.000000000 +1000
@@ -170,11 +170,6 @@ void *__symbol_get_gpl(const char *symbo
* special casing EXPORT_SYMBOL_NOVERS. FIXME: Deprecated */
#define EXPORT_SYMBOL_NOVERS(sym) EXPORT_SYMBOL(sym)

-struct module_ref
-{
- local_t count;
-} ____cacheline_aligned;
-
enum module_state
{
MODULE_STATE_LIVE,
@@ -224,25 +219,14 @@ struct module
/* Arch-specific module values */
struct mod_arch_specific arch;

- /* Am I unsafe to unload? */
- int unsafe;
-
/* Am I GPL-compatible */
int license_gplok;

-#ifdef CONFIG_MODULE_UNLOAD
- /* Reference counts */
- struct module_ref ref[NR_CPUS];
-
/* What modules depend on me? */
struct list_head modules_which_use_me;

- /* Who is waiting for us to be unloaded */
- struct task_struct *waiter;
-
/* Destruction function. */
void (*exit)(void);
-#endif

#ifdef CONFIG_KALLSYMS
/* We keep the symbol and string tables for kallsyms. */
@@ -282,66 +266,9 @@ extern void __module_put_and_exit(struct
__attribute__((noreturn));
#define module_put_and_exit(code) __module_put_and_exit(THIS_MODULE, code);

-#ifdef CONFIG_MODULE_UNLOAD
-unsigned int module_refcount(struct module *mod);
-void __symbol_put(const char *symbol);
-#define symbol_put(x) __symbol_put(MODULE_SYMBOL_PREFIX #x)
-void symbol_put_addr(void *addr);
-
-/* Sometimes we know we already have a refcount, and it's easier not
- to handle the error case (which only happens with rmmod --wait). */
-static inline void __module_get(struct module *module)
-{
- if (module) {
- BUG_ON(module_refcount(module) == 0);
- local_inc(&module->ref[get_cpu()].count);
- put_cpu();
- }
-}
-
-static inline int try_module_get(struct module *module)
-{
- int ret = 1;
-
- if (module) {
- unsigned int cpu = get_cpu();
- if (likely(module_is_live(module)))
- local_inc(&module->ref[cpu].count);
- else
- ret = 0;
- put_cpu();
- }
- return ret;
-}
-
-static inline void module_put(struct module *module)
-{
- if (module) {
- unsigned int cpu = get_cpu();
- local_dec(&module->ref[cpu].count);
- /* Maybe they're waiting for us to drop reference? */
- if (unlikely(!module_is_live(module)))
- wake_up_process(module->waiter);
- put_cpu();
- }
-}
-
-#else /*!CONFIG_MODULE_UNLOAD*/
-static inline int try_module_get(struct module *module)
-{
- return !module || module_is_live(module);
-}
-static inline void module_put(struct module *module)
-{
-}
-static inline void __module_get(struct module *module)
-{
-}
#define symbol_put(x) do { } while(0)
#define symbol_put_addr(p) do { } while(0)

-#endif /* CONFIG_MODULE_UNLOAD */
-
/* This is a #define so the string doesn't get put in every .o file */
#define module_name(mod) \
({ \
@@ -349,16 +276,6 @@ static inline void __module_get(struct m
__mod ? __mod->name : "kernel"; \
})

-#define __unsafe(mod) \
-do { \
- if (mod && !(mod)->unsafe) { \
- printk(KERN_WARNING \
- "Module %s cannot be unloaded due to unsafe usage in" \
- " %s:%u\n", (mod)->name, __FILE__, __LINE__); \
- (mod)->unsafe = 1; \
- } \
-} while(0)
-
/* For kallsyms to ask for address resolution. NULL means not found. */
const char *module_address_lookup(unsigned long addr,
unsigned long *symbolsize,
@@ -394,23 +311,8 @@ static inline struct module *module_text
#define symbol_put(x) do { } while(0)
#define symbol_put_addr(x) do { } while(0)

-static inline void __module_get(struct module *module)
-{
-}
-
-static inline int try_module_get(struct module *module)
-{
- return 1;
-}
-
-static inline void module_put(struct module *module)
-{
-}
-
#define module_name(mod) "kernel"

-#define __unsafe(mod)
-
/* For kallsyms to ask for address resolution. NULL means not found. */
static inline const char *module_address_lookup(unsigned long addr,
unsigned long *symbolsize,
@@ -448,6 +350,19 @@ static inline int unregister_module_noti

#endif /* CONFIG_MODULES */

+static inline void __module_get(struct module *module)
+{
+}
+
+static inline int try_module_get(struct module *module)
+{
+ return 1;
+}
+
+static inline void module_put(struct module *module)
+{
+}
+
#ifdef MODULE
extern struct module __this_module;
#ifdef KBUILD_MODNAME
@@ -480,19 +395,10 @@ struct obsolete_modparm __parm_##var __a

static inline void __deprecated MOD_INC_USE_COUNT(struct module *module)
{
- __unsafe(module);
-
-#if defined(CONFIG_MODULE_UNLOAD) && defined(MODULE)
- local_inc(&module->ref[get_cpu()].count);
- put_cpu();
-#else
- (void)try_module_get(module);
-#endif
}

static inline void __deprecated MOD_DEC_USE_COUNT(struct module *module)
{
- module_put(module);
}

#define MOD_INC_USE_COUNT MOD_INC_USE_COUNT(THIS_MODULE)
diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal .6124-linux-2.6.0-test1-bk2/kernel/module.c .6124-linux-2.6.0-test1-bk2.updated/kernel/module.c
--- .6124-linux-2.6.0-test1-bk2/kernel/module.c 2003-07-21 00:04:13.000000000 +1000
+++ .6124-linux-2.6.0-test1-bk2.updated/kernel/module.c 2003-07-24 07:20:43.000000000 +1000
@@ -63,6 +63,9 @@ static LIST_HEAD(modules);
static DECLARE_MUTEX(notify_mutex);
static struct notifier_block * module_notify_list;

+/* This is predeclared because we only want one CONFIG_UNLOAD block. */
+static void unlink_module(struct module *mod);
+
int register_module_notifier(struct notifier_block * nb)
{
int err;
@@ -378,20 +381,6 @@ static inline void percpu_modcopy(void *
#endif /* CONFIG_SMP */

#ifdef CONFIG_MODULE_UNLOAD
-/* Init the unload section of the module. */
-static void module_unload_init(struct module *mod)
-{
- unsigned int i;
-
- INIT_LIST_HEAD(&mod->modules_which_use_me);
- for (i = 0; i < NR_CPUS; i++)
- local_set(&mod->ref[i].count, 0);
- /* Hold reference count during initialization. */
- local_set(&mod->ref[smp_processor_id()].count, 1);
- /* Backwards compatibility macros put refcount during init. */
- mod->waiter = current;
-}
-
/* modules using other modules */
struct module_use
{
@@ -420,7 +409,7 @@ static int use_module(struct module *a,
struct module_use *use;
if (b == NULL || already_uses(a, b)) return 1;

- if (!strong_try_module_get(b))
+ if (b->state != MODULE_STATE_LIVE)
return 0;

DEBUGP("Allocating new usage for %s.\n", a->name);
@@ -457,167 +446,6 @@ static void module_unload_free(struct mo
}
}

-#ifdef CONFIG_SMP
-/* Thread to stop each CPU in user context. */
-enum stopref_state {
- STOPREF_WAIT,
- STOPREF_PREPARE,
- STOPREF_DISABLE_IRQ,
- STOPREF_EXIT,
-};
-
-static enum stopref_state stopref_state;
-static unsigned int stopref_num_threads;
-static atomic_t stopref_thread_ack;
-
-static int stopref(void *cpu)
-{
- int irqs_disabled = 0;
- int prepared = 0;
-
- sprintf(current->comm, "kmodule%lu\n", (unsigned long)cpu);
-
- /* Highest priority we can manage, and move to right CPU. */
-#if 0 /* FIXME */
- struct sched_param param = { .sched_priority = MAX_RT_PRIO-1 };
- setscheduler(current->pid, SCHED_FIFO, &param);
-#endif
- set_cpus_allowed(current, 1UL << (unsigned long)cpu);
-
- /* Ack: we are alive */
- atomic_inc(&stopref_thread_ack);
-
- /* Simple state machine */
- while (stopref_state != STOPREF_EXIT) {
- if (stopref_state == STOPREF_DISABLE_IRQ && !irqs_disabled) {
- local_irq_disable();
- irqs_disabled = 1;
- /* Ack: irqs disabled. */
- atomic_inc(&stopref_thread_ack);
- } else if (stopref_state == STOPREF_PREPARE && !prepared) {
- /* Everyone is in place, hold CPU. */
- preempt_disable();
- prepared = 1;
- atomic_inc(&stopref_thread_ack);
- }
- if (irqs_disabled || prepared)
- cpu_relax();
- else
- yield();
- }
-
- /* Ack: we are exiting. */
- atomic_inc(&stopref_thread_ack);
-
- if (irqs_disabled)
- local_irq_enable();
- if (prepared)
- preempt_enable();
-
- return 0;
-}
-
-/* Change the thread state */
-static void stopref_set_state(enum stopref_state state, int sleep)
-{
- atomic_set(&stopref_thread_ack, 0);
- wmb();
- stopref_state = state;
- while (atomic_read(&stopref_thread_ack) != stopref_num_threads) {
- if (sleep)
- yield();
- else
- cpu_relax();
- }
-}
-
-/* Stop the machine. Disables irqs. */
-static int stop_refcounts(void)
-{
- unsigned int i, cpu;
- unsigned long old_allowed;
- int ret = 0;
-
- /* One thread per cpu. We'll do our own. */
- cpu = smp_processor_id();
-
- /* FIXME: racy with set_cpus_allowed. */
- old_allowed = current->cpus_allowed;
- set_cpus_allowed(current, 1UL << (unsigned long)cpu);
-
- atomic_set(&stopref_thread_ack, 0);
- stopref_num_threads = 0;
- stopref_state = STOPREF_WAIT;
-
- /* No CPUs can come up or down during this. */
- down(&cpucontrol);
-
- for (i = 0; i < NR_CPUS; i++) {
- if (i == cpu || !cpu_online(i))
- continue;
- ret = kernel_thread(stopref, (void *)(long)i, CLONE_KERNEL);
- if (ret < 0)
- break;
- stopref_num_threads++;
- }
-
- /* Wait for them all to come to life. */
- while (atomic_read(&stopref_thread_ack) != stopref_num_threads)
- yield();
-
- /* If some failed, kill them all. */
- if (ret < 0) {
- stopref_set_state(STOPREF_EXIT, 1);
- up(&cpucontrol);
- return ret;
- }
-
- /* Don't schedule us away at this point, please. */
- preempt_disable();
-
- /* Now they are all scheduled, make them hold the CPUs, ready. */
- stopref_set_state(STOPREF_PREPARE, 0);
-
- /* Make them disable irqs. */
- stopref_set_state(STOPREF_DISABLE_IRQ, 0);
-
- local_irq_disable();
- return 0;
-}
-
-/* Restart the machine. Re-enables irqs. */
-static void restart_refcounts(void)
-{
- stopref_set_state(STOPREF_EXIT, 0);
- local_irq_enable();
- preempt_enable();
- up(&cpucontrol);
-}
-#else /* ...!SMP */
-static inline int stop_refcounts(void)
-{
- local_irq_disable();
- return 0;
-}
-static inline void restart_refcounts(void)
-{
- local_irq_enable();
-}
-#endif
-
-unsigned int module_refcount(struct module *mod)
-{
- unsigned int i, total = 0;
-
- for (i = 0; i < NR_CPUS; i++)
- total += local_read(&mod->ref[i].count);
- return total;
-}
-EXPORT_SYMBOL(module_refcount);
-
-/* This exists whether we can unload or not */
-static void free_module(struct module *mod);
-
#ifdef CONFIG_MODULE_FORCE_UNLOAD
static inline int try_force(unsigned int flags)
{
@@ -639,21 +467,6 @@ void cleanup_module(void)
}
EXPORT_SYMBOL(cleanup_module);

-static void wait_for_zero_refcount(struct module *mod)
-{
- /* Since we might sleep for some time, drop the semaphore first */
- up(&module_mutex);
- for (;;) {
- DEBUGP("Looking at refcount...\n");
- set_current_state(TASK_UNINTERRUPTIBLE);
- if (module_refcount(mod) == 0)
- break;
- schedule();
- }
- current->state = TASK_RUNNING;
- down(&module_mutex);
-}
-
asmlinkage long
sys_delete_module(const char __user *name_user, unsigned int flags)
{
@@ -693,8 +506,7 @@ sys_delete_module(const char __user *nam
}

/* If it has an init func, it must have an exit func to unload */
- if ((mod->init != init_module && mod->exit == cleanup_module)
- || mod->unsafe) {
+ if (mod->init != init_module && mod->exit == cleanup_module) {
forced = try_force(flags);
if (!forced) {
/* This module can't be removed */
@@ -702,34 +514,18 @@ sys_delete_module(const char __user *nam
goto out;
}
}
- /* Stop the machine so refcounts can't move: irqs disabled. */
- DEBUGP("Stopping refcounts...\n");
- ret = stop_refcounts();
- if (ret != 0)
- goto out;
-
- /* If it's not unused, quit unless we are told to block. */
- if ((flags & O_NONBLOCK) && module_refcount(mod) != 0) {
- forced = try_force(flags);
- if (!forced) {
- ret = -EWOULDBLOCK;
- restart_refcounts();
- goto out;
- }
- }

/* Mark it as dying. */
- mod->waiter = current;
mod->state = MODULE_STATE_GOING;
- restart_refcounts();
-
- /* Never wait if forced. */
- if (!forced && module_refcount(mod) != 0)
- wait_for_zero_refcount(mod);

- /* Final destruction now noone is using it. */
+ /* Drop lock in case cleanup function dies. */
+ up(&module_mutex);
mod->exit();
- free_module(mod);
+ down(&module_mutex);
+
+ /* Remove from lists now noone is using it. */
+ unlink_module(mod);
+ ret = 0;

out:
up(&module_mutex);
@@ -741,7 +537,7 @@ static void print_unload_info(struct seq
struct module_use *use;
int printed_something = 0;

- seq_printf(m, " %u ", module_refcount(mod));
+ seq_printf(m, " 0 ");

/* Always include a trailing , so userspace can differentiate
between this and the old multi-field proc format. */
@@ -750,11 +546,6 @@ static void print_unload_info(struct seq
seq_printf(m, "%s,", use->module_which_uses->name);
}

- if (mod->unsafe) {
- printed_something = 1;
- seq_printf(m, "[unsafe],");
- }
-
if (mod->init != init_module && mod->exit == cleanup_module) {
printed_something = 1;
seq_printf(m, "[permanent],");
@@ -763,34 +554,6 @@ static void print_unload_info(struct seq
if (!printed_something)
seq_printf(m, "-");
}
-
-void __symbol_put(const char *symbol)
-{
- struct module *owner;
- unsigned long flags;
- const unsigned long *crc;
-
- spin_lock_irqsave(&modlist_lock, flags);
- if (!__find_symbol(symbol, &owner, &crc, 1))
- BUG();
- module_put(owner);
- spin_unlock_irqrestore(&modlist_lock, flags);
-}
-EXPORT_SYMBOL(__symbol_put);
-
-void symbol_put_addr(void *addr)
-{
- unsigned long flags;
-
- spin_lock_irqsave(&modlist_lock, flags);
- if (!kernel_text_address((unsigned long)addr))
- BUG();
-
- module_put(module_text_address((unsigned long)addr));
- spin_unlock_irqrestore(&modlist_lock, flags);
-}
-EXPORT_SYMBOL_GPL(symbol_put_addr);
-
#else /* !CONFIG_MODULE_UNLOAD */
static void print_unload_info(struct seq_file *m, struct module *mod)
{
@@ -804,11 +567,9 @@ static inline void module_unload_free(st

static inline int use_module(struct module *a, struct module *b)
{
- return strong_try_module_get(b);
-}
-
-static inline void module_unload_init(struct module *mod)
-{
+ if (b && b->state != MODULE_STATE_LIVE)
+ return 0;
+ return 1;
}

asmlinkage long
@@ -1071,28 +832,16 @@ static unsigned long resolve_symbol(Elf_
return ret;
}

-/* Free a module, remove from lists, etc (must hold module mutex). */
-static void free_module(struct module *mod)
+/* Remove a module from lists, etc (must hold module mutex). */
+static void unlink_module(struct module *mod)
{
/* Delete from various lists */
spin_lock_irq(&modlist_lock);
list_del(&mod->list);
spin_unlock_irq(&modlist_lock);

- /* Arch-specific cleanup. */
- module_arch_cleanup(mod);
-
- /* Module unload stuff */
+ /* Clean up module unload stuff */
module_unload_free(mod);
-
- /* This may be NULL, but that's OK */
- module_free(mod, mod->module_init);
- kfree(mod->args);
- if (mod->percpu)
- percpu_modfree(mod->percpu);
-
- /* Finally, free the core (containing the module structure) */
- module_free(mod, mod->module_core);
}

void *__symbol_get(const char *symbol)
@@ -1573,8 +1322,8 @@ static struct module *load_module(void _
/* Module has been moved. */
mod = (void *)sechdrs[modindex].sh_addr;

- /* Now we've moved module, initialize linked lists, etc. */
- module_unload_init(mod);
+ /* Now we've moved module, initialize the usage linked list. */
+ INIT_LIST_HEAD(&mod->modules_which_use_me);

/* Set up license info based on the info section */
set_license(mod, get_modinfo(sechdrs, infoindex, "license"));
@@ -1722,35 +1471,22 @@ sys_init_module(void __user *umod,

/* Start the module */
ret = mod->init();
- if (ret < 0) {
- /* Init routine failed: abort. Try to protect us from
- buggy refcounters. */
- mod->state = MODULE_STATE_GOING;
- synchronize_kernel();
- if (mod->unsafe)
- printk(KERN_ERR "%s: module is now stuck!\n",
- mod->name);
- else {
- module_put(mod);
- down(&module_mutex);
- free_module(mod);
- up(&module_mutex);
- }
- return ret;
- }

- /* Now it's a first class citizen! */
down(&module_mutex);
- mod->state = MODULE_STATE_LIVE;
- /* Drop initial reference. */
- module_put(mod);
+ if (ret < 0)
+ /* Remove from lists and let it rot. */
+ unlink_module(mod);
+ else
+ /* Now it's a first class citizen! */
+ mod->state = MODULE_STATE_LIVE;
+
module_free(mod, mod->module_init);
mod->module_init = NULL;
mod->init_size = 0;
mod->init_text_size = 0;
up(&module_mutex);

- return 0;
+ return ret;
}

static inline int within(unsigned long addr, void *start, unsigned long size)
@@ -1925,8 +1661,6 @@ const struct exception_table_entry *sear
}
spin_unlock_irqrestore(&modlist_lock, flags);

- /* Now, if we found one, we are running inside it now, hence
- we cannot unload the module, hence no refcnt needed. */
return e;
}


2003-07-25 17:35:11

by David Miller

[permalink] [raw]
Subject: Re: [PATCH] Remove module reference counting.

On Fri, 25 Jul 2003 04:00:18 +1000
Rusty Russell <[email protected]> wrote:

> If module removal is to be a rare and unusual event, it
> doesn't seem so sensible to go to great lengths in the code to handle
> just that case. In fact, it's easier to leave the module memory in
> place, and not have the concept of parts of the kernel text (and some
> types of kernel data) vanishing.
>
> Polite feedback welcome,

I'm ok with this, with one possible enhancement.

How about we make ->cleanup() return a boolean, which if true
causes the caller to do the module_free()?

(Yes I know that doing this will require some more thought
in order to minimize how large a change it would need to
be to keep exiting modules working. It's a seperate discussion.)

2003-07-25 17:39:54

by Greg KH

[permalink] [raw]
Subject: Re: [PATCH] Remove module reference counting.

On Fri, Jul 25, 2003 at 04:00:18AM +1000, Rusty Russell wrote:
> Hi all,
>
> When the initial module patch was submitted, it made modules
> start isolated, so they would not be accessible until (if)
> initialization had succeeded. This broke partition scanning, and was
> immediately reverted, leaving us with a module reference count scheme
> identical to the previous one (just a faster implementation): we still
> have cases where modules can be access on failed load.
>
> Then Dave decided that the work of reference counting network
> driver modules everywhere is too invasive, so network driver modules
> now have zero reference counts always. The idea is that if you don't
> want the module removed, don't do it. ie. only remove the module if
> there's a bug, or you want to replace it.

Hm, as long as we add a kobject to the module structure, so that users
of a module can be tracked somehow to know if it is safe to unload the
module or not.

This is because there is a difference between device reference counts,
and code reference counts, which is why I added the module owner logic
to sysfs attributes, to prevent code from being unloaded when the device
might already be gone.

So can the following situation still work with this proposed patch:
- device created
- sysfs files created associated with that device
- user opens sysfs file
- user disconnects sysfs files.
- device goes away, driver no longer references device, but
kobject count is still incremented.
- driver associated with device is unloaded.
- user reads sysfs file previously opened (which calls into
module memory that is now gone.)

Can we still prevent this from happening now? I think if we add a
kobject (or something, we still need a kobject to get module
parameters so might as well use that), we might be safe.

thanks,

greg k-h

2003-07-25 18:56:48

by Greg KH

[permalink] [raw]
Subject: Re: [PATCH] Remove module reference counting.

On Fri, Jul 25, 2003 at 01:54:47PM -0400, Greg KH wrote:
>
> Can we still prevent this from happening now? I think if we add a
> kobject (or something, we still need a kobject to get module
> parameters so might as well use that), we might be safe.

Ok, in talking to you in person I now understand this better, the code
doesn't go away at all. That sounds fine, I have no problem with this
change.

thanks,

greg k-h

2003-07-25 19:11:52

by Stephen Hemminger

[permalink] [raw]
Subject: Re: [PATCH] Remove module reference counting.

On Fri, 25 Jul 2003 04:00:18 +1000
Rusty Russell <[email protected]> wrote:

> Hi all,
>
> When the initial module patch was submitted, it made modules
> start isolated, so they would not be accessible until (if)
> initialization had succeeded. This broke partition scanning, and was
> immediately reverted, leaving us with a module reference count scheme
> identical to the previous one (just a faster implementation): we still
> have cases where modules can be access on failed load.
>
> Then Dave decided that the work of reference counting network
> driver modules everywhere is too invasive, so network driver modules
> now have zero reference counts always. The idea is that if you don't
> want the module removed, don't do it. ie. only remove the module if
> there's a bug, or you want to replace it.
>
> If module removal is to be a rare and unusual event, it
> doesn't seem so sensible to go to great lengths in the code to handle
> just that case. In fact, it's easier to leave the module memory in
> place, and not have the concept of parts of the kernel text (and some
> types of kernel data) vanishing.
>
> Polite feedback welcome,
> Rusty.
> --

There are two possible objections to this:
* Some developers keep the same kernel running and load/unload then reload
a new driver when debugging. This would break probably or at least cause
a large amount of kernel growth. Not that big an issue for me personally
but driver writers seem to get hit with all the changes.

* Drivers might get sloppy about not cleaning up timers and data structures
-- more than they are already. You might want to have a kernel debug option
that overwrite's the unloaded text with something guaranteed to cause an
oops.






2003-07-25 19:17:55

by Greg KH

[permalink] [raw]
Subject: Re: [PATCH] Remove module reference counting.

On Fri, Jul 25, 2003 at 12:26:51PM -0700, Stephen Hemminger wrote:
>
> There are two possible objections to this:
> * Some developers keep the same kernel running and load/unload then reload
> a new driver when debugging. This would break probably or at least cause
> a large amount of kernel growth. Not that big an issue for me personally
> but driver writers seem to get hit with all the changes.

As a driver writer, I don't mind this.

thanks,

greg k-h

2003-07-25 22:27:58

by Gianni Tedesco

[permalink] [raw]
Subject: Re: [PATCH] Remove module reference counting.

On Thu, 2003-07-24 at 19:00, Rusty Russell wrote:
> If module removal is to be a rare and unusual event, it
> doesn't seem so sensible to go to great lengths in the code to handle
> just that case. In fact, it's easier to leave the module memory in
> place, and not have the concept of parts of the kernel text (and some
> types of kernel data) vanishing.

Wasn't the idea once banded about of a 2-stage unload that went
something like:

1. ->cleanup() - unregister IRQ handlers, timers, etc.
2. Quiesce the system
3. Safe to unload

surely if nothing is registered and all CPUs do a voluntary schedule()
then there can be no chance of calling back in to the module.

LOL. Or am I kidding myself here? :)

--
// Gianni Tedesco (gianni at scaramanga dot co dot uk)
lynx --source http://www.scaramanga.co.uk/gianni-at-ecsc.asc | gpg --import
8646BE7D: 6D9F 2287 870E A2C9 8F60 3A3C 91B5 7669 8646 BE7D


Attachments:
signature.asc (189.00 B)
This is a digitally signed message part

2003-07-25 23:21:55

by Bernd Eckenfels

[permalink] [raw]
Subject: Re: [PATCH] Remove module reference counting.

In article <1059172995.16255.6.camel@sherbert> you wrote:
> 1. ->cleanup() - unregister IRQ handlers, timers, etc.
...
> surely if nothing is registered and all CPUs do a voluntary schedule()
> then there can be no chance of calling back in to the module.

no, because data structures might contain pointers, especially in handles.
So unregistering will avoid new handlers to jump into the code, but it will
not avoid that existing objets (i.e. open handles) exist. And exactly for
those the refcounting is used.

Greetings
Bernd
--
eckes privat - http://www.eckes.org/
Project Freefire - http://www.freefire.org/

2003-07-25 23:49:28

by Alan

[permalink] [raw]
Subject: Re: [PATCH] Remove module reference counting.

On Gwe, 2003-07-25 at 20:26, Stephen Hemminger wrote:
> > If module removal is to be a rare and unusual event, it
> > doesn't seem so sensible to go to great lengths in the code to handle
> > just that case. In fact, it's easier to leave the module memory in
> > place, and not have the concept of parts of the kernel text (and some
> > types of kernel data) vanishing.

Uggh. There is a difference between taking the approach that some stuff
is hard to handle and gets into trouble for using MOD_INC/DEC so is
unsafe, and doing the locking from the caller, or arranging that you
know the device is quiescent in the unload path and not allowing
unloading to work properly.

I've got drivers that use MOD_INC/DEC and are technically unsafe, they
by default now don't unload and its an incentive to fix them. I'd hate
to have my box cluttering up and have to keep rebooting to test drivers
because of inept implementations however.


2003-07-26 17:06:30

by Rusty Russell

[permalink] [raw]
Subject: Re: [PATCH] Remove module reference counting.

In message <[email protected]> you write:
> On Fri, 25 Jul 2003 04:00:18 +1000
> Rusty Russell <[email protected]> wrote:
> > If module removal is to be a rare and unusual event, it
> > doesn't seem so sensible to go to great lengths in the code to handle
> > just that case. In fact, it's easier to leave the module memory in
> > place, and not have the concept of parts of the kernel text (and some
> > types of kernel data) vanishing.
> >
> > Polite feedback welcome,
> > Rusty.
> > --
>
> There are two possible objections to this:
> * Some developers keep the same kernel running and load/unload then reload
> a new driver when debugging. This would break probably or at least cause
> a large amount of kernel growth. Not that big an issue for me personally
> but driver writers seem to get hit with all the changes.

No, it would just leak memory. Not really a concern for developers.
It's fairly trivial to hack up a backdoor "remove all freed modules
and be damned" thing for developers if there's real demand.

> * Drivers might get sloppy about not cleaning up timers and data
> structures -- more than they are already. You might want to have a
> kernel debug option that overwrite's the unloaded text with
> something guaranteed to cause an oops.

I already have a poisoning patch for init code, when some modules
seemed to suffer from this being discarded. I can extend it.

Thanks!
Rusty.
--
Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

2003-07-26 19:16:24

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] Remove module reference counting.


First off - we're not changing fundamental module stuff any more.

On Sat, 26 Jul 2003, Rusty Russell wrote:
>
> No, it would just leak memory. Not really a concern for developers.
> It's fairly trivial to hack up a backdoor "remove all freed modules
> and be damned" thing for developers if there's real demand.

It's not just a developer thing. At least installers etc used to do some
device probing by loading modules and depending on the result.

Linus

2003-07-26 19:22:34

by Arjan van de Ven

[permalink] [raw]
Subject: Re: [PATCH] Remove module reference counting.

On Sat, Jul 26, 2003 at 12:31:25PM -0700, Linus Torvalds wrote:
>
> First off - we're not changing fundamental module stuff any more.

fair enough

>
> On Sat, 26 Jul 2003, Rusty Russell wrote:
> >
> > No, it would just leak memory. Not really a concern for developers.
> > It's fairly trivial to hack up a backdoor "remove all freed modules
> > and be damned" thing for developers if there's real demand.
>
> It's not just a developer thing. At least installers etc used to do some
> device probing by loading modules and depending on the result.

yes but those same installers couldn't care less if the kernel
completely frees the memory of the module text or if it leaks that.
It won't even notice the difference....

2003-07-27 05:25:39

by Aschwin Marsman

[permalink] [raw]
Subject: Re: [PATCH] Remove module reference counting.

On Sat, 26 Jul 2003, Arjan van de Ven wrote:

> On Sat, Jul 26, 2003 at 12:31:25PM -0700, Linus Torvalds wrote:
> >
> > On Sat, 26 Jul 2003, Rusty Russell wrote:
> > >
> > > No, it would just leak memory. Not really a concern for developers.
> > > It's fairly trivial to hack up a backdoor "remove all freed modules
> > > and be damned" thing for developers if there's real demand.
> >
> > It's not just a developer thing. At least installers etc used to do some
> > device probing by loading modules and depending on the result.
>
> yes but those same installers couldn't care less if the kernel
> completely frees the memory of the module text or if it leaks that.
> It won't even notice the difference....

But doesn't the same happen when e.g. kudzu tries to detect new hardware?
This is running every time my RH system boots, so memory leaks at that
moment do not appeal to me. When it's only an installer thing, it doesn't
worry me, although from an engineering perspective...

Have fun,

Aschwin Marsman

--
aYniK Software Solutions all You need is Knowledge
P.O. box 134 NL-7600 AC Almelo - the Netherlands
[email protected] http://www.aYniK.com

2003-07-27 10:58:13

by Alan

[permalink] [raw]
Subject: Re: [PATCH] Remove module reference counting.

On Sad, 2003-07-26 at 20:37, Arjan van de Ven wrote:
> > It's not just a developer thing. At least installers etc used to do some
> > device probing by loading modules and depending on the result.
>
> yes but those same installers couldn't care less if the kernel
> completely frees the memory of the module text or if it leaks that.
> It won't even notice the difference....

On a 64Mb box the Red Hat installer would crash in this situation if you
do the maths

2003-07-27 10:59:09

by Alan

[permalink] [raw]
Subject: Re: [PATCH] Remove module reference counting.

On Sul, 2003-07-27 at 06:38, Aschwin Marsman wrote:
> But doesn't the same happen when e.g. kudzu tries to detect new hardware?
> This is running every time my RH system boots, so memory leaks at that
> moment do not appeal to me. When it's only an installer thing, it doesn't
> worry me, although from an engineering perspective...

Yes - and also every time you trigger some of the USB firmware loading
modules and the like you'd lose 200K or so

2003-07-27 18:26:06

by Rusty Russell

[permalink] [raw]
Subject: Re: [PATCH] Remove module reference counting.

In message <1059172995.16255.6.camel@sherbert> you write:
> Wasn't the idea once banded about of a 2-stage unload that went
> something like:
>
> 1. ->cleanup() - unregister IRQ handlers, timers, etc.
> 2. Quiesce the system
> 3. Safe to unload
>
> surely if nothing is registered and all CPUs do a voluntary schedule()
> then there can be no chance of calling back in to the module.

Yes, I implemented this, even. The problem is that the desired module
semantics are "unload if it'll work, otherwise do nothing and fail".
This means that you either get halfway through the cleanup function
and back out (which leaves the race where some interfaces to your
module is MIA for a while), or you hang forever if things are in use.

It's this "atomically check refcount and deregister" that
try_module_get() gives us, by effectively unregistering all the
modules' interfaces at once.

Rusty.
--
Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

2003-07-27 22:54:39

by Arjan van de Ven

[permalink] [raw]
Subject: Re: [PATCH] Remove module reference counting.

On Mon, Jul 28, 2003 at 05:34:36AM +1000, Rusty Russell wrote:
>
> The kudzu one and Alan's USB firmware example bother me more: they
> load then unload modules currently?

I'm pretty sure kudzu doesn't

2003-07-28 00:14:50

by Alan

[permalink] [raw]
Subject: Re: [PATCH] Remove module reference counting.

On Sul, 2003-07-27 at 22:47, Arjan van de Ven wrote:
> On Mon, Jul 28, 2003 at 05:34:36AM +1000, Rusty Russell wrote:
> >
> > The kudzu one and Alan's USB firmware example bother me more: they
> > load then unload modules currently?
>
> I'm pretty sure kudzu doesn't

Redid the instrumentation - kudzu loads a *lot* of modules which do a
ton of stuff then decide they failed to initialize. It doesn't seem
to load modules that initialize successfully then unload then. Those
stay in memory by default.


2003-07-28 00:09:02

by Bill Nottingham

[permalink] [raw]
Subject: Re: [PATCH] Remove module reference counting.

Arjan van de Ven ([email protected]) said:
> > The kudzu one and Alan's USB firmware example bother me more: they
> > load then unload modules currently?
>
> I'm pretty sure kudzu doesn't

It loads/unloads things like scsi modules and firewire controller
modules, but only for hardware actually present in the system (i.e.,
you'd probably be loading it again anyway, if you haven't already
loaded it.)

Bill

2003-07-28 01:38:54

by Rusty Russell

[permalink] [raw]
Subject: Re: [PATCH] Remove module reference counting.

In message <[email protected]> you write:
> On Fri, 25 Jul 2003 04:00:18 +1000
> Rusty Russell <[email protected]> wrote:
>
> > If module removal is to be a rare and unusual event, it
> > doesn't seem so sensible to go to great lengths in the code to handle
> > just that case. In fact, it's easier to leave the module memory in
> > place, and not have the concept of parts of the kernel text (and some
> > types of kernel data) vanishing.
> >
> > Polite feedback welcome,
>
> I'm ok with this, with one possible enhancement.
>
> How about we make ->cleanup() return a boolean, which if true
> causes the caller to do the module_free()?

Some "I am the perfect module" flag would probably cause less
breakage. But, I'm not sure even that is worth it.

Rusty.
--
Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

2003-07-28 02:01:30

by Rusty Russell

[permalink] [raw]
Subject: Re: [PATCH] Remove module reference counting.

In message <[email protected]> you write:
>
> First off - we're not changing fundamental module stuff any more.

OK. Who are you and what have you done with the real Linus?

I guess it's back to fixing up reference counting in the rest of the
kernel. It's not hard, it's just not done. 8(

> > No, it would just leak memory. Not really a concern for developers.
> > It's fairly trivial to hack up a backdoor "remove all freed modules
> > and be damned" thing for developers if there's real demand.
>
> It's not just a developer thing. At least installers etc used to do some
> device probing by loading modules and depending on the result.

A similar case. At this point you don't have random users trying to
access things, so freeing is actually fairly safe (modulo other
bugs).

The kudzu one and Alan's USB firmware example bother me more: they
load then unload modules currently? Since modern device drivers are
not supposed to fail to load just because there isn't currently
hardware, I guess they'd have to.

Oh well,
Rusty.
--
Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

2003-07-28 02:25:13

by Rusty Russell

[permalink] [raw]
Subject: Re: [PATCH] Remove module reference counting.

In message <[email protected]> you write:
> On Gwe, 2003-07-25 at 20:26, Stephen Hemminger wrote:
> > > If module removal is to be a rare and unusual event, it
> > > doesn't seem so sensible to go to great lengths in the code to handle
> > > just that case. In fact, it's easier to leave the module memory in
> > > place, and not have the concept of parts of the kernel text (and some
> > > types of kernel data) vanishing.
>
> Uggh. There is a difference between taking the approach that some stuff
> is hard to handle and gets into trouble for using MOD_INC/DEC so is
> unsafe, and doing the locking from the caller, or arranging that you
> know the device is quiescent in the unload path and not allowing
> unloading to work properly.

We can do this everywhere: we have the technology. But as I pointed
out, at least some hackers who know what they are doing have balked at
what that involves. This is apart from the subsystems which are still
not safe as it stands.

> I've got drivers that use MOD_INC/DEC and are technically unsafe, they
> by default now don't unload and its an incentive to fix them. I'd hate
> to have my box cluttering up and have to keep rebooting to test drivers
> because of inept implementations however.

But OTOH, this patch would make those modules perfectly safe: no
fixing needed.

One modification is to tally up the deleted modules in /proc/modules
under a "[deleted]" entry or somesuch, but allow you to "rmmod
[deleted]" and actually free that memory (and taint your kernel). eg:

# lsmod
Module Size Used by
loop 8144 0
[deleted] 12345
# rmmod '[deleted]'

Thoughts?
Rusty.
--
Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

2003-07-28 11:19:43

by Rahul Karnik

[permalink] [raw]
Subject: Re: [PATCH] Remove module reference counting.

Rusty Russell wrote:

> If module removal is to be a rare and unusual event, it
> doesn't seem so sensible to go to great lengths in the code to handle
> just that case. In fact, it's easier to leave the module memory in
> place, and not have the concept of parts of the kernel text (and some
> types of kernel data) vanishing.

Rusty and others,

Module removal is *not* a rare event. One common case it is used is on
laptops during suspend. A lot of drivers do not do proper PM and so must
be unloaded before suspend and relaoaded after resume. How will this be
affected by removing module refcounting, even if we use your <deleted>
idea? If nothing else, having the ability to *reload* a module --
thereby reinitializing the device and achieving the same effect as
actually rmmod/insmod is what is needed.

I must say that it is somewhat disconcerting that I can rmmod a network
driver while it is being used by a network interface. A stupid user like
me can definitely shoot myself in the foot now.

Last but not least weren't we moving towards a more modular kernel with
early userspace loading things from initrd as needed? Removing existing
module functionality, however broken it may be, seems to me a step
backward in this regard.

Thanks,
Rahul
--
Rahul Karnik
[email protected]

2003-07-28 11:31:23

by Alan

[permalink] [raw]
Subject: Re: [PATCH] Remove module reference counting.

On Sul, 2003-07-27 at 20:34, Rusty Russell wrote:
> In message <[email protected]> you write:
> >
> > First off - we're not changing fundamental module stuff any more.
>
> OK. Who are you and what have you done with the real Linus?
>
> I guess it's back to fixing up reference counting in the rest of the
> kernel. It's not hard, it's just not done. 8(

Right but it wasnt in 2.2 or 2.4 and its root only. If you want to be
really paranoid add a

MODULE_UNLOADABLE

that people can add to their modules that do unload safely (ie most of
them) as and when they check them over.

2003-07-28 11:28:08

by Alan

[permalink] [raw]
Subject: Re: [PATCH] Remove module reference counting.

On Llu, 2003-07-28 at 01:12, Bill Nottingham wrote:
> It loads/unloads things like scsi modules and firewire controller
> modules, but only for hardware actually present in the system (i.e.,
> you'd probably be loading it again anyway, if you haven't already
> loaded it.)

It loads things like floppy anyway, and it loads lots of things like the
firewire stuff that nobody ever uses because it has to see if anything
is plugged into them.

I guess kudzu could simply do lots of I/O ops directly on the floppy
hardware to detect it without loading drivers but thats pretty fugly.

2003-07-28 17:56:30

by Gianni Tedesco

[permalink] [raw]
Subject: Re: [PATCH] Remove module reference counting.

On Sun, 2003-07-27 at 20:34, Rusty Russell wrote:
> I guess it's back to fixing up reference counting in the rest of the
> kernel. It's not hard, it's just not done. 8(

Do you know which subsystems and modules are definately broken wrt.
refcounting? And also which ones are un-fixable / wont-fix and why.

Maybe someone will step up to the plate if you name and shame...

--
// Gianni Tedesco (gianni at scaramanga dot co dot uk)
lynx --source http://www.scaramanga.co.uk/gianni-at-ecsc.asc | gpg --import
8646BE7D: 6D9F 2287 870E A2C9 8F60 3A3C 91B5 7669 8646 BE7D


Attachments:
signature.asc (189.00 B)
This is a digitally signed message part

2003-07-28 18:48:20

by Mike Fedyk

[permalink] [raw]
Subject: Re: [PATCH] Remove module reference counting.

On Mon, Jul 28, 2003 at 07:11:42PM +0100, Gianni Tedesco wrote:
> On Sun, 2003-07-27 at 20:34, Rusty Russell wrote:
> > I guess it's back to fixing up reference counting in the rest of the
> > kernel. It's not hard, it's just not done. 8(
>
> Do you know which subsystems and modules are definately broken wrt.
> refcounting? And also which ones are un-fixable / wont-fix and why.
>
> Maybe someone will step up to the plate if you name and shame...

At the very least, the network drivers have some trouble...

2003-07-29 02:01:00

by Rusty Russell

[permalink] [raw]
Subject: Re: [PATCH] Remove module reference counting.

In message <[email protected]> you write:
> Rusty Russell wrote:
>
> > If module removal is to be a rare and unusual event, it
> > doesn't seem so sensible to go to great lengths in the code to handle
> > just that case. In fact, it's easier to leave the module memory in
> > place, and not have the concept of parts of the kernel text (and some
> > types of kernel data) vanishing.
>
> Rusty and others,
>
> Module removal is *not* a rare event. One common case it is used is on
> laptops during suspend.

Yes, but that cuts both ways: noone fixes these broken drivers, but
work around them using module removal, leaving newbies with broken
laptops 8(

> Last but not least weren't we moving towards a more modular kernel with
> early userspace loading things from initrd as needed? Removing existing
> module functionality, however broken it may be, seems to me a step
> backward in this regard.

Not really. Adding modules is required. Removing them is a more
dubious goal, and if we didn't already have it, I know we'd balk at
doing it.

Hope that clarifies!
Rusty.
--
Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

2003-07-29 02:10:33

by Perez-Gonzalez, Inaky

[permalink] [raw]
Subject: RE: [PATCH] Remove module reference counting.


> From: Rusty Russell [mailto:[email protected]]

> > Last but not least weren't we moving towards a more modular kernel with
> > early userspace loading things from initrd as needed? Removing existing
> > module functionality, however broken it may be, seems to me a step
> > backward in this regard.
>
> Not really. Adding modules is required. Removing them is a more
> dubious goal, and if we didn't already have it, I know we'd balk at
> doing it.

Can I add that it is really desirable to remove modules when developing
drivers? [and so to avoid reboots when loading new code?].

I?aky P?rez-Gonz?lez -- Not speaking for Intel -- all opinions are my own (and my fault)

2003-07-29 02:22:53

by Rahul Karnik

[permalink] [raw]
Subject: Re: [PATCH] Remove module reference counting.

Rusty,

> Yes, but that cuts both ways: noone fixes these broken drivers, but
> work around them using module removal, leaving newbies with broken
> laptops 8(

Good point; so there's the work of fixing power management with drivers
known to load and unload correctly (dependent on hardware specs,
undocumented registers, etcc), or adding refcounting to fix the
remaining cases of drivers that do not unload safely (solvable in
kernel). Pick your poison. :)

By the way, what about a reload option that re-inits the module? Is that
possible/present, or subject to the same difficulties as unloading?

> Not really. Adding modules is required. Removing them is a more
> dubious goal, and if we didn't already have it, I know we'd balk at
> doing it.

Fair enough. I think we all agree that module unloading is a hard problem.

Thanks,
Rahul
--
Rahul Karnik
[email protected]

2003-07-29 23:01:10

by Rusty Russell

[permalink] [raw]
Subject: Re: [PATCH] Remove module reference counting.

On 28 Jul 2003 12:38:41 +0100
Alan Cox <[email protected]> wrote:

> On Llu, 2003-07-28 at 01:12, Bill Nottingham wrote:
> > It loads/unloads things like scsi modules and firewire controller
> > modules, but only for hardware actually present in the system (i.e.,
> > you'd probably be loading it again anyway, if you haven't already
> > loaded it.)
>
> It loads things like floppy anyway, and it loads lots of things like the
> firewire stuff that nobody ever uses because it has to see if anything
> is plugged into them.

And it has to leave them in memory anyway, in case someone plugs stuff in
later. Oh well.

> I guess kudzu could simply do lots of I/O ops directly on the floppy
> hardware to detect it without loading drivers but thats pretty fugly.

Agreed that'd be kinda silly. But I was "educated" earlier that driver
loading shouldn't fail just because hardware is missing, due to hotplug.

Is this wrong?
Rusty.
--
there are those who do and those who hang on and you don't see too
many doers quoting their contemporaries. -- Larry McVoy

2003-07-30 01:55:26

by Greg KH

[permalink] [raw]
Subject: Re: [PATCH] Remove module reference counting.

On Wed, Jul 30, 2003 at 06:33:10AM +1000, Rusty Russell wrote:
> Agreed that'd be kinda silly. But I was "educated" earlier that driver
> loading shouldn't fail just because hardware is missing, due to hotplug.
>
> Is this wrong?

No, this is not wrong. Older pci drivers would refuse to load if they
didn't find their pci device in the system at that moment in time. All
pci drivers converted to the "new" api (new is a very relative term,
some 3 years old now...) will load just fine even if their devices are
not present.

Hope this helps,

greg k-h

2003-07-30 14:47:41

by Alan

[permalink] [raw]
Subject: Re: [PATCH] Remove module reference counting.

On Maw, 2003-07-29 at 21:33, Rusty Russell wrote:
> > I guess kudzu could simply do lots of I/O ops directly on the floppy
> > hardware to detect it without loading drivers but thats pretty fugly.
>
> Agreed that'd be kinda silly. But I was "educated" earlier that driver
> loading shouldn't fail just because hardware is missing, due to hotplug.
>
> Is this wrong?

On systems without hotplug, on not hotpluggable devices and in a few other
cases - yes.

2003-08-01 02:40:12

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] Remove module reference counting.


On Mon, 28 Jul 2003, Rusty Russell wrote:
>
> I guess it's back to fixing up reference counting in the rest of the
> kernel. It's not hard, it's just not done. 8(

Well, it's _never_ been done, so saying "we have to fix it for 2.6.x" is
obviously not true. It's one of those "nobody ends up really caring"
issues, since only root can unload anyway.

Linus