2003-01-02 22:42:47

by Luca

[permalink] [raw]
Subject: [RFC] Migrating net/sched to new module interface

Hi,

I'm working on net/sched to convert it to new module interface. I'm
going to add a new field 'owner' to struct Qdisc_ops. The init function
of each module registers the callback functions and a reference to
the module itself. When a qdisc is created (qdisc_create) I check
try_module_get() to see if I can use the registered callback functions
and if I can't I return -ENOSYS. When a qdisc is delete (qdisc_destroy)
I use module_put() to decrement the refcount. The attacched patch shows
this preliminar work on sch_api.c sch_generic.c.

The next step is to update the other schedulers removing
MOD_{INC,DEC}_USE_COUNT and adding the new 'owner' field.

This work is based on my (maybe poor) understanding on the new module
system, so I'm waiting some feedback before going on.

diff --exclude=diff.exclude -ru linux-2.5.orig/include/net/pkt_sched.h linux-2.5/include/net/pkt_sched.h
--- linux-2.5.orig/include/net/pkt_sched.h Sun Aug 11 22:07:18 2002
+++ linux-2.5/include/net/pkt_sched.h Thu Jan 2 22:50:57 2003
@@ -10,6 +10,7 @@
#include <linux/config.h>
#include <linux/types.h>
#include <linux/pkt_sched.h>
+#include <linux/module.h>
#include <net/pkt_cls.h>

#ifdef CONFIG_X86_TSC
@@ -67,6 +68,8 @@
int (*change)(struct Qdisc *, struct rtattr *arg);

int (*dump)(struct Qdisc *, struct sk_buff *);
+ /* Protects callbacks */
+ struct module *owner;
};

extern rwlock_t qdisc_tree_lock;
diff --exclude=diff.exclude -ru linux-2.5.orig/net/sched/sch_api.c linux-2.5/net/sched/sch_api.c
--- linux-2.5.orig/net/sched/sch_api.c Thu Jan 2 21:55:40 2003
+++ linux-2.5/net/sched/sch_api.c Thu Jan 2 22:59:26 2003
@@ -409,6 +409,10 @@
if (ops == NULL)
goto err_out;

+ err = -ENOSYS;
+ if (try_module_get(ops->owner) == 0)
+ goto err_module;
+
size = sizeof(*sch) + ops->priv_size;

sch = kmalloc(size, GFP_KERNEL);
@@ -416,12 +420,6 @@
if (!sch)
goto err_out;

- /* Grrr... Resolve race condition with module unload */
-
- err = -EINVAL;
- if (ops != qdisc_lookup_ops(kind))
- goto err_out;
-
memset(sch, 0, size);

skb_queue_head_init(&sch->q);
@@ -460,6 +458,8 @@
}

err_out:
+ module_put(ops->owner);
+err_module:
*errp = err;
if (sch)
kfree(sch);
diff --exclude=diff.exclude -ru linux-2.5.orig/net/sched/sch_generic.c linux-2.5/net/sched/sch_generic.c
--- linux-2.5.orig/net/sched/sch_generic.c Thu Jan 2 22:41:24 2003
+++ linux-2.5/net/sched/sch_generic.c Thu Jan 2 23:07:55 2003
@@ -29,6 +29,7 @@
#include <linux/skbuff.h>
#include <linux/rtnetlink.h>
#include <linux/init.h>
+#include <linux/module.h>
#include <net/sock.h>
#include <net/pkt_sched.h>

@@ -356,6 +357,8 @@

.init = pfifo_fast_init,
.reset = pfifo_fast_reset,
+
+ .owner = THIS_MODULE,
};

struct Qdisc * qdisc_create_dflt(struct net_device *dev, struct Qdisc_ops *ops)
@@ -422,6 +425,7 @@
ops->reset(qdisc);
if (ops->destroy)
ops->destroy(qdisc);
+ module_put(ops->owner);
if (!(qdisc->flags&TCQ_F_BUILTIN))
kfree(qdisc);
}


ciao,
Luca
--
Reply-To: [email protected]
Home: http://kronoz.cjb.net
"Chi parla in tono cortese, ma continua a prepararsi, potra` andare avanti;
chi parla in tono bellicoso e avanza rapidamente dovra` ritirarsi"
Sun Tzu -- L'arte della guerra


2003-01-03 05:02:03

by Rusty Russell

[permalink] [raw]
Subject: Re: [RFC] Migrating net/sched to new module interface

In message <[email protected]> you write:
> Hi,
>
> I'm working on net/sched to convert it to new module interface. I'm
> going to add a new field 'owner' to struct Qdisc_ops.

Hmm, I thought the sched stuff all runs under the network brlock? If
so, it doesn't need to be held in, since it's not preemptible.

I haven't checked, though...
Rusty.
--
Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

2003-01-03 07:56:07

by David Miller

[permalink] [raw]
Subject: Re: [RFC] Migrating net/sched to new module interface

On Thu, 2003-01-02 at 21:10, Rusty Russell wrote:
> Hmm, I thought the sched stuff all runs under the network brlock? If
> so, it doesn't need to be held in, since it's not preemptible.

The packet schedulers transmit, not receive.
They have their own queue locking, along with the device
xmit lock.

2003-01-04 06:11:55

by Rusty Russell

[permalink] [raw]
Subject: Re: [RFC] Migrating net/sched to new module interface

In message <[email protected]> you write:
> On Thu, 2003-01-02 at 21:10, Rusty Russell wrote:
> > Hmm, I thought the sched stuff all runs under the network brlock? If
> > so, it doesn't need to be held in, since it's not preemptible.
>
> The packet schedulers transmit, not receive.
> They have their own queue locking, along with the device
> xmit lock.

Then the patch to mark the "owner" should be straightforward. I
haven't looked at kronos's patch in detail, though.

Rusty.
--
Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

2003-01-04 16:14:34

by Luca

[permalink] [raw]
Subject: Re: [RFC] Migrating net/sched to new module interface

Il Sat, Jan 04, 2003 at 05:09:41PM +1100, Rusty Russell ha scritto:
> Then the patch to mark the "owner" should be straightforward.

Yes, it seems so :)

This is the patch:

diff --exclude-from=diff.exclude -ru linux-2.5.orig/net/sched/cls_api.c linux-2.5/net/sched/cls_api.c
--- linux-2.5.orig/net/sched/cls_api.c Sat Jan 4 16:38:50 2003
+++ linux-2.5/net/sched/cls_api.c Fri Jan 3 17:19:07 2003
@@ -216,6 +216,13 @@
kfree(tp);
goto errout;
}
+
+ if (try_module_get(tp_ops->owner) == 0) {
+ err = -ENOSYS;
+ kfree(tp);
+ goto errout;
+ }
+
memset(tp, 0, sizeof(*tp));
tp->ops = tp_ops;
tp->protocol = protocol;
@@ -225,6 +232,7 @@
tp->classid = parent;
err = tp_ops->init(tp);
if (err) {
+ module_put(tp_ops->owner);
kfree(tp);
goto errout;
}
@@ -248,6 +256,7 @@
write_unlock(&qdisc_tree_lock);

tp->ops->destroy(tp);
+ module_put(tp->ops->owner);
kfree(tp);
err = 0;
goto errout;
diff --exclude-from=diff.exclude -ru linux-2.5.orig/net/sched/cls_fw.c linux-2.5/net/sched/cls_fw.c
--- linux-2.5.orig/net/sched/cls_fw.c Sat Jan 4 16:38:50 2003
+++ linux-2.5/net/sched/cls_fw.c Sat Jan 4 16:49:11 2003
@@ -117,7 +117,6 @@

static int fw_init(struct tcf_proto *tp)
{
- MOD_INC_USE_COUNT;
return 0;
}

@@ -128,7 +127,6 @@
int h;

if (head == NULL) {
- MOD_DEC_USE_COUNT;
return;
}

@@ -146,7 +144,6 @@
}
}
kfree(head);
- MOD_DEC_USE_COUNT;
}

static int fw_delete(struct tcf_proto *tp, unsigned long arg)
@@ -351,18 +348,21 @@
}

struct tcf_proto_ops cls_fw_ops = {
- NULL,
- "fw",
- fw_classify,
- fw_init,
- fw_destroy,
-
- fw_get,
- fw_put,
- fw_change,
- fw_delete,
- fw_walk,
- fw_dump
+ .next = NULL,
+ .kind = "fw",
+ .classify = fw_classify,
+ .init = fw_init,
+ .destroy = fw_destroy,
+
+ .get = fw_get,
+ .put = fw_put,
+ .change = fw_change,
+ .delete = fw_delete,
+ .walk = fw_walk,
+
+ .dump = fw_dump,
+
+ .owner = THIS_MODULE,
};

#ifdef MODULE
diff --exclude-from=diff.exclude -ru linux-2.5.orig/net/sched/cls_route.c linux-2.5/net/sched/cls_route.c
--- linux-2.5.orig/net/sched/cls_route.c Sat Jan 4 16:38:50 2003
+++ linux-2.5/net/sched/cls_route.c Sat Jan 4 16:47:29 2003
@@ -272,7 +272,6 @@

static int route4_init(struct tcf_proto *tp)
{
- MOD_INC_USE_COUNT;
return 0;
}

@@ -282,7 +281,6 @@
int h1, h2;

if (head == NULL) {
- MOD_DEC_USE_COUNT;
return;
}

@@ -309,7 +307,6 @@
}
}
kfree(head);
- MOD_DEC_USE_COUNT;
}

static int route4_delete(struct tcf_proto *tp, unsigned long arg)
@@ -607,18 +604,21 @@
}

struct tcf_proto_ops cls_route4_ops = {
- NULL,
- "route",
- route4_classify,
- route4_init,
- route4_destroy,
-
- route4_get,
- route4_put,
- route4_change,
- route4_delete,
- route4_walk,
- route4_dump
+ .next = NULL,
+ .kind = "route",
+ .classify = route4_classify,
+ .init = route4_init,
+ .destroy = route4_destroy,
+
+ .get = route4_get,
+ .put = route4_put,
+ .change = route4_change,
+ .delete = route4_delete,
+ .walk = route4_walk,
+
+ .dump = route4_dump,
+
+ .owner = THIS_MODULE,
};

#ifdef MODULE
diff --exclude-from=diff.exclude -ru linux-2.5.orig/net/sched/cls_rsvp.h linux-2.5/net/sched/cls_rsvp.h
--- linux-2.5.orig/net/sched/cls_rsvp.h Sat Jan 4 16:38:50 2003
+++ linux-2.5/net/sched/cls_rsvp.h Sat Jan 4 16:49:56 2003
@@ -247,14 +247,12 @@
{
struct rsvp_head *data;

- MOD_INC_USE_COUNT;
data = kmalloc(sizeof(struct rsvp_head), GFP_KERNEL);
if (data) {
memset(data, 0, sizeof(struct rsvp_head));
tp->root = data;
return 0;
}
- MOD_DEC_USE_COUNT;
return -ENOBUFS;
}

@@ -294,7 +292,6 @@
}
}
kfree(data);
- MOD_DEC_USE_COUNT;
}

static int rsvp_delete(struct tcf_proto *tp, unsigned long arg)
@@ -673,18 +670,21 @@
}

struct tcf_proto_ops RSVP_OPS = {
- NULL,
- RSVP_ID,
- rsvp_classify,
- rsvp_init,
- rsvp_destroy,
-
- rsvp_get,
- rsvp_put,
- rsvp_change,
- rsvp_delete,
- rsvp_walk,
- rsvp_dump
+ .next = NULL,
+ .kind = RSVP_ID,
+ .classify = rsvp_classify,
+ .init = rsvp_init,
+ .destroy = rsvp_destroy,
+
+ .get = rsvp_get,
+ .put = rsvp_put,
+ .change = rsvp_change,
+ .delete = rsvp_delete,
+ .walk = rsvp_walk,
+
+ .dump = rsvp_dump,
+
+ .owner = THIS_MODULE,
};

#ifdef MODULE
diff --exclude-from=diff.exclude -ru linux-2.5.orig/net/sched/cls_tcindex.c linux-2.5/net/sched/cls_tcindex.c
--- linux-2.5.orig/net/sched/cls_tcindex.c Sat Jan 4 16:38:50 2003
+++ linux-2.5/net/sched/cls_tcindex.c Sat Jan 4 16:50:21 2003
@@ -144,10 +144,8 @@
struct tcindex_data *p;

DPRINTK("tcindex_init(tp %p)\n",tp);
- MOD_INC_USE_COUNT;
p = kmalloc(sizeof(struct tcindex_data),GFP_KERNEL);
if (!p) {
- MOD_DEC_USE_COUNT;
return -ENOMEM;
}
tp->root = p;
@@ -417,7 +415,6 @@
kfree(p->h);
kfree(p);
tp->root = NULL;
- MOD_DEC_USE_COUNT;
}


@@ -480,20 +477,22 @@
}

struct tcf_proto_ops cls_tcindex_ops = {
- NULL,
- "tcindex",
- tcindex_classify,
- tcindex_init,
- tcindex_destroy,
-
- tcindex_get,
- tcindex_put,
- tcindex_change,
- tcindex_delete,
- tcindex_walk,
- tcindex_dump
-};
+ .next = NULL,
+ .kind = "tcindex",
+ .classify = tcindex_classify,
+ .init = tcindex_init,
+ .destroy = tcindex_destroy,
+
+ .get = tcindex_get,
+ .put = tcindex_put,
+ .change = tcindex_change,
+ .delete = tcindex_delete,
+ .walk = tcindex_walk,
+
+ .dump = tcindex_dump,

+ .owner = THIS_MODULE,
+};

#ifdef MODULE
int init_module(void)
diff --exclude-from=diff.exclude -ru linux-2.5.orig/net/sched/cls_u32.c linux-2.5/net/sched/cls_u32.c
--- linux-2.5.orig/net/sched/cls_u32.c Sat Jan 4 16:38:50 2003
+++ linux-2.5/net/sched/cls_u32.c Sat Jan 4 16:50:39 2003
@@ -262,15 +262,12 @@
struct tc_u_hnode *root_ht;
struct tc_u_common *tp_c;

- MOD_INC_USE_COUNT;
-
for (tp_c = u32_list; tp_c; tp_c = tp_c->next)
if (tp_c->q == tp->q)
break;

root_ht = kmalloc(sizeof(*root_ht), GFP_KERNEL);
if (root_ht == NULL) {
- MOD_DEC_USE_COUNT;
return -ENOBUFS;
}
memset(root_ht, 0, sizeof(*root_ht));
@@ -282,7 +279,6 @@
tp_c = kmalloc(sizeof(*tp_c), GFP_KERNEL);
if (tp_c == NULL) {
kfree(root_ht);
- MOD_DEC_USE_COUNT;
return -ENOBUFS;
}
memset(tp_c, 0, sizeof(*tp_c));
@@ -407,7 +403,6 @@
kfree(tp_c);
}

- MOD_DEC_USE_COUNT;
tp->data = NULL;
}

@@ -695,18 +690,21 @@
}

struct tcf_proto_ops cls_u32_ops = {
- NULL,
- "u32",
- u32_classify,
- u32_init,
- u32_destroy,
-
- u32_get,
- u32_put,
- u32_change,
- u32_delete,
- u32_walk,
- u32_dump
+ .next = NULL,
+ .kind = "u32",
+ .classify = u32_classify,
+ .init = u32_init,
+ .destroy = u32_destroy,
+
+ .get = u32_get,
+ .put = u32_put,
+ .change = u32_change,
+ .delete = u32_delete,
+ .walk = u32_walk,
+
+ .dump = u32_dump,
+
+ .owner = THIS_MODULE,
};

#ifdef MODULE
diff --exclude-from=diff.exclude -ru linux-2.5.orig/net/sched/sch_api.c linux-2.5/net/sched/sch_api.c
--- linux-2.5.orig/net/sched/sch_api.c Sat Jan 4 16:38:50 2003
+++ linux-2.5/net/sched/sch_api.c Sat Jan 4 16:53:23 2003
@@ -409,6 +409,10 @@
if (ops == NULL)
goto err_out;

+ err = -ENOSYS;
+ if (try_module_get(ops->owner) == 0)
+ goto err_module;
+
size = sizeof(*sch) + ops->priv_size;

sch = kmalloc(size, GFP_KERNEL);
@@ -416,12 +420,6 @@
if (!sch)
goto err_out;

- /* Grrr... Resolve race condition with module unload */
-
- err = -EINVAL;
- if (ops != qdisc_lookup_ops(kind))
- goto err_out;
-
memset(sch, 0, size);

skb_queue_head_init(&sch->q);
@@ -460,6 +458,8 @@
}

err_out:
+ module_put(ops->owner);
+err_module:
*errp = err;
if (sch)
kfree(sch);
diff --exclude-from=diff.exclude -ru linux-2.5.orig/net/sched/sch_atm.c linux-2.5/net/sched/sch_atm.c
--- linux-2.5.orig/net/sched/sch_atm.c Sat Jan 4 16:38:50 2003
+++ linux-2.5/net/sched/sch_atm.c Sat Jan 4 16:51:24 2003
@@ -575,7 +575,6 @@
p->link.ref = 1;
p->link.next = NULL;
tasklet_init(&p->task,sch_atm_dequeue,(unsigned long) sch);
- MOD_INC_USE_COUNT;
return 0;
}

@@ -612,7 +611,6 @@
}
}
tasklet_kill(&p->task);
- MOD_DEC_USE_COUNT;
}


@@ -697,9 +695,10 @@
.destroy = atm_tc_destroy,
.change = NULL,

- .dump = atm_tc_dump
-};
+ .dump = atm_tc_dump,

+ .owner = THIS_MODULE,
+};

#ifdef MODULE
int init_module(void)
@@ -707,9 +706,10 @@
return register_qdisc(&atm_qdisc_ops);
}

-
void cleanup_module(void)
{
unregister_qdisc(&atm_qdisc_ops);
}
#endif
+
+MODULE_LICENSE("GPL");
diff --exclude-from=diff.exclude -ru linux-2.5.orig/net/sched/sch_cbq.c linux-2.5/net/sched/sch_cbq.c
--- linux-2.5.orig/net/sched/sch_cbq.c Sat Jan 4 16:38:50 2003
+++ linux-2.5/net/sched/sch_cbq.c Sat Jan 4 16:51:30 2003
@@ -1411,9 +1411,7 @@

r = RTA_DATA(tb[TCA_CBQ_RATE-1]);

- MOD_INC_USE_COUNT;
if ((q->link.R_tab = qdisc_get_rtab(r, tb[TCA_CBQ_RTAB-1])) == NULL) {
- MOD_DEC_USE_COUNT;
return -EINVAL;
}

@@ -1749,7 +1747,6 @@
}

qdisc_put_rtab(q->link.R_tab);
- MOD_DEC_USE_COUNT;
}

static void cbq_put(struct Qdisc *sch, unsigned long arg)
@@ -2099,6 +2096,8 @@
.change = NULL,

.dump = cbq_dump,
+
+ .owner = THIS_MODULE,
};

#ifdef MODULE
diff --exclude-from=diff.exclude -ru linux-2.5.orig/net/sched/sch_csz.c linux-2.5/net/sched/sch_csz.c
--- linux-2.5.orig/net/sched/sch_csz.c Sat Jan 4 16:38:50 2003
+++ linux-2.5/net/sched/sch_csz.c Sat Jan 4 16:53:35 2003
@@ -749,7 +749,7 @@
static void
csz_destroy(struct Qdisc* sch)
{
- MOD_DEC_USE_COUNT;
+ /* nop */
}

static int csz_init(struct Qdisc *sch, struct rtattr *opt)
@@ -791,7 +791,6 @@
q->wd_timer.data = (unsigned long)sch;
q->wd_timer.function = csz_watchdog;
#endif
- MOD_INC_USE_COUNT;
return 0;
}

@@ -1046,8 +1045,9 @@
.change = NULL,

.dump = csz_dump,
-};

+ .owner = THIS_MODULE,
+};

#ifdef MODULE
int init_module(void)
diff --exclude-from=diff.exclude -ru linux-2.5.orig/net/sched/sch_dsmark.c linux-2.5/net/sched/sch_dsmark.c
--- linux-2.5.orig/net/sched/sch_dsmark.c Sat Jan 4 16:38:51 2003
+++ linux-2.5/net/sched/sch_dsmark.c Sat Jan 4 16:51:43 2003
@@ -352,7 +352,6 @@
if (!(p->q = qdisc_create_dflt(sch->dev, &pfifo_qdisc_ops)))
p->q = &noop_qdisc;
DPRINTK("dsmark_init: qdisc %p\n",&p->q);
- MOD_INC_USE_COUNT;
return 0;
}

@@ -381,7 +380,6 @@
qdisc_destroy(p->q);
p->q = &noop_qdisc;
kfree(p->mask);
- MOD_DEC_USE_COUNT;
}


@@ -466,7 +464,9 @@
.destroy = dsmark_destroy,
.change = NULL,

- .dump = dsmark_dump
+ .dump = dsmark_dump,
+
+ .owner = THIS_MODULE,
};

#ifdef MODULE
@@ -474,7 +474,6 @@
{
return register_qdisc(&dsmark_qdisc_ops);
}
-

void cleanup_module(void)
{
diff --exclude-from=diff.exclude -ru linux-2.5.orig/net/sched/sch_fifo.c linux-2.5/net/sched/sch_fifo.c
--- linux-2.5.orig/net/sched/sch_fifo.c Sat Jan 4 16:38:51 2003
+++ linux-2.5/net/sched/sch_fifo.c Sat Jan 4 16:14:50 2003
@@ -206,4 +206,6 @@
.change = fifo_init,

.dump = fifo_dump,
+
+ .owner = THIS_MODULE,
};
diff --exclude-from=diff.exclude -ru linux-2.5.orig/net/sched/sch_generic.c linux-2.5/net/sched/sch_generic.c
--- linux-2.5.orig/net/sched/sch_generic.c Sat Jan 4 16:38:51 2003
+++ linux-2.5/net/sched/sch_generic.c Sat Jan 4 16:54:10 2003
@@ -29,6 +29,7 @@
#include <linux/skbuff.h>
#include <linux/rtnetlink.h>
#include <linux/init.h>
+#include <linux/module.h>
#include <net/sock.h>
#include <net/pkt_sched.h>

@@ -253,6 +254,8 @@
.enqueue = noop_enqueue,
.dequeue = noop_dequeue,
.requeue = noop_requeue,
+
+ .owner = THIS_MODULE,
};

struct Qdisc noqueue_qdisc =
@@ -356,6 +359,8 @@

.init = pfifo_fast_init,
.reset = pfifo_fast_reset,
+
+ .owner = THIS_MODULE,
};

struct Qdisc * qdisc_create_dflt(struct net_device *dev, struct Qdisc_ops *ops)
@@ -422,6 +427,7 @@
ops->reset(qdisc);
if (ops->destroy)
ops->destroy(qdisc);
+ module_put(ops->owner);
if (!(qdisc->flags&TCQ_F_BUILTIN))
kfree(qdisc);
}
diff --exclude-from=diff.exclude -ru linux-2.5.orig/net/sched/sch_gred.c linux-2.5/net/sched/sch_gred.c
--- linux-2.5.orig/net/sched/sch_gred.c Sat Jan 4 16:38:51 2003
+++ linux-2.5/net/sched/sch_gred.c Sat Jan 4 16:51:58 2003
@@ -348,7 +348,6 @@
table->grio=sopt->grio;
table->initd=0;
/* probably need to clear all the table DP entries as well */
- MOD_INC_USE_COUNT;
return 0;
}

@@ -490,7 +489,6 @@
table->def=sopt->def_DP;
table->grio=sopt->grio;
table->initd=0;
- MOD_INC_USE_COUNT;
return 0;
}

@@ -602,7 +600,6 @@
if (table->tab[i])
kfree(table->tab[i]);
}
- MOD_DEC_USE_COUNT;
}

struct Qdisc_ops gred_qdisc_ops =
@@ -623,8 +620,9 @@
.change = gred_change,

.dump = gred_dump,
+
+ .owner = THIS_MODULE,
};
-

#ifdef MODULE
int init_module(void)
diff --exclude-from=diff.exclude -ru linux-2.5.orig/net/sched/sch_htb.c linux-2.5/net/sched/sch_htb.c
--- linux-2.5.orig/net/sched/sch_htb.c Sat Jan 4 16:38:51 2003
+++ linux-2.5/net/sched/sch_htb.c Sat Jan 4 16:55:04 2003
@@ -1181,7 +1181,6 @@
q->rate2quantum = 1;
q->defcls = gopt->defcls;

- MOD_INC_USE_COUNT;
return 0;
}

@@ -1366,7 +1365,6 @@

htb_destroy_filters(&q->filter_list);
__skb_queue_purge(&q->direct_queue);
- MOD_DEC_USE_COUNT;
}

static int htb_delete(struct Qdisc *sch, unsigned long arg)
@@ -1600,39 +1598,41 @@

static struct Qdisc_class_ops htb_class_ops =
{
- htb_graft,
- htb_leaf,
- htb_get,
- htb_put,
- htb_change_class,
- htb_delete,
- htb_walk,
-
- htb_find_tcf,
- htb_bind_filter,
- htb_unbind_filter,
+ .graft = htb_graft,
+ .leaf = htb_leaf,
+ .get = htb_get,
+ .put = htb_put,
+ .change = htb_change_class,
+ .delete = htb_delete,
+ .walk = htb_walk,
+
+ .tcf_chain = htb_find_tcf,
+ .bind_tcf = htb_bind_filter,
+ .unbind_tcf = htb_unbind_filter,

- htb_dump_class,
+ .dump = htb_dump_class,
};

struct Qdisc_ops htb_qdisc_ops =
{
- NULL,
- &htb_class_ops,
- "htb",
- sizeof(struct htb_sched),
-
- htb_enqueue,
- htb_dequeue,
- htb_requeue,
- htb_drop,
-
- htb_init,
- htb_reset,
- htb_destroy,
- NULL /* htb_change */,
+ .next = NULL,
+ .cl_ops = &htb_class_ops,
+ .id = "htb",
+ .priv_size = sizeof(struct htb_sched),
+
+ .enqueue = htb_enqueue,
+ .dequeue = htb_dequeue,
+ .requeue = htb_requeue,
+ .drop = htb_drop,
+
+ .init = htb_init,
+ .reset = htb_reset,
+ .destroy = htb_destroy,
+ .change = NULL,

- htb_dump,
+ .dump = htb_dump,
+
+ .owner = THIS_MODULE,
};

#ifdef MODULE
diff --exclude-from=diff.exclude -ru linux-2.5.orig/net/sched/sch_ingress.c linux-2.5/net/sched/sch_ingress.c
--- linux-2.5.orig/net/sched/sch_ingress.c Sat Jan 4 16:38:51 2003
+++ linux-2.5/net/sched/sch_ingress.c Sat Jan 4 16:52:49 2003
@@ -257,7 +257,6 @@
memset(p, 0, sizeof(*p));
p->filter_list = NULL;
p->q = &noop_qdisc;
- MOD_INC_USE_COUNT;
return 0;
error:
return -EINVAL;
@@ -304,8 +303,6 @@
qdisc_destroy(p->q);
#endif

- MOD_DEC_USE_COUNT;
-
}


@@ -359,22 +356,20 @@
.change = NULL,

.dump = ingress_dump,
-};

+ .owner = THIS_MODULE,
+};

#ifdef MODULE
int init_module(void)
{
int ret = 0;

- if ((ret = register_qdisc(&ingress_qdisc_ops)) < 0) {
- printk("Unable to register Ingress qdisc\n");
- return ret;
- }
+ if ((ret = register_qdisc(&ingress_qdisc_ops)) < 0)
+ printk(KERN_ERR "Unable to register Ingress qdisc\n");

return ret;
}
-

void cleanup_module(void)
{
diff --exclude-from=diff.exclude -ru linux-2.5.orig/net/sched/sch_prio.c linux-2.5/net/sched/sch_prio.c
--- linux-2.5.orig/net/sched/sch_prio.c Sat Jan 4 16:38:51 2003
+++ linux-2.5/net/sched/sch_prio.c Sat Jan 4 16:52:53 2003
@@ -163,7 +163,6 @@
qdisc_destroy(q->queues[prio]);
q->queues[prio] = &noop_qdisc;
}
- MOD_DEC_USE_COUNT;
}

static int prio_tune(struct Qdisc *sch, struct rtattr *opt)
@@ -227,7 +226,6 @@
if ((err= prio_tune(sch, opt)) != 0)
return err;
}
- MOD_INC_USE_COUNT;
return 0;
}

@@ -399,10 +397,11 @@
.change = prio_tune,

.dump = prio_dump,
+
+ .owner = THIS_MODULE,
};

#ifdef MODULE
-
int init_module(void)
{
return register_qdisc(&prio_qdisc_ops);
@@ -412,6 +411,5 @@
{
unregister_qdisc(&prio_qdisc_ops);
}
-
#endif
MODULE_LICENSE("GPL");
diff --exclude-from=diff.exclude -ru linux-2.5.orig/net/sched/sch_red.c linux-2.5/net/sched/sch_red.c
--- linux-2.5.orig/net/sched/sch_red.c Sat Jan 4 16:38:51 2003
+++ linux-2.5/net/sched/sch_red.c Sat Jan 4 16:52:57 2003
@@ -407,14 +407,7 @@

static int red_init(struct Qdisc* sch, struct rtattr *opt)
{
- int err;
-
- MOD_INC_USE_COUNT;
-
- if ((err = red_change(sch, opt)) != 0) {
- MOD_DEC_USE_COUNT;
- }
- return err;
+ return red_change(sch, opt);
}


@@ -458,7 +451,7 @@

static void red_destroy(struct Qdisc *sch)
{
- MOD_DEC_USE_COUNT;
+ /* nop */
}

struct Qdisc_ops red_qdisc_ops =
@@ -479,8 +472,9 @@
.change = red_change,

.dump = red_dump,
-};

+ .owner = THIS_MODULE,
+};

#ifdef MODULE
int init_module(void)
diff --exclude-from=diff.exclude -ru linux-2.5.orig/net/sched/sch_sfq.c linux-2.5/net/sched/sch_sfq.c
--- linux-2.5.orig/net/sched/sch_sfq.c Sat Jan 4 16:38:51 2003
+++ linux-2.5/net/sched/sch_sfq.c Sat Jan 4 16:53:02 2003
@@ -435,7 +435,6 @@
}
for (i=0; i<SFQ_DEPTH; i++)
sfq_link(q, i);
- MOD_INC_USE_COUNT;
return 0;
}

@@ -443,7 +442,6 @@
{
struct sfq_sched_data *q = (struct sfq_sched_data *)sch->data;
del_timer(&q->perturb_timer);
- MOD_DEC_USE_COUNT;
}

static int sfq_dump(struct Qdisc *sch, struct sk_buff *skb)
@@ -486,6 +484,8 @@
.change = NULL,

.dump = sfq_dump,
+
+ .owner = THIS_MODULE,
};

#ifdef MODULE
diff --exclude-from=diff.exclude -ru linux-2.5.orig/net/sched/sch_tbf.c linux-2.5/net/sched/sch_tbf.c
--- linux-2.5.orig/net/sched/sch_tbf.c Sat Jan 4 16:38:51 2003
+++ linux-2.5/net/sched/sch_tbf.c Sat Jan 4 16:53:06 2003
@@ -330,23 +330,17 @@

static int tbf_init(struct Qdisc* sch, struct rtattr *opt)
{
- int err;
struct tbf_sched_data *q = (struct tbf_sched_data *)sch->data;

if (opt == NULL)
return -EINVAL;

- MOD_INC_USE_COUNT;
-
PSCHED_GET_TIME(q->t_c);
init_timer(&q->wd_timer);
q->wd_timer.function = tbf_watchdog;
q->wd_timer.data = (unsigned long)sch;

- if ((err = tbf_change(sch, opt)) != 0) {
- MOD_DEC_USE_COUNT;
- }
- return err;
+ return tbf_change(sch, opt);
}

static void tbf_destroy(struct Qdisc *sch)
@@ -359,8 +353,6 @@
qdisc_put_rtab(q->P_tab);
if (q->R_tab)
qdisc_put_rtab(q->R_tab);
-
- MOD_DEC_USE_COUNT;
}

static int tbf_dump(struct Qdisc *sch, struct sk_buff *skb)
@@ -409,8 +401,9 @@
.change = tbf_change,

.dump = tbf_dump,
-};

+ .owner = THIS_MODULE,
+};

#ifdef MODULE
int init_module(void)
diff --exclude-from=diff.exclude -ru linux-2.5.orig/net/sched/sch_teql.c linux-2.5/net/sched/sch_teql.c
--- linux-2.5.orig/net/sched/sch_teql.c Sat Jan 4 16:38:51 2003
+++ linux-2.5/net/sched/sch_teql.c Sat Jan 4 16:34:52 2003
@@ -178,7 +178,6 @@
} while ((prev = q) != master->slaves);
}

- MOD_DEC_USE_COUNT;
}

static int teql_qdisc_init(struct Qdisc *sch, struct rtattr *opt)
@@ -223,7 +222,6 @@
m->dev.flags = (m->dev.flags&~FMASK)|(dev->flags&FMASK);
}

- MOD_INC_USE_COUNT;
return 0;
}

@@ -454,8 +452,9 @@
.reset = teql_reset,
.destroy = teql_destroy,
.dump = NULL,
-},};

+ .owner = THIS_MODULE,
+},};

#ifdef MODULE
int init_module(void)
diff --exclude-from=diff.exclude -ru linux-2.5.orig/include/net/pkt_cls.h linux-2.5/include/net/pkt_cls.h
--- linux-2.5.orig/include/net/pkt_cls.h Sat Jan 4 16:38:51 2003
+++ linux-2.5/include/net/pkt_cls.h Fri Jan 3 17:20:03 2003
@@ -3,6 +3,7 @@


#include <linux/pkt_cls.h>
+#include <linux/module.h>

struct rtattr;
struct tcmsg;
@@ -56,6 +57,8 @@

/* rtnetlink specific */
int (*dump)(struct tcf_proto*, unsigned long, struct sk_buff *skb, struct tcmsg*);
+
+ struct module *owner;
};

/* Main classifier routine: scans classifier chain attached
diff --exclude-from=diff.exclude -ru linux-2.5.orig/include/net/pkt_sched.h linux-2.5/include/net/pkt_sched.h
--- linux-2.5.orig/include/net/pkt_sched.h Sat Jan 4 16:38:51 2003
+++ linux-2.5/include/net/pkt_sched.h Thu Jan 2 22:50:57 2003
@@ -10,6 +10,7 @@
#include <linux/config.h>
#include <linux/types.h>
#include <linux/pkt_sched.h>
+#include <linux/module.h>
#include <net/pkt_cls.h>

#ifdef CONFIG_X86_TSC
@@ -67,6 +68,8 @@
int (*change)(struct Qdisc *, struct rtattr *arg);

int (*dump)(struct Qdisc *, struct sk_buff *);
+ /* Protects callbacks */
+ struct module *owner;
};

extern rwlock_t qdisc_tree_lock;


I've tested it a bit and seems to work ok. TEQL queue still
uses MOD_{INC,DEC}_USE_COUNT because it register a device and
register_netdevice doesn't use the new interface yet.

ciao,
Luca
--
Reply-To: [email protected]
Home: http://kronoz.cjb.net
The trouble with computers is that they do what you tell them,
not what you want.
D. Cohen

2003-01-13 22:25:13

by Alexey Kuznetsov

[permalink] [raw]
Subject: Re: [RFC] Migrating net/sched to new module interface

Hello!

> Hmm, I thought the sched stuff all runs under the network brlock? If
> so, it doesn't need to be held in, since it's not preemptible.
>
> I haven't checked, though...

It runs under semaphore.


Which does not matter at all, because the hole

void cleanup_module(void)
{
<an instance is cloned here>
unregister_qdisc(&cbq_qdisc_ops);
}

remained in any case, be it under some preemptive, nonprepemtive lock or
not under a lock at all.

BTW, Rusty, a question... I do not understand, what is purpose of this "new"
module stuff at all? If we still need to query something in module.c to create
each instanse of something it smells exactly "old" broken approach. I just
do not see differences. It is not essential in this particular case,
but it would be funny to ask it each time when creating a tcp socket.

Alexey

2003-01-13 23:15:26

by Max Krasnyansky

[permalink] [raw]
Subject: Re: [RFC] Migrating net/sched to new module interface

At 01:32 AM 1/14/2003 +0300, [email protected] wrote:
>It is not essential in this particular case,
>but it would be funny to ask it each time when creating a tcp socket.

That's what we're going to do btw. I'm cooking up new patch.
See "New module refcounting for net_proto_family" thread.

Max

2003-01-14 17:41:50

by Luca

[permalink] [raw]
Subject: Re: [RFC] Migrating net/sched to new module interface

Il Tue, Jan 14, 2003 at 01:32:56AM +0300, Alexey N. Kuznetsov ha scritto:
> Which does not matter at all, because the hole
>
> void cleanup_module(void)
> {
> <an instance is cloned here>
> unregister_qdisc(&cbq_qdisc_ops);
> }
>
> remained in any case, be it under some preemptive, nonprepemtive lock or
> not under a lock at all.

Hmm, this can't happen now. The try_module_get() will fail if the module
is going away and so qdisc_create won't create a new qdisc.

> BTW, Rusty, a question... I do not understand, what is purpose of this
> "new" module stuff at all?

The idea is to use try_module_get() before using any interface
registered by the module. Every module has a state associated to it: if
the module has been loaded but it is still in its init section its
state will be MODULE_STATE_COMING (and try_module_get() will fail);
if the module is in process of being unloaded its state will be
MODULE_STATE_LIVE (and try_module_get() will fail); otherwise the state
will be MODULE_STATE_LIVE (and try_module_get() will success).

If we use try_module_get() before using the module and module_put() when
we are done everything will work:

int create_foo(foo **new) {
if (try_module_get(foo->owner) == 0)
/* Module is being unloaded, it's not safe to use it */
return -ENOSYS;

*new = foo->new_foo();
}

void destroy_foo(foo *foo) {
foo->finalize();

module_put(foo->owner);
}


This is what I've done for the packet scheduler: try_module_get() is
called *before* cloning the qdisc and if it fails the qdisc won't be
cloned. module_put() is called in qdisc_destroy(), after ->destroy(),
when we are sure that nobody will ever use a reference to the qdisc.

Luca
--
Reply-To: [email protected]
Home: http://kronoz.cjb.net
Non capisco tutta questa eccitazione per il Multitasking:
io sono anni che leggo in bagno.

2003-01-15 00:45:26

by Roman Zippel

[permalink] [raw]
Subject: Re: [RFC] Migrating net/sched to new module interface

Hi,

Kronos wrote:

> The idea is to use try_module_get() before using any interface
> registered by the module. Every module has a state associated to it: if
> the module has been loaded but it is still in its init section its
> state will be MODULE_STATE_COMING (and try_module_get() will fail);
> if the module is in process of being unloaded its state will be
> MODULE_STATE_LIVE (and try_module_get() will fail); otherwise the state
> will be MODULE_STATE_LIVE (and try_module_get() will success).

Rusty had to revert this to the old scheme, as e.g. scsi relied on it.
This means the kernel will oops, if someone gets a reference to a module
and its init function fails.
Above scheme is not without problems either, a failing try_module_get()
suddenly gets a double meaning. Now it also means that the module might
be ready soon, please try again later, what increases the complexity of
the whole module business.
BTW what makes it currently even more complex is that there is also no
synchronization by the module-init-tools done anymore, a
try_module_get()/request_module()/try_module_get() sequence does give no
definitive answer anymore, whether a module is available or not. The
results will be spurious kmod failures.

> If we use try_module_get() before using the module and module_put() when
> we are done everything will work:
>
> int create_foo(foo **new) {
> if (try_module_get(foo->owner) == 0)
> /* Module is being unloaded, it's not safe to use it */
> return -ENOSYS;
>
> *new = foo->new_foo();
> }

It's not that simple, try_module_get() does not everything. It's the
responsibility of the caller to protect the owner pointer, so
try_module_get() itself is only usable within some locking. This means
your patch is still not safe, as the test is done too late.

bye, Roman

2003-01-15 01:11:30

by Alexey Kuznetsov

[permalink] [raw]
Subject: Re: [RFC] Migrating net/sched to new module interface

Hello!

> Above scheme is not without problems either, a failing try_module_get()
> suddenly gets a double meaning. Now it also means that the module might
> be ready soon, please try again later,

... and btw completely useless thing, because each module user already
has its own means to do serialization of this kind even when used _NOT_
as module.

Somewhat overdone.

Alexey

2003-01-15 07:23:07

by Werner Almesberger

[permalink] [raw]
Subject: Re: [RFC] Migrating net/sched to new module interface

[email protected] wrote:
> Somewhat overdone.

I think it would be nice to introduce in 2.7 a shutdowncall
(*) function class for modules that works like exitcall, but
with the following differences:

- does not return before the module has really de-registered
itself everywhere, including synchronization with any
callbacks, etc.
- has a return code, and can fail if it would have to sleep
for a possibly long time

Before calling the shutdown function, all symbols exported by
the module are hidden, and after the shutdown functions returns,
the module can be unloaded.

That way, the module reference count becomes merely advisory
information, and the real locking, synchronization, etc. is
inside the module.

The module unload mechanism could then check whether the module
is old (uses exitcall) or new (uses shutdowncall), and act
accordingly. Modules and the services they use could then be
gradually fixed.

(*) Or maybe use a nicer name :-)

- Werner

--
_________________________________________________________________________
/ Werner Almesberger, Buenos Aires, Argentina [email protected] /
/_http://www.almesberger.net/____________________________________________/

2003-01-15 08:15:50

by Rusty Russell

[permalink] [raw]
Subject: Re: [RFC] Migrating net/sched to new module interface

In message <[email protected]> you write:
> [email protected] wrote:
> > Somewhat overdone.
>
> I think it would be nice to introduce in 2.7 a shutdowncall
> (*) function class for modules that works like exitcall, but
> with the following differences:
>
> - does not return before the module has really de-registered
> itself everywhere, including synchronization with any
> callbacks, etc.
> - has a return code, and can fail if it would have to sleep
> for a possibly long time
>
> Before calling the shutdown function, all symbols exported by
> the module are hidden, and after the shutdown functions returns,
> the module can be unloaded.

This already happens. This is why all accesses to the module are
protected by try_module_get().

I've analyzed dozens of "here's my implementation idea" mails over the
last two years. Here's the executive summary:

1) It's simply not a good idea to force 1600 modules to change, no
matter what timescale. And changing it in a way that is *more*,
not *less* complex is even worse.

2) It's bad enough to force the interfaces to change: at least the
primitive they are to use is one many of them are already using,
and is very simple to understand.

Rusty.
PS. The *implementation* flaw in your scheme: someone starts using a
module as you try to deregister it. Either you re-register the
module (ie. you can never unload security modules), or you leave
it half unloaded (even worse).
--
Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

2003-01-15 09:25:09

by Werner Almesberger

[permalink] [raw]
Subject: Re: [RFC] Migrating net/sched to new module interface

Rusty Russell wrote:
> 1) It's simply not a good idea to force 1600 modules to change, no
> matter what timescale.

That's the part that I don't believe. The kernel interfaces
aren't static. Locking rules have changed several times, the
wait/sleep interface has changed, the concept of a module
owner has been added, various other interfaces have changed.

So a module will eventually have to be maintained. And the
more important a module, the more timely this will be done.
(Well, or so everybody wishes :-) Unmaintained modules will
eventually just fail to work or even compile for other reasons,
at which point they cease to be a problem as far as the
loading/unloading mechanism is concerned.

I agree that it's a bad idea to force changes. That's why
there should be a reasonably long period where the current
hack and a clean mechanism are available in parallel.

> And changing it in a way that is *more*,
> not *less* complex is even worse.

The implementation may be more complex, but the change I'm
suggesting would greatly simplify the rules: no endless set
of voodoo rites, but one simple rule: "the shutdowncall
function either does nothing, and returns failure, or it
returns success, and completely de-registers everything it
has previously registered".

> PS. The *implementation* flaw in your scheme: someone starts using a
> module as you try to deregister it.

How could they ? The symbols are hidden, so that way is
blocked. If another module has registered callbacks using
symbols obtained from that module, that other module needs
to be unloaded first anyway, so these references are gone
too.

If the static kernel accesses a module by resolving symbols
(except for the well-controlled operations of the module
loader itself), yes, then that module would become
un-unloadable. I'm not sure this is much of an issue.

If a callback comes in at the wrong moment, the module can
choose to de-register and wait until the subsystem has
finished any callbacks, or detect that it's about to
shut itself down, and fail the callback. The point is that
all the locking is now under control of the module, and
not scattered all over kernel+module(s).

> (ie. you can never unload security modules),

Hmm, what makes security modules (what exactly do you mean
by that ?) special ? In any case, a module that's truly
unloadable would simply return failure from its
shutdowncall.

> or you leave it half unloaded (even worse).

There's always the choice of just sleeping through any
inconvenient callbacks. In some cases, this is perfectly
acceptable, because the callback won't keep the module
busy for a long time anyway (interrupts, timers, tasklets,
etc.). In other cases (e.g. "open" functions), a callback
could keep it busy forever.

In that case, the module needs to maintain its own usage
count and have some flag that indicates that it's shutting
down. So the callback would look like this:

static int foo_open(...)
{
atomic_inc(&busy);
if (shutting_down) {
atomic_dec(&busy);
return -EGONEFISHING;
}
...
}

static int foo_shutdown(...)
{
shutting_down = 1;
if (atomic_read(&busy)) {
shutting_down = 0;
return -EBUSY;
}
deregister_whatever(&myself);
/* deregister and synchronize */
...
}

"busy" could of course just be the module use count.

There is the race condition that the module could briefly
disappear (i.e. "foo_open" fails), and then come back, because
it turned out to be or appear to be busy. But I think,
considering that we're actually trying to make it go away, this
is acceptable behaviour. You have the race conditions "is busy"
vs. "can unload" and "can use" vs. "has been unloaded" anyway.

This can be greatly simplified if we have a
try_deregister_whatever that just returns an error if it would
have to sleep, i.e. if the synchronization is pushed into the
service.

Of course, if the "whatever" subsystem will make callbacks
after returning from deregister_whatever, this doesn't work,
and the module has to use the old mechanisms. But that's really
a bug in the "whatever" subsystem.

The problem I see with the current module interface is that it
just tries to hack the current mess into a less buggy state,
but doesn't provide much of an incentive for actually cleaning
up the interfaces.

Avoiding the bugs is good, but we should also work towards a
clean long-term solution.

- Werner

--
_________________________________________________________________________
/ Werner Almesberger, Buenos Aires, Argentina [email protected] /
/_http://www.almesberger.net/____________________________________________/

2003-01-15 17:48:27

by Roman Zippel

[permalink] [raw]
Subject: Re: [RFC] Migrating net/sched to new module interface

Hi,

Rusty Russell wrote:

> 2) It's bad enough to force the interfaces to change: at least the
> primitive they are to use is one many of them are already using,
> and is very simple to understand.

They are indeed simple, but only because it's impossible to implement
anything more complex.
An example: A "rmmod -w loop" will currently deadlock on a busy loop
module. Could you please explain, how it will be possible to force a
safe removal of the loop module?

> PS. The *implementation* flaw in your scheme: someone starts using a
> module as you try to deregister it. Either you re-register the
> module (ie. you can never unload security modules), or you leave
> it half unloaded (even worse).

What is the problem with a half unloaded module? Only the module knows
which interfaces it can safely remove to prevent new users, afterwards
it only has to wait for remaining user to leave to complete the cleanup.
BTW this also solves nicely the module init race.

bye, Roman

2003-01-16 01:22:32

by Rusty Russell

[permalink] [raw]
Subject: Re: [RFC] Migrating net/sched to new module interface

In message <[email protected]> you write:
> Rusty Russell wrote:
> > 1) It's simply not a good idea to force 1600 modules to change, no
> > matter what timescale.
>
> That's the part that I don't believe. The kernel interfaces
> aren't static. Locking rules have changed several times, the
> wait/sleep interface has changed, the concept of a module
> owner has been added, various other interfaces have changed.

Deprecating every module, and rewriting their initialization routines
is ambitious beyond the scale of anything you have mentioned. Not
that 90% of the kernel code couldn't use a damn good spring cleaning,
but I'm not prepared to make such a change personally.

And remember why we're doing it: for a fairly obscure race condition.

> > And changing it in a way that is *more*,
> > not *less* complex is even worse.
>
> The implementation may be more complex, but the change I'm
> suggesting would greatly simplify the rules: no endless set
> of voodoo rites, but one simple rule: "the shutdowncall
> function either does nothing, and returns failure, or it
> returns success, and completely de-registers everything it
> has previously registered".

So we go from:

int init(void)
{
if (!register_foo(&foo))
return -err;
if (!register_bar(&bar)) {
unregister_foo(&foo);
return -err;
}
return 0;
}

void fini(void)
{
unregister_foo(&foo);
unregister_bar(&bar);
}

to:

int fini(void)
{
if (!unregister_foo(&foo))
return -err;
if (!unregister_bar(&bar)) {
if (!register_foo(&foo))
????
return -err;
}
return 0;
}

> > PS. The *implementation* flaw in your scheme: someone starts using a
> > module as you try to deregister it.
>
> If a callback comes in at the wrong moment, the module can
> choose to de-register and wait until the subsystem has
> finished any callbacks, or detect that it's about to
> shut itself down, and fail the callback. The point is that
> all the locking is now under control of the module, and
> not scattered all over kernel+module(s).

Something like this?

static int i_am_live;
static spinlock_t my_lock = SPIN_LOCK_UNLOCKED;

/* This is our registered function. */
static int foo_function(void *somedata)
{
int live;

spin_lock(&my_lock);
live = i_am_live;
spin_unlock(&my_lock);
if (!live)
return -EIGNOREME???;
...
}

int fini(void)
{
spin_lock(&my_lock);
i_am_live = 0;
spin_unlock(&my_lock);

if (!unregister_foo(&foo)) {
spin_lock(&my_lock);
i_am_live = 1;
spin_unlock(&my_lock);
return -err;
}
if (!unregister_bar(&bar)) {
if (!register_foo(&foo))
????
spin_lock(&my_lock);
i_am_live = 1;
spin_unlock(&my_lock);
return -err;
}
return 0;
}

Now, if someone tries to remove a module, but it's busy, you get a
window of spurious failure, even though the module isn't actually
removed. Secondly, there is often no way of returning a value which
says "I'm going away, act as if I'm not here": only the level above
can sanely know what it would do if this were not found.

> > (ie. you can never unload security modules),
>
> Hmm, what makes security modules (what exactly do you mean
> by that ?) special ? In any case, a module that's truly
> unloadable would simply return failure from its
> shutdowncall.

On a busy system, they're never not being used. Your unload routine
would always fail. Same with netfilter modules.

> > or you leave it half unloaded (even worse).
>
> There's always the choice of just sleeping through any
> inconvenient callbacks. In some cases, this is perfectly
> acceptable, because the callback won't keep the module
> busy for a long time anyway (interrupts, timers, tasklets,
> etc.). In other cases (e.g. "open" functions), a callback
> could keep it busy forever.

Exactly. It's kept there forever, while the module is useless.

> The problem I see with the current module interface is that it
> just tries to hack the current mess into a less buggy state,
> but doesn't provide much of an incentive for actually cleaning
> up the interfaces.
>
> Avoiding the bugs is good, but we should also work towards a
> clean long-term solution.

The current scheme is clean: it's two-stage delete with a nice helper
function "try_module_get()" which tells you when it's going away, and
no requirement that modules actually implement two-stage delete
themselves. The patch to mirror this in two-stage init was posted
yesterday, as well.

It also puts the (minimal) burden in the right place: in the interface
coder's lap, not the interface user's lap.

Unfortunately, I don't have the patience to explain this once for
every kernel developer.

Rusty.
--
Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

2003-01-16 02:34:15

by Werner Almesberger

[permalink] [raw]
Subject: Re: [RFC] Migrating net/sched to new module interface

Rusty Russell wrote:
> Deprecating every module, and rewriting their initialization routines
> is ambitious beyond the scale of anything you have mentioned.

Well, it has happened before, e.g. sleep_on is now deprecated,
cli() doesn't give the comprehensive protection it used to,
holding spinlocks while sleeping used to be frowned upon, but
it's only recently that it moved to being forbidden, etc.

And the people making the new rules didn't go and fix every
module in existence. You may get this kind of responsibility
only if you actually break the old interface, but that's not
what I'm suggesting to do.

> And remember why we're doing it: for a fairly obscure race condition.

No, I want to do this to fix the reason for the fix for the
obscure race condition :-)

Also, all this smells of a fundamental design problem: modules
aren't the only things that can become unavailable. So why
construct a special mechanism that only applies to modules ?

> So we go from:

Yes. Note that the problem case only occurs if you have more than
one registration that could block you for a long time. Otherwise,
you just take that one first, and sleep on all the others.

Otherwise, things get messy, I admit. But that's not a new problem
either, e.g.

> int init(void)
> {
> if (!register_foo(&foo))
> return -err;
> if (!register_bar(&bar)) {
> unregister_foo(&foo);
> return -err;
> }
> return 0;
> }

What if unregister_foo fails, because somebody already started
to use the foo interface ? If "block until it works" is a valid
answer, you could probably use this also for the unload case.

This would actually work even with the current interface: just
make the module exit function sleep until all references are
gone. This means than rmmod might take a looong time, but is
this really a problem ? If a module wishes to accelerate its
demise, it can always add errors exits. If the interface doesn't
provide error exits, this is again a general problem, and
actually one of the interface.

If there's a really nasty case, where you absolutely can't
afford to sleep, you need to change the service to split
"deregister" into:

- prepare_deregister (like "deregister", but reversible)
- commit_deregister
- undo_deregister

But there probably aren't many cases where you'd really need
this.

> Something like this?

Yes, for instance. Or use atomic operations.

> Now, if someone tries to remove a module, but it's busy, you get a
> window of spurious failure, even though the module isn't actually
> removed.

Correct. But does this matter ? After all, we were prepared to
get failures if the removal succeeds anyway. So this is not a
new race condition.

> Secondly, there is often no way of returning a value which
> says "I'm going away, act as if I'm not here": only the level above
> can sanely know what it would do if this were not found.

Okay, my approach usually puts the responsibility of finally giving
up in user space. If user space just ignores the error, and keeps
the thing open anyway, you're in for a long wait. But that's not
different from opening some device and never releasing it. In
either case, your module becomes un-unloadable, unless you kill
the user-space hog. Again, not a new problem.

> On a busy system, they're never not being used. Your unload routine
> would always fail. Same with netfilter modules.

But they're only active for a short time, so the deregistration
could just sleep.

> It also puts the (minimal) burden in the right place: in the interface
> coder's lap, not the interface user's lap.

Well, both need to cooperate. And I don't see why the interface
coder should have to know whether its users are modules or not,
while the interface user, who is implicitly very aware if he's
a module or not, shouldn't.

Also note that this isn't just a module problem. Instead of
tripping over code that all of a sudden isn't there, one may
well trip over a data structure that has just been removed. In
either case, you need to get the synchronization right. What
I'm proposing is simply to make those two cases more similar,
such that handling one case correctly also handles the other
one.

Example:

struct foo *my_data;

int foo_callback()
{
do_something(my_data->blah);
...
}

void foo_exit()
{
deregister(...);
kfree(my_data);
...
}

So if "deregister" allows callbacks after it returns, you'll
still end up getting your oops every once in a while.

Now make this a non-module, and if foo_exit or some equivalent
can be called for some other reason (e.g. power-down,
hotplugging, etc.), you a) still have the same problem, and b)
all the miracle protection of try_module_get has vanished. So
you have to do the synchronization twice, i.e. on both sides
of the interface.

> Unfortunately, I don't have the patience to explain this once for
> every kernel developer.

Sorry for being so persistent. But I really think the module
situation is rapidly approaching the point where just fixing
the next bug isn't good enough, but where we need to get back
to the drawing board, look at the larger picture, and then
work towards a cleaner solution.

Also, although the task may seem daunting, don't forget that
even the BLK wasn't removed in a day :-)

- Werner

--
_________________________________________________________________________
/ Werner Almesberger, Buenos Aires, Argentina [email protected] /
/_http://www.almesberger.net/____________________________________________/

2003-01-16 03:24:48

by Rusty Russell

[permalink] [raw]
Subject: Re: [RFC] Migrating net/sched to new module interface

In message <[email protected]> you write:
> > And remember why we're doing it: for a fairly obscure race condition.
>
> No, I want to do this to fix the reason for the fix for the
> obscure race condition :-)

Semantics.

> Also, all this smells of a fundamental design problem: modules
> aren't the only things that can become unavailable. So why
> construct a special mechanism that only applies to modules ?

NO NO NO. Listen *carefully*.

The ONLY time that FUNCTIONS vanish is when MODULES get UNLOADED (or
fail to LOAD).

So you're suggesting we should lock ALL functions the way we lock all
other datastructures. I look forward to your compiler patch.

I've explained this multiple times. If you're not convinced, fine.

Rusty.
--
Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

2003-01-16 03:40:44

by David Miller

[permalink] [raw]
Subject: Re: [RFC] Migrating net/sched to new module interface

On Wed, 2003-01-15 at 19:31, Rusty Russell wrote:
> The ONLY time that FUNCTIONS vanish is when MODULES get UNLOADED (or
> fail to LOAD).

I totally agree with Rusty. If you don't understand this fundamental
difference between module unloading vs. arbitrary kernel objects
going away, then you really need to apply some gray matter to it
before you continue in this conversation :)

2003-01-16 04:11:52

by Werner Almesberger

[permalink] [raw]
Subject: Re: [RFC] Migrating net/sched to new module interface

Rusty Russell wrote:
> The ONLY time that FUNCTIONS vanish is when MODULES get UNLOADED (or
> fail to LOAD).

Yes, of course. That's not what I mean. (See below.)

> So you're suggesting we should lock ALL functions the way we lock all
> other datastructures. I look forward to your compiler patch.

No, it's not locking for concurrent access, but locking for
access vs. removal.

If a function pointer gets disseminated without any way for
the caller to revoke it, then try_module_get is indeed the only
possible solution. But I think this is the rare exception.

The usual way for disseminating function pointers seems to be
through registration at some service interface, or through a
callback. Do you agree with this ?

Service interfaces normally also have a deregistration
function, and callbacks are either synchronous, so you have
to wait for them to complete, but you're busy anyway, or
asynchronous, usually with a way to cancel them.

Now, if the service insists on calling back even after
deregistration or cancellation, with no means for making
sure that at some point no further callback will happen, I'd
consider this a bug of the service - a bug that should be
fixed. (If there's a way to deregister with and without
synchronization, that's fine, of course.) Do you agree with
this ?

If you agree so far, then I hope you'll also agree that a
module can always be safely unloaded, without try_module_get,
if its cleanup function just deregisters/cancels and
synchronizes all the service registrations/callbacks the
module made. Plain and simple.

Of course, this means that module can spend an unbounded amount
of time in its cleanup function.

Do you agree so far ?

Now, if we accept that there may be the requirement that we
can't always wait for deregistration/cancellation to complete,
this leads to the more complex scenario we've discussed earlier
in this thread.

- Werner

--
_________________________________________________________________________
/ Werner Almesberger, Buenos Aires, Argentina [email protected] /
/_http://www.almesberger.net/____________________________________________/

2003-01-16 04:40:46

by Werner Almesberger

[permalink] [raw]
Subject: Re: [RFC] Migrating net/sched to new module interface

David S. Miller wrote:
> I totally agree with Rusty. If you don't understand this fundamental
> difference between module unloading vs. arbitrary kernel objects
> going away,

I think I understand that part. What I'm saying is that any
interface that will still call you after deregistration will
also cause problems with normal data accesses, even if no
modules are involved.

So, if interfaces with this kind of bug are fixed, all of a
sudden the second (1) synchronization mechanism for module
unloads - returning from the cleanup function (2) - becomes
a viable alternative to try_module_get.

(1) The first mechanism being the use count.
(2) Or, in the case of initialization failure, returning
from the init function.

And that's what I think we should build upon. Rusty doesn't
see things this way, and I'd like to find out where exactly
we disagree.

- Werner

--
_________________________________________________________________________
/ Werner Almesberger, Buenos Aires, Argentina [email protected] /
/_http://www.almesberger.net/____________________________________________/

2003-01-16 14:04:09

by Roman Zippel

[permalink] [raw]
Subject: Re: [RFC] Migrating net/sched to new module interface

Hi,

Rusty Russell wrote:

> Deprecating every module, and rewriting their initialization routines
> is ambitious beyond the scale of anything you have mentioned. Not
> that 90% of the kernel code couldn't use a damn good spring cleaning,
> but I'm not prepared to make such a change personally.

The transition would be rather painless:

if (mod->new_exit) {
res = -EBUSY;
if (force || !mod_use_count(mod))
res = mod->new_exit();
} else {
res = mod_try_exit();
if (!res && mod->old_exit) {
mod->old_exit();
}

Yes, there is a small race, but at least it's nonfatal, as the module
has to repeat the module count test or is protected otherwise.
Now we can start fixing interfaces and exit functions can still call
mod_try_exit() before calling old style interfaces.

> So we go from:
>
> int init(void)
> {
> if (!register_foo(&foo))
> return -err;
> if (!register_bar(&bar)) {
> unregister_foo(&foo);
> return -err;
> }
> return 0;
> }
>
> void fini(void)
> {
> unregister_foo(&foo);
> unregister_bar(&bar);
> }
>
> to:

int init(void)
{
if (register_foo(&foo))
return -err;
if (register_bar(&bar))
return -err;
return 0;
}

void fini(void)
{
if (unregister_bar(&bar))
return -err;
if (unregister_foo(&foo))
return -err;
return 0;
}

Hmm, looks as simple if not simpler to me.

> Something like this?
>
> static int i_am_live;
> static spinlock_t my_lock = SPIN_LOCK_UNLOCKED;
>
> /* This is our registered function. */
> static int foo_function(void *somedata)
> {
> int live;
>
> spin_lock(&my_lock);
> live = i_am_live;
> spin_unlock(&my_lock);
> if (!live)
> return -EIGNOREME???;
> ...
> }

If the exit function starts the shutdown, there is no going back
anymore. Let's take loop.c as example. It can either deconfigure the
devices itself (via loop_clr_fd()) or it let's the user do it and only
removes unused devices and prevents any new loop_set_fd() to succeed. In
any case there will be no deadlock.

> > Hmm, what makes security modules (what exactly do you mean
> > by that ?) special ?
>
> On a busy system, they're never not being used. Your unload routine
> would always fail. Same with netfilter modules.

int unregister(foo)
{
lock();
list_del_init(&foo->list);
unlock();
return foo->usecount ? -EBUSY : 0;
}

This makes sure no new user will start using it and at some point it has
to return zero.

> The current scheme is clean: it's two-stage delete with a nice helper
> function "try_module_get()" which tells you when it's going away, and
> no requirement that modules actually implement two-stage delete
> themselves. The patch to mirror this in two-stage init was posted
> yesterday, as well.

The current scheme is the reserve/activate scheme in disguise and it
sucks as much.

> Unfortunately, I don't have the patience to explain this once for
> every kernel developer.

Well, I'd be happy to see an implementation that not only works for the
most simplest cases.

bye, Roman


2003-01-16 18:33:17

by Roman Zippel

[permalink] [raw]
Subject: Re: [RFC] Migrating net/sched to new module interface

Hi,

Werner Almesberger wrote:

> If there's a really nasty case, where you absolutely can't
> afford to sleep, you need to change the service to split
> "deregister" into:
>
> - prepare_deregister (like "deregister", but reversible)
> - commit_deregister
> - undo_deregister

You can simplify this. All you need are the following simple functions:

- void register();
- void unregister();
- int is_registered();
- void inc_usecount();
- void dec_usecount();
- int get_usecount();

It's important to understand that the registered state and the usecount
are completely independent. As soon as the object is unregistered and
the usecount is zero, the object can be freed, but it doesn't matter in
which order it happens.
The problem is now that we are very limited how we can use these
functions. We can only unregister an object after the usecount became
zero, although it's also possible to first unregister the object and
then wait for the usecount. Only when we can do the latter is it
possible to safely force the removal of the object.

bye, Roman

2003-01-16 18:33:23

by Roman Zippel

[permalink] [raw]
Subject: Re: [RFC] Migrating net/sched to new module interface

Hi,

"David S. Miller" wrote:

> On Wed, 2003-01-15 at 19:31, Rusty Russell wrote:
> > The ONLY time that FUNCTIONS vanish is when MODULES get UNLOADED (or
> > fail to LOAD).
>
> I totally agree with Rusty. If you don't understand this fundamental
> difference between module unloading vs. arbitrary kernel objects
> going away, then you really need to apply some gray matter to it
> before you continue in this conversation :)

Above is not really not wrong, but there is still no fundamental
difference to other kernel objects. We will get this wrong as long as we
see functions as some kind of very special case, this is simply not
true.
Module functions are nothing else than readonly data structures, which
are allocated and initialized in module.c, by calling init_module() the
ownership is transferred to the module. How the ownership is transferred
back is now (hopefully) the point of the discussion. The only thing
special here is that the data possibly needs to be flushed out of the
data cache, before it can be used as a function. Besides of this there
is not a single technical difference to other reference counted data
structures, just the implementations differ.
Functions are not the only thing that vanish when modules get unloaded.
It's a lot more common to register functions via data structures and it
would be very foolish to think that only the functions need protecting.
In the first place one has to get access to the data structure, only
then the functions can be safely used. Again, how these data structures
are allocated doesn't matter, but at any time we have to know, who owns
these data structures, so that we can safely remove and deallocate them.

bye, Roman


2003-01-16 18:51:27

by Werner Almesberger

[permalink] [raw]
Subject: Re: [RFC] Migrating net/sched to new module interface

Roman Zippel wrote:
> > - prepare_deregister (like "deregister", but reversible)
> > - commit_deregister
> > - undo_deregister
>
> You can simplify this. All you need are the following simple functions:
>
> - void register();
> - void unregister();
> - int is_registered();
> - void inc_usecount();
> - void dec_usecount();
> - int get_usecount();

I'm not sure if you, you're not changing the semantics. What I was
describing was a non-blocking interface, e.g.

if (!prepare_deregister(foo))
return -E...;
if (!prepare_deregister(bar)) {
undo_deregister(foo);
return -E...;
}
commit_deregister(foo);
commit_deregister(bar);
return 0;

With your interface, you're not guaranteed that you can re-register.
Well, you could externalize this, e.g. with

int error = 0;

inc_usecount(foo);
inc_usecount(bar);
unregister(foo); /* does nothing irrevokable since use count > 0 */
unregister(bar);
if (get_usecount(foo) > 1 || get_usecount(bar) > 1) {
register(foo); /* re-registers foo */
register(bar); /* re-registers bar */
error = E...;
}
dec_usecount(foo);
dec_usecount(bar);
return error;

Is that what you had in mind ?

- Werner

--
_________________________________________________________________________
/ Werner Almesberger, Buenos Aires, Argentina [email protected] /
/_http://www.almesberger.net/____________________________________________/

2003-01-17 00:17:59

by Roman Zippel

[permalink] [raw]
Subject: Re: [RFC] Migrating net/sched to new module interface

Hi,

Werner Almesberger wrote:

> > You can simplify this. All you need are the following simple functions:
> >
> > - void register();
> > - void unregister();
> > - int is_registered();
> > - void inc_usecount();
> > - void dec_usecount();
> > - int get_usecount();
>
> I'm not sure if you, you're not changing the semantics.

See above functions as primitives, which one can use to build any other
resource management you want.
The module code does the get_usecount() test for every driver and if it
was zero it disables inc_usecount() and only then the driver is allowed
to unregister(). The module code has to add another is_active state for
this and is so actually adding more complexity to it.

> What I was
> describing was a non-blocking interface, e.g.
>
> if (!prepare_deregister(foo))
> return -E...;
> if (!prepare_deregister(bar)) {
> undo_deregister(foo);
> return -E...;
> }
> commit_deregister(foo);
> commit_deregister(bar);
> return 0;

You are making it too complex, as you probably need an is_active state
as well, it would be just per interface instead of global.
If you really want to reduce complexity, all you can do, is to get rid
of the is_active state. For the majority of drivers such state isn't
needed at all, but it's currently forced on all drivers.
So to keep things simple, we should just finish shutting down the
driver, as soon as we started with it and don't restart or undo
anything. To avoid the sleeping we can also simply return -EBUSY and
continue later, so the cleanup could look like this:

if (!ptr)
return 0;
if (is_registered())
unregister();
if (get_usecount())
return -EBUSY;
release();
return 0;

This way the caller just needs to do the following, which can be called
as often as necessary:

if (shutdown(ptr))
return -EBUSY;
ptr = NULL;

There is really no need to introduce extra states to make things more
complex.

bye, Roman


2003-01-17 00:55:56

by Greg KH

[permalink] [raw]
Subject: Re: [RFC] Migrating net/sched to new module interface

On Fri, Jan 17, 2003 at 12:53:12AM +0100, Roman Zippel wrote:
> Hi,
>
> Werner Almesberger wrote:
>
> > > You can simplify this. All you need are the following simple functions:
> > >
> > > - void register();
> > > - void unregister();
> > > - int is_registered();
> > > - void inc_usecount();
> > > - void dec_usecount();
> > > - int get_usecount();
> >
> > I'm not sure if you, you're not changing the semantics.
>
> See above functions as primitives, which one can use to build any other
> resource management you want.

That same functionality is already present in a kobj for you to use,
instead of rolling your own resource management code. :)

(sorry, I couldn't help myself...)

greg k-h

2003-01-17 02:19:42

by Rusty Russell

[permalink] [raw]
Subject: Re: [RFC] Migrating net/sched to new module interface

In message <[email protected]> you write:
> Rusty Russell wrote:
> > Deprecating every module, and rewriting their initialization routines
> > is ambitious beyond the scale of anything you have mentioned.
>
> Well, it has happened before, e.g. sleep_on is now deprecated,
> cli() doesn't give the comprehensive protection it used to,

They gave us SMP. What do we gain for your change?

> holding spinlocks while sleeping used to be frowned upon, but
> it's only recently that it moved to being forbidden, etc.

No, it's always been forbidden, it's only recently we detect it.

But apologies for the tone of my previous mail: it seems I'm
oversensitive to criticism of the module stuff now 8(

To go someway towards an explanation, at least, I humbly submit a
fairly complete description of the approach used (assuming the module
init race fix patch gets merged).

Rusty.
--
Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

================
Rusty's Unreliable Guide To The 2.5 Module Locking Implementation (draft)
aka. All Locking and No Play Make Rusty Go Crazy

Any object in the kernel which might disappear, needs to have some
form of locking; we're very familiar with locking data structures in
the kernel. Modules are no exception, except that in this case it's
the functions themselves need locking.

The main difference is that we are not prepared to pay the price for
locking every function we call: the vast majority of speed-critical
functions are not in modules at all, and for many others, the call is
internal to the same module. So any locking solution must be
lightweight, and (or course) vanish if CONFIG_MODULES is not set.

Now, we don't need to lock unless we are going across a module
boundary (presumbably if you are already in the module, the locking is
sorted out already), and it's easy to ensure that you don't need to
lock for a normal function call, since the module loader can make the
caller statically depend on the callee: if module B calls "foo", which
is in module A, the loader knows that B refers to "foo" when it loads
B, so simply enforces that B uses A, and A cannot be unloaded before
B.

The cases which remain, however, are function calls through pointers:
such strategy functions are used widely, and naturally need some form
of protection against going away. Since the advent of SMP, and
preemption, it is extremely difficult for such functions to protect
themselves.

Two Stage Delete
================

So let's apply standard locking techniques. A standard approach to
this for data (eg networking packets) involves two-stage delete: we
keep a reference count in the object, which is usually 1 + number of
users, and to delete it we mark it deleted so noone new can use it,
and drop the reference count. The last user, who drops the reference
count to zero, actually does the free.

This is called two-stage delete: Alexey Kuznetsov's reason why IPv4
was not a module was the lack of two-stage delete support for modules.

The usual and logical place for the deleted flag and reference count
is in the structure being protected: in this case, that's the module
itself. You can, instead, have a flag and/or reference count in every
object used by the module, but it's not quite the same thing, and
deactivating them all atomically is impossible, which introduces its
own set of problems.

The main problem advantage of a single flag which says whether the
module is active or not, is that every module would need to add a
function which deactivated it. There are 1600 modules in the kernel,
and they work fine when built into the kernel: such a change merely
for the module case seems overly disruptive. Also, there is the other
case to consider: modules can disappear because they failed to
initialize. The same "deleted" flag can be used to isolate modules
during initialization: otherwise each module would need to implement
two-stage initialization as well.

Finally, the try_module_get() primitive (which combines the deleted
flag test with the reference count increment in one convenient form)
was already fairly widely used by the filesystem code and others,
where it was called "try_mod_inc_use_count()". Most people seemed to
have little problem using the primitive correctly.

Corner Cases And Optimizations
==============================

Some interfaces need to do more than simply flip a flag on activation:
they might want to fire off a hotplug event, for example, or scan
partition tables. So a notifier chain is supplied for them to do
exactly this (in the "Proposed Module Init Race Fix" patch). Modules
are free to "roll their own" two-stage init using module_make_live()
if they want, too.

One of the critical things when designing this, was that getting
references to modules had to be fast, and it didn't matter if removing
modules was relatively slow. The logical implementation of
try_module_get() looks like:

static int try_module_get(struct module *mod)
{
int ret;
unsigned long flags;

read_lock_irqsave(&module_lock, flags);
if (mod->deleted)
ret = 0;
else {
ret = 1;
atomic_inc(&mod->use);
}
read_unlock_irqrestore(&module_lock, flags);
return ret;
}

static int module_put(struct module *mod)
{
if (atomic_dec_and_test(&mod->use))
if (mod->deleted)
wake_up(mod->whoever_is_waiting);
}

And module unloading would look like:

...
write_lock_irq(&module_lock);
if (atomic_read(&mod->use) != 0)
ret = -EBUSY;
else {
mod->deleted = 1;
ret = 0;
}
write_unlock_irq(&module_lock);

But we can do better than this. The try_module_get contains interrupt
saving code (even though it's often unneccessary), a spin lock and an
atomic operation. Worse, on SMP the spinlock and atomic variable will
have to bounce from one CPU to another. The answer is to use an
atomic counter per CPU, and a "bogolock", which looks (conceptually)
like this:

void bogo_read_lock(void)
{
preempt_disable();
}

void bogo_read_unlock(void)
{
preempt_enable();
}

void bogo_write_lock(void)
{
run thread on every other CPU
tell them all to stop interrupts
stop interrupts locally
}

void bogo_write_unlock(void)
{
tell threads to unblock interrupts and exit
restore interrupts locally
}

So the real try_module_get and module_put look like:

static inline int try_module_get(struct module *module)
{
int ret = 1;

if (module) {
unsigned int cpu = get_cpu(); /* Disables preemption */
if (likely(module_is_live(module)))
local_inc(&module->ref[cpu].count);
else
ret = 0;
put_cpu();
}
return ret;
}

static inline void module_put(struct module *module)
{
if (module) {
unsigned int cpu = get_cpu();
local_dec(&module->ref[cpu].count);
/* Maybe they're waiting for us to drop reference? */
if (unlikely(!module_is_live(module)))
wake_up_process(module->waiter);
put_cpu();
}
}

And the remove code looks like:

ret = stop_refcounts(); /* bogo_write_lock */
if (ret != 0)
goto out;

/* If it's not unused, quit unless we are told to block. */
if ((flags & O_NONBLOCK) && module_refcount(mod) != 0) {
forced = try_force(flags);
if (!forced)
ret = -EWOULDBLOCK;
} else {
mod->waiter = current;
mod->state = MODULE_STATE_GOING;
}
restart_refcounts(); /* bogo)write_unlock */

Cheers,
Rusty.

2003-01-17 22:10:12

by Roman Zippel

[permalink] [raw]
Subject: Re: [RFC] Migrating net/sched to new module interface

Hi,

Rusty Russell wrote:

> Two Stage Delete
> ================
>
> So let's apply standard locking techniques. A standard approach to
> this for data (eg networking packets) involves two-stage delete: we
> keep a reference count in the object, which is usually 1 + number of
> users, and to delete it we mark it deleted so noone new can use it,
> and drop the reference count. The last user, who drops the reference
> count to zero, actually does the free.

Rusty, could you please show me where you found the deleted flag in
network packets? Where did you find that it's required to test the
deleted flag before you can get a reference to the object?
Actually it's far more common to just delete the object when all
references are gone without any need for a deleted flag. Why does
everyone else get away with a "single stage delete"? Where is the
deleted flag (or active flag) for modules coming from?

(BTW I would be really happy to get any response to this, I already
explained this all before, so I'd really like to know whether someone
understands what I'm talking about or what is so difficult to
understand.)

As already mentioned every object can only be deleted, when it's nowhere
registered anymore and all users are gone, the same is true for modules:

if (!is_registered(obj) && !get_usecount(obj))
delete(obj);

The is_registered() check can be omitted, if we unregister the object
ourselves, but this requires some locking to prevent new users until the
object is unregistered:

lock();
if (get_usecount(obj)) {
unlock();
return -EBUSY;
}
unregister(obj);
unlock();
delete(obj);

New users have to do this:

lock();
obj = lookup();
if (obj)
inc_usecount(obj);
unlock();

This is the standard mechanism we can find all over the kernel. The
problem with modules now is that the usecount check and the unregister
happen at two different places and the lock is not held until the object
is unregistered, so we have to add a new flag:

mod_lock();
if (get_usecount(obj)) {
mod_unlock();
return -EBUSY;
}
deactivate(obj);
mod_unlock();
...
obj_lock();
unregister(obj);
obj_unlock();
delete(obj);

New users have to do this now:

obj_lock();
obj = lookup();
if (obj) {
mod_lock();
if (is_active(obj))
inc_usecount(obj);
mod_unlock();
}
obj_unlock();

Rusty now worked very hard to make the read path as cheap as possible,
but the general complexity is still the same as always. The only way to
make this even cheaper is to reduce the complexity and this is only
possible by getting rid of the is_active() check. The only consequence
would be that we had no global activate switch anymore, but is it really
needed? (Insert compelling reason here, why such a switch is absolutely
required, as I don't know any.)

To understand the possible alternatives I have to rename Rustys "Two
Stage Delete" into "Three stage delete":
1. deactivate
2. unregister
3. delete
(In case anyone was wondering where the single reference in Rustys
example is coming from - it's the registered state from the second
stage).
As said above the delete stage can only be entered if the object is not
registered and not used anymore, this is the only absolute requirement,
but currently we can't even get past the first stage as long as there
are users (otherwise we risk a deadlock).
When Rusty is now talking about exposing a two stage api to the modules,
he actually means the first stage in order to leave it to the module how
to deactivate the module. It makes indeed little sense to expose this
stage, because this is the stage, which is causing the extra complexity
and which we should rather get rid of.
On the other hand it would make quite a lot more sense to expose the
second stage to the modules. This stage is independent of the usecount,
this means we can easily and safely prevent new users from using the
module. The standard mechanism above to remove an object can easily be
changed into:

unregister(force):
lock();
if (get_usecount(obj) && !force) {
unlock();
return -EBUSY;
}
unregister(obj);
unlock();
delete:
if (get_usecount(obj))
return -EBUSY;
delete(obj);

Doing these stages with one or two functions doesn't really matter, the
work has to be done anyway and causes no real extra complexity.

bye, Roman


2003-02-13 23:06:43

by Werner Almesberger

[permalink] [raw]
Subject: Re: [RFC] Migrating net/sched to new module interface

m.au on Fri, Jan 17, 2003 at 01:27:56PM +1100

Sorry for the long pause ...

Rusty Russell wrote:
> They gave us SMP. What do we gain for your change?

Mainly a simpler interface - one that doesn't treat module unloading
as such a special case, but just as yet another fairly regular
synchronization problem.

This should also have a performance impact: the current approach
puts "code locking" outside of the module, and "data locking" inside
of it. Unifying this eliminates one layer of locking mechanisms.

Independent of this, we should fix the interfaces that give us
unstoppable callbacks. These are just disasters waiting to happen,
modules or no.

Of course, since this may imply interface changes (not necessarily
in terms of changing an existing interface, but perhaps in terms of
adding a properly synchronized version, and discovering bugs in
modules using the not-synchronized one), it would be good if the
module cleanup simplification could be done in parallel.

I think, once we know exactly what semantics to aim for, the change
could be relatively straightforward. (And, I wholeheartedly agree,
there must be no "flag day".)

> But apologies for the tone of my previous mail: it seems I'm
> oversensitive to criticism of the module stuff now 8(

No problem. I actually admire your thick skin, given all the
unjustified and nasty stuff you get thrown at you :-)

> To go someway towards an explanation, at least, I humbly submit a
> fairly complete description of the approach used (assuming the module
> init race fix patch gets merged).

Thanks ! That part looks fine. But, of course, it's not how you
do it that I don't like, but what you're doing :-)

Anyway, my plan is to first get my simulation infrastructure
working, and then make a few test cases that show callbacks
after deregistration causing trouble. After that, hopefully
other people can pick up the cleanup work.

Do you see any obvious technical problems with the approach of
using return from module initialization/cleanup as "ready to
unload" indicator ?

- Werner

--
_________________________________________________________________________
/ Werner Almesberger, Buenos Aires, Argentina [email protected] /
/_http://www.almesberger.net/____________________________________________/

2003-02-14 01:59:10

by Rusty Russell

[permalink] [raw]
Subject: Re: [RFC] Migrating net/sched to new module interface

In message <[email protected]> you write:
> m.au on Fri, Jan 17, 2003 at 01:27:56PM +1100
>
> Sorry for the long pause ...
>
> Rusty Russell wrote:
> > They gave us SMP. What do we gain for your change?
>
> Mainly a simpler interface - one that doesn't treat module unloading
> as such a special case, but just as yet another fairly regular
> synchronization problem.
>
> This should also have a performance impact: the current approach
> puts "code locking" outside of the module, and "data locking" inside
> of it. Unifying this eliminates one layer of locking mechanisms.

Um, no. You're special case "optimizing" it.

When you have an object which may vanish, the linux kernel idiom runs
something like this:

write_lock()
find available object in list
inc object refcount
write_unlock()

The writelock protects the infrastructure, the refcount protects the
object. Deleting is done in two stages (mark deleted and drop
refcount, then whoever drops the refcount to 0 actually does the
free). Usually "mark deleted" means simply remove from the list, but
not always.

The current module code uses exactly the same idiom for the code
itself: we use a heavily read-optimized lock, but that's an
implementation detail.

Now, could you instead lock all the (arbitrary, widespread, unrelated)
accesses to the object, instead of the object itself? Sure. It'd be
unique in the kernel, and involve changing the way every interface
works, and probably expose some of the complexity to every module
author (every workable implementation I've seen has this problem), but
you could do it. See below for why I don't think it's worth it.

> Independent of this, we should fix the interfaces that give us
> unstoppable callbacks. These are just disasters waiting to happen,
> modules or no.

I assume you're referring to the many places where we assume that the
structure being added was not dynamically allocated, so don't bother
to protect against its deletion?

That is, of course, orthogonal.

And in general, I agree: not including a refcount is asking for
trouble. But these reference counts are *not* free. The module
reference counts, by comparison, can be made extremely cheap (see
implementation), because we can allocate a cacheline per cpu, and we
can bias "read" speed in preference of awful "write" speed.

In summary: the most obvious implementation is to lock to module as an
object separately from any objects it might create. The current
implementation is extremely fast, requires neither module changes nor
(many) interface changes, and in effect canonicalizes a single
existing method of locking, which coders seem quite comfortable with.

Given these reasons, you can see why I no longer discuss new
implementation ideas with people 8(

Rusty.
--
Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

2003-02-14 03:34:55

by Werner Almesberger

[permalink] [raw]
Subject: Re: [RFC] Migrating net/sched to new module interface

Rusty Russell wrote:
> Um, no. You're special case "optimizing" it.
>
> When you have an object which may vanish, the linux kernel idiom runs
> something like this:

Yes, that's when you view it as a locking problem, as for data
objects. What I'm saying is that, if you manage the data
structures properly, plus fix a few interfaces that currently
don't have to manage data structures properly, you're already
perfectly synchronized, so no further locking is needed.

> I assume you're referring to the many places where we assume that the
> structure being added was not dynamically allocated, so don't bother
> to protect against its deletion?

Yes.

> And in general, I agree: not including a refcount is asking for
> trouble. But these reference counts are *not* free.

Alas, no. But we how long can we afford not to fix them, at least
the "public" interfaces ? Even if they're unsafe, people will use
them, particularly if they're given no other choice.

> object separately from any objects it might create. The current
> implementation is extremely fast, requires neither module changes nor
> (many) interface changes, and in effect canonicalizes a single
> existing method of locking, which coders seem quite comfortable with.

It's more the changes behind the interfaces I'm worried about.
It may not be bad today, but every function pointer is a
potential problem.

Anyway, without fixing a good number of the "ghost from the
past" interfaces first, my point is moot. So I won't trouble
you again with module locking before there is some progress in
this area :-)

> Given these reasons, you can see why I no longer discuss new
> implementation ideas with people 8(

Nah, don't give up ! :-)

- Werner

--
_________________________________________________________________________
/ Werner Almesberger, Buenos Aires, Argentina [email protected] /
/_http://www.almesberger.net/____________________________________________/

2003-02-14 11:07:43

by Roman Zippel

[permalink] [raw]
Subject: Re: [RFC] Migrating net/sched to new module interface

Hi,

On Fri, 14 Feb 2003, Rusty Russell wrote:

> When you have an object which may vanish, the linux kernel idiom runs
> something like this:
>
> write_lock()
> find available object in list
> inc object refcount
> write_unlock()
>
> The writelock protects the infrastructure, the refcount protects the
> object. Deleting is done in two stages (mark deleted and drop
> refcount, then whoever drops the refcount to 0 actually does the
> free). Usually "mark deleted" means simply remove from the list, but
> not always.
>
> The current module code uses exactly the same idiom for the code
> itself: we use a heavily read-optimized lock, but that's an
> implementation detail.

It's not the same, please see:
http://marc.theaimsgroup.com/?l=linux-kernel&m=104284223130775&w=2
I explained why the current module locking is more complex and why it's
actually a three stage delete.

Let's take an example: procfs entries. That's a more interesting example,
because these can be dynamically created, but they also include pointers
to the module.
To keep it simple we create/remove an entry from a nonmodular kernel (so
module functions are dummies):

foo_entry = create_proc_entry();
foo_entry->data = kmalloc();
foo_entry->read_proc = foo_read;
foo_entry->write_proc = foo_write;

Problem 1: If the user opens that new proc entry, he must be prepared that
the new entry is not fully initialized yet.
Problem 2: There are no memory barriers, that means it's undefined in
which order the data, read_proc, write_proc members arrive at another
cpu, so it's possible that foo_read/foo_write can be called with a NULL
data pointer.

Now let's remove it again:

data = foo_entry->data;
remove_proc_entry();
kfree(data);

Problem: Should the entry still be busy, procfs just prints a warning,
delays cleanup and returns immediately, so the data can be accessed after
it was freed.

Now we do the same from a module but not from module_init/module_exit,
that means we dynamically want to create/remove entries during the module
life time. When we create the entry we also initialize the owner member,
but this doesn't help with any of the above problems. The only thing
module locking changes is that we know that it's safe to free the data at
the time module_exit is called, but this might happen in the distant
future or even never.

Rusty, above are real problems, the module locking fixes these problems
during module_init/module_exit, but how can these problems fixed in the
other cases and how does the module locking help?

bye, Roman

2003-02-14 11:56:35

by Rusty Russell

[permalink] [raw]
Subject: Re: [RFC] Migrating net/sched to new module interface

In message <Pine.LNX.4.44.0302141035270.1336-100000@serv> you write:
> It's not the same, please see:
> http://marc.theaimsgroup.com/?l=linux-kernel&m=104284223130775&w=2
> I explained why the current module locking is more complex and why it's
> actually a three stage delete.

No, here is where you show *your* ignorance of kernel locking idioms,
and that your axiom is that "the new system is more complex".

I suggest you read the kernel locking guide: it's in the kernel
sources in Documentation/DocBook/kernel-locking.*, try "make psdocs".

> Rusty, above are real problems, the module locking fixes these problems
> during module_init/module_exit, but how can these problems fixed in the
> other cases and how does the module locking help?

This isn't even a sensible question: "This is not a module problem.
How does module locking help?"

You're wasting your own valuable time, too.
Rusty.
--
Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

2003-02-14 12:39:49

by Roman Zippel

[permalink] [raw]
Subject: Re: [RFC] Migrating net/sched to new module interface

Hi,

On Fri, 14 Feb 2003, Rusty Russell wrote:

> > It's not the same, please see:
> > http://marc.theaimsgroup.com/?l=linux-kernel&m=104284223130775&w=2
> > I explained why the current module locking is more complex and why it's
> > actually a three stage delete.
>
> No, here is where you show *your* ignorance of kernel locking idioms,
> and that your axiom is that "the new system is more complex".

Well, it should be simple then to point out the problem, would you
_please_ do it?

> > Rusty, above are real problems, the module locking fixes these problems
> > during module_init/module_exit, but how can these problems fixed in the
> > other cases and how does the module locking help?
>
> This isn't even a sensible question: "This is not a module problem.
> How does module locking help?"
>
> You're wasting your own valuable time, too.

I hoped we could discuss locking problems, as I said these problems are
real, so it should be worth to describe possible solutions. Then we can
compare the different locking mechanism and maybe we find one, which not
only solves a part of the problem.

Rusty, I'm asking all these questions on purpose, I want to help you to
understand the complete problem and how limited your solutions are. So
either please answer my questions or point out the mistakes in my
argumentation.

bye, Roman

2003-02-14 13:12:36

by Roman Zippel

[permalink] [raw]
Subject: Re: [RFC] Migrating net/sched to new module interface

Hi,

On Fri, 14 Feb 2003, Rusty Russell wrote:

> > It's not the same, please see:
> > http://marc.theaimsgroup.com/?l=linux-kernel&m=104284223130775&w=2
> > I explained why the current module locking is more complex and why it's
> > actually a three stage delete.
>
> No, here is where you show *your* ignorance of kernel locking idioms,
> and that your axiom is that "the new system is more complex".

Another point you probably misunderstood: I said that the complexity of
the new and the old system is exactly the same, please read more carefully
before flaming, it might backfire.

> > Rusty, above are real problems, the module locking fixes these problems
> > during module_init/module_exit, but how can these problems fixed in the
> > other cases and how does the module locking help?
>
> This isn't even a sensible question: "This is not a module problem.
> How does module locking help?"

I hate to drag people into a discussion, but maybe you're more inclined
to believe Al:

http://marc.theaimsgroup.com/?l=linux-kernel&m=103761331525509&w=2
http://marc.theaimsgroup.com/?l=linux-kernel&m=103769023500378&w=2

Please read this very careful and think about it.

bye, Roman

2003-02-14 13:43:58

by Werner Almesberger

[permalink] [raw]
Subject: Re: [RFC] Migrating net/sched to new module interface

Roman Zippel wrote:
> On Fri, 14 Feb 2003, Rusty Russell wrote:
>> This isn't even a sensible question: "This is not a module problem.
>> How does module locking help?"

I have to side with Rusty here - it's really not a module problem.

The module changes only make this a little worse, because they
follow the philosophy that this can't be fixed, so try_module_get
and friends try to implement the right kind of locking for unload
races (but for little else) without tripping over those "unfixable"
bugs.

The good news is that you can start fixing all those bugs right
now, and even without Rusty's consent :-)

- Werner

--
_________________________________________________________________________
/ Werner Almesberger, Buenos Aires, Argentina [email protected] /
/_http://www.almesberger.net/____________________________________________/

2003-02-14 14:15:19

by Roman Zippel

[permalink] [raw]
Subject: Re: [RFC] Migrating net/sched to new module interface

Hi,

On Fri, 14 Feb 2003, Werner Almesberger wrote:

> The module changes only make this a little worse, because they
> follow the philosophy that this can't be fixed, so try_module_get
> and friends try to implement the right kind of locking for unload
> races (but for little else) without tripping over those "unfixable"
> bugs.

If you see these bugs as "unfixable", you already gave up and you end up
putting one band-aid over another, each only solving part of the problem.
Please try work with me here and we might find a more general solution.
I could just post possible solutions, but as long nobody understands the
problem, they will be rejected anyway.

bye, Roman

2003-02-14 18:20:59

by Werner Almesberger

[permalink] [raw]
Subject: Re: [RFC] Migrating net/sched to new module interface

Roman Zippel wrote:
> If you see these bugs as "unfixable", you already gave up and you end up
> putting one band-aid over another, each only solving part of the problem.

Yup, that's what I don't like either.

> Please try work with me here and we might find a more general solution.
> I could just post possible solutions, but as long nobody understands the
> problem, they will be rejected anyway.

Step one is to fix those "unfixable" problems. That's independent
of modules, and I'm convinced that it needs to be done.

I don't want to have to go through lots of subsystems and try to
figure our how to do their locking right, so my plan is to use a
simulator to exercise such obscure race conditions, and once the
Oops has been shown live and in color, leave the actual fixes to
the maintainers of the respective code. (And eventually, also
leave the bug hunt to others. I'm lousy at maintenance-type of
work.)

I expect to have that simulator ready for useful work on UP in
about one week. SMP will come later. (For a sneak preview, you
can look at umlsim.sourceforge.net, which has the (old) kernel
side, and at netbb/dumpbb/README.UMLSIM in netbb from
http://www.almesberger.net/netbb, which describes some of the
user-space side (basically a script-driven debugger).)

- Werner

--
_________________________________________________________________________
/ Werner Almesberger, Buenos Aires, Argentina [email protected] /
/_http://www.almesberger.net/____________________________________________/

2003-02-14 19:59:44

by Roman Zippel

[permalink] [raw]
Subject: Re: [RFC] Migrating net/sched to new module interface

Hi,

On Fri, 14 Feb 2003, Werner Almesberger wrote:

> > Please try work with me here and we might find a more general solution.
> > I could just post possible solutions, but as long nobody understands the
> > problem, they will be rejected anyway.
>
> Step one is to fix those "unfixable" problems. That's independent
> of modules, and I'm convinced that it needs to be done.

Step one is still to understand the problem. I described a very real
problem, once this problem is solved (which can be done in different
ways), I can explain how this can be applied to modules. It's not really
independent of modules.

bye, Roman

2003-02-15 00:02:48

by Werner Almesberger

[permalink] [raw]
Subject: Re: [RFC] Migrating net/sched to new module interface

Roman Zippel wrote:
> Step one is still to understand the problem. I described a very real
> problem, once this problem is solved (which can be done in different
> ways),

I though we were talking about

static data_used_by_callbacks;
...
register_foo(&stuff_with_callbacks);
...
unregister_foo(&stuff_with_callbacks);
make_unusable(&data_used_by_callbacks)
...
/* oops, we just got a callback */

("data_used_by_callbacks" could be a pointer to kmalloc'ed
memory, etc.)

This kind of problem seems to be understood well enough.
But maybe you mean something else ?

- Werner

--
_________________________________________________________________________
/ Werner Almesberger, Buenos Aires, Argentina [email protected] /
/_http://www.almesberger.net/____________________________________________/

2003-02-15 00:42:31

by Roman Zippel

[permalink] [raw]
Subject: Re: [RFC] Migrating net/sched to new module interface

Hi,

On Fri, 14 Feb 2003, Werner Almesberger wrote:

> I though we were talking about
>
> static data_used_by_callbacks;
> ...
> register_foo(&stuff_with_callbacks);
> ...
> unregister_foo(&stuff_with_callbacks);
> make_unusable(&data_used_by_callbacks)
> ...
> /* oops, we just got a callback */
>
> ("data_used_by_callbacks" could be a pointer to kmalloc'ed
> memory, etc.)
>
> This kind of problem seems to be understood well enough.

Yes, and now compare how the solutions differ when the data is static and
when it's allocated.

bye, Roman

2003-02-15 02:18:38

by Werner Almesberger

[permalink] [raw]
Subject: Re: [RFC] Migrating net/sched to new module interface

Roman Zippel wrote:
> Yes, and now compare how the solutions differ when the data is static and
> when it's allocated.

Do they ? Even if the data is static, it can become invalid
(in the sense that accessing it from a callback would lead
to some kind of undesirable behaviour, even though the access
itself would work), so I don't quite see why the difference
would matter.

Example:

static ... common_callback(...)
{
switch (my_state) {
...
}
}

...
my_state = A;
register_fancy_timer_A(&me_A,common_callback);
...
unregister_fancy_timer_A(&me_A);
my_state = B;
/* stray fancy_timer_A call to common_callback would
trigger action for state B */
...
register_fancy_timer_B(&me_B,common_callback);
...

Depending on "my_state", the callback would perform different
actions. (The "fancy timers" would be some timer-like service
that doesn't del_timer_sync.)

This is getting to abstract. Why don't you just say where you
see the difference ? :-)

- Werner

--
_________________________________________________________________________
/ Werner Almesberger, Buenos Aires, Argentina [email protected] /
/_http://www.almesberger.net/____________________________________________/

2003-02-15 23:10:53

by Roman Zippel

[permalink] [raw]
Subject: Re: [RFC] Migrating net/sched to new module interface

Hi,

On Fri, 14 Feb 2003, Werner Almesberger wrote:

> > Yes, and now compare how the solutions differ when the data is static and
> > when it's allocated.
>
> Do they ? Even if the data is static, it can become invalid
> (in the sense that accessing it from a callback would lead
> to some kind of undesirable behaviour, even though the access
> itself would work), so I don't quite see why the difference
> would matter.

Let's stay at the main problem, we have find out when it's safe to delete
an object. For dynamic objects you have the following options:
- callbacks: when the refcount becomes zero, we call a function to remove
the object.
- failure: we just return -EBUSY and try again later.
- wait: we simply wait until the refcount becomes zero

Static objects and functions are freed by the module code and usually we
want to unregister them at module unload time, so there are basically two
ways:
- we use the module count via try_module_get/module_put
- we use your own refcount and must wait in module_exit until all users
are gone

If we exclude the possibly-wait-forever-option, do you see the problem
for dynamic objects which also contain references to static data/
functions? Procfs entries are such objects, there is a count field for the
dynamic part and an owner field for the static part and proc_get_inode
always has to get two references. The interesting question is now, can we
get rid of one of them?
If the answer is no, it would mean we need two procfs APIs, one which can
be used from module_exit and another which can be used to remove dynamic
entries. OTOH if we want to avoid this it would mean we have to make one
or more of the above options generically usable.

I stop here, because before we can discuss possible solutions, we have to
agree, that this problem is real.
Rusty, did I make any mistake so far? I'd really like to have your opinion
too. You are free to flame me, if I made any mistake, but please do it
with a bit more substance than just accusing me of ignorance.

bye, Roman

2003-02-17 02:53:24

by Rusty Russell

[permalink] [raw]
Subject: Re: [RFC] Migrating net/sched to new module interface

In message <Pine.LNX.4.44.0302141334560.1336-100000@serv> you write:
> Rusty, I'm asking all these questions on purpose, I want to help you to
> understand the complete problem and how limited your solutions are. So
> either please answer my questions or point out the mistakes in my
> argumentation.

There were two major changes in the module code. The first was to
simplify the userland interface, from:

sys_create_module(name, size)
sys_query_module(...) (a 5-way multiplexed syscall)
sys_init_module(name, ptr)

to:
sys_init_module(ptr, len, userargs)

To argue against this change is a demonstration of lack of
understanding, or a complete lack of taste.

The second change was the speed up one system of module locking in the
kernel which wasn't racy, and deprecate the other system which was
racy in 99% of its uses. That is all.

Did it solve all the races in the kernel? Of course not. But it's
simple to use, already well understood in the kernel, and avoids
massive changes. It also allows connection tracking to be properly
modularized, which was my long-lost original purpose.

I've repeated this enough now, I think.
Rusty.
--
Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

2003-02-17 10:43:33

by Roman Zippel

[permalink] [raw]
Subject: Re: [RFC] Migrating net/sched to new module interface

Hi,

On Mon, 17 Feb 2003, Rusty Russell wrote:

> There were two major changes in the module code. The first was to
> simplify the userland interface, from:
>
> sys_create_module(name, size)
> sys_query_module(...) (a 5-way multiplexed syscall)
> sys_init_module(name, ptr)
>
> to:
> sys_init_module(ptr, len, userargs)
>
> To argue against this change is a demonstration of lack of
> understanding, or a complete lack of taste.

Maybe you could share a bit of your wisdom?
1. Doing the linking in userspace requires two steps, but I still don't
know what's so bad about it.
2. This still doesn't explain, why everything has to be moved into kernel,
why can't we move more into userspace?
3. You simply moved part of the query syscall functionality to
/proc/modules (which btw is still not enough to fix ksymoops).

> The second change was the speed up one system of module locking in the
> kernel which wasn't racy, and deprecate the other system which was
> racy in 99% of its uses. That is all.

Well, I'm not against optimizing the module locking (*), as we won't get
rid of it in the near feature, but it still has problems.
1. It's adding complexity (however you implement it), I explained it in
detail and you still haven't told me, where I'm wrong.
2. The module interface is incompatible with other kernel interfaces, I
tried to explain that in the mail from saturday, if you think I'm wrong,
your input is very welcome, but _please_ answer to that mail.

> Did it solve all the races in the kernel? Of course not. But it's
> simple to use, already well understood in the kernel, and avoids
> massive changes. It also allows connection tracking to be properly
> modularized, which was my long-lost original purpose.

It's too much fun to quote Al here:
"And no, I don't buy arguments about poor interface-writers. You do some
infrastructure intended to be used by driver-writers - you are supposed
to have a clue. 'Cause having rabid monkeys on crack on *both* sides of
an interface is a recipe for disaster and on the driver side you are
guaranteed to have a bunch of them.

We need to have interfaces cleaned up. No silver bullets there. There's
maybe a dozen of interfaces that cover 99% of all drivers out there. Remaining
1% can and should fend for itself - you do something really tricky, you
are responsible for getting it right."

> I've repeated this enough now, I think.

Yes, you repeated your "executive summaries" now often enough, maybe you
can impress some manager with it, but it's highly offensive to kernel
hackers. Could you _please_ come up with some arguments now?

bye, Roman

(*) BTW I have patch that would make the unload path usable again, it
would only be required to add a single smp_mb() to the fast path.

2003-02-17 16:54:55

by Werner Almesberger

[permalink] [raw]
Subject: Re: [RFC] Migrating net/sched to new module interface

Roman Zippel wrote:
> Let's stay at the main problem, we have find out when it's safe to delete
> an object. For dynamic objects you have the following options:
[...]
> Static objects and functions are freed by the module code and usually we
[...]

Okay so far.

> If we exclude the possibly-wait-forever-option, do you see the problem
> for dynamic objects which also contain references to static data/
> functions?

You mean that two locking mechanisms are used, where one of them
shouldn't be doing all that much ? Well, yes.

Now, is this a problem, or just a symptom ? I'd say it's a symptom:
we already have a perfectly good locking/synchronization method,
and that's through the register/unregister interface, so the
module-specific part is unnecessary.

That much about the theory. Of course, in real life, we have to
face a few more problems:

- if callbacks can happen after apparently successful "unregister",
we die
- if accesses to other static data owned by a module can happen
after apparently successful "unregister", we may die
- if a module doesn't "unregister" at all, we die too

But all these problems equally affect code that does other things
that can break a callback/access, e.g. if we destroy *de->data
immediately after remove_proc_entry returns.

So this is not a module-specific problem.

Agreed ?

- Werner

--
_________________________________________________________________________
/ Werner Almesberger, Buenos Aires, Argentina [email protected] /
/_http://www.almesberger.net/____________________________________________/

2003-02-17 22:59:31

by Roman Zippel

[permalink] [raw]
Subject: [RFC] Is an alternative module interface needed/possible?

Hi,

(Subject changed to hopefully get a bit more attention.)

On Mon, 17 Feb 2003, Werner Almesberger wrote:

> > If we exclude the possibly-wait-forever-option, do you see the problem
> > for dynamic objects which also contain references to static data/
> > functions?
>
> You mean that two locking mechanisms are used, where one of them
> shouldn't be doing all that much ? Well, yes.
>
> Now, is this a problem, or just a symptom ? I'd say it's a symptom:
> we already have a perfectly good locking/synchronization method,
> and that's through the register/unregister interface, so the
> module-specific part is unnecessary.

If it was perfectly good, we hadn't a problem. :)

> That much about the theory. Of course, in real life, we have to
> face a few more problems:
>
> - if callbacks can happen after apparently successful "unregister",
> we die
> - if accesses to other static data owned by a module can happen
> after apparently successful "unregister", we may die
> - if a module doesn't "unregister" at all, we die too
>
> But all these problems equally affect code that does other things
> that can break a callback/access, e.g. if we destroy *de->data
> immediately after remove_proc_entry returns.
>
> So this is not a module-specific problem.

You're skipping ahead. You haven't solved the problem yet, but you're
already jumping to conclusions. :-)
Remember, that we want to savely remove a proc entry and as added bonus,
we only want a single reference count. Let's look first at the possible
solutions:
module count: by design this only works for entries, which are removed
during module exit, but not for dynamic entries.
failure: if the object is still busy, we just return -EBUSY. This is
simple, but this doesn't work for modules, since during module exit you
can't fail anymore.
callbacks: the callback function itself had to be protected somehow, so
just to unregister a proc entry, you have to register a callback. To
unregister that callback, it would be silly to use another callback and
failure doesn't work with modules, so that only leaves the module count.

The last solution sounds complicated, but exactly this is done for
filesystems and we didn't really get rid of the second reference count, we
just moved it somewhere else, where it hurts least.
Without interface changes this is also the only generic option to export
dynamic data - the drivers have to get a filesystem like interface (or
just become filesystem themselves).

The very basic reason which prevents another solution is that static data
(which includes functions) is controlled by the generic module code and
dynamic data is controlled by the driver itself. It's obvious that we
can't give the module code control over dynamic data, on the other hand
would it be possible to give the driver control over the static data? This
way it suddenly it becomes a module-specific problem - how can we give
drivers more control over its data?

bye, Roman

2003-02-18 01:09:11

by Werner Almesberger

[permalink] [raw]
Subject: Re: [RFC] Is an alternative module interface needed/possible?

Roman Zippel wrote:
> If it was perfectly good, we hadn't a problem. :)

I said we he have the method. Now we need to use it properly :-)

> You're skipping ahead. You haven't solved the problem yet, but you're
> already jumping to conclusions. :-)

The solution is another issue. I simply stated that the problem
happens with or without modules.

> module count: by design this only works for entries, which are removed
> during module exit, but not for dynamic entries.

Works only for modules, not good.

> failure: if the object is still busy, we just return -EBUSY. This is
> simple, but this doesn't work for modules, since during module exit you
> can't fail anymore.

That's a modules API problem. And yes, I think modules should
eventually be able to say that they're busy.

> callbacks: the callback function itself had to be protected somehow, so
> just to unregister a proc entry, you have to register a callback. To
> unregister that callback, it would be silly to use another callback and

If all you want to do is to decrement the module count, you could
have a global handler for this that is guaranteed not to reside
in a module.

By the way, a loong time ago, in the modules thread, I suggested
a "decrement_module_count_and_return" function [1]. Such a
construct would be useful in this specific case.

[1] http://www.uwsg.iu.edu/hypermail/linux/kernel/0207.0/0147.html

> failure doesn't work with modules, so that only leaves the module count.

And how would you ensure correct access to static data in the
absence of modules ? Any solution that _requires_ a module count
looks highly suspicious to me.

Likewise, possibly dynamically allocated data that is synchronized
by the caller, e.g. "user" in "struct proc_dir_entry".

> The last solution sounds complicated, but exactly this is done for
> filesystems and we didn't really get rid of the second reference count, we
> just moved it somewhere else, where it hurts least.

Hmm, I'm confused. With "filesystem", do you mean the file system
driver per se (e.g. "ext3"), or a specific instance of such a file
system (e.g. /dev/hda1 mounted on /) ?

- Werner

--
_________________________________________________________________________
/ Werner Almesberger, Buenos Aires, Argentina [email protected] /
/_http://www.almesberger.net/____________________________________________/

2003-02-18 02:38:53

by Rusty Russell

[permalink] [raw]
Subject: Re: [RFC] Migrating net/sched to new module interface

In message <Pine.LNX.4.44.0302171112390.1336-100000@serv> you write:
> Maybe you could share a bit of your wisdom?
> 1. Doing the linking in userspace requires two steps, but I still don't
> know what's so bad about it.
> 2. This still doesn't explain, why everything has to be moved into kernel,
> why can't we move more into userspace?
> 3. You simply moved part of the query syscall functionality to
> /proc/modules (which btw is still not enough to fix ksymoops).

I think you'd do far better to implement it yourself for half a dozen
architectures. It's not my job to teach you things which can be
gained by reading the code and thinking a little.

We're going in a circle again.

> > The second change was the speed up one system of module locking in the
> > kernel which wasn't racy, and deprecate the other system which was
> > racy in 99% of its uses. That is all.
>
> Well, I'm not against optimizing the module locking (*), as we won't get
> rid of it in the near feature, but it still has problems.
>
> 1. It's adding complexity (however you implement it), I explained it in
> detail and you still haven't told me, where I'm wrong.

No, it's exactly the same as before. You can't see that, and I've
given up explaining it.

> 2. The module interface is incompatible with other kernel interfaces, I
> tried to explain that in the mail from saturday, if you think I'm wrong,
> your input is very welcome, but _please_ answer to that mail.

This problem is in your mind Roman.

Deal with it.

> > Did it solve all the races in the kernel? Of course not. But it's
> > simple to use, already well understood in the kernel, and avoids
> > massive changes. It also allows connection tracking to be properly
> > modularized, which was my long-lost original purpose.
>
> It's too much fun to quote Al here:

Quoting Al's rant isn't an argument. It wasn't very coherent when he
wrote it, and it doesn't gain with repetition.

The code exists. It's simple to use.

I give up. You're killfiled again 8(
Rusty.
--
Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

2003-02-18 04:53:53

by Rusty Russell

[permalink] [raw]
Subject: Re: [RFC] Is an alternative module interface needed/possible?

In message <[email protected]> you write:
> > failure: if the object is still busy, we just return -EBUSY. This is
> > simple, but this doesn't work for modules, since during module exit you
> > can't fail anymore.

Of course, they never could fail module exit. You could use
can_unload() which was only protected by the BKL, hence so rarely
used: grep 2.4.20 to see all four users.

Having a separate can_unload function means a window between asking
can_unload() and doing the actual unload. Doing it in the function
itself (ie. cleanup returns an int, or a separate "prepare_to_unload"
kind of function) means you have to deal with the "I started
deregistering my stuff, but then had to re-register them" which leaves
a race where the module is partially unavailable (ie. the spurious
failure problem).

Both these problems are soluble, but they're not trivial.

> That's a modules API problem. And yes, I think modules should
> eventually be able to say that they're busy.

Take a look at the current ip_conntrack code. It keeps its own
reference count and spins until it hits zero in unload, because
otherwise it would never be unloadable (it attaches a callback to each
packet, so logically it would bump its refcount there). In 2.5 it can
use try_module_get() and be unloaded with rmmod --wait in this worst
case (I'm slack, patching is on my todo list).

Security modules have the same issues, and Greg Kroah-Hartmann was
telling me USB has it as well.

You can see why I considered deprecating module unloading: it's a hard
problem, and fixing it in general probably means massive changes.
Which I'm happy to leave to someone else.

> By the way, a loong time ago, in the modules thread, I suggested
> a "decrement_module_count_and_return" function [1]. Such a
> construct would be useful in this specific case.

Stephen and I even implemented one, for x86 (see below). But
implementing the matching "enter and inc" case remains problematic, so
I dropped that idea.

Anyway, good luck!
Rusty.
--
Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

Name: module_put_return primitive for x86
Author: Rusty Russell
Status: Experimental
Depends: Module/forceunload.patch.gz

D: This patch implements module_put_return() for x86, which allows a
D: module to control its own reference counts in some cases. A module
D: must never use module_put() on itself, since this may result in the
D: module being removable immediately: this is the alternative, which
D: atomically decrements the count and returns.
D:
D: Each architecture will need to implement this for preempt.

diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal .13134-linux-2.5.38/arch/i386/kernel/module.c .13134-linux-2.5.38.updated/arch/i386/kernel/module.c
--- .13134-linux-2.5.38/arch/i386/kernel/module.c 2002-09-25 02:02:28.000000000 +1000
+++ .13134-linux-2.5.38.updated/arch/i386/kernel/module.c 2002-09-25 02:02:56.000000000 +1000
@@ -28,6 +28,9 @@
#define DEBUGP(fmt , ...)
#endif

+/* For the magic module return. */
+struct module_percpu module_percpu[NR_CPUS];
+
static void *alloc_and_zero(unsigned long size)
{
void *ret;
diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal .13134-linux-2.5.38/arch/i386/lib/module.S .13134-linux-2.5.38.updated/arch/i386/lib/module.S
--- .13134-linux-2.5.38/arch/i386/lib/module.S 1970-01-01 10:00:00.000000000 +1000
+++ .13134-linux-2.5.38.updated/arch/i386/lib/module.S 2002-09-25 02:02:56.000000000 +1000
@@ -0,0 +1,33 @@
+/* Icky, icky, return trampoline for dying modules. Entered with
+ interrupts off. (C) 2002 Stephen Rothwell, IBM Corporation. */
+
+#include <asm/thread_info.h>
+
+.text
+.align 4
+.globl __magic_module_return
+
+#define MODULE_PERCPU_SIZE_ORDER 3
+
+__magic_module_return:
+ /* Save one working variable */
+ pushl %eax
+
+ /* Get CPU number from current. */
+ GET_THREAD_INFO(%eax)
+ movl TI_CPU(%eax), %eax
+
+ /* Push module_percpu[cpu].flags on the stack */
+ shll $MODULE_PERCPU_SIZE_ORDER, %eax
+ pushl module_percpu(%eax)
+
+ /* Put module_percpu[cpu].returnaddr into %eax */
+ movl module_percpu+4(%eax), %eax
+
+ /* Push returnaddr and restore eax */
+ xchgl %eax, 4(%esp)
+
+ /* Restore interrupts */
+ popf
+ /* Go home */
+ ret
diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal .13134-linux-2.5.38/include/asm-i386/module.h .13134-linux-2.5.38.updated/include/asm-i386/module.h
--- .13134-linux-2.5.38/include/asm-i386/module.h 2002-09-25 02:02:28.000000000 +1000
+++ .13134-linux-2.5.38.updated/include/asm-i386/module.h 2002-09-25 02:02:56.000000000 +1000
@@ -8,4 +8,33 @@ struct mod_arch_specific
#define Elf_Shdr Elf32_Shdr
#define Elf_Sym Elf32_Sym
#define Elf_Ehdr Elf32_Ehdr
+
+/* Non-preemtible decrement the refcount and return. */
+#define module_put_return(firstarg , ...) \
+do { \
+ unsigned int cpu; \
+ unsigned long flags; \
+ \
+ local_irq_save(flags); \
+ if (unlikely(module_put(THIS_MODULE)) { \
+ module_percpu[cpu].flags = flags; \
+ module_percpu[cpu].returnaddr = ((void **)&(firstarg))[-1]; \
+ ((void **)&(firstarg))[-1] = __magic_module_return; \
+ } else \
+ local_irq_restore(flags); \
+ return __VA_ARGS__; \
+} while(0)
+
+/* FIXME: Use per-cpu vars --RR */
+struct module_percpu
+{
+ unsigned long flags;
+ void *returnaddr;
+};
+
+extern struct module_percpu module_percpu[NR_CPUS];
+
+/* Restore flags and return to caller. */
+extern void __magic_module_return(void);
+
#endif /* _ASM_I386_MODULE_H */

2003-02-18 07:11:46

by Werner Almesberger

[permalink] [raw]
Subject: Re: [RFC] Is an alternative module interface needed/possible?

Grr, I hate this. I just typed a long reply to your posting, and
wanted to finish it with a reminder that the issue is more general
than just modules, so that we shouldn't look at modules yet. Then
I realized of course that everything above was about modules :-(

I don't think we'll make much progress if we keep on mixing issues
of interface correctness, current module constraints, and possible
module interface changes, all that with performance considerations
and minimum invasive migration plans thrown in. So I'd suggest the
following sequence:

1) do we agree that the current registration/deregistration
interfaces are potential hazards for their users, be they
modules or not ?
2) one we agree with this, we can look for mechanisms that
solve this, again for general users, which may or may not
be modules
3) last but not least, we can look at what this means for
modules (and that's where beautiful tools like
"module_put_return" (thanks !), or also ideas about
module_exit redesign have their place)
4) "the root of all evil ...". Okay, and now to which level
of hell would all this shoot our performance ? (And back
we go to step 2.)

- Werner

--
_________________________________________________________________________
/ Werner Almesberger, Buenos Aires, Argentina [email protected] /
/_http://www.almesberger.net/____________________________________________/

2003-02-18 11:57:18

by Roman Zippel

[permalink] [raw]
Subject: Re: [RFC] Is an alternative module interface needed/possible?

Hi,

On Tue, 18 Feb 2003, Werner Almesberger wrote:

> I don't think we'll make much progress if we keep on mixing issues
> of interface correctness, current module constraints, and possible
> module interface changes, all that with performance considerations
> and minimum invasive migration plans thrown in. So I'd suggest the
> following sequence:
>
> 1) do we agree that the current registration/deregistration
> interfaces are potential hazards for their users, be they
> modules or not ?
> 2) one we agree with this, we can look for mechanisms that
> solve this, again for general users, which may or may not
> be modules
> 3) last but not least, we can look at what this means for
> modules (and that's where beautiful tools like
> "module_put_return" (thanks !), or also ideas about
> module_exit redesign have their place)
> 4) "the root of all evil ...". Okay, and now to which level
> of hell would all this shoot our performance ? (And back
> we go to step 2.)

Basically I can agree with this, although I'd like to avoid that we
iterate too much over these steps, as it would too easily divert the
discussion to other things, so I'd rather take smaller steps and keep the
scope a bit broader.
Another point is the perfomance, which is not that important right now.
I'm more interested in the general complexity, it's simply easier to
optimize a simple design than a very complex design.

bye, Roman

2003-02-18 12:17:44

by Roman Zippel

[permalink] [raw]
Subject: Re: [RFC] Migrating net/sched to new module interface

Hi,

On Tue, 18 Feb 2003, Rusty Russell wrote:

> > Maybe you could share a bit of your wisdom?
> > 1. Doing the linking in userspace requires two steps, but I still don't
> > know what's so bad about it.
> > 2. This still doesn't explain, why everything has to be moved into kernel,
> > why can't we move more into userspace?
> > 3. You simply moved part of the query syscall functionality to
> > /proc/modules (which btw is still not enough to fix ksymoops).
>
> I think you'd do far better to implement it yourself for half a dozen
> architectures. It's not my job to teach you things which can be
> gained by reading the code and thinking a little.

As usual you explain nothing, so I still don't know why a complete rewrite
was necessary. The old implementation did work fine within limits and
already has support for all architectures, so why should I just throw it
away? Why was it not possible to first fix the problems of the old system?

> > Well, I'm not against optimizing the module locking (*), as we won't get
> > rid of it in the near feature, but it still has problems.
> >
> > 1. It's adding complexity (however you implement it), I explained it in
> > detail and you still haven't told me, where I'm wrong.
>
> No, it's exactly the same as before. You can't see that, and I've
> given up explaining it.

So far you explained nothing and if you would just read and try to
understand that damned mail(*), you would know, that I already said that
the complexity is "exactly the same as before". I'm comparing it to other
solutions, which you obviously haven't understood.

(*) http://marc.theaimsgroup.com/?l=linux-kernel&m=104284223130775&w=2

> > 2. The module interface is incompatible with other kernel interfaces, I
> > tried to explain that in the mail from saturday, if you think I'm wrong,
> > your input is very welcome, but _please_ answer to that mail.
>
> This problem is in your mind Roman.

Thanks for another detailed explaination. :(

> > It's too much fun to quote Al here:
>
> Quoting Al's rant isn't an argument. It wasn't very coherent when he
> wrote it, and it doesn't gain with repetition.

Well, if you don't even try to understand, what Al is trying to tell you,
I'm afraid I can't help you either.

> The code exists. It's simple to use.
>
> I give up. You're killfiled again 8(

I seriously consider to take over modules maintainership, but I have
neither the energy nor the time to do this alone, so I can only wish
everyone much fun with modules during 2.6.

bye, Roman

2003-02-18 12:25:48

by Roman Zippel

[permalink] [raw]
Subject: Re: [RFC] Is an alternative module interface needed/possible?

Hi,

On Tue, 18 Feb 2003, Werner Almesberger wrote:

> I don't think we'll make much progress if we keep on mixing issues
> of interface correctness, current module constraints, and possible
> module interface changes, all that with performance considerations
> and minimum invasive migration plans thrown in. So I'd suggest the
> following sequence:

Um, another point, let's ignore "minimum invasive migration plans", if we
found a good solution, we can still figure out how to get there smoothly,
so this shouldn't be a primary concern.

bye, Roman

2003-02-18 14:04:46

by Werner Almesberger

[permalink] [raw]
Subject: Re: [RFC] Is an alternative module interface needed/possible?

Roman Zippel wrote:
> Um, another point, let's ignore "minimum invasive migration plans", if we
> found a good solution, we can still figure out how to get there smoothly,
> so this shouldn't be a primary concern.

In my next posting, I just wrote (this very minute)

"For brainstorming's sake, let's not worry about backward-compatibility
for now:"

So I guess I don't need this disclaimer then ;-)

- Werner

--
_________________________________________________________________________
/ Werner Almesberger, Buenos Aires, Argentina [email protected] /
/_http://www.almesberger.net/____________________________________________/

2003-02-18 14:02:42

by Werner Almesberger

[permalink] [raw]
Subject: Re: [RFC] Is an alternative module interface needed/possible?

(I think we should limit the Cc:s to Roman, Rusty, the list, and me,
and leave the others in peace. Please yell if you really want that
extra copy :-)

Roman Zippel wrote:
> Basically I can agree with this, although I'd like to avoid that we
> iterate too much over these steps, as it would too easily divert the
> discussion to other things, so I'd rather take smaller steps and keep the
> scope a bit broader.

Good :-) I want to avoid modules as much as possible, because
they've extensively been tackled in the past (which didn't help
much making the interfaces better), and also because they're
just a bit too political an issue.

Okay, this brings us to the issue of broken interfaces. Do we
have agreement that there are cases where interfaces like
remove_proc_entry, in their current state, cannot be used
correctly ?

Example:

static ...callback...(... struct file *file ...)
{
void *user = PDE(file->f_dentry->d_inode)->data;

...
do something with "*user"
...
}

...
struct proc_dir_entry *de = create_proc_entry(...);
void *my_data;

de->data = my_data = kmalloc(...);
...
remove_proc_entry(...);
/* what happens with "my_data", formerly known as "de->data" ? */

a) kfree it. May cause an oops or other problems if we're in the
middle of the callback.

b) don't kfree it. So we now have a (hopefully small) memory leak.
Problem: my_data may point to a lot more than just some tiny
allocation.

Okay so far ?

(Possible solutions on the next, slid^H^H^H^Hposting :-)

- Werner

--
_________________________________________________________________________
/ Werner Almesberger, Buenos Aires, Argentina [email protected] /
/_http://www.almesberger.net/____________________________________________/

2003-02-18 17:14:45

by Werner Almesberger

[permalink] [raw]
Subject: Re: [RFC] Is an alternative module interface needed/possible?

Next round: possible remedies and their side-effects. As
usual, if you disagree with something, please holler.

If yes, let's look at possible (and not overly insane) solutions,
using remove_proc_entry as a case study:

1) still don't kfree, and leave it to the user to somehow
minimize the damage. (Good luck :-)

2) add a callback that is invoked when the proc entry gets
deleted. (This callback may be called before remove_proc_entry
completes.) Problem: unload/return race for modules.

3) change remove_proc_entry or add remove_proc_entry_wait that
works like remove_proc_entry, but blocks until the entry is
deleted. Problem: may sleep "forever".

4) make remove_proc_entry return an indication of whether the
entry was successfully removed or not. Problem: if it
wasn't, what can we do then ?

5) like above, but don't remove the entry if we can't do it
immediately. Problem: there's no notification for when we
should try again, so we'd have to poll.

6) export the lookup mechanism, and let the caller poll for
removal. Problem: races with creation of a new entry with
the same name.

7) transfer ownership of de->data to procfs, and set some
(possibly configurable) destruction policy (e.g. always
kfree, or such). Similar to 2), but less flexible.

Any more ?

I think that most programmers would intuitively expect an
interface of type 3). In cases where we can't sleep, either
type 2) or type 5) would be common choices.

Does that sound reasonable so far ?

I'll wait a little until I continue with ways for dealing
with the side-effects.

- Werner

--
_________________________________________________________________________
/ Werner Almesberger, Buenos Aires, Argentina [email protected] /
/_http://www.almesberger.net/____________________________________________/

2003-02-18 18:35:59

by Thomas Molina

[permalink] [raw]
Subject: Re: [RFC] Is an alternative module interface needed/possible?

On Tue, 18 Feb 2003, Werner Almesberger wrote:

> Good :-) I want to avoid modules as much as possible, because
> they've extensively been tackled in the past (which didn't help
> much making the interfaces better), and also because they're
> just a bit too political an issue.
>
> Okay, this brings us to the issue of broken interfaces. Do we
> have agreement that there are cases where interfaces like
> remove_proc_entry, in their current state, cannot be used
> correctly ?

I hope this discussion is taking place in the context of looking forward
towards something to implement in 2.7. IMHO we are much too late in the
2.5 cycle to implement this now.

2003-02-19 01:40:02

by Roman Zippel

[permalink] [raw]
Subject: Re: [RFC] Is an alternative module interface needed/possible?

Hi,

On Mon, 17 Feb 2003, Werner Almesberger wrote:

> > failure: if the object is still busy, we just return -EBUSY. This is
> > simple, but this doesn't work for modules, since during module exit you
> > can't fail anymore.
>
> That's a modules API problem. And yes, I think modules should
> eventually be able to say that they're busy.

As Rusty already correctly said, this is currently not possible with the
current modules API. On the other hand he also jumps very quickly to
conclusions, so let's look at all the options first and then decide how
trivial they are.
Anyway, for now just keep this option in mind and let's look at the other
options, which don't require a module API change.

> By the way, a loong time ago, in the modules thread, I suggested
> a "decrement_module_count_and_return" function [1]. Such a
> construct would be useful in this specific case.

This would just be an optimization(?) for the module count, it doesn't
change the general problem and is not useful as generic solution, so I'd
rather put it back for now.

> > failure doesn't work with modules, so that only leaves the module count.
>
> And how would you ensure correct access to static data in the
> absence of modules ? Any solution that _requires_ a module count
> looks highly suspicious to me.

In that case it would be kernel memory, which cannot be freed, so it will
not go away and requires no module count.

> Likewise, possibly dynamically allocated data that is synchronized
> by the caller, e.g. "user" in "struct proc_dir_entry".

user?

> > The last solution sounds complicated, but exactly this is done for
> > filesystems and we didn't really get rid of the second reference count, we
> > just moved it somewhere else, where it hurts least.
>
> Hmm, I'm confused. With "filesystem", do you mean the file system
> driver per se (e.g. "ext3"), or a specific instance of such a file
> system (e.g. /dev/hda1 mounted on /) ?

A generic file system as it's registered via register_filesystem. A file
system exports lots of dynamic and static data and the only (used) module
pointer you will find is in struct file_system_type (the owner pointer in
struct file_operations is not used by any standard fs). It's interesting
to see how file systems keep the module business at a minimum (no
try_module_get() is still cheaper than the cheapest try_module_get()).

Um, it's getting late and I just played too much with procfs to find a
sensible solution. Below is an experimental patch to add callback which
would allow it to safely remove user data. Very lightly tested, so no
guarantees.

bye, Roman

Index: include/linux/proc_fs.h
===================================================================
RCS file: /usr/src/cvsroot/linux-2.5/include/linux/proc_fs.h,v
retrieving revision 1.1.1.9
diff -u -p -r1.1.1.9 proc_fs.h
--- include/linux/proc_fs.h 27 Aug 2002 23:38:02 -0000 1.1.1.9
+++ include/linux/proc_fs.h 19 Feb 2003 00:46:28 -0000
@@ -69,6 +69,7 @@ struct proc_dir_entry {
void *data;
read_proc_t *read_proc;
write_proc_t *write_proc;
+ void (*release_proc)(struct proc_dir_entry *);
atomic_t count; /* use count */
int deleted; /* delete flag */
kdev_t rdev;
Index: fs/proc/generic.c
===================================================================
RCS file: /usr/src/cvsroot/linux-2.5/fs/proc/generic.c,v
retrieving revision 1.1.1.12
diff -u -p -r1.1.1.12 generic.c
--- fs/proc/generic.c 15 Feb 2003 01:25:07 -0000 1.1.1.12
+++ fs/proc/generic.c 19 Feb 2003 01:40:36 -0000
@@ -639,6 +639,8 @@ void free_proc_entry(struct proc_dir_ent
if (ino < PROC_DYNAMIC_FIRST ||
ino >= PROC_DYNAMIC_FIRST+PROC_NDYNAMIC)
return;
+ if (de->release_proc)
+ de->release_proc(de);
if (S_ISLNK(de->mode) && de->data)
kfree(de->data);
kfree(de);
@@ -655,6 +657,7 @@ void remove_proc_entry(const char *name,
const char *fn = name;
int len;

+ lock_kernel();
if (!parent && xlate_proc_name(name, &parent, &fn) != 0)
goto out;
len = strlen(fn);
@@ -680,5 +683,6 @@ void remove_proc_entry(const char *name,
break;
}
out:
+ unlock_kernel();
return;
}
Index: fs/proc/inode.c
===================================================================
RCS file: /usr/src/cvsroot/linux-2.5/fs/proc/inode.c,v
retrieving revision 1.1.1.13
diff -u -p -r1.1.1.13 inode.c
--- fs/proc/inode.c 1 Feb 2003 19:59:21 -0000 1.1.1.13
+++ fs/proc/inode.c 19 Feb 2003 01:40:36 -0000
@@ -23,13 +23,6 @@

extern void free_proc_entry(struct proc_dir_entry *);

-static inline struct proc_dir_entry * de_get(struct proc_dir_entry *de)
-{
- if (de)
- atomic_inc(&de->count);
- return de;
-}
-
/*
* Decrements the use count and checks for deferred deletion.
*/
@@ -44,11 +37,13 @@ static void de_put(struct proc_dir_entry
}

if (atomic_dec_and_test(&de->count)) {
+ struct module *owner = de->owner;
if (de->deleted) {
printk("de_put: deferred delete of %s\n",
de->name);
free_proc_entry(de);
}
+ module_put(owner);
}
unlock_kernel();
}
@@ -71,11 +66,8 @@ static void proc_delete_inode(struct ino

/* Let go of any associated proc directory entry */
de = PROC_I(inode)->pde;
- if (de) {
- if (de->owner)
- module_put(de->owner);
+ if (de)
de_put(de);
- }
}

struct vfsmount *proc_mnt;
@@ -178,44 +170,36 @@ struct inode * proc_get_inode(struct sup
/*
* Increment the use count so the dir entry can't disappear.
*/
- de_get(de);
-#if 1
-/* shouldn't ever happen */
-if (de && de->deleted)
-printk("proc_iget: using deleted entry %s, count=%d\n", de->name, atomic_read(&de->count));
-#endif
+ if (!atomic_read(&de->count) && !try_module_get(de->owner))
+ return NULL;
+ atomic_inc(&de->count);

inode = iget(sb, ino);
if (!inode)
goto out_fail;
-
+
PROC_I(inode)->pde = de;
- if (de) {
- if (de->mode) {
- inode->i_mode = de->mode;
- inode->i_uid = de->uid;
- inode->i_gid = de->gid;
- }
- if (de->size)
- inode->i_size = de->size;
- if (de->nlink)
- inode->i_nlink = de->nlink;
- if (!try_module_get(de->owner))
- goto out_fail;
- if (de->proc_iops)
- inode->i_op = de->proc_iops;
- if (de->proc_fops)
- inode->i_fop = de->proc_fops;
- else if (S_ISBLK(de->mode)||S_ISCHR(de->mode)||S_ISFIFO(de->mode))
- init_special_inode(inode,de->mode,kdev_t_to_nr(de->rdev));
+ if (de->mode) {
+ inode->i_mode = de->mode;
+ inode->i_uid = de->uid;
+ inode->i_gid = de->gid;
}
+ if (de->size)
+ inode->i_size = de->size;
+ if (de->nlink)
+ inode->i_nlink = de->nlink;
+ if (de->proc_iops)
+ inode->i_op = de->proc_iops;
+ if (de->proc_fops)
+ inode->i_fop = de->proc_fops;
+ else if (S_ISBLK(de->mode)||S_ISCHR(de->mode)||S_ISFIFO(de->mode))
+ init_special_inode(inode,de->mode,kdev_t_to_nr(de->rdev));

-out:
return inode;

out_fail:
de_put(de);
- goto out;
+ return NULL;
}

int proc_fill_super(struct super_block *s, void *data, int silent)

2003-02-19 02:17:42

by Werner Almesberger

[permalink] [raw]
Subject: Re: [RFC] Is an alternative module interface needed/possible?

Roman Zippel wrote:
> Anyway, for now just keep this option in mind and let's look at the other
> options, which don't require a module API change.

Yes, we can look at the modules case at the end.

> In that case it would be kernel memory, which cannot be freed, so it will
> not go away and requires no module count.

kfree isn't the only way to make data unusable. Remember the
"my_state" example.

> > Likewise, possibly dynamically allocated data that is synchronized
> > by the caller, e.g. "user" in "struct proc_dir_entry".
>
> user?

"data", sorry. I always call this kind of argument something like
"user" in my code ...

> A generic file system as it's registered via register_filesystem.

Okay, I'll have a look at that.

> Um, it's getting late and I just played too much with procfs to find a
> sensible solution. Below is an experimental patch to add callback which
> would allow it to safely remove user data. Very lightly tested, so no
> guarantees.

Yep, that's the kind of callback I had in mind.

- Werner

--
_________________________________________________________________________
/ Werner Almesberger, Buenos Aires, Argentina [email protected] /
/_http://www.almesberger.net/____________________________________________/

2003-02-19 03:26:17

by Rusty Russell

[permalink] [raw]
Subject: Re: [RFC] Is an alternative module interface needed/possible?

In message <[email protected]> you write:
> Next round: possible remedies and their side-effects. As
> usual, if you disagree with something, please holler.
>
> If yes, let's look at possible (and not overly insane) solutions,
> using remove_proc_entry as a case study:
>
> 1) still don't kfree, and leave it to the user to somehow
> minimize the damage. (Good luck :-)
>
> 2) add a callback that is invoked when the proc entry gets
> deleted. (This callback may be called before remove_proc_entry
> completes.) Problem: unload/return race for modules.

OK. For reference, the "state of 2.4" solution (which is also the
"state of 2.5" solution) looks like:

> struct proc_dir_entry *de = create_proc_entry(...);
> void *my_data;
>
> de->data = my_data = kmalloc(...);
=====> de->owner = THIS_MODULE;
> ...
> remove_proc_entry(...);
> /* what happens with "my_data", formerly known as "de->data" ? */

And have proc_file_operations do the standard owner get and release:

open: proc_open,
release: proc_release,

static int proc_open(struct inode *inode, struct file *filp)
{
struct proc_dir_entry *dp = PDE(inode);
if (!try_module_get(dp->owner))
return -ENOENT;
return 0;
}

static int proc_release(struct inode *inode, struct file *filp)
{
struct proc_dir_entry *dp = PDE(inode);
module_put(dp->owner);
return 0;
}

Now, if remove_proc_entry() is called from module_exit(), the kfree()
works fine, since (1) we wouldn't be in module_exit() if the proc
entry was in used, and (2) the try_module_get() prevents any new
users.

Of course, if you wanted to remove the entry at any other time
(eg. hotplug), this doesn't help you one damn bit (which is kind of
your point).

> 3) change remove_proc_entry or add remove_proc_entry_wait that
> works like remove_proc_entry, but blocks until the entry is
> deleted. Problem: may sleep "forever".

This is what network devices do, and what the sockopt registration
code does, too, so this is already in the kernel, too. It's not
great, but it becomes a noop for the module deregistration stuff.

Thanks!
Rusty.
--
Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

2003-02-19 04:02:00

by Werner Almesberger

[permalink] [raw]
Subject: Re: [RFC] Is an alternative module interface needed/possible?

Rusty Russell wrote:
> Of course, if you wanted to remove the entry at any other time
> (eg. hotplug), this doesn't help you one damn bit (which is kind of
> your point).

Yep, try_module_get solves the general synchronization problem for
the special but interesting case of modules, but not for the general
case.

> This is what network devices do, and what the sockopt registration
> code does, too, so this is already in the kernel, too. It's not
> great, but it becomes a noop for the module deregistration stuff.

Yes, I think just sleeping isn't so bad at all. First of all,
we already have the module use count as a kind of "don't unload
now" advice (not sure if you plan to phase out MOD_INC_USE_COUNT ?),
and second, it's not exactly without precedent anyway. E.g. umount
will have little qualms of letting you sleep "forever". (And,
naturally, every once in a while, people hate it for this :-)

Anyway, I'll write more about this tomorrow. For tonight, I
have my advanced insanity 101 to finish, topic "ptracing
more than one UML/TT at the same time".

- Werner

--
_________________________________________________________________________
/ Werner Almesberger, Buenos Aires, Argentina [email protected] /
/_http://www.almesberger.net/____________________________________________/

2003-02-20 00:26:40

by Rusty Russell

[permalink] [raw]
Subject: Re: [RFC] Is an alternative module interface needed/possible?

In message <[email protected]> you write:
> Rusty Russell wrote:
> > Of course, if you wanted to remove the entry at any other time
> > (eg. hotplug), this doesn't help you one damn bit (which is kind of
> > your point).
>
> Yep, try_module_get solves the general synchronization problem for
> the special but interesting case of modules, but not for the general
> case.

Absolutely. And I admire your (and Roman's) bravery for trying the
general case.

> will have little qualms of letting you sleep "forever". (And,
> naturally, every once in a while, people hate it for this :-)

(Well, MOD_INC_USE_COUNT is already deprecated, mainly because it's
used to try to control module counts from within, which preempt makes
really hard).

Yes, the addition of "umount -l" is congruent to "rmmod --wait". The
assumption is "I don't want any new users, and I'll handle any current
ones". You can get yourself in trouble with both of them.

Keep up the good work,
Rusty.
--
Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

2003-02-20 00:31:12

by Roman Zippel

[permalink] [raw]
Subject: Re: [RFC] Is an alternative module interface needed/possible?

Hi,

On Tue, 18 Feb 2003, Werner Almesberger wrote:

> Next round: possible remedies and their side-effects. As
> usual, if you disagree with something, please holler.
>
> If yes, let's look at possible (and not overly insane) solutions,
> using remove_proc_entry as a case study:
>
> 1) still don't kfree, and leave it to the user to somehow
> minimize the damage. (Good luck :-)

Obviously not a good idea.

> 3) change remove_proc_entry or add remove_proc_entry_wait that
> works like remove_proc_entry, but blocks until the entry is
> deleted. Problem: may sleep "forever".

Sleeping is not a good general solution, as you have to be very careful
not to hold any important locks, otherwise it's easy to abuse.

> 6) export the lookup mechanism, and let the caller poll for
> removal. Problem: races with creation of a new entry with
> the same name.

I don't really understand this one.

> 7) transfer ownership of de->data to procfs, and set some
> (possibly configurable) destruction policy (e.g. always
> kfree, or such). Similar to 2), but less flexible.

de->data might contain references to other data and de->data might not the
only dynamic data, just the most visible one, the read/write functions can
access other dynamic data as well.

> 2) add a callback that is invoked when the proc entry gets
> deleted. (This callback may be called before remove_proc_entry
> completes.) Problem: unload/return race for modules.
>
> 4) make remove_proc_entry return an indication of whether the
> entry was successfully removed or not. Problem: if it
> wasn't, what can we do then ?
>
> 5) like above, but don't remove the entry if we can't do it
> immediately. Problem: there's no notification for when we
> should try again, so we'd have to poll.
>
> Any more ?

If you want to make 4) and 5) separate options, you could the same with
2), you can unlink the entry during remove_proc_entry or after the last
user is gone (before the callback is called).
It would not be difficult to separate an unlink_proc_entry from
remove_proc_entry, so we can force an unlink if needed and we can reduce
this again to two basic options in two variations. :)

The question is now whether we return an error value or use a callback.
When a device is removed, we usually also want to remove all its data
structures, on the other hand we can only remove a module when there are
no users, so here we return an error value.
Now I need a bigger example to put this into a context, a nice example is
scsi_unregister. It removes among other things procfs entries and these
entries have a reference to struct Scsi_Host. As long as scsi_unregister
is called from module_exit everything works fine, but a bit searching
reveals drivers/usb/storage/usb.c, which create/removes a scsi host when
you plug/unplug a storage device (let's ignore other problems here, like
it's still mounted). In this case nothing protects the removal of the proc
entries anymore, this means if someone accesses /proc/scsi/usb-storage/...
while the device is unplugged he will access freed data. Here would be now
the patch from yesterday useful to implement a callback, but now we also
can't simply free the Scsi_Host structure, so we have to add a reference
count to it and only if all references to it are gone, it can be removed
as well.

At this point I can also respond to Rusty's main argument against giving
more control to the modules, that they suddenly had to deal with "I
started deregistering my stuff, but then...". Unless interfaces are
only used during module_init/module_exit, they have to deal with this
anyway. I can understand that the module interface is tempting - just wait
until all user are gone and then disable the module. The only problem is
that this breaks horribly, when the driver interface is used in a
different context (as demonstrated with the usb/scsi example). So what
speaks against forcing driver/interface writers to get it right from the
beginning?
(I at least could think of a few more reason that speak for this, but I'll
continue this later.)

bye, Roman

2003-02-20 02:07:19

by Werner Almesberger

[permalink] [raw]
Subject: Re: [RFC] Is an alternative module interface needed/possible?

Roman Zippel wrote:
> Sleeping is not a good general solution, as you have to be very careful
> not to hold any important locks, otherwise it's easy to abuse.

Sleeping has many problems, but it also has one great advantage:
it's relatively easy to understand and doesn't need much extra
code (if any) in the user.

Also, I wouldn't expect lock interdependencies to be a major
problem at the level of something removing all traces of itself
(e.g. a driver module unregistering itself). Or would you happen
to know some example where such problems are happening ?

> > 6) export the lookup mechanism, and let the caller poll for
> > removal. Problem: races with creation of a new entry with
> > the same name.
>
> I don't really understand this one.

Something like:

foo_schedule_destruction(&whatever);
while (foo_lookup(whatever.name))
yield();

Usually too ugly to consider for anything but desperate cases.

> It would not be difficult to separate an unlink_proc_entry from
> remove_proc_entry, so we can force an unlink if needed and we can reduce
> this again to two basic options in two variations. :)

Yes, separating unlink (quick, non-blocking, always succeeds) and
destruction (slow, may run nethack, fragile) clearly has some appeal.

> The question is now whether we return an error value or use a callback.

There's also the difference that the version using a callback would
schedule the removal, while the version returning an error would do
nothing (unless the caller tries again).

You could also do both, but that gets pretty difficult to use.

> Now I need a bigger example to put this into a context, a nice example is
> scsi_unregister.

Nicely illustrates the problem of the "can look around one corner,
but not two" property of things like try_module_get, yes.

I'd also expect many cases of multiple devices which are serviced by
the same driver to have the same sort of problems. E.g. I'm pretty
sure I committed a few such sins in the ATM code, too.

> So what speaks against forcing driver/interface writers to get it
> right from the beginning?

Raising the general awareness of such problems is exactly what I'm
trying to accomplish here :-) By the way, also the argument "it may
be broken but it has never failed so far" is getting weaker with
the emergence of technologies that change relative code path
lengths, like hyperthreading.

- Werner

--
_________________________________________________________________________
/ Werner Almesberger, Buenos Aires, Argentina [email protected] /
/_http://www.almesberger.net/____________________________________________/

2003-02-20 09:36:46

by Roman Zippel

[permalink] [raw]
Subject: Re: [RFC] Is an alternative module interface needed/possible?

Hi,

On Thu, 20 Feb 2003, Rusty Russell wrote:

> Yes, the addition of "umount -l" is congruent to "rmmod --wait". The
> assumption is "I don't want any new users, and I'll handle any current
> ones". You can get yourself in trouble with both of them.

With the small difference that "umount -l" won't deadlock.

bye, Roman

2003-02-20 12:13:55

by Adam J. Richter

[permalink] [raw]
Subject: Re: [RFC] Is an alternative module interface needed/possible?

On 2003-02-20, Roman Zippel responded to Werner Almesberger:
>The question is now whether we return an error value or use a callback.
>When a device is removed, we usually also want to remove all its data
>structures, on the other hand we can only remove a module when there are
>no users, so here we return an error value.
>Now I need a bigger example to put this into a context, a nice example is
>scsi_unregister. It removes among other things procfs entries and these
>entries have a reference to struct Scsi_Host. As long as scsi_unregister
>is called from module_exit everything works fine, but a bit searching
>reveals drivers/usb/storage/usb.c, which create/removes a scsi host when
>you plug/unplug a storage device (let's ignore other problems here, like
>it's still mounted).

The ability to remove a module is generally independent of
whether or not there is any hardware present at that moment for which
the module supplies a driver. Instead, the determining issue is
whether there are file descriptors open for that driver.

Of course, if the necessary hardware was never present at any
time when the device driver's module was loaded, then there never will
be any file descriptors open for the device driver.

Adam J. Richter __ ______________ 575 Oroville Road
[email protected] \ / Milpitas, California 95035
+1 408 309-6081 | g g d r a s i l United States of America
"Free Software For The Rest Of Us."

2003-02-20 12:38:45

by Roman Zippel

[permalink] [raw]
Subject: Re: [RFC] Is an alternative module interface needed/possible?

Hi,

On Thu, 20 Feb 2003, Adam J. Richter wrote:

> The ability to remove a module is generally independent of
> whether or not there is any hardware present at that moment for which
> the module supplies a driver. Instead, the determining issue is
> whether there are file descriptors open for that driver.

I don't understand, what you're trying to say.
File descriptors are not the only way to access a driver and the ability
to remove a module is only dependent on the number of references to this
module.

bye, Roman

2003-02-20 13:43:38

by Adam J. Richter

[permalink] [raw]
Subject: Re: [RFC] Is an alternative module interface needed/possible?

On Thu, 20 Feb 2003, Roman Zippel wrote:
>On Thu, 20 Feb 2003, Adam J. Richter wrote:

>> The ability to remove a module is generally independent of
>> whether or not there is any hardware present at that moment for which
>> the module supplies a driver. Instead, the determining issue is
>> whether there are file descriptors open for that driver.

>I don't understand, what you're trying to say.
>File descriptors are not the only way to access a driver and the ability
>to remove a module is only dependent on the number of references to this
>module.

You're right. My second sentence was an oversimplification.
I should have said "software references" rather than file descriptors
to include things like "ifconfig eth0 up" creating a reference,
mounting a block device creating a refernece, etc. (Perhaps I
should have stated only my first sentence and stopped there.)

Anyhow, my point is that removing a piece of hardware
does not require that the corresponding module be unloaded
immediately.

Adam J. Richter __ ______________ 575 Oroville Road
[email protected] \ / Milpitas, California 95035
+1 408 309-6081 | g g d r a s i l United States of America
"Free Software For The Rest Of Us."

2003-02-20 13:56:55

by Werner Almesberger

[permalink] [raw]
Subject: Re: [RFC] Is an alternative module interface needed/possible?

Adam J. Richter wrote:
> Anyhow, my point is that removing a piece of hardware
> does not require that the corresponding module be unloaded
> immediately.

Nor does the use of a module require the presence of any
non-generic hardware in the first place (e.g. net/sched/,
netfilter, PPP, etc.) :-)

- Werner

--
_________________________________________________________________________
/ Werner Almesberger, Buenos Aires, Argentina [email protected] /
/_http://www.almesberger.net/____________________________________________/

2003-02-20 15:28:27

by Roman Zippel

[permalink] [raw]
Subject: Re: [RFC] Is an alternative module interface needed/possible?

Hi,

On Thu, 20 Feb 2003, Adam J. Richter wrote:

> Anyhow, my point is that removing a piece of hardware
> does not require that the corresponding module be unloaded
> immediately.

That's true, but when a piece of hardware is removed, you usually also
want to get rid of some data structures, but if the relevant functions are
not prepared to be called outside of module_exit, you have a huge problem.

bye, Roman

2003-02-23 15:52:56

by Roman Zippel

[permalink] [raw]
Subject: Re: [RFC] Is an alternative module interface needed/possible?

Hi,

On Wed, 19 Feb 2003, Werner Almesberger wrote:

> Roman Zippel wrote:
> > Sleeping is not a good general solution, as you have to be very careful
> > not to hold any important locks, otherwise it's easy to abuse.
>
> Sleeping has many problems, but it also has one great advantage:
> it's relatively easy to understand and doesn't need much extra
> code (if any) in the user.

Uninterruptable sleep should generally be avoided at kernel level, either
it should be possible to interrupt it or it should at least include a
timeout, but then you have to deal with failure again.
OTOH if you have callbacks, you can easily implement sleeping yourself.

> Also, I wouldn't expect lock interdependencies to be a major
> problem at the level of something removing all traces of itself
> (e.g. a driver module unregistering itself). Or would you happen
> to know some example where such problems are happening ?

Imagine you want to remove a device structure, if you manage to block
this somehow, you might completely block the creation and removeal of
other devices.

> > Now I need a bigger example to put this into a context, a nice example is
> > scsi_unregister.
>
> Nicely illustrates the problem of the "can look around one corner,
> but not two" property of things like try_module_get, yes.

Thanks, it seems I didn't cause too much confusion yet, so I can go on
with the next part. I think we can agree that the module locking only
protects a special case and as hotpluggable devices become more and more
important, we should rather get it right in the general case.

Anyway, this alone would be not reason enough to change the module
interface, but another module interface would give us more flexibility and
reduce the locking complexity. To make it easy for Rusty I take an example
he is familiar with - netfilter.
Let's assume we had an interface which allows exit to fail, how would that
help? First, struct nf_hook_ops gets a reference count (struct module does
the job), now with a small change to nf_unregister_hook:

void nf_unregister_hook(struct nf_hook_ops *reg)
{
br_write_lock_bh(BR_NETPROTO_LOCK);
- list_del(&reg->list);
+ list_del_init(&reg->list);
br_write_unlock_bh(BR_NETPROTO_LOCK);
}

we can do the following from module_exit:

int foo_exit()
{
nf_unregister_hook(&foo_ops);
return module_refcount(THIS_MODULE) ? -EBUSY : 0;
}

to get a reference to the module this would be enough:

static inline void __module_get(struct module *module)
{
local_inc(&module->ref[smp_processor_id()].count);
}

releasing the reference is as simple:

static inline void __module_put(struct module *module)
{
local_dec(&module->ref[smp_processor_id()].count);
}

This probably needs a bit of explanation:
1. more flexibility: The driver has better the knowledge about how the
module can be stopped and especially only the driver knows which events
should prevent the shutdown of the module.
There is an important difference between stopping the module and actually
removing the module. Any reference to the module must prevent the module
removal, but not everything needs to stop the module from shutting down.
For example an incoming network packet doesn't need to prevent the
shutdown, so we can just remove the hook above independent of the
reference count and just wait for the reference to become zero.
On the other hand this means the module could be in a disabled state, but
this can rather be compared with lazy unmount (via "umount -l") - after
the last user is gone the module can be removed without problems, but the
module could also be reenabled.

2. reduced locking complexity: Above demonstrates how it could look like,
if we can get rid of the live state (at least outside of the generic
module code). I explained the theory behind this before and posted the
link already a few times:
http://marc.theaimsgroup.com/?l=linux-kernel&m=104284223130775&w=2
This live state isn't needed most of the time and if some module should
really need this, it can easily implement that itself.

To help understanding this it maybe helps to clear up a common
misconception: modules are no special objects and they are not different
to other objects which can be removed while the kernel is running. Even
Rusty confuses this issue (http://lwn.net/Articles/15404/). The question
should rather be: how can I safely access an object which I don't own?
There are exactly two rules:
- The user has to lock out the owner.
- When the user wants to access the object outside the lock, he has to
tell that the owner, usually done via a reference count.
One of these conditions must be true (no exception), otherwise we have a
problem.
Applied to netfilter this means as long as we hold the netproto brlock we
can safely access any netfilter hook and only if we need a reference to
the hook outside of this lock, we had to call try_module_get(). This also
means that we don't have to try to get a reference for every packet,
instead e.g. it would be enough to do this once per connection.
(Why now an atomic counter wasn't sufficient for this and why Rusty had to
mess up the module code for this, will probably stay one of the big
mysteries...)

bye, Roman

2003-02-23 23:24:23

by Kevin O'Connor

[permalink] [raw]
Subject: Re: [RFC] Is an alternative module interface needed/possible?

On Tue, Feb 18, 2003 at 02:22:57PM -0300, Werner Almesberger wrote:
> Next round: possible remedies and their side-effects. As
> usual, if you disagree with something, please holler.
>
> If yes, let's look at possible (and not overly insane) solutions,
> using remove_proc_entry as a case study:

Hi Werner,

Thanks for putting together this list; I've been following this thread for
a while and it seems that the discussion always deteriorates into
semantics. Anyway, I think there is an eighth possible solution to the
list you proposed.

Just to be specific, the requirements for the proc entry stuff are:

a) a mechanism needs to be defined to indicate that the entry is no longer
needed (something that starts the delete process -- ie, remove_proc_entry),

b) the code must conform to a system that ensures de->data will not be used
after the "release" code is executed, and

c) the "release" code must be eventually executed.

Assuming these requirements are really requirements (your options 1 and 4
don't seem to meet these) then an "eighth" solution is:

8) Have the unregister code (remove_proc_entry) set an external flag (eg,
de->data_is_there), and update all users of de->data to check the flag
before following the pointer.

Option 8 may not qualify as "sane", but I think it is important to add it
because it is what the module code is currently using. Thus, one need not
look at the module stuff as a "special case", but as a general (if
complicated) resource management solution.

Finally, one could probably apply any of the "options" for any dynamically
allocated memory. It is interesting that Linus seems to prefer option 2/7
(from the recent kobject work and CodingStyle).

If I've missed something, please let me know.
-Kevin

> 1) still don't kfree, and leave it to the user to somehow
> minimize the damage. (Good luck :-)
>
> 2) add a callback that is invoked when the proc entry gets
> deleted. (This callback may be called before remove_proc_entry
> completes.) Problem: unload/return race for modules.
>
> 3) change remove_proc_entry or add remove_proc_entry_wait that
> works like remove_proc_entry, but blocks until the entry is
> deleted. Problem: may sleep "forever".
>
> 4) make remove_proc_entry return an indication of whether the
> entry was successfully removed or not. Problem: if it
> wasn't, what can we do then ?
>
> 5) like above, but don't remove the entry if we can't do it
> immediately. Problem: there's no notification for when we
> should try again, so we'd have to poll.
>
> 6) export the lookup mechanism, and let the caller poll for
> removal. Problem: races with creation of a new entry with
> the same name.
>
> 7) transfer ownership of de->data to procfs, and set some
> (possibly configurable) destruction policy (e.g. always
> kfree, or such). Similar to 2), but less flexible.

--
------------------------------------------------------------------------
| Kevin O'Connor "BTW, IMHO we need a FAQ for |
| [email protected] 'IMHO', 'FAQ', 'BTW', etc. !" |
------------------------------------------------------------------------

2003-02-24 12:04:38

by Roman Zippel

[permalink] [raw]
Subject: Re: [RFC] Is an alternative module interface needed/possible?

Hi,

On Sun, 23 Feb 2003, Kevin O'Connor wrote:

> 8) Have the unregister code (remove_proc_entry) set an external flag (eg,
> de->data_is_there), and update all users of de->data to check the flag
> before following the pointer.
>
> Option 8 may not qualify as "sane", but I think it is important to add it
> because it is what the module code is currently using. Thus, one need not
> look at the module stuff as a "special case", but as a general (if
> complicated) resource management solution.

Yes, it's another possible solution, but it has the same problem as the
current module locking - increased locking complexity.
Such flag actually exists already ("deleted"), but no user can use it
currently, because the read/write functions don't have the proc entry
argument. Even if they could use it, switching this flag isn't enough
remove_proc_entry also had to synchronize with active users, so users had
to take some lock just to read the data, where a simple reference was
sufficient before.

bye, Roman

2003-02-26 23:16:51

by Werner Almesberger

[permalink] [raw]
Subject: Re: [RFC] Is an alternative module interface needed/possible?

Roman Zippel wrote:
> Anyway, this alone would be not reason enough to change the module
> interface, but another module interface would give us more flexibility and
> reduce the locking complexity.

Wait, wait ! :-) There's one step you've left out: what we actually
expect the module interface to do. We have:

- what it currently does, or what it did in the past
- what users think it does
- what users want it to do
- what we think the users should want
- what we think is a comfortable compromise

With "users", I mean primarily the guy who invokes "rmmod", or such.

Anyway, I'm afraid I can't offer much wisdom from experience for this
part, for I'm not much of a module user myself. I'll have more to say
on service interfaces, though.

Sorry for slowing down, but I'm currently quite busy absorbing all
the cool stuff that's recently been happening with UML. (So, blame
Jeff ;-))

- Werner

--
_________________________________________________________________________
/ Werner Almesberger, Buenos Aires, Argentina [email protected] /
/_http://www.almesberger.net/____________________________________________/

2003-02-27 12:24:37

by Roman Zippel

[permalink] [raw]
Subject: Re: [RFC] Is an alternative module interface needed/possible?

Hi,

On Wed, 26 Feb 2003, Werner Almesberger wrote:

> Roman Zippel wrote:
> > Anyway, this alone would be not reason enough to change the module
> > interface, but another module interface would give us more flexibility and
> > reduce the locking complexity.
>
> Wait, wait ! :-) There's one step you've left out: what we actually
> expect the module interface to do. We have:

There are several module interfaces:
- module user interface
- module load interface
- module driver interface

These are quite independent and so far we were talking about the last one,
so I'm a bit confused about your request to talk about the first.
<rant>
BTW Why Rusty had to completely break the first two interfaces to
"improve" the last one, is probably another mystery, I'll never
understand.
</rant>

bye, Roman

2003-02-27 13:10:35

by Werner Almesberger

[permalink] [raw]
Subject: Re: [RFC] Is an alternative module interface needed/possible?

Roman Zippel wrote:
> There are several module interfaces:
> - module user interface
> - module load interface
> - module driver interface

Hmm, the "load interface" would be the system calls, while
the "driver interface" would be initialization and cleanup
functions in the module ?

We've already established that most things called from the
latter isn't actually module specific.

> These are quite independent and so far we were talking about the last one,
> so I'm a bit confused about your request to talk about the first.

I'm not so sure about them being independent. E.g. if we
just had a blocking single-phase cleanup, users would always
see success, but resources may be tied up indefinitely, which
would break uses like

rmmod foo
insmod foo.o

On the other hand, with a non-blocking two-phase cleanup, users
would still always see success, but only "anonymous" resources
(memory, etc.) would be tied up.

Last but not least, a non-blocking single-phase cleanup would
expose all failures to the user, and require some retry strategy.

Furthermore, use counts can have several meanings:
- indicate when it's convenient for the module to be removed
- indicate when it's possible for the module to be removed
- indicate what consequences the user may experience if
trying to remove now (e.g. blocking)

The "old" module count was a bit of a mixture of the first two.
The "new" count is clearer.

Oh, lest I be misunderstood: I'm not saying that we should
redesign everything, and screw compatibility. I just want to
bring the hidden assumptions that have piled up over the life
of the module system out in the open.

Then, of course, there are still plenty of opportunities to
plot any massive breakage ;-)

- Werner

--
_________________________________________________________________________
/ Werner Almesberger, Buenos Aires, Argentina [email protected] /
/_http://www.almesberger.net/____________________________________________/

2003-02-27 14:23:22

by Roman Zippel

[permalink] [raw]
Subject: Re: [RFC] Is an alternative module interface needed/possible?

Hi,

On Thu, 27 Feb 2003, Werner Almesberger wrote:

> > There are several module interfaces:
> > - module user interface
> > - module load interface
> > - module driver interface
>
> Hmm, the "load interface" would be the system calls, while
> the "driver interface" would be initialization and cleanup
> functions in the module ?
>
> We've already established that most things called from the
> latter isn't actually module specific.

But the module driver interface has an impact on the other driver
interfaces and that's what I tried to describe in the previous mail.

> > These are quite independent and so far we were talking about the last one,
> > so I'm a bit confused about your request to talk about the first.
>
> I'm not so sure about them being independent. E.g. if we
> just had a blocking single-phase cleanup, users would always
> see success, but resources may be tied up indefinitely, which
> would break uses like
>
> rmmod foo
> insmod foo.o

rmmod could already fail before, because the module is busy, so I'm not
sure, what should break here.

> On the other hand, with a non-blocking two-phase cleanup, users
> would still always see success, but only "anonymous" resources
> (memory, etc.) would be tied up.
>
> Last but not least, a non-blocking single-phase cleanup would
> expose all failures to the user, and require some retry strategy.

Now you're skipping ahead. You are already looking at the possible
consequences for the module user interface, before we actually made clear
in which ways the module driver interface could be changed.
There are of course dependencies between the interfaces, but you can
change a lot in one before you have to modify another.

> Furthermore, use counts can have several meanings:
> - indicate when it's convenient for the module to be removed
> - indicate when it's possible for the module to be removed
> - indicate what consequences the user may experience if
> trying to remove now (e.g. blocking)
>
> The "old" module count was a bit of a mixture of the first two.
> The "new" count is clearer.

There is/was no count for the first?!
Which "new" count?

bye, Roman