This is a new approach to the "share sysctl tables" RFC series I
posted earlier this month.
In previous patches I proposed deriving 'struct net*' from the parent
ctl_entry's ->extra1 field, but that has seen opposition due to mixing
in information from the dentry cache/fs layers.
In this version, the clt_table_header is extended to hold a cookie at
creation time and pass it to the handlers. By default every
ctl_table_header that is netns specific will store the 'struct net*'
in the cookie.
I could go on with the patch series and share other ctl_tables between
network namespace in the same manner, but I stopped here to not waste
time on a solution that you do not consider applying for reasons I
don't see now.
If you like this, I'll post a full patch series:
* change proc_handler to accept a cookie
* change all proc_handler functions in the kernel to accept a cookie
* apply sysctl table sharing to other tables. Candidates would be:
nf_conntrack_acct_init_sysctl, nf_conntrack_standalone_init_sysctl,
unix_sysctl_register, but there may be others I'm not seeing now.
This series is against Linus's 2.6.38-rc6 (plus a few other patches).
fs/proc/proc_sysctl.c | 11 +++++++-
include/linux/sysctl.h | 8 +++++-
include/net/ipv6.h | 6 +---
include/net/net_namespace.h | 26 ++++++++++++++++++
kernel/sysctl.c | 12 +++++---
net/core/sysctl_net_core.c | 28 ++-----------------
net/ipv4/ip_fragment.c | 34 ++++-------------------
net/ipv4/route.c | 36 +++++--------------------
net/ipv4/sysctl_net_ipv4.c | 53 ++++++-------------------------------
net/ipv6/icmp.c | 17 +----------
net/ipv6/reassembly.c | 34 ++++-------------------
net/ipv6/route.c | 54 ++++++++++---------------------------
net/ipv6/sysctl_net_ipv6.c | 61 +++++-------------------------------------
net/sysctl_net.c | 37 ++++++++++++++++++++++++--
14 files changed, 143 insertions(+), 274 deletions(-)
* [PATCH 1/9] sysctl: add ctl_header_cookie
* [PATCH 2/9] sysctl: use ctl_header_cookie in proc_handler
* [PATCH 3/9] sysctl: add netns_proc_dointvec and similar handlers
* [PATCH 4/9] sysctl: ipv4: ipfrag: share ip4_frags_ns_ctl_table between nets
* [PATCH 5/9] sysctl: net: share netns_core_table between nets
* [PATCH 6/9] sysctl: route: share ipv4_route_flush_table between nets
* [PATCH 7/9] sysctl: ipv4: share ipv4_net_table between nets
* [PATCH 8/9] sysctl: ipv6: share ip6_frags_ns_ctl_table between nets
* [PATCH 9/9] sysctl: ipv6: share ip6_ctl_table, ipv6_icmp_table and ipv6_route_table between nets
Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
include/net/net_namespace.h | 26 ++++++++++++++++++++++++++
net/sysctl_net.c | 31 +++++++++++++++++++++++++++++++
2 files changed, 57 insertions(+), 0 deletions(-)
diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
index 1bf812b..0b7d37d 100644
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -276,4 +276,30 @@ extern struct ctl_table_header *register_net_sysctl_rotable(
const struct ctl_path *path, struct ctl_table *table);
extern void unregister_net_sysctl_table(struct ctl_table_header *header);
+/* similar to the versions without 'netns', with these remarks:
+ * - these handlers receive as cookie a 'struct net*'
+ * - the data field of ctl_table* must be of the form
+ * &init_net.member1.member2..memberN
+ * - these handlers will call their equivalent handler with a
+ * ctl_table with data of the form: net->member1.member2..memberN
+ */
+extern int netns_proc_dostring(struct ctl_table *,
+ int, void __user *, size_t *, loff_t *, void *net);
+extern int netns_proc_dointvec(struct ctl_table *, int,
+ void __user *, size_t *, loff_t *, void *net);
+extern int netns_proc_dointvec_minmax(struct ctl_table *, int,
+ void __user *, size_t *, loff_t *, void *net);
+extern int netns_proc_dointvec_jiffies(struct ctl_table *, int,
+ void __user *, size_t *, loff_t *, void *net);
+extern int netns_proc_dointvec_userhz_jiffies(struct ctl_table *, int,
+ void __user *, size_t *, loff_t *, void *net);
+extern int netns_proc_dointvec_ms_jiffies(struct ctl_table *, int,
+ void __user *, size_t *, loff_t *, void *net);
+extern int netns_proc_doulongvec_minmax(struct ctl_table *, int,
+ void __user *, size_t *, loff_t *, void *net);
+extern int netns_proc_doulongvec_ms_jiffies_minmax(struct ctl_table *table, int,
+ void __user *, size_t *, loff_t *, void *net);
+extern int netns_proc_do_large_bitmap(struct ctl_table *, int,
+ void __user *, size_t *, loff_t *, void *net);
+
#endif /* __NET_NET_NAMESPACE_H */
diff --git a/net/sysctl_net.c b/net/sysctl_net.c
index 9dadd17..60b36ad 100644
--- a/net/sysctl_net.c
+++ b/net/sysctl_net.c
@@ -127,3 +127,34 @@ void unregister_net_sysctl_table(struct ctl_table_header *header)
unregister_sysctl_table(header);
}
EXPORT_SYMBOL_GPL(unregister_net_sysctl_table);
+
+
+
+static int netns_proc_wrapper(struct ctl_table *table, int write,
+ void __user *buffer, size_t *lenp,
+ loff_t *ppos, void *net, proc_handler proc_handler)
+{
+ struct ctl_table tmp = *table;
+ tmp.data += (char *)net - (char *)&init_net;
+ return ((proc_handler_cookie*) proc_handler)(&tmp, write, buffer, lenp, ppos, NULL);
+}
+
+
+#define NETNS_PROC_WRAP(handler_name) \
+ int netns_##handler_name(struct ctl_table *table, int write, \
+ void __user *buffer, size_t *lenp, \
+ loff_t *ppos, void *net) \
+ { \
+ return netns_proc_wrapper(table, write, buffer, lenp, \
+ ppos, net, handler_name); \
+ } \
+ EXPORT_SYMBOL_GPL(netns_##handler_name);
+
+NETNS_PROC_WRAP(proc_dointvec);
+NETNS_PROC_WRAP(proc_dointvec_minmax);
+NETNS_PROC_WRAP(proc_dointvec_jiffies);
+NETNS_PROC_WRAP(proc_dointvec_userhz_jiffies);
+NETNS_PROC_WRAP(proc_dointvec_ms_jiffies);
+NETNS_PROC_WRAP(proc_doulongvec_minmax)
+NETNS_PROC_WRAP(proc_doulongvec_ms_jiffies_minmax);
+NETNS_PROC_WRAP(proc_do_large_bitmap);
--
1.7.4.rc1.7.g2cf08.dirty
Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
include/linux/sysctl.h | 5 ++++-
kernel/sysctl.c | 12 ++++++++----
net/sysctl_net.c | 6 +++---
3 files changed, 15 insertions(+), 8 deletions(-)
diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h
index 7bb5cb6..43fed29 100644
--- a/include/linux/sysctl.h
+++ b/include/linux/sysctl.h
@@ -1048,6 +1048,9 @@ struct ctl_table_header
struct ctl_table *attached_by;
struct ctl_table *attached_to;
struct ctl_table_header *parent;
+ /* Pointer to data that outlives this ctl_table_header.
+ * Caller responsible to free the cookie. */
+ void *ctl_header_cookie;
};
/* struct ctl_path describes where in the hierarchy a table is added */
@@ -1058,7 +1061,7 @@ struct ctl_path {
void register_sysctl_root(struct ctl_table_root *root);
struct ctl_table_header *__register_sysctl_paths(
struct ctl_table_root *root, struct nsproxy *namespaces,
- const struct ctl_path *path, struct ctl_table *table);
+ const struct ctl_path *path, struct ctl_table *table, void *cookie);
struct ctl_table_header *register_sysctl_table(struct ctl_table * table);
struct ctl_table_header *register_sysctl_paths(const struct ctl_path *path,
struct ctl_table *table);
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 0f1bd83..31fd587 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -199,6 +199,7 @@ static struct ctl_table_header root_table_header = {
.ctl_entry = LIST_HEAD_INIT(sysctl_table_root.default_set.list),
.root = &sysctl_table_root,
.set = &sysctl_table_root.default_set,
+ .ctl_header_cookie = NULL,
};
static struct ctl_table_root sysctl_table_root = {
.root_list = LIST_HEAD_INIT(sysctl_table_root.root_list),
@@ -1774,6 +1775,9 @@ static void try_attach(struct ctl_table_header *p, struct ctl_table_header *q)
* @namespaces: Data to compute which lists of sysctl entries are visible
* @path: The path to the directory the sysctl table is in.
* @table: the top-level table structure
+ * @cookie: Pointer to user provided data that must be accessible
+ * until unregister_sysctl_table. This cookie will be passed to the
+ * proc_handler.
*
* Register a sysctl table hierarchy. @table should be a filled in ctl_table
* array. A completely 0 filled entry terminates the table.
@@ -1822,9 +1826,8 @@ static void try_attach(struct ctl_table_header *p, struct ctl_table_header *q)
* to the table header on success.
*/
struct ctl_table_header *__register_sysctl_paths(
- struct ctl_table_root *root,
- struct nsproxy *namespaces,
- const struct ctl_path *path, struct ctl_table *table)
+ struct ctl_table_root *root, struct nsproxy *namespaces,
+ const struct ctl_path *path, struct ctl_table *table, void *cookie)
{
struct ctl_table_header *header;
struct ctl_table *new, **prevp;
@@ -1871,6 +1874,7 @@ struct ctl_table_header *__register_sysctl_paths(
header->root = root;
sysctl_set_parent(NULL, header->ctl_table);
header->count = 1;
+ header->ctl_header_cookie = cookie;
#ifdef CONFIG_SYSCTL_SYSCALL_CHECK
if (sysctl_check_table(namespaces, header->ctl_table)) {
kfree(header);
@@ -1911,7 +1915,7 @@ struct ctl_table_header *register_sysctl_paths(const struct ctl_path *path,
struct ctl_table *table)
{
return __register_sysctl_paths(&sysctl_table_root, current->nsproxy,
- path, table);
+ path, table, NULL);
}
/**
diff --git a/net/sysctl_net.c b/net/sysctl_net.c
index ca84212..9dadd17 100644
--- a/net/sysctl_net.c
+++ b/net/sysctl_net.c
@@ -109,8 +109,8 @@ struct ctl_table_header *register_net_sysctl_table(struct net *net,
struct nsproxy namespaces;
namespaces = *current->nsproxy;
namespaces.net_ns = net;
- return __register_sysctl_paths(&net_sysctl_root,
- &namespaces, path, table);
+ return __register_sysctl_paths(&net_sysctl_root, &namespaces, path,
+ table, NULL);
}
EXPORT_SYMBOL_GPL(register_net_sysctl_table);
@@ -118,7 +118,7 @@ struct ctl_table_header *register_net_sysctl_rotable(const
struct ctl_path *path, struct ctl_table *table)
{
return __register_sysctl_paths(&net_sysctl_ro_root,
- &init_nsproxy, path, table);
+ &init_nsproxy, path, table, NULL);
}
EXPORT_SYMBOL_GPL(register_net_sysctl_rotable);
--
1.7.4.rc1.7.g2cf08.dirty
Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
net/core/sysctl_net_core.c | 28 +++-------------------------
1 files changed, 3 insertions(+), 25 deletions(-)
diff --git a/net/core/sysctl_net_core.c b/net/core/sysctl_net_core.c
index 385b609..e5a1544 100644
--- a/net/core/sysctl_net_core.c
+++ b/net/core/sysctl_net_core.c
@@ -182,7 +182,7 @@ static struct ctl_table netns_core_table[] = {
.data = &init_net.core.sysctl_somaxconn,
.maxlen = sizeof(int),
.mode = 0644,
- .proc_handler = proc_dointvec
+ .proc_handler = (proc_handler *) netns_proc_dointvec
},
{ }
};
@@ -195,41 +195,19 @@ __net_initdata struct ctl_path net_core_path[] = {
static __net_init int sysctl_core_net_init(struct net *net)
{
- struct ctl_table *tbl;
-
net->core.sysctl_somaxconn = SOMAXCONN;
- tbl = netns_core_table;
- if (!net_eq(net, &init_net)) {
- tbl = kmemdup(tbl, sizeof(netns_core_table), GFP_KERNEL);
- if (tbl == NULL)
- goto err_dup;
-
- tbl[0].data = &net->core.sysctl_somaxconn;
- }
-
net->core.sysctl_hdr = register_net_sysctl_table(net,
- net_core_path, tbl);
+ net_core_path, netns_core_table);
if (net->core.sysctl_hdr == NULL)
- goto err_reg;
+ return -ENOMEM;
return 0;
-
-err_reg:
- if (tbl != netns_core_table)
- kfree(tbl);
-err_dup:
- return -ENOMEM;
}
static __net_exit void sysctl_core_net_exit(struct net *net)
{
- struct ctl_table *tbl;
-
- tbl = net->core.sysctl_hdr->ctl_table_arg;
unregister_net_sysctl_table(net->core.sysctl_hdr);
- BUG_ON(tbl == netns_core_table);
- kfree(tbl);
}
static __net_initdata struct pernet_operations sysctl_core_ops = {
--
1.7.4.rc1.7.g2cf08.dirty
This patch includes another implementation of the patch from [1]. This
patch will not apply cleanly if that one has been applied.
[1] http://thread.gmane.org/gmane.linux.network/187273
Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
include/net/ipv6.h | 6 +---
net/ipv6/icmp.c | 17 +-----------
net/ipv6/route.c | 54 +++++++++++----------------------------
net/ipv6/sysctl_net_ipv6.c | 61 ++++++--------------------------------------
4 files changed, 27 insertions(+), 111 deletions(-)
diff --git a/include/net/ipv6.h b/include/net/ipv6.h
index 96e50e0..1526ed6 100644
--- a/include/net/ipv6.h
+++ b/include/net/ipv6.h
@@ -652,11 +652,9 @@ static inline int snmp6_unregister_dev(struct inet6_dev *idev) { return 0; }
#endif
#ifdef CONFIG_SYSCTL
-extern ctl_table ipv6_route_table_template[];
-extern ctl_table ipv6_icmp_table_template[];
+extern ctl_table ipv6_route_table[];
+extern ctl_table ipv6_icmp_table[];
-extern struct ctl_table *ipv6_icmp_sysctl_init(struct net *net);
-extern struct ctl_table *ipv6_route_sysctl_init(struct net *net);
extern int ipv6_sysctl_register(void);
extern void ipv6_sysctl_unregister(void);
extern int ipv6_static_sysctl_register(void);
diff --git a/net/ipv6/icmp.c b/net/ipv6/icmp.c
index 03e62f9..924cb36 100644
--- a/net/ipv6/icmp.c
+++ b/net/ipv6/icmp.c
@@ -938,29 +938,16 @@ int icmpv6_err_convert(u8 type, u8 code, int *err)
EXPORT_SYMBOL(icmpv6_err_convert);
#ifdef CONFIG_SYSCTL
-ctl_table ipv6_icmp_table_template[] = {
+ctl_table ipv6_icmp_table[] = {
{
.procname = "ratelimit",
.data = &init_net.ipv6.sysctl.icmpv6_time,
.maxlen = sizeof(int),
.mode = 0644,
- .proc_handler = proc_dointvec_ms_jiffies,
+ .proc_handler = (proc_handler *) netns_proc_dointvec_ms_jiffies,
},
{ },
};
-struct ctl_table * __net_init ipv6_icmp_sysctl_init(struct net *net)
-{
- struct ctl_table *table;
-
- table = kmemdup(ipv6_icmp_table_template,
- sizeof(ipv6_icmp_table_template),
- GFP_KERNEL);
-
- if (table)
- table[0].data = &net->ipv6.sysctl.icmpv6_time;
-
- return table;
-}
#endif
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index a998db6..29e05ca 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -2553,11 +2553,11 @@ static const struct file_operations rt6_stats_seq_fops = {
#ifdef CONFIG_SYSCTL
-static
-int ipv6_sysctl_rtcache_flush(ctl_table *ctl, int write,
- void __user *buffer, size_t *lenp, loff_t *ppos)
+static int netns_ipv6_sysctl_rtcache_flush(ctl_table *ctl, int write,
+ void __user *buffer, size_t *lenp,
+ loff_t *ppos, void *cookie)
{
- struct net *net = current->nsproxy->net_ns;
+ struct net *net = (struct net *) cookie;
int delay = net->ipv6.sysctl.flush_delay;
if (write) {
proc_dointvec(ctl, write, buffer, lenp, ppos);
@@ -2567,103 +2567,79 @@ int ipv6_sysctl_rtcache_flush(ctl_table *ctl, int write,
return -EINVAL;
}
-ctl_table ipv6_route_table_template[] = {
+ctl_table ipv6_route_table[] = {
{
.procname = "flush",
.data = &init_net.ipv6.sysctl.flush_delay,
.maxlen = sizeof(int),
.mode = 0200,
- .proc_handler = ipv6_sysctl_rtcache_flush
+ .proc_handler = (proc_handler *) netns_ipv6_sysctl_rtcache_flush
},
{
.procname = "gc_thresh",
.data = &ip6_dst_ops_template.gc_thresh,
.maxlen = sizeof(int),
.mode = 0644,
- .proc_handler = proc_dointvec,
+ .proc_handler = (proc_handler *) netns_proc_dointvec,
},
{
.procname = "max_size",
.data = &init_net.ipv6.sysctl.ip6_rt_max_size,
.maxlen = sizeof(int),
.mode = 0644,
- .proc_handler = proc_dointvec,
+ .proc_handler = (proc_handler *) netns_proc_dointvec,
},
{
.procname = "gc_min_interval",
.data = &init_net.ipv6.sysctl.ip6_rt_gc_min_interval,
.maxlen = sizeof(int),
.mode = 0644,
- .proc_handler = proc_dointvec_jiffies,
+ .proc_handler = (proc_handler *) netns_proc_dointvec_jiffies,
},
{
.procname = "gc_timeout",
.data = &init_net.ipv6.sysctl.ip6_rt_gc_timeout,
.maxlen = sizeof(int),
.mode = 0644,
- .proc_handler = proc_dointvec_jiffies,
+ .proc_handler = (proc_handler *) netns_proc_dointvec_jiffies,
},
{
.procname = "gc_interval",
.data = &init_net.ipv6.sysctl.ip6_rt_gc_interval,
.maxlen = sizeof(int),
.mode = 0644,
- .proc_handler = proc_dointvec_jiffies,
+ .proc_handler = (proc_handler *) netns_proc_dointvec_jiffies,
},
{
.procname = "gc_elasticity",
.data = &init_net.ipv6.sysctl.ip6_rt_gc_elasticity,
.maxlen = sizeof(int),
.mode = 0644,
- .proc_handler = proc_dointvec,
+ .proc_handler = (proc_handler *) netns_proc_dointvec,
},
{
.procname = "mtu_expires",
.data = &init_net.ipv6.sysctl.ip6_rt_mtu_expires,
.maxlen = sizeof(int),
.mode = 0644,
- .proc_handler = proc_dointvec_jiffies,
+ .proc_handler = (proc_handler *) netns_proc_dointvec_jiffies,
},
{
.procname = "min_adv_mss",
.data = &init_net.ipv6.sysctl.ip6_rt_min_advmss,
.maxlen = sizeof(int),
.mode = 0644,
- .proc_handler = proc_dointvec,
+ .proc_handler = (proc_handler *) netns_proc_dointvec,
},
{
.procname = "gc_min_interval_ms",
.data = &init_net.ipv6.sysctl.ip6_rt_gc_min_interval,
.maxlen = sizeof(int),
.mode = 0644,
- .proc_handler = proc_dointvec_ms_jiffies,
+ .proc_handler = (proc_handler *) netns_proc_dointvec_ms_jiffies,
},
{ }
};
-
-struct ctl_table * __net_init ipv6_route_sysctl_init(struct net *net)
-{
- struct ctl_table *table;
-
- table = kmemdup(ipv6_route_table_template,
- sizeof(ipv6_route_table_template),
- GFP_KERNEL);
-
- if (table) {
- table[0].data = &net->ipv6.sysctl.flush_delay;
- table[1].data = &net->ipv6.ip6_dst_ops.gc_thresh;
- table[2].data = &net->ipv6.sysctl.ip6_rt_max_size;
- table[3].data = &net->ipv6.sysctl.ip6_rt_gc_min_interval;
- table[4].data = &net->ipv6.sysctl.ip6_rt_gc_timeout;
- table[5].data = &net->ipv6.sysctl.ip6_rt_gc_interval;
- table[6].data = &net->ipv6.sysctl.ip6_rt_gc_elasticity;
- table[7].data = &net->ipv6.sysctl.ip6_rt_mtu_expires;
- table[8].data = &net->ipv6.sysctl.ip6_rt_min_advmss;
- table[9].data = &net->ipv6.sysctl.ip6_rt_gc_min_interval;
- }
-
- return table;
-}
#endif
static int __net_init ip6_route_net_init(struct net *net)
diff --git a/net/ipv6/sysctl_net_ipv6.c b/net/ipv6/sysctl_net_ipv6.c
index 7cb65ef..cd15483 100644
--- a/net/ipv6/sysctl_net_ipv6.c
+++ b/net/ipv6/sysctl_net_ipv6.c
@@ -17,25 +17,25 @@
static struct ctl_table empty[1];
-static ctl_table ipv6_table_template[] = {
+static ctl_table ipv6_table[] = {
{
.procname = "route",
.maxlen = 0,
.mode = 0555,
- .child = ipv6_route_table_template
+ .child = ipv6_route_table
},
{
.procname = "icmp",
.maxlen = 0,
.mode = 0555,
- .child = ipv6_icmp_table_template
+ .child = ipv6_icmp_table
},
{
.procname = "bindv6only",
.data = &init_net.ipv6.sysctl.bindv6only,
.maxlen = sizeof(int),
.mode = 0644,
- .proc_handler = proc_dointvec
+ .proc_handler = (proc_handler *) netns_proc_dointvec
},
{
.procname = "neigh",
@@ -66,62 +66,17 @@ EXPORT_SYMBOL_GPL(net_ipv6_ctl_path);
static int __net_init ipv6_sysctl_net_init(struct net *net)
{
- struct ctl_table *ipv6_table;
- struct ctl_table *ipv6_route_table;
- struct ctl_table *ipv6_icmp_table;
- int err;
-
- err = -ENOMEM;
- ipv6_table = kmemdup(ipv6_table_template, sizeof(ipv6_table_template),
- GFP_KERNEL);
- if (!ipv6_table)
- goto out;
-
- ipv6_route_table = ipv6_route_sysctl_init(net);
- if (!ipv6_route_table)
- goto out_ipv6_table;
- ipv6_table[0].child = ipv6_route_table;
-
- ipv6_icmp_table = ipv6_icmp_sysctl_init(net);
- if (!ipv6_icmp_table)
- goto out_ipv6_route_table;
- ipv6_table[1].child = ipv6_icmp_table;
-
- ipv6_table[2].data = &net->ipv6.sysctl.bindv6only;
-
- net->ipv6.sysctl.table = register_net_sysctl_table(net, net_ipv6_ctl_path,
- ipv6_table);
+ net->ipv6.sysctl.table = register_net_sysctl_table(net,
+ net_ipv6_ctl_path, ipv6_table);
if (!net->ipv6.sysctl.table)
- goto out_ipv6_icmp_table;
-
- err = 0;
-out:
- return err;
+ return -ENOMEM;
-out_ipv6_icmp_table:
- kfree(ipv6_icmp_table);
-out_ipv6_route_table:
- kfree(ipv6_route_table);
-out_ipv6_table:
- kfree(ipv6_table);
- goto out;
+ return 0;
}
static void __net_exit ipv6_sysctl_net_exit(struct net *net)
{
- struct ctl_table *ipv6_table;
- struct ctl_table *ipv6_route_table;
- struct ctl_table *ipv6_icmp_table;
-
- ipv6_table = net->ipv6.sysctl.table->ctl_table_arg;
- ipv6_route_table = ipv6_table[0].child;
- ipv6_icmp_table = ipv6_table[1].child;
-
unregister_net_sysctl_table(net->ipv6.sysctl.table);
-
- kfree(ipv6_table);
- kfree(ipv6_route_table);
- kfree(ipv6_icmp_table);
}
static struct pernet_operations ipv6_sysctl_net_ops = {
--
1.7.4.rc1.7.g2cf08.dirty
Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
net/ipv6/reassembly.c | 34 ++++++----------------------------
1 files changed, 6 insertions(+), 28 deletions(-)
diff --git a/net/ipv6/reassembly.c b/net/ipv6/reassembly.c
index 07beeb0..868cbd5 100644
--- a/net/ipv6/reassembly.c
+++ b/net/ipv6/reassembly.c
@@ -600,21 +600,21 @@ static struct ctl_table ip6_frags_ns_ctl_table[] = {
.data = &init_net.ipv6.frags.high_thresh,
.maxlen = sizeof(int),
.mode = 0644,
- .proc_handler = proc_dointvec
+ .proc_handler = (proc_handler *) netns_proc_dointvec,
},
{
.procname = "ip6frag_low_thresh",
.data = &init_net.ipv6.frags.low_thresh,
.maxlen = sizeof(int),
.mode = 0644,
- .proc_handler = proc_dointvec
+ .proc_handler = (proc_handler *) netns_proc_dointvec,
},
{
.procname = "ip6frag_time",
.data = &init_net.ipv6.frags.timeout,
.maxlen = sizeof(int),
.mode = 0644,
- .proc_handler = proc_dointvec_jiffies,
+ .proc_handler = (proc_handler *) netns_proc_dointvec_jiffies,
},
{ }
};
@@ -632,42 +632,20 @@ static struct ctl_table ip6_frags_ctl_table[] = {
static int __net_init ip6_frags_ns_sysctl_register(struct net *net)
{
- struct ctl_table *table;
struct ctl_table_header *hdr;
- table = ip6_frags_ns_ctl_table;
- if (!net_eq(net, &init_net)) {
- table = kmemdup(table, sizeof(ip6_frags_ns_ctl_table), GFP_KERNEL);
- if (table == NULL)
- goto err_alloc;
-
- table[0].data = &net->ipv6.frags.high_thresh;
- table[1].data = &net->ipv6.frags.low_thresh;
- table[2].data = &net->ipv6.frags.timeout;
- }
-
- hdr = register_net_sysctl_table(net, net_ipv6_ctl_path, table);
+ hdr = register_net_sysctl_table(net, net_ipv6_ctl_path,
+ ip6_frags_ns_ctl_table);
if (hdr == NULL)
- goto err_reg;
+ return -ENOMEM;
net->ipv6.sysctl.frags_hdr = hdr;
return 0;
-
-err_reg:
- if (!net_eq(net, &init_net))
- kfree(table);
-err_alloc:
- return -ENOMEM;
}
static void __net_exit ip6_frags_ns_sysctl_unregister(struct net *net)
{
- struct ctl_table *table;
-
- table = net->ipv6.sysctl.frags_hdr->ctl_table_arg;
unregister_net_sysctl_table(net->ipv6.sysctl.frags_hdr);
- if (!net_eq(net, &init_net))
- kfree(table);
}
static struct ctl_table_header *ip6_ctl_header;
--
1.7.4.rc1.7.g2cf08.dirty
Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
net/ipv4/route.c | 36 +++++++-----------------------------
1 files changed, 7 insertions(+), 29 deletions(-)
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 6ed6603..8fd0208 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -3038,19 +3038,18 @@ void ip_rt_multicast_event(struct in_device *in_dev)
#ifdef CONFIG_SYSCTL
static int ipv4_sysctl_rtcache_flush(ctl_table *__ctl, int write,
- void __user *buffer,
- size_t *lenp, loff_t *ppos)
+ void __user *buffer,
+ size_t *lenp, loff_t *ppos, void *cookie)
{
if (write) {
int flush_delay;
ctl_table ctl;
- struct net *net;
+ struct net *net = (struct net *) cookie;
memcpy(&ctl, __ctl, sizeof(ctl));
ctl.data = &flush_delay;
proc_dointvec(&ctl, write, buffer, lenp, ppos);
- net = (struct net *)__ctl->extra1;
rt_cache_flush(net, flush_delay);
return 0;
}
@@ -3191,7 +3190,7 @@ static struct ctl_table ipv4_route_flush_table[] = {
.procname = "flush",
.maxlen = sizeof(int),
.mode = 0200,
- .proc_handler = ipv4_sysctl_rtcache_flush,
+ .proc_handler = (proc_handler *) ipv4_sysctl_rtcache_flush,
},
{ },
};
@@ -3205,37 +3204,16 @@ static __net_initdata struct ctl_path ipv4_route_path[] = {
static __net_init int sysctl_route_net_init(struct net *net)
{
- struct ctl_table *tbl;
-
- tbl = ipv4_route_flush_table;
- if (!net_eq(net, &init_net)) {
- tbl = kmemdup(tbl, sizeof(ipv4_route_flush_table), GFP_KERNEL);
- if (tbl == NULL)
- goto err_dup;
- }
- tbl[0].extra1 = net;
-
- net->ipv4.route_hdr =
- register_net_sysctl_table(net, ipv4_route_path, tbl);
+ net->ipv4.route_hdr = register_net_sysctl_table(net,
+ ipv4_route_path, ipv4_route_flush_table);
if (net->ipv4.route_hdr == NULL)
- goto err_reg;
+ return -ENOMEM;
return 0;
-
-err_reg:
- if (tbl != ipv4_route_flush_table)
- kfree(tbl);
-err_dup:
- return -ENOMEM;
}
static __net_exit void sysctl_route_net_exit(struct net *net)
{
- struct ctl_table *tbl;
-
- tbl = net->ipv4.route_hdr->ctl_table_arg;
unregister_net_sysctl_table(net->ipv4.route_hdr);
- BUG_ON(tbl == ipv4_route_flush_table);
- kfree(tbl);
}
static __net_initdata struct pernet_operations sysctl_route_ops = {
--
1.7.4.rc1.7.g2cf08.dirty
Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
net/ipv4/sysctl_net_ipv4.c | 53 +++++++------------------------------------
1 files changed, 9 insertions(+), 44 deletions(-)
diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
index 1a45665..6fd3279 100644
--- a/net/ipv4/sysctl_net_ipv4.c
+++ b/net/ipv4/sysctl_net_ipv4.c
@@ -636,49 +636,49 @@ static struct ctl_table ipv4_net_table[] = {
.data = &init_net.ipv4.sysctl_icmp_echo_ignore_all,
.maxlen = sizeof(int),
.mode = 0644,
- .proc_handler = proc_dointvec
+ .proc_handler = (proc_handler *) netns_proc_dointvec
},
{
.procname = "icmp_echo_ignore_broadcasts",
.data = &init_net.ipv4.sysctl_icmp_echo_ignore_broadcasts,
.maxlen = sizeof(int),
.mode = 0644,
- .proc_handler = proc_dointvec
+ .proc_handler = (proc_handler *) netns_proc_dointvec
},
{
.procname = "icmp_ignore_bogus_error_responses",
.data = &init_net.ipv4.sysctl_icmp_ignore_bogus_error_responses,
.maxlen = sizeof(int),
.mode = 0644,
- .proc_handler = proc_dointvec
+ .proc_handler = (proc_handler *) netns_proc_dointvec
},
{
.procname = "icmp_errors_use_inbound_ifaddr",
.data = &init_net.ipv4.sysctl_icmp_errors_use_inbound_ifaddr,
.maxlen = sizeof(int),
.mode = 0644,
- .proc_handler = proc_dointvec
+ .proc_handler = (proc_handler *) netns_proc_dointvec
},
{
.procname = "icmp_ratelimit",
.data = &init_net.ipv4.sysctl_icmp_ratelimit,
.maxlen = sizeof(int),
.mode = 0644,
- .proc_handler = proc_dointvec_ms_jiffies,
+ .proc_handler = (proc_handler *) netns_proc_dointvec_ms_jiffies,
},
{
.procname = "icmp_ratemask",
.data = &init_net.ipv4.sysctl_icmp_ratemask,
.maxlen = sizeof(int),
.mode = 0644,
- .proc_handler = proc_dointvec
+ .proc_handler = (proc_handler *) netns_proc_dointvec
},
{
.procname = "rt_cache_rebuild_count",
.data = &init_net.ipv4.sysctl_rt_cache_rebuild_count,
.maxlen = sizeof(int),
.mode = 0644,
- .proc_handler = proc_dointvec
+ .proc_handler = (proc_handler *) netns_proc_dointvec
},
{ }
};
@@ -692,53 +692,18 @@ EXPORT_SYMBOL_GPL(net_ipv4_ctl_path);
static __net_init int ipv4_sysctl_init_net(struct net *net)
{
- struct ctl_table *table;
-
- table = ipv4_net_table;
- if (!net_eq(net, &init_net)) {
- table = kmemdup(table, sizeof(ipv4_net_table), GFP_KERNEL);
- if (table == NULL)
- goto err_alloc;
-
- table[0].data =
- &net->ipv4.sysctl_icmp_echo_ignore_all;
- table[1].data =
- &net->ipv4.sysctl_icmp_echo_ignore_broadcasts;
- table[2].data =
- &net->ipv4.sysctl_icmp_ignore_bogus_error_responses;
- table[3].data =
- &net->ipv4.sysctl_icmp_errors_use_inbound_ifaddr;
- table[4].data =
- &net->ipv4.sysctl_icmp_ratelimit;
- table[5].data =
- &net->ipv4.sysctl_icmp_ratemask;
- table[6].data =
- &net->ipv4.sysctl_rt_cache_rebuild_count;
- }
-
net->ipv4.sysctl_rt_cache_rebuild_count = 4;
net->ipv4.ipv4_hdr = register_net_sysctl_table(net,
- net_ipv4_ctl_path, table);
+ net_ipv4_ctl_path, ipv4_net_table);
if (net->ipv4.ipv4_hdr == NULL)
- goto err_reg;
-
+ return -ENOMEM;
return 0;
-
-err_reg:
- if (!net_eq(net, &init_net))
- kfree(table);
-err_alloc:
- return -ENOMEM;
}
static __net_exit void ipv4_sysctl_exit_net(struct net *net)
{
- struct ctl_table *table;
-
- table = net->ipv4.ipv4_hdr->ctl_table_arg;
unregister_net_sysctl_table(net->ipv4.ipv4_hdr);
- kfree(table);
}
static __net_initdata struct pernet_operations ipv4_sysctl_ops = {
--
1.7.4.rc1.7.g2cf08.dirty
The only reason we were creating a copy of this table was to set
->data to point to data from within the newly created net. The
netns_proc_* handlers do this dynamically.
Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
net/ipv4/ip_fragment.c | 34 ++++++----------------------------
net/sysctl_net.c | 2 +-
2 files changed, 7 insertions(+), 29 deletions(-)
diff --git a/net/ipv4/ip_fragment.c b/net/ipv4/ip_fragment.c
index a1151b8..ffca3cc 100644
--- a/net/ipv4/ip_fragment.c
+++ b/net/ipv4/ip_fragment.c
@@ -677,21 +677,21 @@ static struct ctl_table ip4_frags_ns_ctl_table[] = {
.data = &init_net.ipv4.frags.high_thresh,
.maxlen = sizeof(int),
.mode = 0644,
- .proc_handler = proc_dointvec
+ .proc_handler = (proc_handler *) netns_proc_dointvec
},
{
.procname = "ipfrag_low_thresh",
.data = &init_net.ipv4.frags.low_thresh,
.maxlen = sizeof(int),
.mode = 0644,
- .proc_handler = proc_dointvec
+ .proc_handler = (proc_handler *) netns_proc_dointvec
},
{
.procname = "ipfrag_time",
.data = &init_net.ipv4.frags.timeout,
.maxlen = sizeof(int),
.mode = 0644,
- .proc_handler = proc_dointvec_jiffies,
+ .proc_handler = (proc_handler *) netns_proc_dointvec_jiffies,
},
{ }
};
@@ -717,41 +717,19 @@ static struct ctl_table ip4_frags_ctl_table[] = {
static int __net_init ip4_frags_ns_ctl_register(struct net *net)
{
- struct ctl_table *table;
struct ctl_table_header *hdr;
-
- table = ip4_frags_ns_ctl_table;
- if (!net_eq(net, &init_net)) {
- table = kmemdup(table, sizeof(ip4_frags_ns_ctl_table), GFP_KERNEL);
- if (table == NULL)
- goto err_alloc;
-
- table[0].data = &net->ipv4.frags.high_thresh;
- table[1].data = &net->ipv4.frags.low_thresh;
- table[2].data = &net->ipv4.frags.timeout;
- }
-
- hdr = register_net_sysctl_table(net, net_ipv4_ctl_path, table);
+ hdr = register_net_sysctl_table(net, net_ipv4_ctl_path,
+ ip4_frags_ns_ctl_table);
if (hdr == NULL)
- goto err_reg;
+ return -ENOMEM;
net->ipv4.frags_hdr = hdr;
return 0;
-
-err_reg:
- if (!net_eq(net, &init_net))
- kfree(table);
-err_alloc:
- return -ENOMEM;
}
static void __net_exit ip4_frags_ns_ctl_unregister(struct net *net)
{
- struct ctl_table *table;
-
- table = net->ipv4.frags_hdr->ctl_table_arg;
unregister_net_sysctl_table(net->ipv4.frags_hdr);
- kfree(table);
}
static void ip4_frags_ctl_register(void)
diff --git a/net/sysctl_net.c b/net/sysctl_net.c
index 60b36ad..d80e9c4 100644
--- a/net/sysctl_net.c
+++ b/net/sysctl_net.c
@@ -110,7 +110,7 @@ struct ctl_table_header *register_net_sysctl_table(struct net *net,
namespaces = *current->nsproxy;
namespaces.net_ns = net;
return __register_sysctl_paths(&net_sysctl_root, &namespaces, path,
- table, NULL);
+ table, net);
}
EXPORT_SYMBOL_GPL(register_net_sysctl_table);
--
1.7.4.rc1.7.g2cf08.dirty
TODO: if this patch series gets a positive feedback this patch will be
extended with a kernel-wide change of each proc_handler to add a
'cookie' argument.
Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
fs/proc/proc_sysctl.c | 11 ++++++++++-
include/linux/sysctl.h | 3 +++
2 files changed, 13 insertions(+), 1 deletions(-)
diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
index 09a1f92..85b6b75 100644
--- a/fs/proc/proc_sysctl.c
+++ b/fs/proc/proc_sysctl.c
@@ -135,6 +135,7 @@ static ssize_t proc_sys_call_handler(struct file *filp, void __user *buf,
struct inode *inode = filp->f_path.dentry->d_inode;
struct ctl_table_header *head = grab_header(inode);
struct ctl_table *table = PROC_I(inode)->sysctl_entry;
+ proc_handler_cookie *phc = (proc_handler_cookie *) table->proc_handler;
ssize_t error;
size_t res;
@@ -156,7 +157,15 @@ static ssize_t proc_sys_call_handler(struct file *filp, void __user *buf,
/* careful: calling conventions are nasty here */
res = count;
- error = table->proc_handler(table, write, buf, &res, ppos);
+ /*XXX Most handlers only use the first 5 arguments (without
+ *XXX @cookie). Changing all handlers is too much of work,
+ *XXX as this is only a RFC patch at the moment.
+ *XXX
+ *XXX This is just a HACK for now, I did this this way to not
+ *XXX waste time changing all the handlers, in the final version
+ *XXX I'll change all the handlers if there's not other solution.
+ */
+ error = phc(table, write, buf, &res, ppos, head->ctl_header_cookie);
if (!error)
error = res;
out:
diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h
index 43fed29..3d21832 100644
--- a/include/linux/sysctl.h
+++ b/include/linux/sysctl.h
@@ -963,6 +963,9 @@ typedef struct ctl_table ctl_table;
typedef int proc_handler (struct ctl_table *ctl, int write,
void __user *buffer, size_t *lenp, loff_t *ppos);
+typedef int proc_handler_cookie(struct ctl_table *ctl, int write,
+ void __user *buffer, size_t *lenp,
+ loff_t *ppos, void *ctl_header_cookie);
extern int proc_dostring(struct ctl_table *, int,
void __user *, size_t *, loff_t *);
--
1.7.4.rc1.7.g2cf08.dirty
From: Lucian Adrian Grijincu <[email protected]>
Date: Fri, 25 Feb 2011 20:52:32 +0200
> This is a new approach to the "share sysctl tables" RFC series I
> posted earlier this month.
I do not disagree conceptually with these changes from a networking
perspective, but I am not a sysctl layer expert so I don't know if the
generic sysctl bits are a good idea or not.
David Miller <[email protected]> writes:
> From: Lucian Adrian Grijincu <[email protected]>
> Date: Fri, 25 Feb 2011 20:52:32 +0200
>
>> This is a new approach to the "share sysctl tables" RFC series I
>> posted earlier this month.
>
> I do not disagree conceptually with these changes from a networking
> perspective, but I am not a sysctl layer expert so I don't know if the
> generic sysctl bits are a good idea or not.
I may be missing something in these patches. I haven't had time to look
at this most recent batch carefully. But from a 10,000 foot perspective I
have a problem with them. With a handful of network devices the size of
the data structures is negligible.
Where problems show up is when you have a lot of sysctl entries for
devices and at that point we have much larger problems using the
sysctl data structures. Today add/remove are big O(previous entries)
and I think even readdir suffers from non-scalable data structures.
There are other related issues that the sysctl data structures are not
optimized for use in /proc, and that sysctl uses so usable but on off
locking like mechanisms.
Changing things to make the sysctl users more dependent on the current
implement details of the sysctl data structures seems the exact
opposite of the direction we need to go to make the sysctl data structures
scale.
So until I can see a reason why we should save a few bytes at the cost
of greater future maintenance costs I'm not in favor of this patch set.
Eric
On Thu, Mar 3, 2011 at 10:33 AM, Eric W. Biederman
<[email protected]> wrote:
> I may be missing something in these patches. I haven't had time to look
> at this most recent batch carefully. But from a 10,000 foot perspective I
> have a problem with them. With a handful of network devices the size of
> the data structures is negligible.
>
> So until I can see a reason why we should save a few bytes at the cost
> of greater future maintenance costs I'm not in favor of this patch set.
Sorry, I'm moving between countries and I don't have as much time as
I'd like to.
This patch series adds the "cookie" field and uses it in a few places.
I need this for the next step, but I wanted some feedback regarding
the cookie approach (sane? applicable if the 'dynamic header' feature
is accepted?).
Afterwards I want to add a "dynamic ctl_header" which will implement a
few ops (something on the lines of 'find_in_table' and 'scan' from
proc_sysctl.c).
At 'scan' time the "dynamic header" will create inodes for the
directories underneath with:
ctl_table='shared ctl table for /proc/sys/net/ipv4/conf' (or
ipv6/addrconf or neigh)
ctl_table_header=a device specific (not dynamic) table header with
->cookie pointing to a struct {char*dev_name; struct net*net;}
proc_handlers will use the name (or even a pointer to the device or
whatever speeds up the implementation) and the net to find out the
real ->data similar to the netns_proc_handlers from this patch series.
Adding an interface will not need to scan through the list of existing
ctl headers to see if any duplicates exist because there cannot be two
interfaces with the same name.
Promise to get back with patches for this implementation as soon as I can.
PS: sorry if my mumbling does not make much sense, hopefully code will
make things clear.
--
.
..: Lucian