Short description: sysctl is slow (bad algorithm); this patch series
makes it faster (without using more memory but with some limitations).
$ time modprobe dummy numdummies=N
Without this patch series :(
- ipv4 only
- N=1000 time= 0m 06s
- N=2000 time= 0m 30s
- N=4000 time= 2m 35s
- ipv4 and ipv6
- N=1000 time= 0m 24s
- N=2000 time= 2m 14s
- N=4000 time=10m 16s
- N=5000 time=16m 03s
With this patch series :)
- ipv4 only
- N=1000 time=0.33s
- N=2000 time=1.25s
- N=4000 time=5.31s
- ipv4 and ipv6
- N=1000 time=0.41s
- N=2000 time=1.62s
- N=4000 time=7.64s
- N=5000 time=12.35s
- N=8000 time=36.95s
Tests were done with and without ipv6 with the same .config on
2.6.39-rc5 with and without this patch series running:
Note about time stats: adding a netdevice registers
- two sysctl tables for ipv4:
/proc/sys/net/ipv4/conf/DEVICE
/proc/sys/net/ipv4/neigh/DEVICE
- two sysctl tables for ipv6:
/proc/sys/net/ipv6/conf/DEVICE
/proc/sys/net/ipv6/neigh/DEVICE
Some companies (e.g. IXIACOM which sponsored this work) have use cases
where they need to create 10^3->10^5 (virtual) network devices. They
can't use the current sysctl because of the time it takes to register
so many sysctl tables.
The first patches remove the .child field of ctl_table. This is a
requirement for the new algorithm. These patches are scattered all
over the tree :(
Some patches make changes in architectures I don't know how to
compile to or drivers for which I don't have devices (there the
patches were at least compiled :).
People interested in the core sysctl changes/networking should read:
[PATCH 60/69] sysctl: faster tree-based sysctl implementation
which introduces the new algorithm (commit message and comments have
more details), and the next few patches which add some further (simple
and effective) optimisations for networking (and not only).
The last patch tries to replace a rwsem with rcu+spinlock. I'm not
sure about it because I haven't worked with RCU before. If you find
some big ugly monstrosity there, please don't disregard the rest of
the series. :)
Cc: "Eric W . Biederman" <[email protected]>
Cc: Alexey Dobriyan <[email protected]>
Cc: Octavian Purdila <[email protected]>
Cc: "David S . Miller" <[email protected]>
Lucian Adrian Grijincu (69):
sysctl: remove .child from dev/parport/default
sysctl: parport: reorder .child assignments to simplify review
sysctl: remove .child from dev/parport/PORT/devices/DEVICE
sysctl: remove .child from dev/parport/PORT/
sysctl: remove .child from dev/parport/PORT/devices/
sysctl: remove .child from kernel/vsyscall64 (x86)
sysctl: remove .child from abi/vsyscall32 (x86)
sysctl: remove .child from crypto/fips_enabled
sysctl: remove .child from dev/cdrom/
sysctl: remove .child from dev/hpet/
sysctl: remove .child from dev/ipmi/
sysctl: remove .child from dev/rtc/
sysctl: remove .child from dev/mac_hid/
sysctl: remove .child from dev/raid/
sysctl: remove .child from xpc/
sysctl: remove .child from xpc/hb
sysctl: remove .child from kernel/sclp (s390)
sysctl: remove .child from dev/scsi
sysctl: remove .child from kernel/pty
sysctl: remove .child from coda/
sysctl: remove .child from fscache/
sysctl: remove .child from fs/nfs/ nlm_table table
sysctl: remove .child from fs/nfs/ nfs_cb_table
sysctl: remove .child from fs/ntfs-debug
sysctl: remove .child from fs/ocfs2/nm/
sysctl: remove .child from fs/quota/
sysctl: remove .child from fs/xfs/
sysctl: remove .child from kernel/ (ipc)
sysctl: remove .child from fs/mqueue
sysctl: sched: add sd_table_template
sysctl: remove .child from kernel/sched_domain/cpuX/domainY/
sysctl: remove .child from kernel/ (utsname)
sysctl: remove .child from sunrpc/
sysctl: remove .child from sunrpc/svc_rdma
sysctl: remove .child from sunrpc/ (xprtrdma)
sysctl: remove .child from sunrpc/ (xprtsock)
sysctl: remove .child from bus/isa/ (arm)
sysctl: remove .child from reboot/warm (arm)
sysctl: remove .child from lasat/ (mips)
sysctl: remove .child from appldata/ (s390)
sysctl: remove .child from s390dbf/
sysctl: remove .child from vm/ (s390)
sysctl: remove .child from kernel/perfmon/ (ia64)
sysctl: remove .child from kernel/ (ia64/kdump)
sysctl: remove .child from kernel/powersave-nap (powerpc)
sysctl: remove .child from pm/ (frv)
sysctl: remove .child from frv/
sysctl: remove .child from sh64/unaligned_fixup/
sysctl: delete unused register_sysctl_table function
sysctl: remove .child from ax25 table
sysctl: remove .child from net/ipv4/route and net/ipv4/neigh tables
sysctl: remove .child from net/ipv4/neigh table
sysctl: remove .child from net/ipv6/route, net/ipv6/icmp, net/ipv6 tables
sysctl: remove .child from net/llc tables
sysctl: no-child: manually register kernel/random
sysctl: no-child: manually register kernel/keys
sysctl: no-child: manually register fs/inotify
sysctl: no-child: manually register fs/epoll
sysctl: no-child: manually register root tables
sysctl: faster tree-based sysctl implementation
sysctl: single subheader path: optimisation for paths used only once
sysctl: single subheader path: net/ipv4/conf/DEVICE-NAME/
sysctl: single subheader path: net/{ipv4|ipv6}/neigh/DEV/
sysctl: single subheader path: net/ipv6/conf/DEVICE-NAME/
sysctl: single subheader path: dev/parport/PORT/devices/DEVICE/
sysctl: single subheader path: net/ax25/DEVICE
sysctl: single subheader path: kernel/sched_domain/CPU/DOMAIN/
sysctl: single subheader path: net/decnet/conf/DEVNAME
RFC: sysctl: convert read-write lock to RCU
arch/arm/kernel/isa.c | 31 +-
arch/arm/mach-bcmring/arch.c | 25 +-
arch/frv/kernel/pm.c | 10 +-
arch/frv/kernel/sysctl.c | 12 +-
arch/ia64/kernel/crash.c | 13 +-
arch/ia64/kernel/perfmon.c | 23 +-
arch/mips/lasat/sysctl.c | 13 +-
arch/powerpc/kernel/idle.c | 13 +-
arch/s390/appldata/appldata_base.c | 42 +-
arch/s390/kernel/debug.c | 13 +-
arch/s390/mm/cmm.c | 11 +-
arch/sh/kernel/traps_64.c | 21 +-
arch/x86/kernel/vsyscall_64.c | 25 +-
arch/x86/vdso/vdso32-setup.c | 14 +-
crypto/proc.c | 12 +-
drivers/cdrom/cdrom.c | 22 +-
drivers/char/hpet.c | 38 +--
drivers/char/ipmi/ipmi_poweroff.c | 16 +-
drivers/char/random.c | 27 ++-
drivers/char/rtc.c | 24 +-
drivers/macintosh/mac_hid.c | 26 +-
drivers/md/md.c | 22 +-
drivers/misc/sgi-xp/xpc_main.c | 81 ++--
drivers/parport/procfs.c | 231 +++++-------
drivers/s390/char/sclp_async.c | 13 +-
drivers/scsi/scsi_sysctl.c | 28 +-
drivers/tty/pty.c | 23 +-
fs/coda/sysctl.c | 12 +-
fs/eventpoll.c | 22 +-
fs/fscache/main.c | 15 +-
fs/lockd/svc.c | 22 +-
fs/nfs/sysctl.c | 22 +-
fs/notify/inotify/inotify_user.c | 22 +-
fs/ntfs/sysctl.c | 15 +-
fs/ocfs2/stackglue.c | 36 +--
fs/proc/inode.c | 2 +-
fs/proc/proc_sysctl.c | 201 ++++++----
fs/quota/dquot.c | 21 +-
fs/xfs/linux-2.6/xfs_sysctl.c | 22 +-
include/linux/inotify.h | 2 -
include/linux/key.h | 4 +-
include/linux/poll.h | 2 -
include/linux/sysctl.h | 189 ++++++----
include/net/ax25.h | 10 +-
include/net/net_namespace.h | 2 +-
include/net/netns/ipv6.h | 4 +-
init/main.c | 2 +
ipc/ipc_sysctl.c | 12 +-
ipc/mq_sysctl.c | 24 +-
kernel/sched.c | 398 +++++++++++++------
kernel/sysctl.c | 798 +++++++++++++++++++++---------------
kernel/sysctl_check.c | 253 ++++++------
kernel/utsname_sysctl.c | 14 +-
net/ax25/af_ax25.c | 23 +-
net/ax25/ax25_dev.c | 10 +-
net/ax25/sysctl_net_ax25.c | 82 ++---
net/core/neighbour.c | 8 +-
net/decnet/dn_dev.c | 8 +-
net/ipv4/devinet.c | 8 +-
net/ipv4/route.c | 15 +-
net/ipv6/addrconf.c | 8 +-
net/ipv6/sysctl_net_ipv6.c | 119 +++---
net/llc/sysctl_net_llc.c | 55 ++--
net/sunrpc/sysctl.c | 19 +-
net/sunrpc/xprtrdma/svc_rdma.c | 26 +-
net/sunrpc/xprtrdma/transport.c | 14 +-
net/sunrpc/xprtsock.c | 16 +-
net/sysctl_net.c | 63 ++--
security/keys/key.c | 1 +
security/keys/sysctl.c | 18 +-
70 files changed, 1778 insertions(+), 1670 deletions(-)
--
1.7.5.134.g1c08b
First patch in a series that will end with a rewrite of sysctl. The
new implementation needs to get rid of the .child field of ctl_table.
Same functionality, but a little more clarity.
MAINTAINERS says parport is "Orphan" and I don't have a parallel
port. I minimally tested this patch, but I don't know who to resort to
for an ACK.
Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
drivers/parport/procfs.c | 96 +++++++++++++++++++---------------------------
1 files changed, 40 insertions(+), 56 deletions(-)
diff --git a/drivers/parport/procfs.c b/drivers/parport/procfs.c
index 3f56bc0..89b8b71 100644
--- a/drivers/parport/procfs.c
+++ b/drivers/parport/procfs.c
@@ -419,56 +419,6 @@ parport_device_sysctl_template = {
}
};
-struct parport_default_sysctl_table
-{
- struct ctl_table_header *sysctl_header;
- ctl_table vars[3];
- ctl_table default_dir[2];
- ctl_table parport_dir[2];
- ctl_table dev_dir[2];
-};
-
-static struct parport_default_sysctl_table
-parport_default_sysctl_table = {
- .sysctl_header = NULL,
- {
- {
- .procname = "timeslice",
- .data = &parport_default_timeslice,
- .maxlen = sizeof(parport_default_timeslice),
- .mode = 0644,
- .proc_handler = proc_doulongvec_ms_jiffies_minmax,
- .extra1 = (void*) &parport_min_timeslice_value,
- .extra2 = (void*) &parport_max_timeslice_value
- },
- {
- .procname = "spintime",
- .data = &parport_default_spintime,
- .maxlen = sizeof(parport_default_spintime),
- .mode = 0644,
- .proc_handler = proc_dointvec_minmax,
- .extra1 = (void*) &parport_min_spintime_value,
- .extra2 = (void*) &parport_max_spintime_value
- },
- {}
- },
- {
- {
- .procname = "default",
- .mode = 0555,
- .child = parport_default_sysctl_table.vars
- },
- {}
- },
- {
- PARPORT_PARPORT_DIR(parport_default_sysctl_table.default_dir),
- {}
- },
- {
- PARPORT_DEV_DIR(parport_default_sysctl_table.parport_dir),
- {}
- }
-};
int parport_proc_register(struct parport *port)
@@ -558,19 +508,53 @@ int parport_device_proc_unregister(struct pardevice *device)
return 0;
}
+
+static struct ctl_table_header *parport_default_sysctl_header;
+
+static struct ctl_table parport_default_sysctl_table[] = {
+ {
+ .procname = "timeslice",
+ .data = &parport_default_timeslice,
+ .maxlen = sizeof(parport_default_timeslice),
+ .mode = 0644,
+ .proc_handler = proc_doulongvec_ms_jiffies_minmax,
+ .extra1 = (void*) &parport_min_timeslice_value,
+ .extra2 = (void*) &parport_max_timeslice_value
+ },
+ {
+ .procname = "spintime",
+ .data = &parport_default_spintime,
+ .maxlen = sizeof(parport_default_spintime),
+ .mode = 0644,
+ .proc_handler = proc_dointvec_minmax,
+ .extra1 = (void*) &parport_min_spintime_value,
+ .extra2 = (void*) &parport_max_spintime_value
+ },
+ { },
+};
+
+static const __initdata struct ctl_path parport_default_path[] = {
+ { .procname = "dev" },
+ { .procname = "parport" },
+ { .procname = "default" },
+ { },
+};
+
static int __init parport_default_proc_register(void)
{
- parport_default_sysctl_table.sysctl_header =
- register_sysctl_table(parport_default_sysctl_table.dev_dir);
+ parport_default_sysctl_header =
+ register_sysctl_paths(parport_default_path,
+ parport_default_sysctl_table);
+ /* XXX: if this fails then we can't access the sysctl tables for
+ * /proc/sys/dev/parport/default/. Should the module fail to load? */
return 0;
}
static void __exit parport_default_proc_unregister(void)
{
- if (parport_default_sysctl_table.sysctl_header) {
- unregister_sysctl_table(parport_default_sysctl_table.
- sysctl_header);
- parport_default_sysctl_table.sysctl_header = NULL;
+ if (parport_default_sysctl_header) {
+ unregister_sysctl_table(parport_default_sysctl_header);
+ parport_default_sysctl_header = NULL;
}
}
--
1.7.5.134.g1c08b
Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
drivers/parport/procfs.c | 14 ++++++++------
1 files changed, 8 insertions(+), 6 deletions(-)
diff --git a/drivers/parport/procfs.c b/drivers/parport/procfs.c
index 89b8b71..edeb012 100644
--- a/drivers/parport/procfs.c
+++ b/drivers/parport/procfs.c
@@ -437,16 +437,17 @@ int parport_proc_register(struct parport *port)
t->vars[i].extra1 = port;
t->vars[0].data = &port->spintime;
- t->vars[5].child = t->device_dir;
for (i = 0; i < 5; i++)
t->vars[6 + i].extra2 = &port->probe_info[i];
t->port_dir[0].procname = port->name;
- t->port_dir[0].child = t->vars;
- t->parport_dir[0].child = t->port_dir;
t->dev_dir[0].child = t->parport_dir;
+ t->parport_dir[0].child = t->port_dir;
+ t->port_dir[0].child = t->vars;
+ t->vars[5].child = t->device_dir;
+ /* vars[5] = PARPORT_DEVICES_ROOT_DIR => .procname = 'devices' */
t->sysctl_header = register_sysctl_table(t->dev_dir);
if (t->sysctl_header == NULL) {
@@ -478,14 +479,15 @@ int parport_device_proc_register(struct pardevice *device)
return -ENOMEM;
memcpy(t, &parport_device_sysctl_template, sizeof(*t));
+ t->port_dir[0].procname = port->name;
+ t->device_dir[0].procname = device->name;
+
t->dev_dir[0].child = t->parport_dir;
t->parport_dir[0].child = t->port_dir;
- t->port_dir[0].procname = port->name;
t->port_dir[0].child = t->devices_root_dir;
t->devices_root_dir[0].child = t->device_dir;
-
- t->device_dir[0].procname = device->name;
t->device_dir[0].child = t->vars;
+
t->vars[0].data = &device->timeslice;
t->sysctl_header = register_sysctl_table(t->dev_dir);
--
1.7.5.134.g1c08b
MAINTAINERS says parport is "Orphan" and I don't have a parallel
port => I cannot test that this patch works.
Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
drivers/parport/procfs.c | 56 ++++++++++------------------------------------
1 files changed, 12 insertions(+), 44 deletions(-)
diff --git a/drivers/parport/procfs.c b/drivers/parport/procfs.c
index edeb012..350233e 100644
--- a/drivers/parport/procfs.c
+++ b/drivers/parport/procfs.c
@@ -370,17 +370,11 @@ struct parport_device_sysctl_table
{
struct ctl_table_header *sysctl_header;
ctl_table vars[2];
- ctl_table device_dir[2];
- ctl_table devices_root_dir[2];
- ctl_table port_dir[2];
- ctl_table parport_dir[2];
- ctl_table dev_dir[2];
};
static const struct parport_device_sysctl_table
parport_device_sysctl_template = {
- .sysctl_header = NULL,
- {
+ .vars = {
{
.procname = "timeslice",
.data = NULL,
@@ -391,32 +385,6 @@ parport_device_sysctl_template = {
.extra2 = (void*) &parport_max_timeslice_value
},
},
- {
- {
- .procname = NULL,
- .data = NULL,
- .maxlen = 0,
- .mode = 0555,
- .child = NULL
- },
- {}
- },
- {
- PARPORT_DEVICES_ROOT_DIR,
- {}
- },
- {
- PARPORT_PORT_DIR(NULL),
- {}
- },
- {
- PARPORT_PARPORT_DIR(NULL),
- {}
- },
- {
- PARPORT_DEV_DIR(NULL),
- {}
- }
};
@@ -473,24 +441,24 @@ int parport_device_proc_register(struct pardevice *device)
{
struct parport_device_sysctl_table *t;
struct parport * port = device->port;
-
+ struct ctl_path parport_devices_port_path[] = {
+ { .procname = "dev" },
+ { .procname = "parport" },
+ { .procname = port->name },
+ { .procname = "devices" },
+ { .procname = device->name },
+ { },
+ };
+
t = kmalloc(sizeof(*t), GFP_KERNEL);
if (t == NULL)
return -ENOMEM;
memcpy(t, &parport_device_sysctl_template, sizeof(*t));
- t->port_dir[0].procname = port->name;
- t->device_dir[0].procname = device->name;
-
- t->dev_dir[0].child = t->parport_dir;
- t->parport_dir[0].child = t->port_dir;
- t->port_dir[0].child = t->devices_root_dir;
- t->devices_root_dir[0].child = t->device_dir;
- t->device_dir[0].child = t->vars;
-
t->vars[0].data = &device->timeslice;
- t->sysctl_header = register_sysctl_table(t->dev_dir);
+ t->sysctl_header = register_sysctl_paths(parport_devices_port_path,
+ t->vars);
if (t->sysctl_header == NULL) {
kfree(t);
t = NULL;
--
1.7.5.134.g1c08b
Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
drivers/parport/procfs.c | 48 ++++++++++++++-------------------------------
1 files changed, 15 insertions(+), 33 deletions(-)
diff --git a/drivers/parport/procfs.c b/drivers/parport/procfs.c
index 350233e..e55b9b6 100644
--- a/drivers/parport/procfs.c
+++ b/drivers/parport/procfs.c
@@ -233,13 +233,6 @@ static int do_hardware_modes (ctl_table *table, int write,
return copy_to_user(result, buffer, len) ? -EFAULT : 0;
}
-#define PARPORT_PORT_DIR(CHILD) { .procname = NULL, .mode = 0555, .child = CHILD }
-#define PARPORT_PARPORT_DIR(CHILD) { .procname = "parport", \
- .mode = 0555, .child = CHILD }
-#define PARPORT_DEV_DIR(CHILD) { .procname = "dev", .mode = 0555, .child = CHILD }
-#define PARPORT_DEVICES_ROOT_DIR { .procname = "devices", \
- .mode = 0555, .child = NULL }
-
static const unsigned long parport_min_timeslice_value =
PARPORT_MIN_TIMESLICE_VALUE;
@@ -257,14 +250,10 @@ struct parport_sysctl_table {
struct ctl_table_header *sysctl_header;
ctl_table vars[12];
ctl_table device_dir[2];
- ctl_table port_dir[2];
- ctl_table parport_dir[2];
- ctl_table dev_dir[2];
};
static const struct parport_sysctl_table parport_sysctl_template = {
- .sysctl_header = NULL,
- {
+ .vars = {
{
.procname = "spintime",
.data = NULL,
@@ -302,7 +291,11 @@ static const struct parport_sysctl_table parport_sysctl_template = {
.mode = 0444,
.proc_handler = do_hardware_modes
},
- PARPORT_DEVICES_ROOT_DIR,
+ {
+ .procname = "devices",
+ .mode = 0555,
+ .child = NULL, /* child will point to .device_dir */
+ },
#ifdef CONFIG_PARPORT_1284
{
.procname = "autoprobe",
@@ -342,7 +335,7 @@ static const struct parport_sysctl_table parport_sysctl_template = {
#endif /* IEEE 1284 support */
{}
},
- {
+ .device_dir = {
{
.procname = "active",
.data = NULL,
@@ -352,18 +345,6 @@ static const struct parport_sysctl_table parport_sysctl_template = {
},
{}
},
- {
- PARPORT_PORT_DIR(NULL),
- {}
- },
- {
- PARPORT_PARPORT_DIR(NULL),
- {}
- },
- {
- PARPORT_DEV_DIR(NULL),
- {}
- }
};
struct parport_device_sysctl_table
@@ -391,6 +372,12 @@ parport_device_sysctl_template = {
int parport_proc_register(struct parport *port)
{
+ struct ctl_path parport_port_path[] = {
+ { .procname = "dev" },
+ { .procname = "parport" },
+ { .procname = port->name },
+ { },
+ };
struct parport_sysctl_table *t;
int i;
@@ -409,15 +396,10 @@ int parport_proc_register(struct parport *port)
for (i = 0; i < 5; i++)
t->vars[6 + i].extra2 = &port->probe_info[i];
- t->port_dir[0].procname = port->name;
-
- t->dev_dir[0].child = t->parport_dir;
- t->parport_dir[0].child = t->port_dir;
- t->port_dir[0].child = t->vars;
t->vars[5].child = t->device_dir;
- /* vars[5] = PARPORT_DEVICES_ROOT_DIR => .procname = 'devices' */
+ /* vars[5].procname is the 'devices' dir entry */
- t->sysctl_header = register_sysctl_table(t->dev_dir);
+ t->sysctl_header = register_sysctl_paths(parport_port_path, t->vars);
if (t->sysctl_header == NULL) {
kfree(t);
t = NULL;
--
1.7.5.134.g1c08b
Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
drivers/parport/procfs.c | 42 ++++++++++++++++++++++++++++--------------
1 files changed, 28 insertions(+), 14 deletions(-)
diff --git a/drivers/parport/procfs.c b/drivers/parport/procfs.c
index e55b9b6..3bb5bed 100644
--- a/drivers/parport/procfs.c
+++ b/drivers/parport/procfs.c
@@ -248,6 +248,7 @@ PARPORT_MAX_SPINTIME_VALUE;
struct parport_sysctl_table {
struct ctl_table_header *sysctl_header;
+ struct ctl_table_header *devices_sysctl_header;
ctl_table vars[12];
ctl_table device_dir[2];
};
@@ -291,11 +292,6 @@ static const struct parport_sysctl_table parport_sysctl_template = {
.mode = 0444,
.proc_handler = do_hardware_modes
},
- {
- .procname = "devices",
- .mode = 0555,
- .child = NULL, /* child will point to .device_dir */
- },
#ifdef CONFIG_PARPORT_1284
{
.procname = "autoprobe",
@@ -378,6 +374,14 @@ int parport_proc_register(struct parport *port)
{ .procname = port->name },
{ },
};
+ struct ctl_path parport_port_devices_path[] = {
+ { .procname = "dev" },
+ { .procname = "parport" },
+ { .procname = port->name },
+ { .procname = "devices" },
+ { },
+ };
+
struct parport_sysctl_table *t;
int i;
@@ -392,20 +396,29 @@ int parport_proc_register(struct parport *port)
t->vars[i].extra1 = port;
t->vars[0].data = &port->spintime;
-
- for (i = 0; i < 5; i++)
- t->vars[6 + i].extra2 = &port->probe_info[i];
- t->vars[5].child = t->device_dir;
- /* vars[5].procname is the 'devices' dir entry */
+#ifdef CONFIG_PARPORT_1284
+ for (i = 0; i < 5; i++)
+ t->vars[5 + i].extra2 = &port->probe_info[i];
+#endif /* CONFIG_PARPORT_1284 */
t->sysctl_header = register_sysctl_paths(parport_port_path, t->vars);
- if (t->sysctl_header == NULL) {
- kfree(t);
- t = NULL;
- }
+ if (t->sysctl_header == NULL)
+ goto fail_register_port;
+
+ t->devices_sysctl_header = register_sysctl_paths(parport_port_devices_path,
+ t->device_dir);
+ if (t->devices_sysctl_header == NULL)
+ goto fail_register_devices;
port->sysctl_table = t;
return 0;
+
+fail_register_devices:
+ unregister_sysctl_table(t->sysctl_header);
+fail_register_port:
+ kfree(t);
+
+ return -ENOMEM;
}
int parport_proc_unregister(struct parport *port)
@@ -413,6 +426,7 @@ int parport_proc_unregister(struct parport *port)
if (port->sysctl_table) {
struct parport_sysctl_table *t = port->sysctl_table;
port->sysctl_table = NULL;
+ unregister_sysctl_table(t->devices_sysctl_header);
unregister_sysctl_table(t->sysctl_header);
kfree(t);
}
--
1.7.5.134.g1c08b
Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
arch/x86/kernel/vsyscall_64.c | 25 ++++++++++++++-----------
1 files changed, 14 insertions(+), 11 deletions(-)
diff --git a/arch/x86/kernel/vsyscall_64.c b/arch/x86/kernel/vsyscall_64.c
index dcbb28c..7d8b83d 100644
--- a/arch/x86/kernel/vsyscall_64.c
+++ b/arch/x86/kernel/vsyscall_64.c
@@ -234,18 +234,21 @@ static long __vsyscall(3) venosys_1(void)
}
#ifdef CONFIG_SYSCTL
-static ctl_table kernel_table2[] = {
- { .procname = "vsyscall64",
- .data = &vsyscall_gtod_data.sysctl_enabled, .maxlen = sizeof(int),
- .mode = 0644,
- .proc_handler = proc_dointvec },
- {}
+static ctl_table vsyscall64_table[] = {
+ {
+ .procname = "vsyscall64",
+ .data = &vsyscall_gtod_data.sysctl_enabled,
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec,
+ },
+ { }
};
-static ctl_table kernel_root_table2[] = {
- { .procname = "kernel", .mode = 0555,
- .child = kernel_table2 },
- {}
+
+static struct ctl_path kernel_root_path[] = {
+ { .procname = "kernel" },
+ { }
};
#endif
@@ -303,7 +306,7 @@ static int __init vsyscall_init(void)
BUG_ON((VSYSCALL_ADDR(0) != __fix_to_virt(VSYSCALL_FIRST_PAGE)));
BUG_ON((unsigned long) &vgetcpu != VSYSCALL_ADDR(__NR_vgetcpu));
#ifdef CONFIG_SYSCTL
- register_sysctl_table(kernel_root_table2);
+ register_sysctl_paths(kernel_root_path, vsyscall64_table);
#endif
on_each_cpu(cpu_vsyscall_init, NULL, 1);
/* notifier priority > KVM */
--
1.7.5.134.g1c08b
Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
arch/x86/vdso/vdso32-setup.c | 14 +++++---------
1 files changed, 5 insertions(+), 9 deletions(-)
diff --git a/arch/x86/vdso/vdso32-setup.c b/arch/x86/vdso/vdso32-setup.c
index 468d591..e6ef3b4 100644
--- a/arch/x86/vdso/vdso32-setup.c
+++ b/arch/x86/vdso/vdso32-setup.c
@@ -380,7 +380,7 @@ subsys_initcall(sysenter_setup);
/* Register vsyscall32 into the ABI table */
#include <linux/sysctl.h>
-static ctl_table abi_table2[] = {
+static ctl_table abi_table[] = {
{
.procname = "vsyscall32",
.data = &sysctl_vsyscall32,
@@ -391,18 +391,14 @@ static ctl_table abi_table2[] = {
{}
};
-static ctl_table abi_root_table2[] = {
- {
- .procname = "abi",
- .mode = 0555,
- .child = abi_table2
- },
- {}
+static const struct ctl_path abi_root_path[] = {
+ { .procname = "abi" },
+ { }
};
static __init int ia32_binfmt_init(void)
{
- register_sysctl_table(abi_root_table2);
+ register_sysctl_paths(abi_root_path, abi_table);
return 0;
}
__initcall(ia32_binfmt_init);
--
1.7.5.134.g1c08b
Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
crypto/proc.c | 12 ++++--------
1 files changed, 4 insertions(+), 8 deletions(-)
diff --git a/crypto/proc.c b/crypto/proc.c
index 58fef67..2ef248b 100644
--- a/crypto/proc.c
+++ b/crypto/proc.c
@@ -34,20 +34,16 @@ static struct ctl_table crypto_sysctl_table[] = {
{}
};
-static struct ctl_table crypto_dir_table[] = {
- {
- .procname = "crypto",
- .mode = 0555,
- .child = crypto_sysctl_table
- },
- {}
+static const struct ctl_path crypto_root_path[] = {
+ { .procname = "crypto" },
+ { }
};
static struct ctl_table_header *crypto_sysctls;
static void crypto_proc_fips_init(void)
{
- crypto_sysctls = register_sysctl_table(crypto_dir_table);
+ crypto_sysctls = register_sysctl_paths(crypto_root_path, crypto_sysctl_table);
}
static void crypto_proc_fips_exit(void)
--
1.7.5.134.g1c08b
Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
drivers/cdrom/cdrom.c | 22 ++++------------------
1 files changed, 4 insertions(+), 18 deletions(-)
diff --git a/drivers/cdrom/cdrom.c b/drivers/cdrom/cdrom.c
index 514dd8e..9560789 100644
--- a/drivers/cdrom/cdrom.c
+++ b/drivers/cdrom/cdrom.c
@@ -3654,26 +3654,12 @@ static ctl_table cdrom_table[] = {
{ }
};
-static ctl_table cdrom_cdrom_table[] = {
- {
- .procname = "cdrom",
- .maxlen = 0,
- .mode = 0555,
- .child = cdrom_table,
- },
+static const struct ctl_path cdrom_root_path[] = {
+ { .procname = "dev" },
+ { .procname = "cdrom" },
{ }
};
-/* Make sure that /proc/sys/dev is there */
-static ctl_table cdrom_root_table[] = {
- {
- .procname = "dev",
- .maxlen = 0,
- .mode = 0555,
- .child = cdrom_cdrom_table,
- },
- { }
-};
static struct ctl_table_header *cdrom_sysctl_header;
static void cdrom_sysctl_register(void)
@@ -3683,7 +3669,7 @@ static void cdrom_sysctl_register(void)
if (initialized == 1)
return;
- cdrom_sysctl_header = register_sysctl_table(cdrom_root_table);
+ cdrom_sysctl_header = register_sysctl_paths(cdrom_root_path, cdrom_table);
/* set the defaults */
cdrom_sysctl_settings.autoclose = autoclose;
--
1.7.5.134.g1c08b
Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
drivers/char/hpet.c | 38 ++++++++++++--------------------------
1 files changed, 12 insertions(+), 26 deletions(-)
diff --git a/drivers/char/hpet.c b/drivers/char/hpet.c
index 7066e80..303de7e 100644
--- a/drivers/char/hpet.c
+++ b/drivers/char/hpet.c
@@ -721,33 +721,19 @@ static int hpet_is_known(struct hpet_data *hdp)
static ctl_table hpet_table[] = {
{
- .procname = "max-user-freq",
- .data = &hpet_max_freq,
- .maxlen = sizeof(int),
- .mode = 0644,
- .proc_handler = proc_dointvec,
- },
- {}
+ .procname = "max-user-freq",
+ .data = &hpet_max_freq,
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec,
+ },
+ { }
};
-static ctl_table hpet_root[] = {
- {
- .procname = "hpet",
- .maxlen = 0,
- .mode = 0555,
- .child = hpet_table,
- },
- {}
-};
-
-static ctl_table dev_root[] = {
- {
- .procname = "dev",
- .maxlen = 0,
- .mode = 0555,
- .child = hpet_root,
- },
- {}
+static const struct ctl_path hpet_path[] = {
+ { .procname = "dev" },
+ { .procname = "hpet" },
+ { }
};
static struct ctl_table_header *sysctl_header;
@@ -1053,7 +1039,7 @@ static int __init hpet_init(void)
if (result < 0)
return -ENODEV;
- sysctl_header = register_sysctl_table(dev_root);
+ sysctl_header = register_sysctl_paths(hpet_path, hpet_table);
result = acpi_bus_register_driver(&hpet_acpi_driver);
if (result < 0) {
--
1.7.5.134.g1c08b
Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
drivers/char/ipmi/ipmi_poweroff.c | 16 ++++------------
1 files changed, 4 insertions(+), 12 deletions(-)
diff --git a/drivers/char/ipmi/ipmi_poweroff.c b/drivers/char/ipmi/ipmi_poweroff.c
index 2efa176..ac71d69 100644
--- a/drivers/char/ipmi/ipmi_poweroff.c
+++ b/drivers/char/ipmi/ipmi_poweroff.c
@@ -668,17 +668,9 @@ static ctl_table ipmi_table[] = {
{ }
};
-static ctl_table ipmi_dir_table[] = {
- { .procname = "ipmi",
- .mode = 0555,
- .child = ipmi_table },
- { }
-};
-
-static ctl_table ipmi_root_table[] = {
- { .procname = "dev",
- .mode = 0555,
- .child = ipmi_dir_table },
+static const struct ctl_path ipmi_path[] = {
+ { .procname = "dev" },
+ { .procname = "ipmi" },
{ }
};
@@ -699,7 +691,7 @@ static int __init ipmi_poweroff_init(void)
printk(KERN_INFO PFX "Power cycle is enabled.\n");
#ifdef CONFIG_PROC_FS
- ipmi_table_header = register_sysctl_table(ipmi_root_table);
+ ipmi_table_header = register_sysctl_paths(ipmi_path, ipmi_table);
if (!ipmi_table_header) {
printk(KERN_ERR PFX "Unable to register powercycle sysctl\n");
rv = -ENOMEM;
--
1.7.5.134.g1c08b
Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
drivers/char/rtc.c | 24 ++++++------------------
1 files changed, 6 insertions(+), 18 deletions(-)
diff --git a/drivers/char/rtc.c b/drivers/char/rtc.c
index dfa8b30..cc752f5 100644
--- a/drivers/char/rtc.c
+++ b/drivers/char/rtc.c
@@ -291,21 +291,9 @@ static ctl_table rtc_table[] = {
{ }
};
-static ctl_table rtc_root[] = {
- {
- .procname = "rtc",
- .mode = 0555,
- .child = rtc_table,
- },
- { }
-};
-
-static ctl_table dev_root[] = {
- {
- .procname = "dev",
- .mode = 0555,
- .child = rtc_root,
- },
+static const __initdata struct ctl_path rtc_path[] = {
+ { .procname = "dev" },
+ { .procname = "rtc" },
{ }
};
@@ -313,13 +301,13 @@ static struct ctl_table_header *sysctl_header;
static int __init init_sysctl(void)
{
- sysctl_header = register_sysctl_table(dev_root);
- return 0;
+ sysctl_header = register_sysctl_paths(rtc_path, rtc_table);
+ return 0;
}
static void __exit cleanup_sysctl(void)
{
- unregister_sysctl_table(sysctl_header);
+ unregister_sysctl_table(sysctl_header);
}
/*
--
1.7.5.134.g1c08b
Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
drivers/macintosh/mac_hid.c | 26 +++++---------------------
1 files changed, 5 insertions(+), 21 deletions(-)
diff --git a/drivers/macintosh/mac_hid.c b/drivers/macintosh/mac_hid.c
index 6a82388..5eec7b7 100644
--- a/drivers/macintosh/mac_hid.c
+++ b/drivers/macintosh/mac_hid.c
@@ -214,7 +214,7 @@ static int mac_hid_toggle_emumouse(ctl_table *table, int write,
}
/* file(s) in /proc/sys/dev/mac_hid */
-static ctl_table mac_hid_files[] = {
+static ctl_table mac_hid_table[] = {
{
.procname = "mouse_button_emulation",
.data = &mouse_emulate_buttons,
@@ -239,25 +239,9 @@ static ctl_table mac_hid_files[] = {
{ }
};
-/* dir in /proc/sys/dev */
-static ctl_table mac_hid_dir[] = {
- {
- .procname = "mac_hid",
- .maxlen = 0,
- .mode = 0555,
- .child = mac_hid_files,
- },
- { }
-};
-
-/* /proc/sys/dev itself, in case that is not there yet */
-static ctl_table mac_hid_root_dir[] = {
- {
- .procname = "dev",
- .maxlen = 0,
- .mode = 0555,
- .child = mac_hid_dir,
- },
+static const __initdata struct ctl_path mac_hid_path[] = {
+ { .procname = "dev" },
+ { .procname = "mac_hid" },
{ }
};
@@ -265,7 +249,7 @@ static struct ctl_table_header *mac_hid_sysctl_header;
static int __init mac_hid_init(void)
{
- mac_hid_sysctl_header = register_sysctl_table(mac_hid_root_dir);
+ mac_hid_sysctl_header = register_sysctl_paths(mac_hid_path, mac_hid_table);
if (!mac_hid_sysctl_header)
return -ENOMEM;
--
1.7.5.134.g1c08b
Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
drivers/md/md.c | 22 ++++------------------
1 files changed, 4 insertions(+), 18 deletions(-)
diff --git a/drivers/md/md.c b/drivers/md/md.c
index 7d6f7f1..3b54374 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -125,26 +125,12 @@ static ctl_table raid_table[] = {
{ }
};
-static ctl_table raid_dir_table[] = {
- {
- .procname = "raid",
- .maxlen = 0,
- .mode = S_IRUGO|S_IXUGO,
- .child = raid_table,
- },
+static const __initdata struct ctl_path raid_path[] = {
+ { .procname = "dev" },
+ { .procname = "raid" },
{ }
};
-static ctl_table raid_root_table[] = {
- {
- .procname = "dev",
- .maxlen = 0,
- .mode = 0555,
- .child = raid_dir_table,
- },
- { }
-};
-
static const struct block_device_operations md_fops;
static int start_readonly;
@@ -7380,7 +7366,7 @@ static int __init md_init(void)
md_probe, NULL, NULL);
register_reboot_notifier(&md_notifier);
- raid_table_header = register_sysctl_table(raid_root_table);
+ raid_table_header = register_sysctl_paths(raid_path, raid_table);
md_geninit();
return 0;
--
1.7.5.134.g1c08b
Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
drivers/misc/sgi-xp/xpc_main.c | 12 +++++-------
1 files changed, 5 insertions(+), 7 deletions(-)
diff --git a/drivers/misc/sgi-xp/xpc_main.c b/drivers/misc/sgi-xp/xpc_main.c
index 8d082b4..642efb1 100644
--- a/drivers/misc/sgi-xp/xpc_main.c
+++ b/drivers/misc/sgi-xp/xpc_main.c
@@ -122,13 +122,11 @@ static ctl_table xpc_sys_xpc_dir[] = {
.extra2 = &xpc_disengage_max_timelimit},
{}
};
-static ctl_table xpc_sys_dir[] = {
- {
- .procname = "xpc",
- .mode = 0555,
- .child = xpc_sys_xpc_dir},
- {}
+static const __initdata struct ctl_path xpc_path[] = {
+ { .procname = "xpc" },
+ { }
};
+
static struct ctl_table_header *xpc_sysctl;
/* non-zero if any remote partition disengage was timed out */
@@ -1236,7 +1234,7 @@ xpc_init(void)
goto out_1;
}
- xpc_sysctl = register_sysctl_table(xpc_sys_dir);
+ xpc_sysctl = register_sysctl_paths(xpc_path, xpc_sys_xpc_dir);
/*
* Fill the partition reserved page with the information needed by
--
1.7.5.134.g1c08b
Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
drivers/misc/sgi-xp/xpc_main.c | 69 +++++++++++++++++++++++-----------------
1 files changed, 40 insertions(+), 29 deletions(-)
diff --git a/drivers/misc/sgi-xp/xpc_main.c b/drivers/misc/sgi-xp/xpc_main.c
index 642efb1..414d68b 100644
--- a/drivers/misc/sgi-xp/xpc_main.c
+++ b/drivers/misc/sgi-xp/xpc_main.c
@@ -88,46 +88,52 @@ int xpc_disengage_timelimit = XPC_DISENGAGE_DEFAULT_TIMELIMIT;
static int xpc_disengage_min_timelimit; /* = 0 */
static int xpc_disengage_max_timelimit = 120;
-static ctl_table xpc_sys_xpc_hb_dir[] = {
+static ctl_table xpc_hb_table[] = {
{
- .procname = "hb_interval",
- .data = &xpc_hb_interval,
- .maxlen = sizeof(int),
- .mode = 0644,
- .proc_handler = proc_dointvec_minmax,
- .extra1 = &xpc_hb_min_interval,
- .extra2 = &xpc_hb_max_interval},
+ .procname = "hb_interval",
+ .data = &xpc_hb_interval,
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec_minmax,
+ .extra1 = &xpc_hb_min_interval,
+ .extra2 = &xpc_hb_max_interval
+ },
{
- .procname = "hb_check_interval",
- .data = &xpc_hb_check_interval,
- .maxlen = sizeof(int),
- .mode = 0644,
- .proc_handler = proc_dointvec_minmax,
- .extra1 = &xpc_hb_check_min_interval,
- .extra2 = &xpc_hb_check_max_interval},
- {}
+ .procname = "hb_check_interval",
+ .data = &xpc_hb_check_interval,
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec_minmax,
+ .extra1 = &xpc_hb_check_min_interval,
+ .extra2 = &xpc_hb_check_max_interval
+ },
+ { }
};
-static ctl_table xpc_sys_xpc_dir[] = {
- {
- .procname = "hb",
- .mode = 0555,
- .child = xpc_sys_xpc_hb_dir},
+static ctl_table xpc_table[] = {
{
- .procname = "disengage_timelimit",
- .data = &xpc_disengage_timelimit,
- .maxlen = sizeof(int),
- .mode = 0644,
- .proc_handler = proc_dointvec_minmax,
- .extra1 = &xpc_disengage_min_timelimit,
- .extra2 = &xpc_disengage_max_timelimit},
- {}
+ .procname = "disengage_timelimit",
+ .data = &xpc_disengage_timelimit,
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec_minmax,
+ .extra1 = &xpc_disengage_min_timelimit,
+ .extra2 = &xpc_disengage_max_timelimit
+ },
+ { }
};
static const __initdata struct ctl_path xpc_path[] = {
{ .procname = "xpc" },
{ }
};
+static const __initdata struct ctl_path xpc_hb_path[] = {
+ { .procname = "xpc" },
+ { .procname = "hb" },
+ { }
+};
+
static struct ctl_table_header *xpc_sysctl;
+static struct ctl_table_header *xpc_hb_sysctl;
/* non-zero if any remote partition disengage was timed out */
int xpc_disengage_timedout;
@@ -1040,6 +1046,8 @@ xpc_do_exit(enum xp_retval reason)
/* clear the interface to XPC's functions */
xpc_clear_interface();
+ if (xpc_hb_sysctl)
+ unregister_sysctl_table(xpc_hb_sysctl);
if (xpc_sysctl)
unregister_sysctl_table(xpc_sysctl);
@@ -1235,6 +1243,7 @@ xpc_init(void)
}
xpc_sysctl = register_sysctl_paths(xpc_path, xpc_sys_xpc_dir);
+ xpc_hb_sysctl = register_sysctl_paths(xpc_hb_path, xpc_hb_table);
/*
* Fill the partition reserved page with the information needed by
@@ -1299,6 +1308,8 @@ out_3:
(void)unregister_die_notifier(&xpc_die_notifier);
(void)unregister_reboot_notifier(&xpc_reboot_notifier);
out_2:
+ if (xpc_hb_sysctl)
+ unregister_sysctl_table(xpc_hb_sysctl);
if (xpc_sysctl)
unregister_sysctl_table(xpc_sysctl);
--
1.7.5.134.g1c08b
Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
drivers/s390/char/sclp_async.c | 13 ++++---------
1 files changed, 4 insertions(+), 9 deletions(-)
diff --git a/drivers/s390/char/sclp_async.c b/drivers/s390/char/sclp_async.c
index 7ad30e7..43f8b1e 100644
--- a/drivers/s390/char/sclp_async.c
+++ b/drivers/s390/char/sclp_async.c
@@ -106,14 +106,9 @@ static struct ctl_table callhome_table[] = {
{}
};
-static struct ctl_table kern_dir_table[] = {
- {
- .procname = "kernel",
- .maxlen = 0,
- .mode = 0555,
- .child = callhome_table,
- },
- {}
+static const __initdata struct ctl_path kern_path[] = {
+ { .procname = "kernel" },
+ { }
};
/*
@@ -175,7 +170,7 @@ static int __init sclp_async_init(void)
if (!(sclp_async_register.sclp_receive_mask & EVTYP_ASYNC_MASK))
goto out_sclp;
rc = -ENOMEM;
- callhome_sysctl_header = register_sysctl_table(kern_dir_table);
+ callhome_sysctl_header = register_sysctl_paths(kern_path, callhome_table);
if (!callhome_sysctl_header)
goto out_sclp;
request = kzalloc(sizeof(struct sclp_req), GFP_KERNEL);
--
1.7.5.134.g1c08b
Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
drivers/scsi/scsi_sysctl.c | 28 +++++++++++-----------------
1 files changed, 11 insertions(+), 17 deletions(-)
diff --git a/drivers/scsi/scsi_sysctl.c b/drivers/scsi/scsi_sysctl.c
index 2b6b93f..a28707f 100644
--- a/drivers/scsi/scsi_sysctl.c
+++ b/drivers/scsi/scsi_sysctl.c
@@ -13,25 +13,19 @@
static ctl_table scsi_table[] = {
- { .procname = "logging_level",
- .data = &scsi_logging_level,
- .maxlen = sizeof(scsi_logging_level),
- .mode = 0644,
- .proc_handler = proc_dointvec },
+ {
+ .procname = "logging_level",
+ .data = &scsi_logging_level,
+ .maxlen = sizeof(scsi_logging_level),
+ .mode = 0644,
+ .proc_handler = proc_dointvec,
+ },
{ }
};
-static ctl_table scsi_dir_table[] = {
- { .procname = "scsi",
- .mode = 0555,
- .child = scsi_table },
- { }
-};
-
-static ctl_table scsi_root_table[] = {
- { .procname = "dev",
- .mode = 0555,
- .child = scsi_dir_table },
+static const __initdata struct ctl_path scsi_path[] = {
+ { .procname = "dev" },
+ { .procname = "scsi" },
{ }
};
@@ -39,7 +33,7 @@ static struct ctl_table_header *scsi_table_header;
int __init scsi_init_sysctl(void)
{
- scsi_table_header = register_sysctl_table(scsi_root_table);
+ scsi_table_header = register_sysctl_paths(scsi_path, scsi_table);
if (!scsi_table_header)
return -ENOMEM;
return 0;
--
1.7.5.134.g1c08b
Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
drivers/tty/pty.c | 23 +++++------------------
1 files changed, 5 insertions(+), 18 deletions(-)
diff --git a/drivers/tty/pty.c b/drivers/tty/pty.c
index 2107747..2a40b34 100644
--- a/drivers/tty/pty.c
+++ b/drivers/tty/pty.c
@@ -469,25 +469,12 @@ static struct ctl_table pty_table[] = {
{}
};
-static struct ctl_table pty_kern_table[] = {
- {
- .procname = "pty",
- .mode = 0555,
- .child = pty_table,
- },
- {}
+static const __initdata struct ctl_path pty_path[] = {
+ { .procname = "kernel" },
+ { .procname = "pty" },
+ { }
};
-static struct ctl_table pty_root_table[] = {
- {
- .procname = "kernel",
- .mode = 0555,
- .child = pty_kern_table,
- },
- {}
-};
-
-
static int pty_unix98_ioctl(struct tty_struct *tty,
unsigned int cmd, unsigned long arg)
{
@@ -750,7 +737,7 @@ static void __init unix98_pty_init(void)
if (tty_register_driver(pts_driver))
panic("Couldn't register Unix98 pts driver");
- register_sysctl_table(pty_root_table);
+ register_sysctl_paths(pty_path, pty_table);
/* Now create the /dev/ptmx special device */
tty_default_fops(&ptmx_fops);
--
1.7.5.134.g1c08b
Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
fs/coda/sysctl.c | 12 ++++--------
1 files changed, 4 insertions(+), 8 deletions(-)
diff --git a/fs/coda/sysctl.c b/fs/coda/sysctl.c
index af56ad5..8c328c9 100644
--- a/fs/coda/sysctl.c
+++ b/fs/coda/sysctl.c
@@ -39,19 +39,15 @@ static ctl_table coda_table[] = {
{}
};
-static ctl_table fs_table[] = {
- {
- .procname = "coda",
- .mode = 0555,
- .child = coda_table
- },
- {}
+static const __initdata struct ctl_path coda_path[] = {
+ { .procname = "coda" },
+ { }
};
void coda_sysctl_init(void)
{
if ( !fs_table_header )
- fs_table_header = register_sysctl_table(fs_table);
+ fs_table_header = register_sysctl_paths(coda_path, coda_table);
}
void coda_sysctl_clean(void)
--
1.7.5.134.g1c08b
Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
fs/fscache/main.c | 15 ++++++---------
1 files changed, 6 insertions(+), 9 deletions(-)
diff --git a/fs/fscache/main.c b/fs/fscache/main.c
index f9d8567..7f9c055 100644
--- a/fs/fscache/main.c
+++ b/fs/fscache/main.c
@@ -67,7 +67,7 @@ static int fscache_max_active_sysctl(struct ctl_table *table, int write,
return ret;
}
-ctl_table fscache_sysctls[] = {
+static ctl_table fscache_table[] = {
{
.procname = "object_max_active",
.data = &fscache_object_max_active,
@@ -87,14 +87,11 @@ ctl_table fscache_sysctls[] = {
{}
};
-ctl_table fscache_sysctls_root[] = {
- {
- .procname = "fscache",
- .mode = 0555,
- .child = fscache_sysctls,
- },
- {}
+static const __initdata struct ctl_path fscache_path[] = {
+ { .procname = "fscache" },
+ { }
};
+
#endif
/*
@@ -135,7 +132,7 @@ static int __init fscache_init(void)
#ifdef CONFIG_SYSCTL
ret = -ENOMEM;
- fscache_sysctl_header = register_sysctl_table(fscache_sysctls_root);
+ fscache_sysctl_header = register_sysctl_paths(fscache_path, fscache_table);
if (!fscache_sysctl_header)
goto error_sysctl;
#endif
--
1.7.5.134.g1c08b
Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
fs/lockd/svc.c | 22 +++++-----------------
1 files changed, 5 insertions(+), 17 deletions(-)
diff --git a/fs/lockd/svc.c b/fs/lockd/svc.c
index abfff9d..6ab5932 100644
--- a/fs/lockd/svc.c
+++ b/fs/lockd/svc.c
@@ -355,7 +355,7 @@ EXPORT_SYMBOL_GPL(lockd_down);
* Sysctl parameters (same as module parameters, different interface).
*/
-static ctl_table nlm_sysctls[] = {
+static ctl_table nlm_table[] = {
{
.procname = "nlm_grace_period",
.data = &nlm_grace_period,
@@ -409,21 +409,9 @@ static ctl_table nlm_sysctls[] = {
{ }
};
-static ctl_table nlm_sysctl_dir[] = {
- {
- .procname = "nfs",
- .mode = 0555,
- .child = nlm_sysctls,
- },
- { }
-};
-
-static ctl_table nlm_sysctl_root[] = {
- {
- .procname = "fs",
- .mode = 0555,
- .child = nlm_sysctl_dir,
- },
+static const __initdata struct ctl_path nlm_path[] = {
+ { .procname = "fs" },
+ { .procname = "nfs" },
{ }
};
@@ -504,7 +492,7 @@ module_param(nlm_max_connections, uint, 0644);
static int __init init_nlm(void)
{
#ifdef CONFIG_SYSCTL
- nlm_sysctl_table = register_sysctl_table(nlm_sysctl_root);
+ nlm_sysctl_table = register_sysctl_paths(nlm_path, nlm_table);
return nlm_sysctl_table ? 0 : -ENOMEM;
#else
return 0;
--
1.7.5.134.g1c08b
Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
fs/nfs/sysctl.c | 22 +++++-----------------
1 files changed, 5 insertions(+), 17 deletions(-)
diff --git a/fs/nfs/sysctl.c b/fs/nfs/sysctl.c
index 978aaeb..046fe19 100644
--- a/fs/nfs/sysctl.c
+++ b/fs/nfs/sysctl.c
@@ -21,7 +21,7 @@ static const int nfs_set_port_max = 65535;
#endif
static struct ctl_table_header *nfs_callback_sysctl_table;
-static ctl_table nfs_cb_sysctls[] = {
+static ctl_table nfs_cb_table[] = {
#ifdef CONFIG_NFS_V4
{
.procname = "nfs_callback_tcpport",
@@ -59,27 +59,15 @@ static ctl_table nfs_cb_sysctls[] = {
{ }
};
-static ctl_table nfs_cb_sysctl_dir[] = {
- {
- .procname = "nfs",
- .mode = 0555,
- .child = nfs_cb_sysctls,
- },
- { }
-};
-
-static ctl_table nfs_cb_sysctl_root[] = {
- {
- .procname = "fs",
- .mode = 0555,
- .child = nfs_cb_sysctl_dir,
- },
+static const __initdata struct ctl_path nfs_cb_path[] = {
+ { .procname = "fs" },
+ { .procname = "nfs" },
{ }
};
int nfs_register_sysctl(void)
{
- nfs_callback_sysctl_table = register_sysctl_table(nfs_cb_sysctl_root);
+ nfs_callback_sysctl_table = register_sysctl_paths(nfs_cb_path, nfs_cb_table);
if (nfs_callback_sysctl_table == NULL)
return -ENOMEM;
return 0;
--
1.7.5.134.g1c08b
Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
fs/ntfs/sysctl.c | 15 +++++----------
1 files changed, 5 insertions(+), 10 deletions(-)
diff --git a/fs/ntfs/sysctl.c b/fs/ntfs/sysctl.c
index 79a8918..da1293d 100644
--- a/fs/ntfs/sysctl.c
+++ b/fs/ntfs/sysctl.c
@@ -34,7 +34,7 @@
#include "debug.h"
/* Definition of the ntfs sysctl. */
-static ctl_table ntfs_sysctls[] = {
+static ctl_table ntfs_table[] = {
{
.procname = "ntfs-debug",
.data = &debug_msgs, /* Data pointer and size. */
@@ -45,14 +45,9 @@ static ctl_table ntfs_sysctls[] = {
{}
};
-/* Define the parent directory /proc/sys/fs. */
-static ctl_table sysctls_root[] = {
- {
- .procname = "fs",
- .mode = 0555,
- .child = ntfs_sysctls
- },
- {}
+static const __initdata struct ctl_path ntfs_path[] = {
+ { .procname = "fs" },
+ { }
};
/* Storage for the sysctls header. */
@@ -68,7 +63,7 @@ int ntfs_sysctl(int add)
{
if (add) {
BUG_ON(sysctls_root_table);
- sysctls_root_table = register_sysctl_table(sysctls_root);
+ sysctls_root_table = register_sysctl_paths(ntfs_path, ntfs_table);
if (!sysctls_root_table)
return -ENOMEM;
} else {
--
1.7.5.134.g1c08b
Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
fs/ocfs2/stackglue.c | 36 +++++-------------------------------
1 files changed, 5 insertions(+), 31 deletions(-)
diff --git a/fs/ocfs2/stackglue.c b/fs/ocfs2/stackglue.c
index 39abf89..3cb738a 100644
--- a/fs/ocfs2/stackglue.c
+++ b/fs/ocfs2/stackglue.c
@@ -654,36 +654,10 @@ static ctl_table ocfs2_nm_table[] = {
{ }
};
-static ctl_table ocfs2_mod_table[] = {
- {
- .procname = "nm",
- .data = NULL,
- .maxlen = 0,
- .mode = 0555,
- .child = ocfs2_nm_table
- },
- { }
-};
-
-static ctl_table ocfs2_kern_table[] = {
- {
- .procname = "ocfs2",
- .data = NULL,
- .maxlen = 0,
- .mode = 0555,
- .child = ocfs2_mod_table
- },
- { }
-};
-
-static ctl_table ocfs2_root_table[] = {
- {
- .procname = "fs",
- .data = NULL,
- .maxlen = 0,
- .mode = 0555,
- .child = ocfs2_kern_table
- },
+static const __initdata struct ctl_path ocfs2_nm_path[] = {
+ { .procname = "fs" },
+ { .procname = "ocfs2" },
+ { .procname = "nm" },
{ }
};
@@ -698,7 +672,7 @@ static int __init ocfs2_stack_glue_init(void)
{
strcpy(cluster_stack_name, OCFS2_STACK_PLUGIN_O2CB);
- ocfs2_table_header = register_sysctl_table(ocfs2_root_table);
+ ocfs2_table_header = register_sysctl_paths(ocfs2_nm_path, ocfs2_nm_table);
if (!ocfs2_table_header) {
printk(KERN_ERR
"ocfs2 stack glue: unable to register sysctl\n");
--
1.7.5.134.g1c08b
Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
fs/quota/dquot.c | 21 +++++----------------
1 files changed, 5 insertions(+), 16 deletions(-)
diff --git a/fs/quota/dquot.c b/fs/quota/dquot.c
index d3c032f..7837944 100644
--- a/fs/quota/dquot.c
+++ b/fs/quota/dquot.c
@@ -2591,22 +2591,11 @@ static ctl_table fs_dqstats_table[] = {
{ },
};
-static ctl_table fs_table[] = {
- {
- .procname = "quota",
- .mode = 0555,
- .child = fs_dqstats_table,
- },
- { },
-};
-static ctl_table sys_table[] = {
- {
- .procname = "fs",
- .mode = 0555,
- .child = fs_table,
- },
- { },
+static const __initdata struct ctl_path quota_path[] = {
+ { .procname = "fs" },
+ { .procname = "quota" },
+ { }
};
static int __init dquot_init(void)
@@ -2616,7 +2605,7 @@ static int __init dquot_init(void)
printk(KERN_NOTICE "VFS: Disk quotas %s\n", __DQUOT_VERSION__);
- register_sysctl_table(sys_table);
+ register_sysctl_paths(quota_path, fs_dqstats_table);
dquot_cachep = kmem_cache_create("dquot",
sizeof(struct dquot), sizeof(unsigned long) * 4,
--
1.7.5.134.g1c08b
Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
fs/xfs/linux-2.6/xfs_sysctl.c | 22 +++++-----------------
1 files changed, 5 insertions(+), 17 deletions(-)
diff --git a/fs/xfs/linux-2.6/xfs_sysctl.c b/fs/xfs/linux-2.6/xfs_sysctl.c
index ee2d2ad..95f803c 100644
--- a/fs/xfs/linux-2.6/xfs_sysctl.c
+++ b/fs/xfs/linux-2.6/xfs_sysctl.c
@@ -218,28 +218,16 @@ static ctl_table xfs_table[] = {
{}
};
-static ctl_table xfs_dir_table[] = {
- {
- .procname = "xfs",
- .mode = 0555,
- .child = xfs_table
- },
- {}
-};
-
-static ctl_table xfs_root_table[] = {
- {
- .procname = "fs",
- .mode = 0555,
- .child = xfs_dir_table
- },
- {}
+static const __initdata struct ctl_path xfs_path[] = {
+ { .procname = "fs" },
+ { .procname = "xfs" },
+ { }
};
int
xfs_sysctl_register(void)
{
- xfs_table_header = register_sysctl_table(xfs_root_table);
+ xfs_table_header = register_sysctl_paths(xfs_path, xfs_table);
if (!xfs_table_header)
return -ENOMEM;
return 0;
--
1.7.5.134.g1c08b
Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
ipc/ipc_sysctl.c | 12 ++++--------
1 files changed, 4 insertions(+), 8 deletions(-)
diff --git a/ipc/ipc_sysctl.c b/ipc/ipc_sysctl.c
index 56410fa..9e408a6 100644
--- a/ipc/ipc_sysctl.c
+++ b/ipc/ipc_sysctl.c
@@ -194,18 +194,14 @@ static struct ctl_table ipc_kern_table[] = {
{}
};
-static struct ctl_table ipc_root_table[] = {
- {
- .procname = "kernel",
- .mode = 0555,
- .child = ipc_kern_table,
- },
- {}
+static const __initdata struct ctl_path ipc_path[] = {
+ { .procname = "kernel" },
+ { }
};
static int __init ipc_sysctl_init(void)
{
- register_sysctl_table(ipc_root_table);
+ register_sysctl_paths(ipc_path, ipc_kern_table);
return 0;
}
--
1.7.5.134.g1c08b
Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
ipc/mq_sysctl.c | 24 ++++++------------------
1 files changed, 6 insertions(+), 18 deletions(-)
diff --git a/ipc/mq_sysctl.c b/ipc/mq_sysctl.c
index 0c09366..007164b 100644
--- a/ipc/mq_sysctl.c
+++ b/ipc/mq_sysctl.c
@@ -62,7 +62,7 @@ static int msg_max_limit_max = MAX_MSGMAX;
static int msg_maxsize_limit_min = MIN_MSGSIZEMAX;
static int msg_maxsize_limit_max = MAX_MSGSIZEMAX;
-static ctl_table mq_sysctls[] = {
+static ctl_table mq_table[] = {
{
.procname = "queues_max",
.data = &init_ipc_ns.mq_queues_max,
@@ -91,25 +91,13 @@ static ctl_table mq_sysctls[] = {
{}
};
-static ctl_table mq_sysctl_dir[] = {
- {
- .procname = "mqueue",
- .mode = 0555,
- .child = mq_sysctls,
- },
- {}
-};
-
-static ctl_table mq_sysctl_root[] = {
- {
- .procname = "fs",
- .mode = 0555,
- .child = mq_sysctl_dir,
- },
- {}
+static const struct ctl_path mq_path[] = {
+ { .procname = "fs" },
+ { .procname = "mqueue" },
+ { }
};
struct ctl_table_header *mq_register_sysctl_table(void)
{
- return register_sysctl_table(mq_sysctl_root);
+ return register_sysctl_paths(mq_path, mq_table);
}
--
1.7.5.134.g1c08b
This is just a cleanup patch, it doesn't change any functionality.
Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
kernel/sched.c | 144 ++++++++++++++++++++++++++++++++++++++++----------------
1 files changed, 103 insertions(+), 41 deletions(-)
diff --git a/kernel/sched.c b/kernel/sched.c
index 312f8b9..23a980c 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -6092,6 +6092,95 @@ static void migrate_tasks(unsigned int dead_cpu)
#if defined(CONFIG_SCHED_DEBUG) && defined(CONFIG_SYSCTL)
+
+static struct ctl_table sd_table_template[] = {
+ {
+ .procname = "min_interval",
+ /* .data = &sd->min_interval, */
+ .maxlen = sizeof(long),
+ .mode = 0644,
+ .proc_handler = proc_doulongvec_minmax,
+ },
+ {
+ .procname = "max_interval",
+ /* .data = &sd->max_interval, */
+ .maxlen = sizeof(long),
+ .mode = 0644,
+ .proc_handler = proc_doulongvec_minmax,
+ },
+ {
+ .procname = "busy_idx",
+ /* .data = &sd->busy_idx, */
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec_minmax,
+ },
+ {
+ .procname = "idle_idx",
+ /* .data = &sd->idle_idx, */
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec_minmax,
+ },
+ {
+ .procname = "newidle_idx",
+ /* .data = &sd->newidle_idx, */
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec_minmax,
+ },
+ {
+ .procname = "wake_idx",
+ /* .data = &sd->wake_idx, */
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec_minmax,
+ },
+ {
+ .procname = "forkexec_idx",
+ /* .data = &sd->forkexec_idx, */
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec_minmax,
+ },
+ {
+ .procname = "busy_factor",
+ /* .data = &sd->busy_factor, */
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec_minmax,
+ },
+ {
+ .procname = "imbalance_pct",
+ /* .data = &sd->imbalance_pct, */
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec_minmax,
+ },
+ {
+ .procname = "cache_nice_tries",
+ /* .data = &sd->cache_nice_tries, */
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec_minmax,
+ },
+ {
+ .procname = "flags",
+ /* .data = &sd->flags, */
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec_minmax,
+ },
+ {
+ .procname = "name",
+ /* .data = sd->name, */
+ .maxlen = CORENAME_MAX_SIZE,
+ .mode = 0444,
+ .proc_handler = proc_dostring,
+ },
+ { }
+};
+
static struct ctl_table sd_ctl_dir[] = {
{
.procname = "sched_domain",
@@ -6138,52 +6227,25 @@ static void sd_free_ctl_entry(struct ctl_table **tablep)
*tablep = NULL;
}
-static void
-set_table_entry(struct ctl_table *entry,
- const char *procname, void *data, int maxlen,
- mode_t mode, proc_handler *proc_handler)
-{
- entry->procname = procname;
- entry->data = data;
- entry->maxlen = maxlen;
- entry->mode = mode;
- entry->proc_handler = proc_handler;
-}
-
static struct ctl_table *
sd_alloc_ctl_domain_table(struct sched_domain *sd)
{
- struct ctl_table *table = sd_alloc_ctl_entry(13);
-
+ struct ctl_table *table = kmemdup(&sd_table_template,
+ sizeof(sd_table_template), GFP_KERNEL);
if (table == NULL)
return NULL;
-
- set_table_entry(&table[0], "min_interval", &sd->min_interval,
- sizeof(long), 0644, proc_doulongvec_minmax);
- set_table_entry(&table[1], "max_interval", &sd->max_interval,
- sizeof(long), 0644, proc_doulongvec_minmax);
- set_table_entry(&table[2], "busy_idx", &sd->busy_idx,
- sizeof(int), 0644, proc_dointvec_minmax);
- set_table_entry(&table[3], "idle_idx", &sd->idle_idx,
- sizeof(int), 0644, proc_dointvec_minmax);
- set_table_entry(&table[4], "newidle_idx", &sd->newidle_idx,
- sizeof(int), 0644, proc_dointvec_minmax);
- set_table_entry(&table[5], "wake_idx", &sd->wake_idx,
- sizeof(int), 0644, proc_dointvec_minmax);
- set_table_entry(&table[6], "forkexec_idx", &sd->forkexec_idx,
- sizeof(int), 0644, proc_dointvec_minmax);
- set_table_entry(&table[7], "busy_factor", &sd->busy_factor,
- sizeof(int), 0644, proc_dointvec_minmax);
- set_table_entry(&table[8], "imbalance_pct", &sd->imbalance_pct,
- sizeof(int), 0644, proc_dointvec_minmax);
- set_table_entry(&table[9], "cache_nice_tries",
- &sd->cache_nice_tries,
- sizeof(int), 0644, proc_dointvec_minmax);
- set_table_entry(&table[10], "flags", &sd->flags,
- sizeof(int), 0644, proc_dointvec_minmax);
- set_table_entry(&table[11], "name", sd->name,
- CORENAME_MAX_SIZE, 0444, proc_dostring);
- /* &table[12] is terminator */
+ table[ 0].data = &sd->min_interval;
+ table[ 1].data = &sd->max_interval;
+ table[ 2].data = &sd->busy_idx;
+ table[ 3].data = &sd->idle_idx;
+ table[ 4].data = &sd->newidle_idx;
+ table[ 5].data = &sd->wake_idx;
+ table[ 6].data = &sd->forkexec_idx;
+ table[ 7].data = &sd->busy_factor;
+ table[ 8].data = &sd->imbalance_pct;
+ table[ 9].data = &sd->cache_nice_tries;
+ table[10].data = &sd->flags;
+ table[11].data = sd->name;
return table;
}
--
1.7.5.134.g1c08b
Note: this patch makes sure to add empty kernel/sched_domain/cpuX/
directories when there are no domains in them.
This was the behaviour before this patch, and I thought it may need to
remain so in the new implementation. If they are not necessary this
can be removed to simplify the code.
Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
kernel/sched.c | 266 ++++++++++++++++++++++++++++++++++++++------------------
1 files changed, 180 insertions(+), 86 deletions(-)
diff --git a/kernel/sched.c b/kernel/sched.c
index 23a980c..6e39b7c 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -6181,52 +6181,6 @@ static struct ctl_table sd_table_template[] = {
{ }
};
-static struct ctl_table sd_ctl_dir[] = {
- {
- .procname = "sched_domain",
- .mode = 0555,
- },
- {}
-};
-
-static struct ctl_table sd_ctl_root[] = {
- {
- .procname = "kernel",
- .mode = 0555,
- .child = sd_ctl_dir,
- },
- {}
-};
-
-static struct ctl_table *sd_alloc_ctl_entry(int n)
-{
- struct ctl_table *entry =
- kcalloc(n, sizeof(struct ctl_table), GFP_KERNEL);
-
- return entry;
-}
-
-static void sd_free_ctl_entry(struct ctl_table **tablep)
-{
- struct ctl_table *entry;
-
- /*
- * In the intermediate directories, both the child directory and
- * procname are dynamically allocated and could fail but the mode
- * will always be set. In the lowest directory the names are
- * static strings and all have proc handlers.
- */
- for (entry = *tablep; entry->mode; entry++) {
- if (entry->child)
- sd_free_ctl_entry(&entry->child);
- if (entry->proc_handler == NULL)
- kfree(entry->procname);
- }
-
- kfree(*tablep);
- *tablep = NULL;
-}
-
static struct ctl_table *
sd_alloc_ctl_domain_table(struct sched_domain *sd)
{
@@ -6250,64 +6204,204 @@ sd_alloc_ctl_domain_table(struct sched_domain *sd)
return table;
}
-static ctl_table *sd_alloc_ctl_cpu_table(int cpu)
+/*
+ * Find out what is the maximum number of domains in a cpu, and the
+ * total number of domains across all cpus.
+ */
+static void count_sd_domains(int *p_max, int *p_total)
{
- struct ctl_table *entry, *table;
- struct sched_domain *sd;
- int domain_num = 0, i;
- char buf[32];
+ int cpu;
+ int max = 0;
+ int total = 0;
- for_each_domain(cpu, sd)
- domain_num++;
- entry = table = sd_alloc_ctl_entry(domain_num + 1);
- if (table == NULL)
- return NULL;
+ for_each_possible_cpu(cpu) {
+ struct sched_domain *sd;
+ int domain_num = 0;
- i = 0;
- for_each_domain(cpu, sd) {
- snprintf(buf, 32, "domain%d", i);
- entry->procname = kstrdup(buf, GFP_KERNEL);
- entry->mode = 0555;
- entry->child = sd_alloc_ctl_domain_table(sd);
- entry++;
- i++;
+ for_each_domain(cpu, sd)
+ domain_num++;
+
+ if (domain_num > max)
+ max = domain_num;
+ total += domain_num;
}
- return table;
+ *p_max = max;
+ *p_total = total;
}
-static struct ctl_table_header *sd_sysctl_header;
+
+/* enough space to hold a string "cpu%d" or "domain%d" */
+#define SD_NAME_LEN 32
+typedef char sd_name_buf[SD_NAME_LEN];
+
+static sd_name_buf *sd_cpu_names, *sd_domain_names;
+static int sd_domain_headers_num, sd_cpudir_headers_num;
+static struct ctl_table_header **sd_domain_headers, **sd_cpudir_headers;
+
static void register_sched_domain_sysctl(void)
{
- int i, cpu_num = num_possible_cpus();
- struct ctl_table *entry = sd_alloc_ctl_entry(cpu_num + 1);
- char buf[32];
+ int cpu, i;
+ int cpu_num, max_domain_num;
+
+ /* possitions 2 and 3 in the array bellow */
+#define SD_PATH_CPU 2
+#define SD_PATH_DOM 3
+ struct ctl_path sd_path[] = {
+ { .procname = "kernel" },
+ { .procname = "sched_domain" },
+ { /* 'cpu0' */ },
+ { /* 'domain0' */ },
+ { },
+ };
- WARN_ON(sd_ctl_dir[0].child);
- sd_ctl_dir[0].child = entry;
+ sd_cpudir_headers_num = cpu_num = num_possible_cpus();
+ count_sd_domains(&max_domain_num, &sd_domain_headers_num);
- if (entry == NULL)
- return;
+ /*
+ * Allocate space for:
+ * - all cpu names (cpu0, cpu1,...) and all domain names (domain0,...)
+ * - the array of headers for cpu dirs kernel/sched_domain/cpuX/
+ * - the array of headers for domain dirs kernel/sched_domain/cpuX/domainY
+ *
+ * We only register the empty kernel/sched_domain/cpuX/ dirs
+ * to not break the ABI: if there were no domains defined, we
+ * would still have empty cpuX dir entries in
+ * kernel/sched_domain/.
+ *
+ * If this is not considered useful or part of the ABI, then
+ * we can drop the empty cpu dir entries.
+ */
+ sd_cpu_names = kmalloc(sizeof(sd_name_buf) * cpu_num, GFP_KERNEL);
+ if (sd_cpu_names == NULL)
+ goto fail_alloc_sd_cpu_names;
- for_each_possible_cpu(i) {
- snprintf(buf, 32, "cpu%d", i);
- entry->procname = kstrdup(buf, GFP_KERNEL);
- entry->mode = 0555;
- entry->child = sd_alloc_ctl_cpu_table(i);
- entry++;
+ sd_domain_names = kmalloc(sizeof(sd_name_buf) * max_domain_num, GFP_KERNEL);
+ if (sd_domain_names == NULL)
+ goto fail_alloc_sd_domain_names;
+
+ sd_cpudir_headers = kmalloc(sizeof(*sd_cpudir_headers) *
+ sd_cpudir_headers_num, GFP_KERNEL);
+ if (sd_cpudir_headers == NULL)
+ goto fail_alloc_sd_cpudir_headers;
+
+ sd_domain_headers = kmalloc(sizeof(*sd_domain_headers) *
+ sd_domain_headers_num, GFP_KERNEL);
+ if (sd_domain_headers == NULL)
+ goto fail_alloc_sd_domain_headers;
+
+ for_each_possible_cpu(cpu)
+ snprintf((char*)&sd_cpu_names[cpu], SD_NAME_LEN, "cpu%d", cpu);
+ for (i = 0; i < max_domain_num; i++)
+ snprintf((char*)&sd_domain_names[i], SD_NAME_LEN, "domain%d", i);
+
+ i = 0;
+ for_each_possible_cpu(cpu) {
+ struct ctl_table *empty = kzalloc(sizeof(*empty), GFP_KERNEL);
+ if (empty == NULL)
+ goto unregister_sd_cpudir_headers;
+ sd_path[SD_PATH_CPU].procname = sd_cpu_names[cpu];
+ sd_path[SD_PATH_DOM].procname = NULL; /* end of array sentinel */
+ sd_cpudir_headers[i] = register_sysctl_paths(sd_path, empty);
+ if (sd_cpudir_headers[i] == NULL) {
+ kfree(empty);
+ goto unregister_sd_cpudir_headers;
+ }
+ i++;
+ }
+
+ i = 0;
+ for_each_possible_cpu(cpu) {
+ struct sched_domain *sd;
+ int domain = 0;
+ for_each_domain(cpu, sd) {
+ struct ctl_table *table = sd_alloc_ctl_domain_table(sd);
+ if (table == NULL)
+ goto unregister_sd_domain_headers;
+ sd_path[SD_PATH_CPU].procname = sd_cpu_names[cpu];
+ sd_path[SD_PATH_DOM].procname = sd_domain_names[domain];
+ sd_domain_headers[i] = register_sysctl_paths(sd_path, table);
+ if (sd_domain_headers[i] == NULL) {
+ kfree(table);
+ goto unregister_sd_domain_headers;
+ }
+ i++;
+ domain++;
+ }
}
- WARN_ON(sd_sysctl_header);
- sd_sysctl_header = register_sysctl_table(sd_ctl_root);
+ return;
+
+unregister_sd_domain_headers:
+ i--; /* the current 'i' was being filled in, but fail_alloced */
+ for(; i >= 0; i--) {
+ struct ctl_table *table = sd_domain_headers[i]->ctl_table_arg;
+ unregister_sysctl_table(sd_domain_headers[i]);
+ kfree(table);
+ }
+ i = sd_cpudir_headers_num;
+unregister_sd_cpudir_headers:
+ i--;
+ for(; i >= 0; i--) {
+ struct ctl_table *table = sd_cpudir_headers[i]->ctl_table_arg;
+ unregister_sysctl_table(sd_cpudir_headers[i]);
+ kfree(table);
+ }
+
+ kfree(sd_domain_headers);
+fail_alloc_sd_domain_headers:
+ kfree(sd_cpudir_headers);
+fail_alloc_sd_cpudir_headers:
+ kfree(sd_domain_names);
+fail_alloc_sd_domain_names:
+ kfree(sd_cpu_names);
+fail_alloc_sd_cpu_names:
+ sd_domain_headers = NULL;
+ sd_cpudir_headers = NULL;
+ sd_domain_names = NULL;
+ sd_cpu_names = NULL;
+ sd_domain_headers_num = 0;
+ sd_cpudir_headers_num = 0;
}
/* may be called multiple times per register */
static void unregister_sched_domain_sysctl(void)
{
- if (sd_sysctl_header)
- unregister_sysctl_table(sd_sysctl_header);
- sd_sysctl_header = NULL;
- if (sd_ctl_dir[0].child)
- sd_free_ctl_entry(&sd_ctl_dir[0].child);
+ int i;
+
+ /* because this function may be called multiple times (not
+ * concurrently) for a single register_sched_domain_sysctl call,
+ * we skip unregistering if it was already done by a previous
+ * call. This is also why we make sure to NULLify all
+ * pointers: make sure nothing is double-freed. */
+ if (sd_domain_headers == NULL)
+ return;
+
+ /* unregister in the reverse order of registering, or we'll
+ * get a harmless warning saying that the parent of a header
+ * was registered before all it's children. */
+ for(i = sd_domain_headers_num - 1; i >= 0; i--) {
+ struct ctl_table *table = sd_domain_headers[i]->ctl_table_arg;
+ unregister_sysctl_table(sd_domain_headers[i]);
+ kfree(table);
+ }
+
+ for(i = sd_cpudir_headers_num - 1; i >= 0; i--) {
+ struct ctl_table *table = sd_cpudir_headers[i]->ctl_table_arg;
+ unregister_sysctl_table(sd_cpudir_headers[i]);
+ kfree(table);
+ }
+
+ kfree(sd_domain_headers);
+ kfree(sd_cpudir_headers);
+ kfree(sd_domain_names);
+ kfree(sd_cpu_names);
+
+ sd_domain_headers = NULL;
+ sd_cpudir_headers = NULL;
+ sd_domain_names = NULL;
+ sd_cpu_names = NULL;
+ sd_cpudir_headers_num = 0;
+ sd_domain_headers_num = 0;
}
#else
static void register_sched_domain_sysctl(void)
--
1.7.5.134.g1c08b
Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
kernel/utsname_sysctl.c | 14 +++++---------
1 files changed, 5 insertions(+), 9 deletions(-)
diff --git a/kernel/utsname_sysctl.c b/kernel/utsname_sysctl.c
index a2cd77e..7606026 100644
--- a/kernel/utsname_sysctl.c
+++ b/kernel/utsname_sysctl.c
@@ -57,7 +57,7 @@ static int proc_do_uts_string(ctl_table *table, int write,
#define proc_do_uts_string NULL
#endif
-static struct ctl_table uts_kern_table[] = {
+static struct ctl_table uts_table[] = {
{
.procname = "ostype",
.data = init_uts_ns.name.sysname,
@@ -96,18 +96,14 @@ static struct ctl_table uts_kern_table[] = {
{}
};
-static struct ctl_table uts_root_table[] = {
- {
- .procname = "kernel",
- .mode = 0555,
- .child = uts_kern_table,
- },
- {}
+static const __initdata struct ctl_path uts_path[] = {
+ { .procname = "kernel" },
+ { },
};
static int __init utsname_sysctl_init(void)
{
- register_sysctl_table(uts_root_table);
+ register_sysctl_paths(uts_path, uts_table);
return 0;
}
--
1.7.5.134.g1c08b
Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
net/sunrpc/sysctl.c | 19 +++++++------------
1 files changed, 7 insertions(+), 12 deletions(-)
diff --git a/net/sunrpc/sysctl.c b/net/sunrpc/sysctl.c
index e65dcc6..7450ab2 100644
--- a/net/sunrpc/sysctl.c
+++ b/net/sunrpc/sysctl.c
@@ -38,13 +38,17 @@ EXPORT_SYMBOL_GPL(nlm_debug);
#ifdef RPC_DEBUG
static struct ctl_table_header *sunrpc_table_header;
-static ctl_table sunrpc_table[];
+static ctl_table sunrpc_table[];
+static const struct ctl_path sunrpc_path[] = {
+ { .procname = "sunrpc" },
+ { }
+};
void
rpc_register_sysctl(void)
{
if (!sunrpc_table_header)
- sunrpc_table_header = register_sysctl_table(sunrpc_table);
+ sunrpc_table_header = register_sysctl_paths(sunrpc_path, sunrpc_table);
}
void
@@ -133,7 +137,7 @@ done:
}
-static ctl_table debug_table[] = {
+static ctl_table sunrpc_table[] = {
{
.procname = "rpc_debug",
.data = &rpc_debug,
@@ -171,13 +175,4 @@ static ctl_table debug_table[] = {
{ }
};
-static ctl_table sunrpc_table[] = {
- {
- .procname = "sunrpc",
- .mode = 0555,
- .child = debug_table
- },
- { }
-};
-
#endif
--
1.7.5.134.g1c08b
Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
net/sunrpc/xprtrdma/svc_rdma.c | 26 +++++++-------------------
1 files changed, 7 insertions(+), 19 deletions(-)
diff --git a/net/sunrpc/xprtrdma/svc_rdma.c b/net/sunrpc/xprtrdma/svc_rdma.c
index 09af4fa..d7c0a70 100644
--- a/net/sunrpc/xprtrdma/svc_rdma.c
+++ b/net/sunrpc/xprtrdma/svc_rdma.c
@@ -118,7 +118,7 @@ static int read_reset_stat(ctl_table *table, int write,
}
static struct ctl_table_header *svcrdma_table_header;
-static ctl_table svcrdma_parm_table[] = {
+static ctl_table svcrdma_table[] = {
{
.procname = "max_requests",
.data = &svcrdma_max_requests,
@@ -213,22 +213,10 @@ static ctl_table svcrdma_parm_table[] = {
{ },
};
-static ctl_table svcrdma_table[] = {
- {
- .procname = "svc_rdma",
- .mode = 0555,
- .child = svcrdma_parm_table
- },
- { },
-};
-
-static ctl_table svcrdma_root_table[] = {
- {
- .procname = "sunrpc",
- .mode = 0555,
- .child = svcrdma_table
- },
- { },
+static const struct ctl_path svcrdma_path[] = {
+ { .procname = "sunrpc" },
+ { .procname = "svc_rdma" },
+ { }
};
void svc_rdma_cleanup(void)
@@ -258,8 +246,8 @@ int svc_rdma_init(void)
return -ENOMEM;
if (!svcrdma_table_header)
- svcrdma_table_header =
- register_sysctl_table(svcrdma_root_table);
+ svcrdma_table_header = register_sysctl_paths(
+ svcrdma_path, svcrdma_table);
/* Create the temporary map cache */
svc_rdma_map_cachep = kmem_cache_create("svc_rdma_map_cache",
--
1.7.5.134.g1c08b
Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
net/sunrpc/xprtrdma/transport.c | 14 +++++---------
1 files changed, 5 insertions(+), 9 deletions(-)
diff --git a/net/sunrpc/xprtrdma/transport.c b/net/sunrpc/xprtrdma/transport.c
index 0867070..9736c93 100644
--- a/net/sunrpc/xprtrdma/transport.c
+++ b/net/sunrpc/xprtrdma/transport.c
@@ -85,7 +85,7 @@ static unsigned int max_memreg = RPCRDMA_LAST - 1;
static struct ctl_table_header *sunrpc_table_header;
-static ctl_table xr_tunables_table[] = {
+static ctl_table rdma_table[] = {
{
.procname = "rdma_slot_table_entries",
.data = &xprt_rdma_slot_table_entries,
@@ -137,13 +137,9 @@ static ctl_table xr_tunables_table[] = {
{ },
};
-static ctl_table sunrpc_table[] = {
- {
- .procname = "sunrpc",
- .mode = 0555,
- .child = xr_tunables_table
- },
- { },
+static const struct ctl_path sunrpc_path[] = {
+ { .procname = "sunrpc" },
+ { }
};
#endif
@@ -771,7 +767,7 @@ static int __init xprt_rdma_init(void)
#ifdef RPC_DEBUG
if (!sunrpc_table_header)
- sunrpc_table_header = register_sysctl_table(sunrpc_table);
+ sunrpc_table_header = register_sysctl_paths(sunrpc_path, rdma_table);
#endif
return 0;
}
--
1.7.5.134.g1c08b
---
net/sunrpc/xprtsock.c | 16 ++++++----------
1 files changed, 6 insertions(+), 10 deletions(-)
diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
index bf005d3..610a2fe 100644
--- a/net/sunrpc/xprtsock.c
+++ b/net/sunrpc/xprtsock.c
@@ -59,7 +59,7 @@ static unsigned int xs_tcp_fin_timeout __read_mostly = XS_TCP_LINGER_TO;
/*
* We can register our own files under /proc/sys/sunrpc by
- * calling register_sysctl_table() again. The files in that
+ * calling register_sysctl_paths() again. The files in that
* directory become the union of all files registered there.
*
* We simply need to make sure that we don't collide with
@@ -79,7 +79,7 @@ static struct ctl_table_header *sunrpc_table_header;
* FIXME: changing the UDP slot table size should also resize the UDP
* socket buffers for existing UDP transports
*/
-static ctl_table xs_tunables_table[] = {
+static ctl_table xprtsock_table[] = {
{
.procname = "udp_slot_table_entries",
.data = &xprt_udp_slot_table_entries,
@@ -126,13 +126,9 @@ static ctl_table xs_tunables_table[] = {
{ },
};
-static ctl_table sunrpc_table[] = {
- {
- .procname = "sunrpc",
- .mode = 0555,
- .child = xs_tunables_table
- },
- { },
+static const struct ctl_path sunrpc_path[] = {
+ { .procname = "sunrpc" },
+ { }
};
#endif
@@ -2470,7 +2466,7 @@ int init_socket_xprt(void)
{
#ifdef RPC_DEBUG
if (!sunrpc_table_header)
- sunrpc_table_header = register_sysctl_table(sunrpc_table);
+ sunrpc_table_header = register_sysctl_paths(sunrpc_path, xprtsock_table);
#endif
xprt_register_transport(&xs_udp_transport);
--
1.7.5.134.g1c08b
Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
arch/arm/kernel/isa.c | 31 ++++++++++++-------------------
1 files changed, 12 insertions(+), 19 deletions(-)
diff --git a/arch/arm/kernel/isa.c b/arch/arm/kernel/isa.c
index 3464859..0236609 100644
--- a/arch/arm/kernel/isa.c
+++ b/arch/arm/kernel/isa.c
@@ -20,44 +20,37 @@
static unsigned int isa_membase, isa_portbase, isa_portshift;
-static ctl_table ctl_isa_vars[4] = {
+static ctl_table isa_table[] = {
{
.procname = "membase",
.data = &isa_membase,
.maxlen = sizeof(isa_membase),
.mode = 0444,
.proc_handler = proc_dointvec,
- }, {
+ },
+ {
.procname = "portbase",
.data = &isa_portbase,
.maxlen = sizeof(isa_portbase),
.mode = 0444,
.proc_handler = proc_dointvec,
- }, {
+ },
+ {
.procname = "portshift",
.data = &isa_portshift,
.maxlen = sizeof(isa_portshift),
.mode = 0444,
.proc_handler = proc_dointvec,
- }, {}
+ },
+ { }
};
static struct ctl_table_header *isa_sysctl_header;
-static ctl_table ctl_isa[2] = {
- {
- .procname = "isa",
- .mode = 0555,
- .child = ctl_isa_vars,
- }, {}
-};
-
-static ctl_table ctl_bus[2] = {
- {
- .procname = "bus",
- .mode = 0555,
- .child = ctl_isa,
- }, {}
+static const __initdata struct ctl_path isa_path[] = {
+ { .procname = "bus" },
+ { .procname = "isa" },
+ { }
};
void __init
@@ -66,5 +59,5 @@ register_isa_ports(unsigned int membase, unsigned int portbase, unsigned int por
isa_membase = membase;
isa_portbase = portbase;
isa_portshift = portshift;
- isa_sysctl_header = register_sysctl_table(ctl_bus);
+ isa_sysctl_header = register_sysctl_paths(isa_path, isa_table);
}
--
1.7.5.134.g1c08b
Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
arch/arm/mach-bcmring/arch.c | 25 ++++++++++++-------------
1 files changed, 12 insertions(+), 13 deletions(-)
diff --git a/arch/arm/mach-bcmring/arch.c b/arch/arm/mach-bcmring/arch.c
index 73eb066..33c10fd 100644
--- a/arch/arm/mach-bcmring/arch.c
+++ b/arch/arm/mach-bcmring/arch.c
@@ -55,20 +55,18 @@ static struct ctl_table_header *bcmring_sysctl_header;
static struct ctl_table bcmring_sysctl_warm_reboot[] = {
{
- .procname = "warm",
- .data = &bcmring_arch_warm_reboot,
- .maxlen = sizeof(int),
- .mode = 0644,
- .proc_handler = proc_dointvec},
- {}
+ .procname = "warm",
+ .data = &bcmring_arch_warm_reboot,
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec,
+ },
+ { }
};
-static struct ctl_table bcmring_sysctl_reboot[] = {
- {
- .procname = "reboot",
- .mode = 0555,
- .child = bcmring_sysctl_warm_reboot},
- {}
+static const __initdata struct ctl_path bcmring_sysctl_path[] = {
+ { .procname = "reboot" },
+ { }
};
static struct resource nand_resource[] = {
@@ -117,7 +115,8 @@ static struct platform_device *devices[] __initdata = {
static void __init bcmring_init_machine(void)
{
- bcmring_sysctl_header = register_sysctl_table(bcmring_sysctl_reboot);
+ bcmring_sysctl_header = register_sysctl_paths(bcmring_sysctl_path,
+ bcmring_sysctl_warm_reboot);
/* Enable spread spectrum */
chipcHw_enableSpreadSpectrum();
--
1.7.5.134.g1c08b
Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
arch/mips/lasat/sysctl.c | 13 ++++---------
1 files changed, 4 insertions(+), 9 deletions(-)
diff --git a/arch/mips/lasat/sysctl.c b/arch/mips/lasat/sysctl.c
index d87ffd0..a6191f0 100644
--- a/arch/mips/lasat/sysctl.c
+++ b/arch/mips/lasat/sysctl.c
@@ -262,21 +262,16 @@ static ctl_table lasat_table[] = {
{}
};
-static ctl_table lasat_root_table[] = {
- {
- .procname = "lasat",
- .mode = 0555,
- .child = lasat_table
- },
- {}
+static const __initdata struct ctl_path lasat_path[] = {
+ { .procname = "lasat" },
+ { }
};
static int __init lasat_register_sysctl(void)
{
struct ctl_table_header *lasat_table_header;
- lasat_table_header =
- register_sysctl_table(lasat_root_table);
+ lasat_table_header = register_sysctl_paths(lasat_path, lasat_table);
if (!lasat_table_header) {
printk(KERN_ERR "Unable to register LASAT sysctl\n");
return -ENOMEM;
--
1.7.5.134.g1c08b
Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
arch/s390/appldata/appldata_base.c | 42 ++++++++++++++++++------------------
1 files changed, 21 insertions(+), 21 deletions(-)
diff --git a/arch/s390/appldata/appldata_base.c b/arch/s390/appldata/appldata_base.c
index 5c91995..0f336a8 100644
--- a/arch/s390/appldata/appldata_base.c
+++ b/arch/s390/appldata/appldata_base.c
@@ -49,7 +49,6 @@ static struct platform_device *appldata_pdev;
/*
* /proc entries (sysctl)
*/
-static const char appldata_proc_name[APPLDATA_PROC_NAME_LENGTH] = "appldata";
static int appldata_timer_handler(ctl_table *ctl, int write,
void __user *buffer, size_t *lenp, loff_t *ppos);
static int appldata_interval_handler(ctl_table *ctl, int write,
@@ -71,14 +70,9 @@ static struct ctl_table appldata_table[] = {
{ },
};
-static struct ctl_table appldata_dir_table[] = {
- {
- .procname = appldata_proc_name,
- .maxlen = 0,
- .mode = S_IRUGO | S_IXUGO,
- .child = appldata_table,
- },
- { },
+static const struct ctl_path appldata_path[] = {
+ { .procname = "appldata" },
+ { }
};
/*
@@ -424,6 +418,18 @@ out:
/************************* module-ops management *****************************/
+
+static const struct ctl_table appldata_ops_template[2] = {
+ {
+ .procname = NULL, /* ops->name */
+ .data = NULL, /* ops */
+ .maxlen = 0,
+ .mode = S_IRUGO | S_IWUSR,
+ .proc_handler = appldata_generic_handler,
+ },
+ { }
+};
+
/*
* appldata_register_ops()
*
@@ -434,7 +440,8 @@ int appldata_register_ops(struct appldata_ops *ops)
if (ops->size > APPLDATA_MAX_REC_SIZE)
return -EINVAL;
- ops->ctl_table = kzalloc(4 * sizeof(struct ctl_table), GFP_KERNEL);
+ ops->ctl_table = kmemdup(&appldata_ops_template,
+ sizeof(appldata_ops_template), GFP_KERNEL);
if (!ops->ctl_table)
return -ENOMEM;
@@ -442,17 +449,10 @@ int appldata_register_ops(struct appldata_ops *ops)
list_add(&ops->list, &appldata_ops_list);
mutex_unlock(&appldata_ops_mutex);
- ops->ctl_table[0].procname = appldata_proc_name;
- ops->ctl_table[0].maxlen = 0;
- ops->ctl_table[0].mode = S_IRUGO | S_IXUGO;
- ops->ctl_table[0].child = &ops->ctl_table[2];
-
- ops->ctl_table[2].procname = ops->name;
- ops->ctl_table[2].mode = S_IRUGO | S_IWUSR;
- ops->ctl_table[2].proc_handler = appldata_generic_handler;
- ops->ctl_table[2].data = ops;
+ ops->ctl_table[0].procname = ops->name;
+ ops->ctl_table[0].data = ops;
- ops->sysctl_header = register_sysctl_table(ops->ctl_table);
+ ops->sysctl_header = register_sysctl_paths(appldata_path, ops->ctl_table);
if (!ops->sysctl_header)
goto out;
return 0;
@@ -649,7 +649,7 @@ static int __init appldata_init(void)
/* Register cpu hotplug notifier */
register_hotcpu_notifier(&appldata_nb);
- appldata_sysctl_header = register_sysctl_table(appldata_dir_table);
+ appldata_sysctl_header = register_sysctl_paths(appldata_path, appldata_table);
return 0;
out_device:
--
1.7.5.134.g1c08b
Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
arch/s390/kernel/debug.c | 13 ++++---------
1 files changed, 4 insertions(+), 9 deletions(-)
diff --git a/arch/s390/kernel/debug.c b/arch/s390/kernel/debug.c
index 5ad6bc0..384f67b 100644
--- a/arch/s390/kernel/debug.c
+++ b/arch/s390/kernel/debug.c
@@ -902,7 +902,7 @@ static struct ctl_table s390dbf_table[] = {
.mode = S_IRUGO | S_IWUSR,
.proc_handler = proc_dointvec,
},
- {
+ {
.procname = "debug_active",
.data = &debug_active,
.maxlen = sizeof(int),
@@ -912,13 +912,8 @@ static struct ctl_table s390dbf_table[] = {
{ }
};
-static struct ctl_table s390dbf_dir_table[] = {
- {
- .procname = "s390dbf",
- .maxlen = 0,
- .mode = S_IRUGO | S_IXUGO,
- .child = s390dbf_table,
- },
+static const __initdata struct ctl_path s390dbf_path[] = {
+ { .procname = "s390dbf" },
{ }
};
@@ -1071,7 +1066,7 @@ __init debug_init(void)
{
int rc = 0;
- s390dbf_sysctl_header = register_sysctl_table(s390dbf_dir_table);
+ s390dbf_sysctl_header = register_sysctl_paths(s390dbf_path, s390dbf_table);
mutex_lock(&debug_mutex);
debug_debugfs_root_entry = debugfs_create_dir(DEBUG_DIR_ROOT,NULL);
initialized = 1;
--
1.7.5.134.g1c08b
Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
arch/s390/mm/cmm.c | 11 +++--------
1 files changed, 3 insertions(+), 8 deletions(-)
diff --git a/arch/s390/mm/cmm.c b/arch/s390/mm/cmm.c
index c66ffd8..0ef5bbf 100644
--- a/arch/s390/mm/cmm.c
+++ b/arch/s390/mm/cmm.c
@@ -348,13 +348,8 @@ static struct ctl_table cmm_table[] = {
{ }
};
-static struct ctl_table cmm_dir_table[] = {
- {
- .procname = "vm",
- .maxlen = 0,
- .mode = 0555,
- .child = cmm_table,
- },
+static const __initdata struct ctl_path cmm_path[] = {
+ { .procname = "vm" },
{ }
};
@@ -434,7 +429,7 @@ static int __init cmm_init(void)
{
int rc = -ENOMEM;
- cmm_sysctl_header = register_sysctl_table(cmm_dir_table);
+ cmm_sysctl_header = register_sysctl_paths(cmm_path, cmm_table);
if (!cmm_sysctl_header)
goto out_sysctl;
#ifdef CONFIG_CMM_IUCV
--
1.7.5.134.g1c08b
Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
arch/ia64/kernel/perfmon.c | 23 +++++++----------------
1 files changed, 7 insertions(+), 16 deletions(-)
diff --git a/arch/ia64/kernel/perfmon.c b/arch/ia64/kernel/perfmon.c
index 89accc6..96743dd 100644
--- a/arch/ia64/kernel/perfmon.c
+++ b/arch/ia64/kernel/perfmon.c
@@ -552,22 +552,13 @@ static ctl_table pfm_ctl_table[]={
},
{}
};
-static ctl_table pfm_sysctl_dir[] = {
- {
- .procname = "perfmon",
- .mode = 0555,
- .child = pfm_ctl_table,
- },
- {}
-};
-static ctl_table pfm_sysctl_root[] = {
- {
- .procname = "kernel",
- .mode = 0555,
- .child = pfm_sysctl_dir,
- },
- {}
+
+static const __initdata struct ctl_path pfm_path[] = {
+ { .procname = "kernel" },
+ { .procname = "perfmon" },
+ { }
};
+
static struct ctl_table_header *pfm_sysctl_header;
static int pfm_context_unload(pfm_context_t *ctx, void *arg, int count, struct pt_regs *regs);
@@ -6687,7 +6678,7 @@ pfm_init(void)
/*
* create /proc/sys/kernel/perfmon (for debugging purposes)
*/
- pfm_sysctl_header = register_sysctl_table(pfm_sysctl_root);
+ pfm_sysctl_header = register_sysctl_paths(pfm_path, pfm_ctl_table);
/*
* initialize all our spinlocks
--
1.7.5.134.g1c08b
Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
arch/ia64/kernel/crash.c | 13 +++++--------
1 files changed, 5 insertions(+), 8 deletions(-)
diff --git a/arch/ia64/kernel/crash.c b/arch/ia64/kernel/crash.c
index b942f40..e54aea5 100644
--- a/arch/ia64/kernel/crash.c
+++ b/arch/ia64/kernel/crash.c
@@ -255,17 +255,14 @@ static ctl_table kdump_ctl_table[] = {
{ }
};
-static ctl_table sys_table[] = {
- {
- .procname = "kernel",
- .mode = 0555,
- .child = kdump_ctl_table,
- },
+static const __initdata struct ctl_path kdump_path[] = {
+ { .procname = "kernel" },
{ }
};
+
#endif
-static int
+static __init int
machine_crash_setup(void)
{
/* be notified before default_monarch_init_process */
@@ -277,7 +274,7 @@ machine_crash_setup(void)
if((ret = register_die_notifier(&kdump_init_notifier_nb)) != 0)
return ret;
#ifdef CONFIG_SYSCTL
- register_sysctl_table(sys_table);
+ register_sysctl_paths(kdump_path, kdump_ctl_table);
#endif
return 0;
}
--
1.7.5.134.g1c08b
Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
arch/powerpc/kernel/idle.c | 13 +++++--------
1 files changed, 5 insertions(+), 8 deletions(-)
diff --git a/arch/powerpc/kernel/idle.c b/arch/powerpc/kernel/idle.c
index 39a2baa..88d03c5 100644
--- a/arch/powerpc/kernel/idle.c
+++ b/arch/powerpc/kernel/idle.c
@@ -118,19 +118,16 @@ static ctl_table powersave_nap_ctl_table[]={
},
{}
};
-static ctl_table powersave_nap_sysctl_root[] = {
- {
- .procname = "kernel",
- .mode = 0555,
- .child = powersave_nap_ctl_table,
- },
- {}
+
+static const __initdata struct ctl_path powersave_nap_path[] = {
+ { .procname = "kernel" },
+ { }
};
static int __init
register_powersave_nap_sysctl(void)
{
- register_sysctl_table(powersave_nap_sysctl_root);
+ register_sysctl_paths(powersave_nap_path, powersave_nap_ctl_table);
return 0;
}
--
1.7.5.134.g1c08b
Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
arch/frv/kernel/pm.c | 10 +++-------
1 files changed, 3 insertions(+), 7 deletions(-)
diff --git a/arch/frv/kernel/pm.c b/arch/frv/kernel/pm.c
index 5fa3889..bcef945 100644
--- a/arch/frv/kernel/pm.c
+++ b/arch/frv/kernel/pm.c
@@ -329,13 +329,9 @@ static struct ctl_table pm_table[] =
{ }
};
-static struct ctl_table pm_dir_table[] =
+static const __initdata struct ctl_path pm_path[] =
{
- {
- .procname = "pm",
- .mode = 0555,
- .child = pm_table,
- },
+ { .procname = "pm" },
{ }
};
@@ -344,7 +340,7 @@ static struct ctl_table pm_dir_table[] =
*/
static int __init pm_init(void)
{
- register_sysctl_table(pm_dir_table);
+ register_sysctl_paths(pm_path, pm_table);
return 0;
}
--
1.7.5.134.g1c08b
Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
arch/frv/kernel/sysctl.c | 12 ++++--------
1 files changed, 4 insertions(+), 8 deletions(-)
diff --git a/arch/frv/kernel/sysctl.c b/arch/frv/kernel/sysctl.c
index 6c155d6..e5c20a2 100644
--- a/arch/frv/kernel/sysctl.c
+++ b/arch/frv/kernel/sysctl.c
@@ -199,14 +199,10 @@ static struct ctl_table frv_table[] =
* Use a temporary sysctl number. Horrid, but will be cleaned up in 2.6
* when all the PM interfaces exist nicely.
*/
-static struct ctl_table frv_dir_table[] =
+static const __initdata struct ctl_path frv_path[] =
{
- {
- .procname = "frv",
- .mode = 0555,
- .child = frv_table
- },
- {}
+ { .procname = "frv" },
+ { }
};
/*
@@ -214,7 +210,7 @@ static struct ctl_table frv_dir_table[] =
*/
static int __init frv_sysctl_init(void)
{
- register_sysctl_table(frv_dir_table);
+ register_sysctl_paths(frv_path, frv_table);
return 0;
}
--
1.7.5.134.g1c08b
Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
arch/sh/kernel/traps_64.c | 21 +++++----------------
1 files changed, 5 insertions(+), 16 deletions(-)
diff --git a/arch/sh/kernel/traps_64.c b/arch/sh/kernel/traps_64.c
index 6713ca9..8b355b4 100644
--- a/arch/sh/kernel/traps_64.c
+++ b/arch/sh/kernel/traps_64.c
@@ -908,27 +908,16 @@ static ctl_table unaligned_table[] = {
{}
};
-static ctl_table unaligned_root[] = {
- {
- .procname = "unaligned_fixup",
- .mode = 0555,
- .child = unaligned_table
- },
- {}
+static const __initdata struct ctl_table unaligned_path[] = {
+ { .procname = "sh64" },
+ { .procname = "unaligned_fixup" },
+ { }
};
-static ctl_table sh64_root[] = {
- {
- .procname = "sh64",
- .mode = 0555,
- .child = unaligned_root
- },
- {}
-};
static struct ctl_table_header *sysctl_header;
static int __init init_sysctl(void)
{
- sysctl_header = register_sysctl_table(sh64_root);
+ sysctl_header = register_sysctl_paths(unaligned_path, unaligned_table);
return 0;
}
--
1.7.5.134.g1c08b
Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
include/linux/sysctl.h | 3 +--
kernel/sysctl.c | 26 ++------------------------
2 files changed, 3 insertions(+), 26 deletions(-)
diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h
index 11684d9..470e06a 100644
--- a/include/linux/sysctl.h
+++ b/include/linux/sysctl.h
@@ -985,7 +985,7 @@ extern int proc_do_large_bitmap(struct ctl_table *, int,
void __user *, size_t *, loff_t *);
/*
- * Register a set of sysctl names by calling register_sysctl_table
+ * Register a set of sysctl names by calling __register_sysctl_paths
* with an initialised array of struct ctl_table's. An entry with
* NULL procname terminates the table. table->de will be
* set up by the registration and need not be initialised in advance.
@@ -1065,7 +1065,6 @@ void register_sysctl_root(struct ctl_table_root *root);
struct ctl_table_header *__register_sysctl_paths(
struct ctl_table_root *root, struct nsproxy *namespaces,
const struct ctl_path *path, struct ctl_table *table);
-struct ctl_table_header *register_sysctl_table(struct ctl_table * table);
struct ctl_table_header *register_sysctl_paths(const struct ctl_path *path,
struct ctl_table *table);
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index c0bb324..b813724 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -1905,7 +1905,7 @@ struct ctl_table_header *__register_sysctl_paths(
}
/**
- * register_sysctl_table_path - register a sysctl table hierarchy
+ * register_sysctl_paths - register a sysctl table hierarchy
* @path: The path to the directory the sysctl table is in.
* @table: the top-level table structure
*
@@ -1922,24 +1922,8 @@ struct ctl_table_header *register_sysctl_paths(const struct ctl_path *path,
}
/**
- * register_sysctl_table - register a sysctl table hierarchy
- * @table: the top-level table structure
- *
- * Register a sysctl table hierarchy. @table should be a filled in ctl_table
- * array. A completely 0 filled entry terminates the table.
- *
- * See register_sysctl_paths for more details.
- */
-struct ctl_table_header *register_sysctl_table(struct ctl_table *table)
-{
- static const struct ctl_path null_path[] = { {} };
-
- return register_sysctl_paths(null_path, table);
-}
-
-/**
* unregister_sysctl_table - unregister a sysctl table hierarchy
- * @header: the header returned from register_sysctl_table
+ * @header: the header returned from __register_sysctl_paths
*
* Unregisters the sysctl table and all children. proc entries may not
* actually be removed until they are no longer used by anyone.
@@ -1987,11 +1971,6 @@ void setup_sysctl_set(struct ctl_table_set *p,
}
#else /* !CONFIG_SYSCTL */
-struct ctl_table_header *register_sysctl_table(struct ctl_table * table)
-{
- return NULL;
-}
-
struct ctl_table_header *register_sysctl_paths(const struct ctl_path *path,
struct ctl_table *table)
{
@@ -2977,6 +2956,5 @@ EXPORT_SYMBOL(proc_dointvec_ms_jiffies);
EXPORT_SYMBOL(proc_dostring);
EXPORT_SYMBOL(proc_doulongvec_minmax);
EXPORT_SYMBOL(proc_doulongvec_ms_jiffies_minmax);
-EXPORT_SYMBOL(register_sysctl_table);
EXPORT_SYMBOL(register_sysctl_paths);
EXPORT_SYMBOL(unregister_sysctl_table);
--
1.7.5.134.g1c08b
Only compile tested!
I'm sorry but I could not manage to add a ax25 interface.
Some notable changes: before this patch, each time a device switched
to up/down we would unregister everything under /proc/sys/net/ax25/
and then reregister an updated table with all devices in it (BTW, the
table was GFP_ATOMIC!).
Now each state change (up/down) registers it's own table (e.g.
/proc/sys/net/ax25/ax0/). I'm assuming ax25 devices cannot be renamed,
but if that's possible, this can be fixed by making a private copy of
the device name for sysctl, and unregistering/reregistering the table
on device rename (see net/ipv4/devinet.c).
Also added an empty /proc/sys/net/ax25/ root directory. Without it,
the first device added would have been the first to create the
/proc/sys/net/ax25/ sysctl path and all other devices would have
attached to it. If the first device was to be removed before other
ones, we would have gotten a harmless warning form sysctl telling us
we're unregistering the parent before the children.
Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
include/net/ax25.h | 10 +++---
net/ax25/af_ax25.c | 23 ++++++++++++-
net/ax25/ax25_dev.c | 10 +-----
net/ax25/sysctl_net_ax25.c | 76 ++++++++++++++-----------------------------
4 files changed, 53 insertions(+), 66 deletions(-)
diff --git a/include/net/ax25.h b/include/net/ax25.h
index 206d222..79c2d2d 100644
--- a/include/net/ax25.h
+++ b/include/net/ax25.h
@@ -215,7 +215,7 @@ typedef struct ax25_dev {
struct ax25_dev *next;
struct net_device *dev;
struct net_device *forward;
- struct ctl_table *systable;
+ struct ctl_table_header *ax25_sysheader;
int values[AX25_MAX_VALUES];
#if defined(CONFIG_AX25_DAMA_SLAVE) || defined(CONFIG_AX25_DAMA_MASTER)
ax25_dama_info dama;
@@ -441,11 +441,11 @@ extern void ax25_uid_free(void);
/* sysctl_net_ax25.c */
#ifdef CONFIG_SYSCTL
-extern void ax25_register_sysctl(void);
-extern void ax25_unregister_sysctl(void);
+extern void ax25_register_sysctl(struct ax25_dev *dev);
+extern void ax25_unregister_sysctl(struct ax25_dev *dev);
#else
-static inline void ax25_register_sysctl(void) {};
-static inline void ax25_unregister_sysctl(void) {};
+static inline void ax25_register_sysctl(struct ax25_dev *dev) {};
+static inline void ax25_unregister_sysctl(struct ax25_dev *dev) {};
#endif /* CONFIG_SYSCTL */
#endif
diff --git a/net/ax25/af_ax25.c b/net/ax25/af_ax25.c
index 6da5dae..965662d 100644
--- a/net/ax25/af_ax25.c
+++ b/net/ax25/af_ax25.c
@@ -1989,6 +1989,18 @@ static struct notifier_block ax25_dev_notifier = {
.notifier_call =ax25_device_event,
};
+
+#ifdef CONFIG_SYSCTL
+static const struct __initdata ctl_path ax25_path[] = {
+ { .procname = "net" },
+ { .procname = "ax25" },
+ { }
+};
+static struct ctl_table empty;
+static struct ctl_table_header *ax25_root_header;
+#endif /* CONFIG_SYSCTL */
+
+
static int __init ax25_init(void)
{
int rc = proto_register(&ax25_proto, 0);
@@ -1999,7 +2011,11 @@ static int __init ax25_init(void)
sock_register(&ax25_family_ops);
dev_add_pack(&ax25_packet_type);
register_netdevice_notifier(&ax25_dev_notifier);
- ax25_register_sysctl();
+
+ /* XXX: no error checking done in initializer */
+ #ifdef CONFIG_SYSCTL
+ ax25_root_header = register_sysctl_paths(ax25_path, &empty);
+ #endif
proc_net_fops_create(&init_net, "ax25_route", S_IRUGO, &ax25_route_fops);
proc_net_fops_create(&init_net, "ax25", S_IRUGO, &ax25_info_fops);
@@ -2024,7 +2040,10 @@ static void __exit ax25_exit(void)
ax25_uid_free();
ax25_dev_free();
- ax25_unregister_sysctl();
+ #ifdef CONFIG_SYSCTL
+ unregister_sysctl_table(ax25_root_header);
+ #endif
+
unregister_netdevice_notifier(&ax25_dev_notifier);
dev_remove_pack(&ax25_packet_type);
diff --git a/net/ax25/ax25_dev.c b/net/ax25/ax25_dev.c
index c1cb982..6ff1853 100644
--- a/net/ax25/ax25_dev.c
+++ b/net/ax25/ax25_dev.c
@@ -60,8 +60,6 @@ void ax25_dev_device_up(struct net_device *dev)
return;
}
- ax25_unregister_sysctl();
-
dev->ax25_ptr = ax25_dev;
ax25_dev->dev = dev;
dev_hold(dev);
@@ -91,7 +89,7 @@ void ax25_dev_device_up(struct net_device *dev)
ax25_dev_list = ax25_dev;
spin_unlock_bh(&ax25_dev_lock);
- ax25_register_sysctl();
+ ax25_register_sysctl(ax25_dev);
}
void ax25_dev_device_down(struct net_device *dev)
@@ -101,7 +99,7 @@ void ax25_dev_device_down(struct net_device *dev)
if ((ax25_dev = ax25_dev_ax25dev(dev)) == NULL)
return;
- ax25_unregister_sysctl();
+ ax25_unregister_sysctl(ax25_dev);
spin_lock_bh(&ax25_dev_lock);
@@ -121,7 +119,6 @@ void ax25_dev_device_down(struct net_device *dev)
spin_unlock_bh(&ax25_dev_lock);
dev_put(dev);
kfree(ax25_dev);
- ax25_register_sysctl();
return;
}
@@ -131,7 +128,6 @@ void ax25_dev_device_down(struct net_device *dev)
spin_unlock_bh(&ax25_dev_lock);
dev_put(dev);
kfree(ax25_dev);
- ax25_register_sysctl();
return;
}
@@ -139,8 +135,6 @@ void ax25_dev_device_down(struct net_device *dev)
}
spin_unlock_bh(&ax25_dev_lock);
dev->ax25_ptr = NULL;
-
- ax25_register_sysctl();
}
int ax25_fwd_ioctl(unsigned int cmd, struct ax25_fwd_struct *fwd)
diff --git a/net/ax25/sysctl_net_ax25.c b/net/ax25/sysctl_net_ax25.c
index ebe0ef3..b1181bc 100644
--- a/net/ax25/sysctl_net_ax25.c
+++ b/net/ax25/sysctl_net_ax25.c
@@ -29,17 +29,6 @@ static int min_proto[1], max_proto[] = { AX25_PROTO_MAX };
static int min_ds_timeout[1], max_ds_timeout[] = {65535000};
#endif
-static struct ctl_table_header *ax25_table_header;
-
-static ctl_table *ax25_table;
-static int ax25_table_size;
-
-static struct ctl_path ax25_path[] = {
- { .procname = "net", },
- { .procname = "ax25", },
- { }
-};
-
static const ctl_table ax25_param_table[] = {
{
.procname = "ip_default_mode",
@@ -159,52 +148,37 @@ static const ctl_table ax25_param_table[] = {
{ } /* that's all, folks! */
};
-void ax25_register_sysctl(void)
+void ax25_register_sysctl(struct ax25_dev *ax25_dev)
{
- ax25_dev *ax25_dev;
- int n, k;
-
- spin_lock_bh(&ax25_dev_lock);
- for (ax25_table_size = sizeof(ctl_table), ax25_dev = ax25_dev_list; ax25_dev != NULL; ax25_dev = ax25_dev->next)
- ax25_table_size += sizeof(ctl_table);
-
- if ((ax25_table = kzalloc(ax25_table_size, GFP_ATOMIC)) == NULL) {
- spin_unlock_bh(&ax25_dev_lock);
+ struct ctl_table *ax25_table;
+ int i;
+
+ /* Assuming the name does not change while this sysctl
+ * is registered. If ax25 supports device renaming
+ * (SIOCSIFNAME), sysctl will need it's own copy of
+ * the name */
+ struct ctl_path ax25_path[] = {
+ { .procname = "net" },
+ { .procname = "ax25" },
+ { .procname = ax25_dev->dev->name },
+ { }
+ };
+
+
+ ax25_table = kmemdup(ax25_param_table, sizeof(ax25_param_table), GFP_KERNEL);
+ if (!ax25_table)
return;
- }
-
- for (n = 0, ax25_dev = ax25_dev_list; ax25_dev != NULL; ax25_dev = ax25_dev->next) {
- struct ctl_table *child = kmemdup(ax25_param_table,
- sizeof(ax25_param_table),
- GFP_ATOMIC);
- if (!child) {
- while (n--)
- kfree(ax25_table[n].child);
- kfree(ax25_table);
- spin_unlock_bh(&ax25_dev_lock);
- return;
- }
- ax25_table[n].child = ax25_dev->systable = child;
- ax25_table[n].procname = ax25_dev->dev->name;
- ax25_table[n].mode = 0555;
-
- for (k = 0; k < AX25_MAX_VALUES; k++)
- child[k].data = &ax25_dev->values[k];
+ for (i = 0; i < AX25_MAX_VALUES; i++)
+ ax25_table[i].data = &ax25_dev->values[i];
- n++;
- }
- spin_unlock_bh(&ax25_dev_lock);
-
- ax25_table_header = register_sysctl_paths(ax25_path, ax25_table);
+ ax25_dev->ax25_sysheader = register_sysctl_paths(ax25_path, ax25_table);
}
-void ax25_unregister_sysctl(void)
+void ax25_unregister_sysctl(struct ax25_dev *ax25_dev)
{
- ctl_table *p;
- unregister_sysctl_table(ax25_table_header);
-
- for (p = ax25_table; p->procname; p++)
- kfree(p->child);
+ struct ctl_table *ax25_table = ax25_dev->ax25_sysheader->ctl_table_arg;
+ unregister_sysctl_table(ax25_dev->ax25_sysheader);
+ ax25_dev->ax25_sysheader = NULL;
kfree(ax25_table);
}
--
1.7.5.134.g1c08b
Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
net/ipv4/route.c | 15 ++++-----------
1 files changed, 4 insertions(+), 11 deletions(-)
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index c1acf69..6bc621b 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -3146,18 +3146,10 @@ static ctl_table ipv4_route_table[] = {
static struct ctl_table empty[1];
-static struct ctl_table ipv4_skeleton[] =
-{
- { .procname = "route",
- .mode = 0555, .child = ipv4_route_table},
- { .procname = "neigh",
- .mode = 0555, .child = empty},
- { }
-};
-
-static __net_initdata struct ctl_path ipv4_path[] = {
+static __net_initdata struct ctl_path ipv4_neigh_path[] = {
{ .procname = "net", },
{ .procname = "ipv4", },
+ { .procname = "neigh", },
{ },
};
@@ -3310,6 +3302,7 @@ int __init ip_rt_init(void)
*/
void __init ip_static_sysctl_init(void)
{
- register_sysctl_paths(ipv4_path, ipv4_skeleton);
+ register_sysctl_paths(ipv4_route_path, ipv4_route_table);
+ register_sysctl_paths(ipv4_neigh_path, empty);
}
#endif
--
1.7.5.134.g1c08b
Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
net/ipv6/sysctl_net_ipv6.c | 18 +++++++-----------
1 files changed, 7 insertions(+), 11 deletions(-)
diff --git a/net/ipv6/sysctl_net_ipv6.c b/net/ipv6/sysctl_net_ipv6.c
index 6dcf5e7..a0d9916 100644
--- a/net/ipv6/sysctl_net_ipv6.c
+++ b/net/ipv6/sysctl_net_ipv6.c
@@ -17,16 +17,6 @@
static struct ctl_table empty[1];
-static ctl_table ipv6_static_skeleton[] = {
- {
- .procname = "neigh",
- .maxlen = 0,
- .mode = 0555,
- .child = empty,
- },
- { }
-};
-
static ctl_table ipv6_table_template[] = {
{
.procname = "route",
@@ -160,11 +150,17 @@ void ipv6_sysctl_unregister(void)
unregister_pernet_subsys(&ipv6_sysctl_net_ops);
}
+static const struct ctl_path net_ipv6_neigh_path[] = {
+ { .procname = "net", },
+ { .procname = "ipv6", },
+ { .procname = "neigh", },
+ { },
+};
static struct ctl_table_header *ip6_base;
int ipv6_static_sysctl_register(void)
{
- ip6_base = register_sysctl_paths(net_ipv6_ctl_path, ipv6_static_skeleton);
+ ip6_base = register_sysctl_paths(net_ipv6_neigh_path, empty);
if (ip6_base == NULL)
return -ENOMEM;
return 0;
--
1.7.5.134.g1c08b
Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
include/net/netns/ipv6.h | 4 +-
net/ipv6/sysctl_net_ipv6.c | 101 +++++++++++++++++++++++++-------------------
2 files changed, 60 insertions(+), 45 deletions(-)
diff --git a/include/net/netns/ipv6.h b/include/net/netns/ipv6.h
index 81abfcb..2d9c6f1 100644
--- a/include/net/netns/ipv6.h
+++ b/include/net/netns/ipv6.h
@@ -12,7 +12,9 @@ struct ctl_table_header;
struct netns_sysctl_ipv6 {
#ifdef CONFIG_SYSCTL
- struct ctl_table_header *table;
+ struct ctl_table_header *bindv6only_hdr;
+ struct ctl_table_header *route6_hdr;
+ struct ctl_table_header *icmp6_hdr;
struct ctl_table_header *frags_hdr;
#endif
int bindv6only;
diff --git a/net/ipv6/sysctl_net_ipv6.c b/net/ipv6/sysctl_net_ipv6.c
index a0d9916..1d2d8c7 100644
--- a/net/ipv6/sysctl_net_ipv6.c
+++ b/net/ipv6/sysctl_net_ipv6.c
@@ -17,19 +17,7 @@
static struct ctl_table empty[1];
-static ctl_table ipv6_table_template[] = {
- {
- .procname = "route",
- .maxlen = 0,
- .mode = 0555,
- .child = ipv6_route_table_template
- },
- {
- .procname = "icmp",
- .maxlen = 0,
- .mode = 0555,
- .child = ipv6_icmp_table_template
- },
+static ctl_table ipv6_bindv6only_template[] = {
{
.procname = "bindv6only",
.data = &init_net.ipv6.sysctl.bindv6only,
@@ -58,64 +46,89 @@ struct ctl_path net_ipv6_ctl_path[] = {
};
EXPORT_SYMBOL_GPL(net_ipv6_ctl_path);
+static const struct ctl_path net_ipv6_route_path[] = {
+ { .procname = "net", },
+ { .procname = "ipv6", },
+ { .procname = "route", },
+ { },
+};
+
+static const struct ctl_path net_ipv6_icmp_path[] = {
+ { .procname = "net", },
+ { .procname = "ipv6", },
+ { .procname = "icmp", },
+ { },
+};
+
static int __net_init ipv6_sysctl_net_init(struct net *net)
{
- struct ctl_table *ipv6_table;
+ struct ctl_table *ipv6_bindv6only_table;
struct ctl_table *ipv6_route_table;
struct ctl_table *ipv6_icmp_table;
- int err;
- err = -ENOMEM;
- ipv6_table = kmemdup(ipv6_table_template, sizeof(ipv6_table_template),
- GFP_KERNEL);
- if (!ipv6_table)
- goto out;
+ ipv6_bindv6only_table = kmemdup(ipv6_bindv6only_template,
+ sizeof(ipv6_bindv6only_template), GFP_KERNEL);
+ if (!ipv6_bindv6only_table)
+ goto fail_alloc_ipv6_bindv6only_table;
+ ipv6_bindv6only_table[0].data = &net->ipv6.sysctl.bindv6only;
ipv6_route_table = ipv6_route_sysctl_init(net);
if (!ipv6_route_table)
- goto out_ipv6_table;
- ipv6_table[0].child = ipv6_route_table;
+ goto fail_alloc_ipv6_route_table;
ipv6_icmp_table = ipv6_icmp_sysctl_init(net);
if (!ipv6_icmp_table)
- goto out_ipv6_route_table;
- ipv6_table[1].child = ipv6_icmp_table;
+ goto fail_alloc_ipv6_icmp_table;
- ipv6_table[2].data = &net->ipv6.sysctl.bindv6only;
- net->ipv6.sysctl.table = register_net_sysctl_table(net, net_ipv6_ctl_path,
- ipv6_table);
- if (!net->ipv6.sysctl.table)
- goto out_ipv6_icmp_table;
+ net->ipv6.sysctl.bindv6only_hdr = register_net_sysctl_table(
+ net, net_ipv6_ctl_path, ipv6_bindv6only_table);
+ if (!net->ipv6.sysctl.bindv6only_hdr)
+ goto fail_reg_bindv6only_hdr;
- err = 0;
-out:
- return err;
+ net->ipv6.sysctl.route6_hdr = register_net_sysctl_table(
+ net, net_ipv6_route_path, ipv6_route_table);
+ if (!net->ipv6.sysctl.route6_hdr)
+ goto fail_reg_route6_hdr;
+
+ net->ipv6.sysctl.icmp6_hdr = register_net_sysctl_table(
+ net, net_ipv6_icmp_path, ipv6_icmp_table);
+ if (!net->ipv6.sysctl.icmp6_hdr)
+ goto fail_reg_icmp6_hdr;
-out_ipv6_icmp_table:
+ return 0;
+
+fail_reg_icmp6_hdr:
+ unregister_net_sysctl_table(net->ipv6.sysctl.route6_hdr);
+fail_reg_route6_hdr:
+ unregister_net_sysctl_table(net->ipv6.sysctl.bindv6only_hdr);
+fail_reg_bindv6only_hdr:
kfree(ipv6_icmp_table);
-out_ipv6_route_table:
+fail_alloc_ipv6_icmp_table:
kfree(ipv6_route_table);
-out_ipv6_table:
- kfree(ipv6_table);
- goto out;
+fail_alloc_ipv6_route_table:
+ kfree(ipv6_bindv6only_table);
+fail_alloc_ipv6_bindv6only_table:
+ return -ENOMEM;
}
static void __net_exit ipv6_sysctl_net_exit(struct net *net)
{
- struct ctl_table *ipv6_table;
+ struct ctl_table *ipv6_bindv6only_table;
struct ctl_table *ipv6_route_table;
struct ctl_table *ipv6_icmp_table;
- ipv6_table = net->ipv6.sysctl.table->ctl_table_arg;
- ipv6_route_table = ipv6_table[0].child;
- ipv6_icmp_table = ipv6_table[1].child;
+ ipv6_bindv6only_table = net->ipv6.sysctl.bindv6only_hdr->ctl_table_arg;
+ ipv6_route_table = net->ipv6.sysctl.route6_hdr->ctl_table_arg;
+ ipv6_icmp_table = net->ipv6.sysctl.icmp6_hdr->ctl_table_arg;
- unregister_net_sysctl_table(net->ipv6.sysctl.table);
+ unregister_net_sysctl_table(net->ipv6.sysctl.icmp6_hdr);
+ unregister_net_sysctl_table(net->ipv6.sysctl.route6_hdr);
+ unregister_net_sysctl_table(net->ipv6.sysctl.bindv6only_hdr);
- kfree(ipv6_table);
- kfree(ipv6_route_table);
kfree(ipv6_icmp_table);
+ kfree(ipv6_route_table);
+ kfree(ipv6_bindv6only_table);
}
static struct pernet_operations ipv6_sysctl_net_ops = {
--
1.7.5.134.g1c08b
Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
net/llc/sysctl_net_llc.c | 55 +++++++++++++++++++++++----------------------
1 files changed, 28 insertions(+), 27 deletions(-)
diff --git a/net/llc/sysctl_net_llc.c b/net/llc/sysctl_net_llc.c
index e2ebe35..8977307 100644
--- a/net/llc/sysctl_net_llc.c
+++ b/net/llc/sysctl_net_llc.c
@@ -56,48 +56,49 @@ static struct ctl_table llc_station_table[] = {
{ },
};
-static struct ctl_table llc2_dir_timeout_table[] = {
- {
- .procname = "timeout",
- .mode = 0555,
- .child = llc2_timeout_table,
- },
- { },
-};
-static struct ctl_table llc_table[] = {
- {
- .procname = "llc2",
- .mode = 0555,
- .child = llc2_dir_timeout_table,
- },
- {
- .procname = "station",
- .mode = 0555,
- .child = llc_station_table,
- },
- { },
+static const __initdata struct ctl_path llc2_timeout_path[] = {
+ { .procname = "net", },
+ { .procname = "llc", },
+ { .procname = "llc2", },
+ { .procname = "timeout", },
+ { }
};
-static struct ctl_path llc_path[] = {
+static const __initdata struct ctl_path llc_station_path[] = {
{ .procname = "net", },
{ .procname = "llc", },
+ { .procname = "station", },
{ }
};
-static struct ctl_table_header *llc_table_header;
+static struct ctl_table_header *llc_station_hdr;
+static struct ctl_table_header *llc2_timeout_hdr;
int __init llc_sysctl_init(void)
{
- llc_table_header = register_sysctl_paths(llc_path, llc_table);
+ llc_station_hdr = register_sysctl_paths(llc_station_path, llc_station_table);
+ if (!llc_station_hdr)
+ return -ENOMEM;
- return llc_table_header ? 0 : -ENOMEM;
+ llc2_timeout_hdr = register_sysctl_paths(llc2_timeout_path, llc2_timeout_table);
+ if (!llc2_timeout_hdr) {
+ unregister_sysctl_table(llc_station_hdr);
+ llc_station_hdr = NULL;
+ return -ENOMEM;
+ }
+
+ return 0;
}
void llc_sysctl_exit(void)
{
- if (llc_table_header) {
- unregister_sysctl_table(llc_table_header);
- llc_table_header = NULL;
+ if (llc2_timeout_hdr) {
+ unregister_sysctl_table(llc2_timeout_hdr);
+ llc2_timeout_hdr = NULL;
+ }
+ if (llc_station_hdr) {
+ unregister_sysctl_table(llc_station_hdr);
+ llc_station_hdr = NULL;
}
}
--
1.7.5.134.g1c08b
Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
drivers/char/random.c | 27 ++++++++++++++++++++++++++-
kernel/sysctl.c | 6 ------
2 files changed, 26 insertions(+), 7 deletions(-)
diff --git a/drivers/char/random.c b/drivers/char/random.c
index d4ddeba..8893c4b 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -959,8 +959,15 @@ static void init_std_data(struct entropy_store *r)
mix_pool_bytes(r, utsname(), sizeof(*(utsname())));
}
+static int __init register_random_sysctls(void);
+
static int rand_initialize(void)
{
+ int rc;
+ rc = register_random_sysctls();
+ if (!rc)
+ return rc;
+
init_std_data(&input_pool);
init_std_data(&blocking_pool);
init_std_data(&nonblocking_pool);
@@ -1250,7 +1257,7 @@ static int proc_do_uuid(ctl_table *table, int write,
}
static int sysctl_poolsize = INPUT_POOL_WORDS * 32;
-ctl_table random_table[] = {
+static struct ctl_table random_table[] = {
{
.procname = "poolsize",
.data = &sysctl_poolsize,
@@ -1298,6 +1305,24 @@ ctl_table random_table[] = {
},
{ }
};
+
+static const __initdata struct ctl_path random_path[] = {
+ { .procname = "kernel" },
+ { .procname = "random" },
+ { }
+};
+
+static struct ctl_table_header *random_header;
+
+static int __init register_random_sysctls(void)
+{
+ random_header = register_sysctl_paths(random_path, random_table);
+ if (!random_header)
+ return -ENOMEM;
+ return 0;
+}
+#else /* CONFIG_SYSCTL */
+static int __init register_random_sysctls(void) { return 0; }
#endif /* CONFIG_SYSCTL */
/********************************************************************
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index b813724..a3f060c 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -216,7 +216,6 @@ static struct ctl_table vm_table[];
static struct ctl_table fs_table[];
static struct ctl_table debug_table[];
static struct ctl_table dev_table[];
-extern struct ctl_table random_table[];
#ifdef CONFIG_EPOLL
extern struct ctl_table epoll_table[];
#endif
@@ -611,11 +610,6 @@ static struct ctl_table kern_table[] = {
.proc_handler = proc_dointvec,
},
{
- .procname = "random",
- .mode = 0555,
- .child = random_table,
- },
- {
.procname = "overflowuid",
.data = &overflowuid,
.maxlen = sizeof(int),
--
1.7.5.134.g1c08b
Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
include/linux/key.h | 4 +++-
kernel/sysctl.c | 7 -------
security/keys/key.c | 1 +
security/keys/sysctl.c | 18 +++++++++++++++++-
4 files changed, 21 insertions(+), 9 deletions(-)
diff --git a/include/linux/key.h b/include/linux/key.h
index b2bb017..9b3df18 100644
--- a/include/linux/key.h
+++ b/include/linux/key.h
@@ -281,7 +281,9 @@ static inline key_serial_t key_serial(struct key *key)
rwsem_is_locked(&((struct key *)(KEY))->sem)))
#ifdef CONFIG_SYSCTL
-extern ctl_table key_sysctls[];
+extern int __init key_register_sysctls(void);
+#else
+static int __init key_register_sysctls(void) { return 0; }
#endif
extern void key_replace_session_keyring(void);
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index a3f060c..4e63701 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -905,13 +905,6 @@ static struct ctl_table kern_table[] = {
.mode = 0644,
.proc_handler = proc_dostring,
},
-#ifdef CONFIG_KEYS
- {
- .procname = "keys",
- .mode = 0555,
- .child = key_sysctls,
- },
-#endif
#ifdef CONFIG_RCU_TORTURE_TEST
{
.procname = "rcutorture_runnable",
diff --git a/security/keys/key.c b/security/keys/key.c
index f7f9d93..33903c2 100644
--- a/security/keys/key.c
+++ b/security/keys/key.c
@@ -1099,6 +1099,7 @@ EXPORT_SYMBOL(unregister_key_type);
*/
void __init key_init(void)
{
+ key_register_sysctls();
/* allocate a slab in which we can store keys */
key_jar = kmem_cache_create("key_jar", sizeof(struct key),
0, SLAB_HWCACHE_ALIGN|SLAB_PANIC, NULL);
diff --git a/security/keys/sysctl.c b/security/keys/sysctl.c
index ee32d18..e079223 100644
--- a/security/keys/sysctl.c
+++ b/security/keys/sysctl.c
@@ -15,7 +15,7 @@
static const int zero, one = 1, max = INT_MAX;
-ctl_table key_sysctls[] = {
+static struct ctl_table key_table[] = {
{
.procname = "maxkeys",
.data = &key_quota_maxkeys,
@@ -63,3 +63,19 @@ ctl_table key_sysctls[] = {
},
{ }
};
+
+static const __initdata struct ctl_path key_path[] = {
+ { .procname = "kernel" },
+ { .procname = "keys" },
+ { }
+};
+
+static struct ctl_table_header *key_header;
+
+int __init key_register_sysctls(void)
+{
+ key_header = register_sysctl_paths(key_path, key_table);
+ if (key_header == NULL)
+ return -ENOMEM;
+ return 0;
+}
--
1.7.5.134.g1c08b
Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
fs/notify/inotify/inotify_user.c | 22 +++++++++++++++++++---
include/linux/inotify.h | 2 --
kernel/sysctl.c | 7 -------
3 files changed, 19 insertions(+), 12 deletions(-)
diff --git a/fs/notify/inotify/inotify_user.c b/fs/notify/inotify/inotify_user.c
index 8445fbc..ba618c2 100644
--- a/fs/notify/inotify/inotify_user.c
+++ b/fs/notify/inotify/inotify_user.c
@@ -51,13 +51,13 @@ static int inotify_max_user_watches __read_mostly;
static struct kmem_cache *inotify_inode_mark_cachep __read_mostly;
struct kmem_cache *event_priv_cachep __read_mostly;
-#ifdef CONFIG_SYSCTL
+#if defined(CONFIG_SYSCTL) && defined(CONFIG_MMU)
#include <linux/sysctl.h>
static int zero;
-ctl_table inotify_table[] = {
+static struct ctl_table inotify_table[] = {
{
.procname = "max_user_instances",
.data = &inotify_max_user_instances,
@@ -84,7 +84,22 @@ ctl_table inotify_table[] = {
},
{ }
};
-#endif /* CONFIG_SYSCTL */
+static const __initdata struct ctl_path inotify_path[] = {
+ { .procname = "fs" },
+ { .procname = "inotify" },
+ { }
+};
+static struct ctl_table_header *inotify_header;
+static int __init register_inotify_sysctls(void)
+{
+ inotify_header = register_sysctl_paths(inotify_path, inotify_table);
+ if (inotify_header == NULL)
+ return -ENOMEM;
+ return 0;
+}
+#else /* CONFIG_SYSCTL && CONFIG_MMU */
+static int __init register_inotify_sysctls(void) { return 0; }
+#endif /* CONFIG_SYSCTL && CONFIG_MMU */
static inline __u32 inotify_arg_to_mask(u32 arg)
{
@@ -862,6 +877,7 @@ static int __init inotify_user_setup(void)
inotify_max_user_instances = 128;
inotify_max_user_watches = 8192;
+ register_inotify_sysctls();
return 0;
}
module_init(inotify_user_setup);
diff --git a/include/linux/inotify.h b/include/linux/inotify.h
index d33041e..89b3bfe 100644
--- a/include/linux/inotify.h
+++ b/include/linux/inotify.h
@@ -71,8 +71,6 @@ struct inotify_event {
#define IN_NONBLOCK O_NONBLOCK
#ifdef __KERNEL__
-#include <linux/sysctl.h>
-extern struct ctl_table inotify_table[]; /* for sysctl */
#define ALL_INOTIFY_BITS (IN_ACCESS | IN_MODIFY | IN_ATTRIB | IN_CLOSE_WRITE | \
IN_CLOSE_NOWRITE | IN_OPEN | IN_MOVED_FROM | \
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 4e63701..5961046 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -1439,13 +1439,6 @@ static struct ctl_table fs_table[] = {
.proc_handler = proc_doulongvec_minmax,
},
#endif /* CONFIG_AIO */
-#ifdef CONFIG_INOTIFY_USER
- {
- .procname = "inotify",
- .mode = 0555,
- .child = inotify_table,
- },
-#endif
#ifdef CONFIG_EPOLL
{
.procname = "epoll",
--
1.7.5.134.g1c08b
Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
fs/eventpoll.c | 22 +++++++++++++++++++---
include/linux/poll.h | 2 --
kernel/sysctl.c | 10 ----------
3 files changed, 19 insertions(+), 15 deletions(-)
diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index f9cfd16..2dbcd0c 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -246,14 +246,14 @@ static struct kmem_cache *epi_cache __read_mostly;
/* Slab cache used to allocate "struct eppoll_entry" */
static struct kmem_cache *pwq_cache __read_mostly;
-#ifdef CONFIG_SYSCTL
+#if defined(CONFIG_SYSCTL) && defined (CONFIG_MMU)
#include <linux/sysctl.h>
static long zero;
static long long_max = LONG_MAX;
-ctl_table epoll_table[] = {
+static struct ctl_table epoll_table[] = {
{
.procname = "max_user_watches",
.data = &max_user_watches,
@@ -265,7 +265,22 @@ ctl_table epoll_table[] = {
},
{ }
};
-#endif /* CONFIG_SYSCTL */
+static const __initdata struct ctl_path epoll_path[] = {
+ { .procname = "fs" },
+ { .procname = "epoll" },
+ { }
+};
+static struct ctl_table_header *epoll_header;
+static int __init register_epoll_sysctls(void)
+{
+ epoll_header = register_sysctl_paths(epoll_path, epoll_table);
+ if (epoll_header == NULL)
+ return -ENOMEM;
+ return 0;
+}
+#else /* CONFIG_SYSCTL && CONFIG_MMU */
+static int __init register_epoll_sysctls(void) { return 0; }
+#endif /* CONFIG_SYSCTL && CONFIG_MMU */
/* Setup the structure that is used as key for the RB tree */
@@ -1586,6 +1601,7 @@ static int __init eventpoll_init(void)
pwq_cache = kmem_cache_create("eventpoll_pwq",
sizeof(struct eppoll_entry), 0, SLAB_PANIC, NULL);
+ register_epoll_sysctls();
return 0;
}
fs_initcall(eventpoll_init);
diff --git a/include/linux/poll.h b/include/linux/poll.h
index cf40010..314331c 100644
--- a/include/linux/poll.h
+++ b/include/linux/poll.h
@@ -10,10 +10,8 @@
#include <linux/wait.h>
#include <linux/string.h>
#include <linux/fs.h>
-#include <linux/sysctl.h>
#include <asm/uaccess.h>
-extern struct ctl_table epoll_table[]; /* for sysctl */
/* ~832 bytes of stack space used max in sys_select/sys_poll before allocating
additional memory. */
#define MAX_STACK_ALLOC 832
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 5961046..ca89653 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -216,9 +216,6 @@ static struct ctl_table vm_table[];
static struct ctl_table fs_table[];
static struct ctl_table debug_table[];
static struct ctl_table dev_table[];
-#ifdef CONFIG_EPOLL
-extern struct ctl_table epoll_table[];
-#endif
#ifdef HAVE_ARCH_PICK_MMAP_LAYOUT
int sysctl_legacy_va_layout;
@@ -1439,13 +1436,6 @@ static struct ctl_table fs_table[] = {
.proc_handler = proc_doulongvec_minmax,
},
#endif /* CONFIG_AIO */
-#ifdef CONFIG_EPOLL
- {
- .procname = "epoll",
- .mode = 0555,
- .child = epoll_table,
- },
-#endif
#endif
{
.procname = "suid_dumpable",
--
1.7.5.134.g1c08b
Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
kernel/sysctl.c | 121 +++++++++++++++++++++++++++++++++++++-----------------
1 files changed, 83 insertions(+), 38 deletions(-)
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index ca89653..d44c280 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -211,12 +211,6 @@ static struct ctl_table_root sysctl_table_root = {
.default_set.list = LIST_HEAD_INIT(root_table_header.ctl_entry),
};
-static struct ctl_table kern_table[];
-static struct ctl_table vm_table[];
-static struct ctl_table fs_table[];
-static struct ctl_table debug_table[];
-static struct ctl_table dev_table[];
-
#ifdef HAVE_ARCH_PICK_MMAP_LAYOUT
int sysctl_legacy_va_layout;
#endif
@@ -224,31 +218,6 @@ int sysctl_legacy_va_layout;
/* The default sysctl tables: */
static struct ctl_table root_table[] = {
- {
- .procname = "kernel",
- .mode = 0555,
- .child = kern_table,
- },
- {
- .procname = "vm",
- .mode = 0555,
- .child = vm_table,
- },
- {
- .procname = "fs",
- .mode = 0555,
- .child = fs_table,
- },
- {
- .procname = "debug",
- .mode = 0555,
- .child = debug_table,
- },
- {
- .procname = "dev",
- .mode = 0555,
- .child = dev_table,
- },
{ }
};
@@ -266,6 +235,11 @@ static int min_extfrag_threshold;
static int max_extfrag_threshold = 1000;
#endif
+static const __initdata struct ctl_path kern_path [] = {
+ { .procname = "kernel" },
+ { },
+};
+
static struct ctl_table kern_table[] = {
{
.procname = "sched_child_runs_first",
@@ -955,6 +929,11 @@ static struct ctl_table kern_table[] = {
{ }
};
+static const __initdata struct ctl_path vm_path [] = {
+ { .procname = "vm" },
+ { },
+};
+
static struct ctl_table vm_table[] = {
{
.procname = "overcommit_memory",
@@ -1324,11 +1303,23 @@ static struct ctl_table vm_table[] = {
};
#if defined(CONFIG_BINFMT_MISC) || defined(CONFIG_BINFMT_MISC_MODULE)
+
+static const __initdata struct ctl_path binfmt_misc_path [] = {
+ { .procname = "fs" },
+ { .procname = "binfmt_misc" },
+ { },
+};
+
static struct ctl_table binfmt_misc_table[] = {
{ }
};
#endif
+static const __initdata struct ctl_path fs_path [] = {
+ { .procname = "fs" },
+ { },
+};
+
static struct ctl_table fs_table[] = {
{
.procname = "inode-nr",
@@ -1446,13 +1437,6 @@ static struct ctl_table fs_table[] = {
.extra1 = &zero,
.extra2 = &two,
},
-#if defined(CONFIG_BINFMT_MISC) || defined(CONFIG_BINFMT_MISC_MODULE)
- {
- .procname = "binfmt_misc",
- .mode = 0555,
- .child = binfmt_misc_table,
- },
-#endif
{
.procname = "pipe-max-size",
.data = &pipe_max_size,
@@ -1464,6 +1448,11 @@ static struct ctl_table fs_table[] = {
{ }
};
+static const __initdata struct ctl_path debug_path [] = {
+ { .procname = "debug" },
+ { },
+};
+
static struct ctl_table debug_table[] = {
#if defined(CONFIG_X86) || defined(CONFIG_PPC) || defined(CONFIG_SPARC) || \
defined(CONFIG_S390)
@@ -1489,6 +1478,11 @@ static struct ctl_table debug_table[] = {
{ }
};
+static const __initdata struct ctl_path dev_path [] = {
+ { .procname = "dev" },
+ { },
+};
+
static struct ctl_table dev_table[] = {
{ }
};
@@ -1688,11 +1682,62 @@ static void sysctl_set_parent(struct ctl_table *parent, struct ctl_table *table)
static __init int sysctl_init(void)
{
+ struct ctl_table_header *kern_header, *vm_header, *fs_header,
+ *debug_header, *dev_header;
+#if defined(CONFIG_BINFMT_MISC) || defined(CONFIG_BINFMT_MISC_MODULE)
+ struct ctl_table_header *binfmt_misc_header;
+#endif
+
sysctl_set_parent(NULL, root_table);
+
+ kern_header = register_sysctl_paths(kern_path, kern_table);
+ if (kern_header == NULL)
+ goto fail_register_kern;
+
+ vm_header = register_sysctl_paths(vm_path, vm_table);
+ if (vm_header == NULL)
+ goto fail_register_vm;
+
+ fs_header = register_sysctl_paths(fs_path, fs_table);
+ if (fs_header == NULL)
+ goto fail_register_fs;
+
+ debug_header = register_sysctl_paths(debug_path, debug_table);
+ if (debug_header == NULL)
+ goto fail_register_debug;
+
+ dev_header = register_sysctl_paths(dev_path, dev_table);
+ if (dev_header == NULL)
+ goto fail_register_dev;
+
+#if defined(CONFIG_BINFMT_MISC) || defined(CONFIG_BINFMT_MISC_MODULE)
+ binfmt_misc_header = register_sysctl_paths(binfmt_misc_path, binfmt_misc_table);
+ if (binfmt_misc_header == NULL)
+ goto fail_register_binfmt_misc;
+#endif
+
+
#ifdef CONFIG_SYSCTL_SYSCALL_CHECK
sysctl_check_table(current->nsproxy, root_table);
#endif
return 0;
+
+
+#if defined(CONFIG_BINFMT_MISC) || defined(CONFIG_BINFMT_MISC_MODULE)
+fail_register_binfmt_misc:
+ unregister_sysctl_table(dev_header);
+#endif
+
+fail_register_dev:
+ unregister_sysctl_table(debug_header);
+fail_register_debug:
+ unregister_sysctl_table(fs_header);
+fail_register_fs:
+ unregister_sysctl_table(vm_header);
+fail_register_vm:
+ unregister_sysctl_table(kern_header);
+fail_register_kern:
+ return -ENOMEM;
}
core_initcall(sysctl_init);
--
1.7.5.134.g1c08b
The old implementation used inefficient algorithms both at
lookup/readdir times and at registration. This patch introduces an
improved algorithm: lower memory consumption, better time complexity
for lookup/readdir/registration. Locking is a bit heavier in this
algorithm (in this patch: reader locks for lookup/readdir, writer
locks for register/unregister; in a later patch in this series: RCU +
spin-lock). I'll address this locking issue later in this commit.
I will shortly describe the previous algorithm, the new one and brag
at the end with an endless list of improvements and new limitations.
= Old algorithm =
== Description ==
We created a ctl_table_header for each registered sysctl table. The
header's role is to maintain sysctl internal data, reference counting
and as a token to unregister the table.
All headers were put in a list in the order of registration without
regard to the position of the tables in the sysctl tree. Headers were
also 'attached' one to another to (somewhat) speed up lookup/readdir.
Attachment meant looking at each other already registered header and
comparing the paths to the tables. A newly registered header would be
attached to the first header with which it would share most of it's
path.
e.g. paths registered: /, /a/b/c, /a/b/c/d, /a/x, /a/x/y, /a/z
tree:
/
+ /a/b/c
| + /a/b/c/d
+ /a/x
| /a/x/y
+ /a/z
== Time complexity ==
- register N tables would take O(N^2) steps (see above)
- lookup: if the item searched for is not found in the current header,
iterate the list of headers until you find another header that's
attached to the current position in the header's table. Lookups for
elements that are in a header registered under the current position
or inexistent elements would take O(N) steps each.
- readdir: after searching the current headers table in the current
position, always do an O(N) search for a header attached to the
current table position.
== Memory ==
Each header was allocated some data and a variable-length path.
O(1) with kzalloc/kfree.
= New algorithm =
== Description ==
Reuses the 'ctl_table_header' concept but with two distinct meanings:
- as a wrapper of a table registered by the user
- as a directory entry.
Registering the paths from the above example gives this tree:
paths: /, /a/b/c, /a/b/c/d, /a/x, /a/x/y, /a/z
tree:
/: .subdirs = a
a: .subdirs = b x z
b: subdirs = c
c: subdirs = d
d:
x: subdirs = y
y:
z:
Each directory gets a header. Each header has a parent (except root)
and two lists:
- ctl_subdirs: list of sub-directories - other headers
- ctl_tables: list of headers that wrap a ctl_table array
Because the directory structure is now maintained as ctl_table_header
objects, we needed to remove the .child from ctl_tables (this explains
the previous patches). A ctl_table array represents a list of files.
== Time complexity ==
- registration of N headers. Registration means adding new directories
at each level or incrementing an existing directory's refcount.
- O(N * lnN) - if the paths to the headers are evenly distributed
- O(N^2) - if most of the headers registered are children of the
same parent directory (searching the list of subdirs takes O(N)).
There are cases where this happens (e.g. registering sysctl
entries for net devices under /proc/sys/net/ipv4|6/conf/device).
A few later patches will add an optimisation, to fix locations
that might trigger the O(N^2) issue.
- lookup: O(len(subdirs) + sum(len(tarr) for each tarr in ctl_tables)
- could be made better:
- sort ctl_subdirs (for binary search)
- replace ctl_subdirs with a hash-table (increase memory footprint)
- sort ctl_table entries at registration time (for binary search).
Could be done, but I'm too lazy to do it now.
- readdir: O(len(subdirs) + sum(len(tarr) for each tarr in ctl_tables)
- can't get any better than this :)
== Memory complexity ==
Although we create more ctl_table_header (one for each directory, one
for each table, and because we deleted the .child from ctl_table there
are more tables registered than before this patch) we remove the need
to store a full path (from too to the table) as was done in the old
solution => a O(N) small memory gain with report to the old algo.
Also, because headers have a fixed size, we use kmem_caches => lower
fragmentation.
= Limitations =
== ctl_table does not has .child => some code uglyfication ==
Registering tables with multiple directories and files cannot be done
in a single operation: there must be at least a table registered for
each directory. This make code that registers sysctls uglier (see the
earlier patches that remove .child form sched_domain and the root
table). Other places e.g. the parport systls look much better now
without .child: I can now read and understand that code.
== Handling of netns specific paths is weirder ==
The algorithm descriptions from above are simplifications. In reality
the code needs to handle directories and files that must be visible in
some netns' only. E.g. the /proc/sys/net/ipv4/conf/DEVICENAME/
directory and it's files must be visible only in the netns of that
device.
The old algorithm used a secondary list that indexed all netns
specific headers. All algorithms remain the same, with the mention
that besides searching the global list, the algorithm would also look
into the current netns' list of headers. This scales perfectly in
rapport to the number of network namespaces.
The new algorithm does something similar, but a bit more complicated.
We also use netns specific lists of directories/tables and store them
in a special directory ctl_table_header (which I dubbed the
"netns-correspondent" of another directory - I'm not very pleased with
the name either).
When registering a net-ns specific table, we will create a
"netns-correspondent" to the last directory that is not net-ns
specific in that path.
E.g.: we're registering a netns specific table for 'lo':
common path: /proc/sys/net/ipv4/
netns path: /proc/sys/net/ipv4/conf/lo/
We'll create an (unnamed) netns correspondent for 'ipv4' which will
have 'conf' as it's subdir.
E.g.: We're registering a netns specific file in /proc/sys/net/core/somaxconn
common path: /proc/sys/net/core/
netns path: /proc/sys/net/core/
We'll create an (unnamed) netns correspondent for 'core' with the
table containing 'somaxconn' in ctl_tables.
All net-ns correspondents of one netns are held in a single list, and
each netns gets it own list. This keeps the algorithm complexity
indifferent of the number of network namespaces (as was the old one).
However, now only a smaller part of directories are members of this
list, improving register/lookup/readdir time complexity.
There is one ugly limitation that stems from this approach.
E.g.: register these files in this order:
- register common /dir1/file-common1
- register netns specific /dir1/dir2/file-netns
- register common /dir1/dir2/file-common2
We'll have this tree:
'dir1' { .subdirs = ['dir2'], .tables = ['file-common1'] }
^ |
| -> { .subdirs = [], .tables = ['file-common2'] }
|
| (unnamed netns-corresp for dir1)
-> { .subdir = ['dir2'] }
|
-> { .subdirs = [], .tables = ['file-netns'] }
readdir: when we list the contents of 'dir1' we'll see it has two
sub-directories named 'dir2' each with a file in it.
lookup: lookup of /dir1/dir2/file-netns will not work because we find
'dir2' as a subdir of 'dir1' and stick with it and never look
into the netns correspondent of 'dir1'.
This can be fixed in two ways:
- A) by making sure to never register a netns specific directory and
after that register that directory as a common one. From what I can
tell there isn't such a problem in the kernel at the moment, but I
did not study the source in detail.
- B) by increasing the complexity of the code:
- readdir: looking at both lists and comparing if we have already
listed a directory as common, so we don't list twice.
-> For imbalanced trees this can make readdir O(N^2) :(
- register: the netns 'dir2' from the example above needs to be
connected to the common 'dir2' when 'dir2' is
registered. I'm not even going to thing of how time
complexity/ugliness is going to explode here.
Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
fs/proc/inode.c | 2 +-
fs/proc/proc_sysctl.c | 201 ++++++++------
include/linux/sysctl.h | 155 ++++++-----
include/net/net_namespace.h | 2 +-
init/main.c | 2 +
kernel/sysctl.c | 628 ++++++++++++++++++++++++++-----------------
kernel/sysctl_check.c | 250 +++++++++---------
net/sysctl_net.c | 63 ++---
8 files changed, 730 insertions(+), 573 deletions(-)
diff --git a/fs/proc/inode.c b/fs/proc/inode.c
index d15aa1b..08166df 100644
--- a/fs/proc/inode.c
+++ b/fs/proc/inode.c
@@ -42,7 +42,7 @@ static void proc_evict_inode(struct inode *inode)
head = PROC_I(inode)->sysctl;
if (head) {
rcu_assign_pointer(PROC_I(inode)->sysctl, NULL);
- sysctl_head_put(head);
+ sysctl_proc_inode_put(head);
}
}
diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
index f50133c..c0cc16b 100644
--- a/fs/proc/proc_sysctl.c
+++ b/fs/proc/proc_sysctl.c
@@ -26,20 +26,20 @@ static struct inode *proc_sys_make_inode(struct super_block *sb,
inode->i_ino = get_next_ino();
- sysctl_head_get(head);
+ sysctl_proc_inode_get(head);
ei = PROC_I(inode);
ei->sysctl = head;
ei->sysctl_entry = table;
inode->i_mtime = inode->i_atime = inode->i_ctime = CURRENT_TIME;
- inode->i_mode = table->mode;
- if (!table->child) {
- inode->i_mode |= S_IFREG;
+
+ if (table) {
+ inode->i_mode = S_IFREG | table->mode;
inode->i_op = &proc_sys_inode_operations;
inode->i_fop = &proc_sys_file_operations;
} else {
- inode->i_mode |= S_IFDIR;
inode->i_nlink = 0;
+ inode->i_mode = S_IFDIR | S_IRUGO | S_IWUSR;
inode->i_op = &proc_sys_dir_operations;
inode->i_fop = &proc_sys_dir_file_operations;
}
@@ -51,70 +51,76 @@ static struct ctl_table *find_in_table(struct ctl_table *p, struct qstr *name)
{
int len;
for ( ; p->procname; p++) {
-
- if (!p->procname)
- continue;
-
len = strlen(p->procname);
if (len != name->len)
continue;
- if (memcmp(p->procname, name->name, len) != 0)
- continue;
-
- /* I have a match */
- return p;
+ if (memcmp(p->procname, name->name, len) == 0)
+ return p;
}
return NULL;
}
-static struct ctl_table_header *grab_header(struct inode *inode)
-{
- if (PROC_I(inode)->sysctl)
- return sysctl_head_grab(PROC_I(inode)->sysctl);
- else
- return sysctl_head_next(NULL);
-}
-
static struct dentry *proc_sys_lookup(struct inode *dir, struct dentry *dentry,
struct nameidata *nd)
{
- struct ctl_table_header *head = grab_header(dir);
- struct ctl_table *table = PROC_I(dir)->sysctl_entry;
- struct ctl_table_header *h = NULL;
+ struct ctl_table_header *head = sysctl_fs_get(PROC_I(dir)->sysctl);
struct qstr *name = &dentry->d_name;
- struct ctl_table *p;
+ struct ctl_table_header *h = NULL, *found_head = NULL;
+ struct ctl_table *table = NULL;
struct inode *inode;
struct dentry *err = ERR_PTR(-ENOENT);
+
if (IS_ERR(head))
return ERR_CAST(head);
- if (table && !table->child) {
- WARN_ON(1);
- goto out;
+retry:
+ sysctl_read_lock_head(head);
+
+ /* first check whether a subdirectory has the searched-for name */
+ list_for_each_entry(h, &head->ctl_subdirs, ctl_entry) {
+ if (IS_ERR(sysctl_fs_get(h)))
+ continue;
+
+ if (strcmp(name->name, h->dirname) == 0) {
+ found_head = h;
+ goto search_finished;
+ }
+ sysctl_fs_put(h);
}
- table = table ? table->child : head->ctl_table;
+ /* no subdir with that name, look for the file in the ctl_tables */
+ list_for_each_entry(h, &head->ctl_tables, ctl_entry) {
+ if (IS_ERR(sysctl_fs_get(h)))
+ continue;
- p = find_in_table(table, name);
- if (!p) {
- for (h = sysctl_head_next(NULL); h; h = sysctl_head_next(h)) {
- if (h->attached_to != table)
- continue;
- p = find_in_table(h->attached_by, name);
- if (p)
- break;
+ table = find_in_table(h->ctl_table_arg, name);
+ if (table) {
+ found_head = h;
+ goto search_finished;
}
+ sysctl_fs_put(h);
}
- if (!p)
+search_finished:
+ sysctl_read_unlock_head(head);
+
+ if (!found_head) {
+ struct ctl_table_header *netns_corresp;
+ netns_corresp = sysctl_fs_get_netns_corresp(head);
+ if (netns_corresp) {
+ sysctl_fs_put(head);
+ head = netns_corresp;
+ goto retry;
+ }
+ }
+ if (!found_head)
goto out;
err = ERR_PTR(-ENOMEM);
- inode = proc_sys_make_inode(dir->i_sb, h ? h : head, p);
- if (h)
- sysctl_head_finish(h);
+ inode = proc_sys_make_inode(dir->i_sb, found_head, table);
+ sysctl_fs_put(found_head);
if (!inode)
goto out;
@@ -124,7 +130,7 @@ static struct dentry *proc_sys_lookup(struct inode *dir, struct dentry *dentry,
d_add(dentry, inode);
out:
- sysctl_head_finish(head);
+ sysctl_fs_put(head);
return err;
}
@@ -132,7 +138,7 @@ static ssize_t proc_sys_call_handler(struct file *filp, void __user *buf,
size_t count, loff_t *ppos, int write)
{
struct inode *inode = filp->f_path.dentry->d_inode;
- struct ctl_table_header *head = grab_header(inode);
+ struct ctl_table_header *head = sysctl_fs_get(PROC_I(inode)->sysctl);
struct ctl_table *table = PROC_I(inode)->sysctl_entry;
ssize_t error;
size_t res;
@@ -145,7 +151,7 @@ static ssize_t proc_sys_call_handler(struct file *filp, void __user *buf,
* and won't be until we finish.
*/
error = -EPERM;
- if (sysctl_perm(head->root, table, write ? MAY_WRITE : MAY_READ))
+ if (sysctl_perm(head->ctl_group, table, write ? MAY_WRITE : MAY_READ))
goto out;
/* if that can happen at all, it should be -EINVAL, not -EISDIR */
@@ -159,7 +165,7 @@ static ssize_t proc_sys_call_handler(struct file *filp, void __user *buf,
if (!error)
error = res;
out:
- sysctl_head_finish(head);
+ sysctl_fs_put(head);
return error;
}
@@ -188,8 +194,8 @@ static int proc_sys_fill_cache(struct file *filp, void *dirent,
ino_t ino = 0;
unsigned type = DT_UNKNOWN;
- qname.name = table->procname;
- qname.len = strlen(table->procname);
+ qname.name = table ? table->procname : head->dirname;
+ qname.len = strlen(qname.name);
qname.hash = full_name_hash(qname.name, qname.len);
child = d_lookup(dir, &qname);
@@ -215,50 +221,69 @@ static int proc_sys_fill_cache(struct file *filp, void *dirent,
return !!filldir(dirent, qname.name, qname.len, filp->f_pos, ino, type);
}
-static int scan(struct ctl_table_header *head, ctl_table *table,
+static int scan(struct ctl_table_header *head,
unsigned long *pos, struct file *file,
void *dirent, filldir_t filldir)
{
+ struct ctl_table_header *h;
+ int res = 0;
- for (; table->procname; table++, (*pos)++) {
- int res;
+ sysctl_read_lock_head(head);
- /* Can't do anything without a proc name */
- if (!table->procname)
+ list_for_each_entry(h, &head->ctl_subdirs, ctl_entry) {
+ if (*pos < file->f_pos) {
+ (*pos)++;
continue;
+ }
- if (*pos < file->f_pos)
+ if (IS_ERR(sysctl_fs_get(h)))
continue;
- res = proc_sys_fill_cache(file, dirent, filldir, head, table);
+ res = proc_sys_fill_cache(file, dirent, filldir, h, NULL);
+ sysctl_fs_put(h);
if (res)
- return res;
+ goto out;
file->f_pos = *pos + 1;
+ (*pos)++;
}
- return 0;
+
+ list_for_each_entry(h, &head->ctl_tables, ctl_entry) {
+ ctl_table *t;
+
+ if (IS_ERR(sysctl_fs_get(h)))
+ continue;
+
+ for (t = h->ctl_table_arg; t->procname; t++, (*pos)++) {
+ if (*pos < file->f_pos)
+ continue;
+
+ res = proc_sys_fill_cache(file, dirent, filldir, h, t);
+ if (res) {
+ sysctl_fs_put(h);
+ goto out;
+ }
+ file->f_pos = *pos + 1;
+ }
+ sysctl_fs_put(h);
+ }
+
+out:
+ sysctl_read_unlock_head(head);
+ return res;
}
static int proc_sys_readdir(struct file *filp, void *dirent, filldir_t filldir)
{
struct dentry *dentry = filp->f_path.dentry;
struct inode *inode = dentry->d_inode;
- struct ctl_table_header *head = grab_header(inode);
- struct ctl_table *table = PROC_I(inode)->sysctl_entry;
- struct ctl_table_header *h = NULL;
+ struct ctl_table_header *head = sysctl_fs_get(PROC_I(inode)->sysctl);
unsigned long pos;
int ret = -EINVAL;
if (IS_ERR(head))
return PTR_ERR(head);
- if (table && !table->child) {
- WARN_ON(1);
- goto out;
- }
-
- table = table ? table->child : head->ctl_table;
-
ret = 0;
/* Avoid a switch here: arm builds fail with missing __cmpdi2 */
if (filp->f_pos == 0) {
@@ -274,23 +299,25 @@ static int proc_sys_readdir(struct file *filp, void *dirent, filldir_t filldir)
filp->f_pos++;
}
pos = 2;
-
- ret = scan(head, table, &pos, filp, dirent, filldir);
- if (ret)
- goto out;
-
- for (h = sysctl_head_next(NULL); h; h = sysctl_head_next(h)) {
- if (h->attached_to != table)
- continue;
- ret = scan(h, h->attached_by, &pos, filp, dirent, filldir);
- if (ret) {
- sysctl_head_finish(h);
- break;
+ ret = scan(head, &pos, filp, dirent, filldir);
+ if (!ret) {
+ /* the netns-correspondent contains only those
+ * subdirectories that are netns-specific, and not
+ * shared with the @head directory: there is no
+ * possibility to list the same directory twice (once
+ * for @head and once for @netns_corresp). Sibling
+ * tables cannot contain the entries with the same
+ * name, no need to worry about them either. */
+ struct ctl_table_header *netns_corresp;
+ netns_corresp = sysctl_fs_get_netns_corresp(head);
+ if (netns_corresp) {
+ ret = scan(netns_corresp, &pos, filp, dirent, filldir);
+ sysctl_fs_put(netns_corresp);
}
}
ret = 1;
out:
- sysctl_head_finish(head);
+ sysctl_fs_put(head);
return ret;
}
@@ -311,17 +338,17 @@ static int proc_sys_permission(struct inode *inode, int mask,unsigned int flags)
if ((mask & MAY_EXEC) && S_ISREG(inode->i_mode))
return -EACCES;
- head = grab_header(inode);
+ head = sysctl_fs_get(PROC_I(inode)->sysctl);
if (IS_ERR(head))
return PTR_ERR(head);
table = PROC_I(inode)->sysctl_entry;
- if (!table) /* global root - r-xr-xr-x */
+ if (!table) /* directory - r-xr-xr-x */
error = mask & MAY_WRITE ? -EACCES : 0;
else /* Use the permissions on the sysctl table entry */
- error = sysctl_perm(head->root, table, mask);
+ error = sysctl_perm(head->ctl_group, table, mask);
- sysctl_head_finish(head);
+ sysctl_fs_put(head);
return error;
}
@@ -352,17 +379,18 @@ static int proc_sys_setattr(struct dentry *dentry, struct iattr *attr)
static int proc_sys_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat *stat)
{
struct inode *inode = dentry->d_inode;
- struct ctl_table_header *head = grab_header(inode);
+ struct ctl_table_header *head = sysctl_fs_get(PROC_I(inode)->sysctl);
struct ctl_table *table = PROC_I(inode)->sysctl_entry;
if (IS_ERR(head))
return PTR_ERR(head);
generic_fillattr(inode, stat);
+
if (table)
stat->mode = (stat->mode & S_IFMT) | table->mode;
- sysctl_head_finish(head);
+ sysctl_fs_put(head);
return 0;
}
@@ -435,5 +463,6 @@ int __init proc_sys_init(void)
proc_sys_root->proc_iops = &proc_sys_dir_operations;
proc_sys_root->proc_fops = &proc_sys_dir_file_operations;
proc_sys_root->nlink = 0;
+
return 0;
}
diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h
index 470e06a..cd9e789 100644
--- a/include/linux/sysctl.h
+++ b/include/linux/sysctl.h
@@ -934,31 +934,39 @@ enum
/* For the /proc/sys support */
struct ctl_table;
-struct nsproxy;
-struct ctl_table_root;
+struct ctl_table_header;
+struct ctl_table_group;
+struct ctl_table_group_ops;
-struct ctl_table_set {
- struct list_head list;
- struct ctl_table_set *parent;
- int (*is_seen)(struct ctl_table_set *);
-};
+extern __init int sysctl_init(void);
-extern void setup_sysctl_set(struct ctl_table_set *p,
- struct ctl_table_set *parent,
- int (*is_seen)(struct ctl_table_set *));
+extern void sysctl_init_group(struct ctl_table_group *group,
+ const struct ctl_table_group_ops *ops,
+ int has_netns_corresp);
-struct ctl_table_header;
-extern void sysctl_head_get(struct ctl_table_header *);
-extern void sysctl_head_put(struct ctl_table_header *);
+/* get/put a reference to this header that
+ * will be/was stored in a procfs inode */
+extern void sysctl_proc_inode_get(struct ctl_table_header *);
+extern void sysctl_proc_inode_put(struct ctl_table_header *);
+
extern int sysctl_is_seen(struct ctl_table_header *);
-extern struct ctl_table_header *sysctl_head_grab(struct ctl_table_header *);
-extern struct ctl_table_header *sysctl_head_next(struct ctl_table_header *prev);
-extern struct ctl_table_header *__sysctl_head_next(struct nsproxy *namespaces,
- struct ctl_table_header *prev);
-extern void sysctl_head_finish(struct ctl_table_header *prev);
-extern int sysctl_perm(struct ctl_table_root *root,
- struct ctl_table *table, int op);
+extern int sysctl_perm(struct ctl_table_group *group,
+ struct ctl_table *table, int op);
+
+/* proctect the ctl_subdirs/ctl_tables lists */
+extern void sysctl_write_lock_head(struct ctl_table_header *head);
+extern void sysctl_write_unlock_head(struct ctl_table_header *head);
+extern void sysctl_read_lock_head(struct ctl_table_header *head);
+extern void sysctl_read_unlock_head(struct ctl_table_header *head);
+
+/* get/put references to this header for transient uses inside a VFS
+ * procfs function call. Each such reference must be 'put' back before
+ * leaving the function that 'got' it. */
+extern struct ctl_table_header *sysctl_fs_get(struct ctl_table_header *);
+extern struct ctl_table_header *sysctl_fs_get_netns_corresp(struct ctl_table_header *);
+extern void sysctl_fs_put(struct ctl_table_header *prev);
+
typedef struct ctl_table ctl_table;
@@ -986,73 +994,78 @@ extern int proc_do_large_bitmap(struct ctl_table *, int,
/*
* Register a set of sysctl names by calling __register_sysctl_paths
- * with an initialised array of struct ctl_table's. An entry with
- * NULL procname terminates the table. table->de will be
- * set up by the registration and need not be initialised in advance.
- *
- * sysctl names can be mirrored automatically under /proc/sys. The
- * procname supplied controls /proc naming.
+ * with an initialised array of struct ctl_table's. An entry with a
+ * NULL procname terminates the table.
*
* The table's mode will be honoured both for sys_sysctl(2) and
- * proc-fs access.
+ * proc-fs access (sys_sysctl(2) uses procfs internally).
+ *
+ * Only files can be represented by ctl_table elements. Directories
+ * are implemented with ctl_table_header objects.
*
- * Leaf nodes in the sysctl tree will be represented by a single file
- * under /proc; non-leaf nodes will be represented by directories. A
- * null procname disables /proc mirroring at this node.
+ * The data and maxlen fields of the ctl_table struct enable minimal
+ * validation of the values being written to be performed, and the
+ * mode field allows minimal authentication.
*
- * sysctl(2) can automatically manage read and write requests through
- * the sysctl table. The data and maxlen fields of the ctl_table
- * struct enable minimal validation of the values being written to be
- * performed, and the mode field allows minimal authentication.
- *
- * There must be a proc_handler routine for any terminal nodes
- * mirrored under /proc/sys (non-terminals are handled by a built-in
- * directory handler). Several default handlers are available to
- * cover common cases.
+ * There must be a proc_handler routine for each ctl_table node.
+ * Several default handlers are available to cover common cases.
*/
/* A sysctl table is an array of struct ctl_table: */
-struct ctl_table
-{
+struct ctl_table {
const char *procname; /* Text ID for /proc/sys, or zero */
void *data;
int maxlen;
mode_t mode;
- struct ctl_table *child;
- struct ctl_table *parent; /* Automatically set */
proc_handler *proc_handler; /* Callback for text formatting */
void *extra1;
void *extra2;
};
-struct ctl_table_root {
- struct list_head root_list;
- struct ctl_table_set default_set;
- struct ctl_table_set *(*lookup)(struct ctl_table_root *root,
- struct nsproxy *namespaces);
- int (*permissions)(struct ctl_table_root *root,
- struct nsproxy *namespaces, struct ctl_table *table);
+struct ctl_table_group_ops {
+ /* some sysctl entries are visible only in some situations.
+ * E.g.: /proc/sys/net/ipv4/conf/eth0/ is only visible in the
+ * netns in which that eth0 interface lives.
+ *
+ * If this hook is not set, then all the sysctl entries in
+ * this group are always visible. */
+ int (*is_seen)(struct ctl_table_group *group);
+
+ /* hook to alter permissions for some sysctl nodes at runtime */
+ int (*permissions)(struct ctl_table *table);
+};
+
+struct ctl_table_group {
+ const struct ctl_table_group_ops *ctl_ops;
+ /* A list of ctl_table_header elements that represent the
+ * netns-specific correspondents of some sysctl directories */
+ struct list_head corresp_list;
+ /* binary: whether this group uses @corresp_list */
+ char has_netns_corresp;
};
/* struct ctl_table_header is used to maintain dynamic lists of
struct ctl_table trees. */
-struct ctl_table_header
-{
+struct ctl_table_header {
union {
struct {
- struct ctl_table *ctl_table;
+ /* a header is used either as a wraper for a
+ * ctl_table array or as directory entry. */
+ union {
+ struct ctl_table *ctl_table_arg;
+ const char *dirname;
+ };
struct list_head ctl_entry;
- int used;
- int count;
+ int fs_func_refs;
+ int proc_inode_refs;
+ int header_refs;
};
struct rcu_head rcu;
};
struct completion *unregistering;
- struct ctl_table *ctl_table_arg;
- struct ctl_table_root *root;
- struct ctl_table_set *set;
- struct ctl_table *attached_by;
- struct ctl_table *attached_to;
+ struct ctl_table_group *ctl_group;
+ struct list_head ctl_tables;
+ struct list_head ctl_subdirs;
struct ctl_table_header *parent;
};
@@ -1061,15 +1074,19 @@ struct ctl_path {
const char *procname;
};
-void register_sysctl_root(struct ctl_table_root *root);
-struct ctl_table_header *__register_sysctl_paths(
- struct ctl_table_root *root, struct nsproxy *namespaces,
- const struct ctl_path *path, struct ctl_table *table);
-struct ctl_table_header *register_sysctl_paths(const struct ctl_path *path,
- struct ctl_table *table);
-
-void unregister_sysctl_table(struct ctl_table_header * table);
-int sysctl_check_table(struct nsproxy *namespaces, struct ctl_table *table);
+extern struct ctl_table_header *__register_sysctl_paths(struct ctl_table_group *g,
+ const struct ctl_path *p,
+ struct ctl_table *table);
+extern struct ctl_table_header *register_sysctl_paths(const struct ctl_path *path,
+ struct ctl_table *table);
+extern void unregister_sysctl_table(struct ctl_table_header *table);
+
+#ifdef CONFIG_SYSCTL_SYSCALL_CHECK
+extern int sysctl_check_table(const struct ctl_path *path,
+ int nr_dirs,
+ struct ctl_table *table);
+extern int sysctl_check_duplicates(struct ctl_table_header *header);
+#endif /* CONFIG_SYSCTL_SYSCALL_CHECK */
#endif /* __KERNEL__ */
diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
index 3ae4919..871dd2b 100644
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -52,7 +52,7 @@ struct net {
struct proc_dir_entry *proc_net_stat;
#ifdef CONFIG_SYSCTL
- struct ctl_table_set sysctls;
+ struct ctl_table_group netns_ctl_group;
#endif
struct sock *rtnl; /* rtnetlink socket */
diff --git a/init/main.c b/init/main.c
index 4a9479e..577bff6 100644
--- a/init/main.c
+++ b/init/main.c
@@ -68,6 +68,7 @@
#include <linux/shmem_fs.h>
#include <linux/slab.h>
#include <linux/perf_event.h>
+#include <linux/sysctl.h>
#include <asm/io.h>
#include <asm/bugs.h>
@@ -595,6 +596,7 @@ asmlinkage void __init start_kernel(void)
efi_enter_virtual_mode();
#endif
thread_info_cache_init();
+ sysctl_init();
cred_init();
fork_init(totalram_pages);
proc_caches_init();
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index d44c280..3ff4384 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -56,6 +56,7 @@
#include <linux/kprobes.h>
#include <linux/pipe_fs_i.h>
#include <linux/oom.h>
+#include <linux/rwsem.h>
#include <asm/uaccess.h>
#include <asm/processor.h>
@@ -197,18 +198,22 @@ static int sysrq_sysctl_handler(ctl_table *table, int write,
#endif
-static struct ctl_table root_table[];
-static struct ctl_table_root sysctl_table_root;
-static struct ctl_table_header root_table_header = {
- {{.count = 1,
- .ctl_table = root_table,
- .ctl_entry = LIST_HEAD_INIT(sysctl_table_root.default_set.list),}},
- .root = &sysctl_table_root,
- .set = &sysctl_table_root.default_set,
+static struct kmem_cache *sysctl_header_cachep;
+
+/* uses default ops and does not need the corresp_list */
+static struct ctl_table_group_ops root_table_group_ops = { };
+
+static struct ctl_table_group root_table_group = {
+ .has_netns_corresp = 0,
+ .ctl_ops = &root_table_group_ops,
};
-static struct ctl_table_root sysctl_table_root = {
- .root_list = LIST_HEAD_INIT(sysctl_table_root.root_list),
- .default_set.list = LIST_HEAD_INIT(root_table_header.ctl_entry),
+
+static struct ctl_table_header root_table_header = {
+ {{.header_refs = 1,
+ .ctl_entry = LIST_HEAD_INIT(root_table_header.ctl_entry),}},
+ .ctl_tables = LIST_HEAD_INIT(root_table_header.ctl_tables),
+ .ctl_subdirs = LIST_HEAD_INIT(root_table_header.ctl_subdirs),
+ .ctl_group = &root_table_group,
};
#ifdef HAVE_ARCH_PICK_MMAP_LAYOUT
@@ -217,10 +222,6 @@ int sysctl_legacy_va_layout;
/* The default sysctl tables: */
-static struct ctl_table root_table[] = {
- { }
-};
-
#ifdef CONFIG_SCHED_DEBUG
static int min_sched_granularity_ns = 100000; /* 100 usecs */
static int max_sched_granularity_ns = NSEC_PER_SEC; /* 1 second */
@@ -1489,22 +1490,28 @@ static struct ctl_table dev_table[] = {
static DEFINE_SPINLOCK(sysctl_lock);
-/* called under sysctl_lock */
-static int use_table(struct ctl_table_header *p)
+
+/* if it's deemed necessary, we can create a per-header rwsem. For now
+ * a global one will do. */
+static DECLARE_RWSEM(sysctl_rwsem);
+void sysctl_write_lock_head(struct ctl_table_header *head)
{
- if (unlikely(p->unregistering))
- return 0;
- p->used++;
- return 1;
+ down_write(&sysctl_rwsem);
}
-
-/* called under sysctl_lock */
-static void unuse_table(struct ctl_table_header *p)
+void sysctl_write_unlock_head(struct ctl_table_header *head)
{
- if (!--p->used)
- if (unlikely(p->unregistering))
- complete(p->unregistering);
+ up_write(&sysctl_rwsem);
}
+void sysctl_read_lock_head(struct ctl_table_header *head)
+{
+ down_read(&sysctl_rwsem);
+}
+void sysctl_read_unlock_head(struct ctl_table_header *head)
+{
+ up_read(&sysctl_rwsem);
+}
+
+
/* called under sysctl_lock, will reacquire if has to wait */
static void start_unregistering(struct ctl_table_header *p)
@@ -1513,7 +1520,7 @@ static void start_unregistering(struct ctl_table_header *p)
* if p->used is 0, nobody will ever touch that entry again;
* we'll eliminate all paths to it before dropping sysctl_lock
*/
- if (unlikely(p->used)) {
+ if (unlikely(p->fs_func_refs)) {
struct completion wait;
init_completion(&wait);
p->unregistering = &wait;
@@ -1524,123 +1531,105 @@ static void start_unregistering(struct ctl_table_header *p)
/* anything non-NULL; we'll never dereference it */
p->unregistering = ERR_PTR(-EINVAL);
}
- /*
- * do not remove from the list until nobody holds it; walking the
- * list in do_sysctl() relies on that.
- */
- list_del_init(&p->ctl_entry);
}
-void sysctl_head_get(struct ctl_table_header *head)
+void sysctl_proc_inode_get(struct ctl_table_header *head)
{
spin_lock(&sysctl_lock);
- head->count++;
+ head->proc_inode_refs++;
spin_unlock(&sysctl_lock);
}
static void free_head(struct rcu_head *rcu)
{
- kfree(container_of(rcu, struct ctl_table_header, rcu));
+ struct ctl_table_header *header;
+ header = container_of(rcu, struct ctl_table_header, rcu);
+ kmem_cache_free(sysctl_header_cachep, header);
}
-void sysctl_head_put(struct ctl_table_header *head)
+void sysctl_proc_inode_put(struct ctl_table_header *head)
{
spin_lock(&sysctl_lock);
- if (!--head->count)
+ head->proc_inode_refs--;
+ if ((head->header_refs == 0) && (head->proc_inode_refs == 0))
call_rcu(&head->rcu, free_head);
spin_unlock(&sysctl_lock);
}
-struct ctl_table_header *sysctl_head_grab(struct ctl_table_header *head)
+/* called under sysctl_lock */
+static struct ctl_table_header *__sysctl_fs_get(struct ctl_table_header *head)
+{
+ if (unlikely(head->unregistering))
+ return ERR_PTR(-ENOENT);
+
+ head->fs_func_refs++;
+ return head;
+}
+
+struct ctl_table_header *sysctl_fs_get(struct ctl_table_header *head)
{
if (!head)
- BUG();
+ head = &root_table_header;
+
spin_lock(&sysctl_lock);
- if (!use_table(head))
- head = ERR_PTR(-ENOENT);
+ head = __sysctl_fs_get(head);
spin_unlock(&sysctl_lock);
return head;
}
-void sysctl_head_finish(struct ctl_table_header *head)
+void sysctl_fs_put(struct ctl_table_header *head)
{
if (!head)
return;
spin_lock(&sysctl_lock);
- unuse_table(head);
- spin_unlock(&sysctl_lock);
-}
-static struct ctl_table_set *
-lookup_header_set(struct ctl_table_root *root, struct nsproxy *namespaces)
-{
- struct ctl_table_set *set = &root->default_set;
- if (root->lookup)
- set = root->lookup(root, namespaces);
- return set;
-}
+ if (!--head->fs_func_refs)
+ if (unlikely(head->unregistering))
+ complete(head->unregistering);
-static struct list_head *
-lookup_header_list(struct ctl_table_root *root, struct nsproxy *namespaces)
-{
- struct ctl_table_set *set = lookup_header_set(root, namespaces);
- return &set->list;
+ spin_unlock(&sysctl_lock);
}
-struct ctl_table_header *__sysctl_head_next(struct nsproxy *namespaces,
- struct ctl_table_header *prev)
+/* must be called with set protector lock (currently this is sysctl_lock) */
+static struct ctl_table_header *sysctl_fs_get_netns_corresp_dflt(
+ struct ctl_table_group *group,
+ struct ctl_table_header *head,
+ struct ctl_table_header *dflt)
{
- struct ctl_table_root *root;
- struct list_head *header_list;
- struct ctl_table_header *head;
- struct list_head *tmp;
+ struct ctl_table_header *h, *ret = NULL;
spin_lock(&sysctl_lock);
- if (prev) {
- head = prev;
- tmp = &prev->ctl_entry;
- unuse_table(prev);
- goto next;
- }
- tmp = &root_table_header.ctl_entry;
- for (;;) {
- head = list_entry(tmp, struct ctl_table_header, ctl_entry);
- if (!use_table(head))
- goto next;
- spin_unlock(&sysctl_lock);
- return head;
- next:
- root = head->root;
- tmp = tmp->next;
- header_list = lookup_header_list(root, namespaces);
- if (tmp != header_list)
+ list_for_each_entry(h, &group->corresp_list, ctl_entry) {
+ if (h->parent != head)
continue;
-
- do {
- root = list_entry(root->root_list.next,
- struct ctl_table_root, root_list);
- if (root == &sysctl_table_root)
- goto out;
- header_list = lookup_header_list(root, namespaces);
- } while (list_empty(header_list));
- tmp = header_list->next;
+ if (IS_ERR(__sysctl_fs_get(h)))
+ continue;
+ ret = h;
+ goto out;
}
+
+ if (!dflt)
+ goto out;
+
+ /* will not fail because dflt is a brand-new header that no
+ * one has seen yet, so no one has started to unregister it */
+ dflt = __sysctl_fs_get(dflt);
+ dflt->parent = head;
+ list_add_tail(&dflt->ctl_entry, &group->corresp_list);
+ ret = dflt;
+
out:
spin_unlock(&sysctl_lock);
- return NULL;
-}
-
-struct ctl_table_header *sysctl_head_next(struct ctl_table_header *prev)
-{
- return __sysctl_head_next(current->nsproxy, prev);
+ return ret;
}
-void register_sysctl_root(struct ctl_table_root *root)
+struct ctl_table_header *sysctl_fs_get_netns_corresp(struct ctl_table_header *h)
{
- spin_lock(&sysctl_lock);
- list_add_tail(&root->root_list, &sysctl_table_root.root_list);
- spin_unlock(&sysctl_lock);
+ struct ctl_table_group *g = ¤t->nsproxy->net_ns->netns_ctl_group;
+ /* dflt == NULL means: if there's a set-part return it,
+ * if there isn't, just return NULL */
+ return sysctl_fs_get_netns_corresp_dflt(g, h, NULL);
}
/*
@@ -1659,28 +1648,21 @@ static int test_perm(int mode, int op)
return -EACCES;
}
-int sysctl_perm(struct ctl_table_root *root, struct ctl_table *table, int op)
+int sysctl_perm(struct ctl_table_group *group, struct ctl_table *table, int op)
{
int mode;
- if (root->permissions)
- mode = root->permissions(root, current->nsproxy, table);
+ if (group->ctl_ops->permissions)
+ mode = group->ctl_ops->permissions(table);
else
mode = table->mode;
return test_perm(mode, op);
}
-static void sysctl_set_parent(struct ctl_table *parent, struct ctl_table *table)
-{
- for (; table->procname; table++) {
- table->parent = parent;
- if (table->child)
- sysctl_set_parent(table, table->child);
- }
-}
+static void sysctl_header_ctor(void *data);
-static __init int sysctl_init(void)
+__init int sysctl_init(void)
{
struct ctl_table_header *kern_header, *vm_header, *fs_header,
*debug_header, *dev_header;
@@ -1688,7 +1670,11 @@ static __init int sysctl_init(void)
struct ctl_table_header *binfmt_misc_header;
#endif
- sysctl_set_parent(NULL, root_table);
+ sysctl_header_cachep = kmem_cache_create("sysctl_header_cachep",
+ sizeof(struct ctl_table_header),
+ 0, 0, &sysctl_header_ctor);
+ if (!sysctl_header_cachep)
+ goto fail_alloc_cachep;
kern_header = register_sysctl_paths(kern_path, kern_table);
if (kern_header == NULL)
@@ -1716,10 +1702,6 @@ static __init int sysctl_init(void)
goto fail_register_binfmt_misc;
#endif
-
-#ifdef CONFIG_SYSCTL_SYSCALL_CHECK
- sysctl_check_table(current->nsproxy, root_table);
-#endif
return 0;
@@ -1737,62 +1719,233 @@ fail_register_fs:
fail_register_vm:
unregister_sysctl_table(kern_header);
fail_register_kern:
+ kmem_cache_destroy(sysctl_header_cachep);
+fail_alloc_cachep:
return -ENOMEM;
}
-core_initcall(sysctl_init);
+static void header_refs_inc(struct ctl_table_header*head)
+{
+ spin_lock(&sysctl_lock);
+ head->header_refs ++;
+ spin_unlock(&sysctl_lock);
+}
-static struct ctl_table *is_branch_in(struct ctl_table *branch,
- struct ctl_table *table)
+static int ctl_path_items(const struct ctl_path *path)
{
- struct ctl_table *p;
- const char *s = branch->procname;
+ int n = 0;
+ while (path->procname) {
+ path++;
+ n++;
+ }
+ return n;
+}
- /* branch should have named subdirectory as its first element */
- if (!s || !branch->child)
- return NULL;
+static void sysctl_header_ctor(void *data)
+{
+ struct ctl_table_header *h = data;
- /* ... and nothing else */
- if (branch[1].procname)
+ h->fs_func_refs = 0;
+ h->proc_inode_refs = 0;
+ h->header_refs = 0;
+
+ INIT_LIST_HEAD(&h->ctl_entry);
+ INIT_LIST_HEAD(&h->ctl_subdirs);
+ INIT_LIST_HEAD(&h->ctl_tables);
+}
+
+static struct ctl_table_header *alloc_sysctl_header(struct ctl_table_group *group)
+{
+ struct ctl_table_header *h;
+
+ h = kmem_cache_alloc(sysctl_header_cachep, GFP_KERNEL);
+ if (!h)
return NULL;
- /* table should contain subdirectory with the same name */
- for (p = table; p->procname; p++) {
- if (!p->child)
+ /* - all _refs members are zero before freeing
+ * - all list_head members point to themselves (empty lists) */
+
+ h->ctl_table_arg = NULL;
+ h->unregistering = NULL;
+ h->ctl_group = group;
+
+ return h;
+}
+
+/* Increment the references to an existing subdir of @parent with the name
+ * @name and return that subdir. If no such subdir exists, return NULL.
+ * Called under the write lock protecting parent's ctl_subdirs. */
+static struct ctl_table_header *mkdir_existing_dir(struct ctl_table_header *parent,
+ const char *name)
+{
+ struct ctl_table_header *h;
+ list_for_each_entry(h, &parent->ctl_subdirs, ctl_entry) {
+ if (IS_ERR(sysctl_fs_get(h)))
continue;
- if (p->procname && strcmp(p->procname, s) == 0)
- return p;
+ if (strcmp(name, h->dirname) == 0) {
+ header_refs_inc(h);
+ sysctl_fs_put(h);
+ return h;
+ }
+ sysctl_fs_put(h);
}
return NULL;
}
-/* see if attaching q to p would be an improvement */
-static void try_attach(struct ctl_table_header *p, struct ctl_table_header *q)
+/* Some sysctl paths are netns-specific. The last directory that in
+ * not net-ns specific will have a corespondent dir in the netns
+ * specific ctl_table_set. That corespondent will hold the lists of
+ * netns specific tables and subdirectories.
+ *
+ * E.g.: registering netns/interface specific directories:
+ * common path: /proc/sys/net/ipv4/
+ * netns path: /proc/sys/net/ipv4/conf/lo/
+ * We'll create an (unnamed) netns correspondent for 'ipv4' which will
+ * have 'conf' as it's subdir.
+ *
+ * E.g.: We're registering a netns specific file in /proc/sys/net/core/somaxconn
+ * common path: /proc/sys/net/core/
+ * netns path: /proc/sys/net/core/
+ * We'll create an (unnamed) netns correspondent for 'core'.
+ */
+static struct ctl_table_header *mkdir_netns_corresp(
+ struct ctl_table_header *parent,
+ struct ctl_table_group *group,
+ struct ctl_table_header **__netns_corresp)
+{
+ struct ctl_table_header *ret;
+
+ ret = sysctl_fs_get_netns_corresp_dflt(group, parent, *__netns_corresp);
+
+ /* *__netns_corresp is a pre-allocated header. If we used it
+ here, we have to tell the caller so it won't free it. */
+ if (*__netns_corresp == ret)
+ *__netns_corresp = NULL;
+
+ header_refs_inc(ret);
+ sysctl_fs_put(ret);
+ return ret;
+}
+
+/* Add @dir as a subdir of @parent.
+ * Called under the write lock protecting parent's ctl_subdirs. */
+static struct ctl_table_header *mkdir_new_dir(struct ctl_table_header *parent,
+ struct ctl_table_header *dir)
{
- struct ctl_table *to = p->ctl_table, *by = q->ctl_table;
- struct ctl_table *next;
- int is_better = 0;
- int not_in_parent = !p->attached_by;
+ dir->parent = parent;
+ header_refs_inc(dir);
+ list_add_tail(&dir->ctl_entry, &parent->ctl_subdirs);
+ return dir;
+}
+
+/*
+ * Attach the branch denoted by @dirs (a series of directories that
+ * are children of their predecessor in the array) to @parent.
+ *
+ * If at a level there exist in the parent tree a node with the same
+ * name as the one we're trying to add, increment that nodes'
+ * @count. If not, add that dir as a subdir of it's parent.
+ *
+ * Nodes that remain non-NULL in @dirs must be freed by the caller as
+ * they were not added to the tree.
+ *
+ * Return the corresponding ctl_table_header for dirs[nr_dirs-1] from
+ * the tree (either one added by this function, or one already in the
+ * tree).
+ */
+static struct ctl_table_header *sysctl_mkdirs(struct ctl_table_header *parent,
+ struct ctl_table_group *group,
+ const struct ctl_path *path,
+ int nr_dirs)
+{
+ struct ctl_table_header *dirs[CTL_MAXNAME];
+ struct ctl_table_header *__netns_corresp = NULL;
+ int create_first_netns_corresp = group->has_netns_corresp;
+ int i;
+
+ /* We create excess ctl_table_header for directory entries.
+ * We do so because we may need new headers while under a lock
+ * where we will not be able to allocate entries (sleeping).
+ * Also, this simplifies handling of ENOMEM: no need to remove
+ * already allocated/added directories and unlink them from
+ * their parent directories. Stuff that is not used will be
+ * freed at the end. */
+ for (i = 0; i < nr_dirs; i++) {
+ dirs[i] = alloc_sysctl_header(group);
+ if (!dirs[i])
+ goto err_alloc_dir;
+ dirs[i]->dirname = path[i].procname;
+ }
- while ((next = is_branch_in(by, to)) != NULL) {
- if (by == q->attached_by)
- is_better = 1;
- if (to == p->attached_by)
- not_in_parent = 1;
- by = by->child;
- to = next->child;
+ if (create_first_netns_corresp) {
+ /* The netns correspondent for the last common path
+ * component migh exist. However we will only know
+ * this later while being under a lock. We
+ * pre-allocate it just in case it might be needed and
+ * free it at the end only if it wasn't used. */
+ __netns_corresp = alloc_sysctl_header(group);
+ if (!__netns_corresp)
+ goto err_alloc_coresp;
}
- if (is_better && not_in_parent) {
- q->attached_by = by;
- q->attached_to = to;
- q->parent = p;
+ header_refs_inc(parent);
+
+ for (i = 0; i < nr_dirs; i++) {
+ struct ctl_table_header *h;
+
+ retry:
+ sysctl_write_lock_head(parent);
+
+ h = mkdir_existing_dir(parent, dirs[i]->dirname);
+ if (h != NULL) {
+ sysctl_write_unlock_head(parent);
+ parent = h;
+ continue;
+ }
+
+ if (likely(!create_first_netns_corresp)) {
+ h = mkdir_new_dir(parent, dirs[i]);
+ sysctl_write_unlock_head(parent);
+ parent = h;
+ dirs[i] = NULL; /* I'm used, don't free me */
+ continue;
+ }
+
+ sysctl_write_unlock_head(parent);
+
+ create_first_netns_corresp = 0;
+ parent = mkdir_netns_corresp(parent, group, &__netns_corresp);
+ /* We still have to add the new subdirectory, but
+ * instead of adding it into the common parent, add it
+ * to it's netns correspondent. */
+ goto retry;
}
+
+ if (create_first_netns_corresp)
+ parent = mkdir_netns_corresp(parent, group, &__netns_corresp);
+
+ if (__netns_corresp)
+ kmem_cache_free(sysctl_header_cachep, __netns_corresp);
+
+ /* free unused pre-allocated entries */
+ for (i = 0; i < nr_dirs; i++)
+ if (dirs[i])
+ kmem_cache_free(sysctl_header_cachep, dirs[i]);
+
+ return parent;
+
+err_alloc_coresp:
+ i = nr_dirs;
+err_alloc_dir:
+ for (i--; i >= 0; i--)
+ kmem_cache_free(sysctl_header_cachep, dirs[i]);
+ return NULL;
+
}
/**
* __register_sysctl_paths - register a sysctl hierarchy
- * @root: List of sysctl headers to register on
+ * @group: Group of sysctl headers to register on
* @namespaces: Data to compute which lists of sysctl entries are visible
* @path: The path to the directory the sysctl table is in.
* @table: the top-level table structure
@@ -1811,9 +1964,6 @@ static void try_attach(struct ctl_table_header *p, struct ctl_table_header *q)
*
* mode - the file permissions for the /proc/sys file, and for sysctl(2)
*
- * child - a pointer to the child sysctl table if this entry is a directory, or
- * %NULL.
- *
* proc_handler - the text handler routine (described below)
*
* de - for internal use by the sysctl routines
@@ -1844,77 +1994,46 @@ static void try_attach(struct ctl_table_header *p, struct ctl_table_header *q)
* to the table header on success.
*/
struct ctl_table_header *__register_sysctl_paths(
- struct ctl_table_root *root,
- struct nsproxy *namespaces,
- const struct ctl_path *path, struct ctl_table *table)
+ struct ctl_table_group *group,
+ const struct ctl_path *path,
+ struct ctl_table *table)
{
struct ctl_table_header *header;
- struct ctl_table *new, **prevp;
- unsigned int n, npath;
- struct ctl_table_set *set;
+ int failed_duplicate_check = 0;
+ int nr_dirs = ctl_path_items(path);
- /* Count the path components */
- for (npath = 0; path[npath].procname; ++npath)
- ;
+#ifdef CONFIG_SYSCTL_SYSCALL_CHECK
+ if (sysctl_check_table(path, nr_dirs, table))
+ return NULL;
+#endif
- /*
- * For each path component, allocate a 2-element ctl_table array.
- * The first array element will be filled with the sysctl entry
- * for this, the second will be the sentinel (procname == 0).
- *
- * We allocate everything in one go so that we don't have to
- * worry about freeing additional memory in unregister_sysctl_table.
- */
- header = kzalloc(sizeof(struct ctl_table_header) +
- (2 * npath * sizeof(struct ctl_table)), GFP_KERNEL);
+ header = alloc_sysctl_header(group);
if (!header)
return NULL;
- new = (struct ctl_table *) (header + 1);
-
- /* Now connect the dots */
- prevp = &header->ctl_table;
- for (n = 0; n < npath; ++n, ++path) {
- /* Copy the procname */
- new->procname = path->procname;
- new->mode = 0555;
-
- *prevp = new;
- prevp = &new->child;
-
- new += 2;
+ header->parent = sysctl_mkdirs(&root_table_header, group, path, nr_dirs);
+ if (!header->parent) {
+ kmem_cache_free(sysctl_header_cachep, header);
+ return NULL;
}
- *prevp = table;
+
header->ctl_table_arg = table;
+ header->header_refs = 1;
+
+ sysctl_write_lock_head(header->parent);
- INIT_LIST_HEAD(&header->ctl_entry);
- header->used = 0;
- header->unregistering = NULL;
- header->root = root;
- sysctl_set_parent(NULL, header->ctl_table);
- header->count = 1;
#ifdef CONFIG_SYSCTL_SYSCALL_CHECK
- if (sysctl_check_table(namespaces, header->ctl_table)) {
- kfree(header);
- return NULL;
- }
+ failed_duplicate_check = sysctl_check_duplicates(header);
#endif
- spin_lock(&sysctl_lock);
- header->set = lookup_header_set(root, namespaces);
- header->attached_by = header->ctl_table;
- header->attached_to = root_table;
- header->parent = &root_table_header;
- for (set = header->set; set; set = set->parent) {
- struct ctl_table_header *p;
- list_for_each_entry(p, &set->list, ctl_entry) {
- if (p->unregistering)
- continue;
- try_attach(p, header);
- }
+ if (!failed_duplicate_check)
+ list_add_tail(&header->ctl_entry, &header->parent->ctl_tables);
+
+ sysctl_write_unlock_head(header->parent);
+
+ if (failed_duplicate_check) {
+ unregister_sysctl_table(header);
+ return NULL;
}
- header->parent->count++;
- list_add_tail(&header->ctl_entry, &header->set->list);
- spin_unlock(&sysctl_lock);
return header;
}
@@ -1932,57 +2051,67 @@ struct ctl_table_header *__register_sysctl_paths(
struct ctl_table_header *register_sysctl_paths(const struct ctl_path *path,
struct ctl_table *table)
{
- return __register_sysctl_paths(&sysctl_table_root, current->nsproxy,
- path, table);
+ return __register_sysctl_paths(&root_table_group, path, table);
}
/**
* unregister_sysctl_table - unregister a sysctl table hierarchy
- * @header: the header returned from __register_sysctl_paths
+ * @h: the header returned from __register_sysctl_paths
*
* Unregisters the sysctl table and all children. proc entries may not
* actually be removed until they are no longer used by anyone.
*/
-void unregister_sysctl_table(struct ctl_table_header * header)
+void unregister_sysctl_table(struct ctl_table_header *header)
{
might_sleep();
- if (header == NULL)
- return;
+ while(header) {
+ struct ctl_table_header *parent = header->parent;
- spin_lock(&sysctl_lock);
- start_unregistering(header);
- if (!--header->parent->count) {
- WARN_ON(1);
- call_rcu(&header->parent->rcu, free_head);
+ /* ctl_entry is a member of the parent's ctl_tables or
+ * ctl_subdirs lists which are protected by the
+ * parent's write lock. */
+ sysctl_write_lock_head(parent);
+
+ /* the three counters (header_refs, proc_inode_refs and
+ * used) are protected by the spin lock */
+
+ spin_lock(&sysctl_lock);
+ if (!--header->header_refs) {
+ start_unregistering(header);
+ list_del_init(&header->ctl_entry);
+ if (!header->proc_inode_refs)
+ call_rcu(&header->rcu, free_head);
+ }
+ spin_unlock(&sysctl_lock);
+
+ sysctl_write_unlock_head(parent);
+ header = parent;
}
- if (!--header->count)
- call_rcu(&header->rcu, free_head);
- spin_unlock(&sysctl_lock);
}
-int sysctl_is_seen(struct ctl_table_header *p)
+int sysctl_is_seen(struct ctl_table_header *head)
{
- struct ctl_table_set *set = p->set;
- int res;
+ struct ctl_table_group *group = head->ctl_group;
+ int ret;
spin_lock(&sysctl_lock);
- if (p->unregistering)
- res = 0;
- else if (!set->is_seen)
- res = 1;
+ if (head->unregistering)
+ ret = 0;
+ else if (!group->ctl_ops->is_seen)
+ ret = 1;
else
- res = set->is_seen(set);
+ ret = group->ctl_ops->is_seen(group);
spin_unlock(&sysctl_lock);
- return res;
+ return ret;
}
-
-void setup_sysctl_set(struct ctl_table_set *p,
- struct ctl_table_set *parent,
- int (*is_seen)(struct ctl_table_set *))
+void sysctl_init_group(struct ctl_table_group *group,
+ const struct ctl_table_group_ops *ops,
+ int has_netns_corresp)
{
- INIT_LIST_HEAD(&p->list);
- p->parent = parent ? parent : &sysctl_table_root.default_set;
- p->is_seen = is_seen;
+ group->ctl_ops = ops;
+ group->has_netns_corresp = has_netns_corresp;
+ if (has_netns_corresp)
+ INIT_LIST_HEAD(&group->corresp_list);
}
#else /* !CONFIG_SYSCTL */
@@ -1996,15 +2125,14 @@ void unregister_sysctl_table(struct ctl_table_header * table)
{
}
-void setup_sysctl_set(struct ctl_table_set *p,
- struct ctl_table_set *parent,
- int (*is_seen)(struct ctl_table_set *))
+void sysctl_init_group(struct ctl_table_group *group,
+ const struct ctl_table_group_ops *ops,
+ int has_netns_corresp)
{
}
-void sysctl_head_put(struct ctl_table_header *head)
-{
-}
+void sysctl_proc_inode_get(struct ctl_table_header *head) { }
+void sysctl_proc_inode_put(struct ctl_table_header *head) { }
#endif /* CONFIG_SYSCTL */
diff --git a/kernel/sysctl_check.c b/kernel/sysctl_check.c
index 4e4932a..ccd39a3 100644
--- a/kernel/sysctl_check.c
+++ b/kernel/sysctl_check.c
@@ -1,160 +1,152 @@
-#include <linux/stat.h>
#include <linux/sysctl.h>
-#include "../fs/xfs/linux-2.6/xfs_sysctl.h"
-#include <linux/sunrpc/debug.h>
#include <linux/string.h>
-#include <net/ip_vs.h>
-
-static int sysctl_depth(struct ctl_table *table)
+/*
+ * @path: the path to the offender
+ * @offender is the name of a file or directory that violated some sysctl rules.
+ * @str: a message accompanying the error
+ */
+static void fail(const struct ctl_path *path,
+ const char *offender,
+ const char *str)
{
- struct ctl_table *tmp;
- int depth;
-
- depth = 0;
- for (tmp = table; tmp->parent; tmp = tmp->parent)
- depth++;
+ printk(KERN_ERR "sysctl sanity check failed: ");
- return depth;
-}
-
-static struct ctl_table *sysctl_parent(struct ctl_table *table, int n)
-{
- int i;
+ for (; path->procname; path++)
+ printk("/%s", path->procname);
- for (i = 0; table && i < n; i++)
- table = table->parent;
+ if (offender)
+ printk("/%s", offender);
- return table;
+ printk(": %s\n", str);
}
+#define FAIL(str) do { fail(path, t->procname, str); error = -EINVAL;} while (0)
-static void sysctl_print_path(struct ctl_table *table)
+int sysctl_check_table(const struct ctl_path *path,
+ int nr_dirs,
+ struct ctl_table *table)
{
- struct ctl_table *tmp;
- int depth, i;
- depth = sysctl_depth(table);
- if (table->procname) {
- for (i = depth; i >= 0; i--) {
- tmp = sysctl_parent(table, i);
- printk("/%s", tmp->procname?tmp->procname:"");
+ struct ctl_table *t;
+ int error = 0;
+
+ if (nr_dirs > CTL_MAXNAME - 1) {
+ fail(path, NULL, "tree too deep");
+ error = -EINVAL;
+ }
+
+ for(t = table; t->procname; t++) {
+ if ((t->proc_handler == proc_dostring) ||
+ (t->proc_handler == proc_dointvec) ||
+ (t->proc_handler == proc_dointvec_minmax) ||
+ (t->proc_handler == proc_dointvec_jiffies) ||
+ (t->proc_handler == proc_dointvec_userhz_jiffies) ||
+ (t->proc_handler == proc_dointvec_ms_jiffies) ||
+ (t->proc_handler == proc_doulongvec_minmax) ||
+ (t->proc_handler == proc_doulongvec_ms_jiffies_minmax)) {
+ if (!t->data)
+ FAIL("No data");
+ if (!t->maxlen)
+ FAIL("No maxlen");
}
+#ifdef CONFIG_PROC_SYSCTL
+ if (!t->proc_handler)
+ FAIL("No proc_handler");
+#endif
+ if (t->mode > 0777)
+ FAIL("bogus .mode");
}
- printk(" ");
+
+ if (error)
+ dump_stack();
+
+ return error;
}
-static struct ctl_table *sysctl_check_lookup(struct nsproxy *namespaces,
- struct ctl_table *table)
+
+/*
+ * @dir: the directory imediately above the offender
+ * @offender is the name of a file or directory that violated some sysctl rules.
+ */
+static void duplicate_error(struct ctl_table_header *dir,
+ const char *offender)
{
- struct ctl_table_header *head;
- struct ctl_table *ref, *test;
- int depth, cur_depth;
-
- depth = sysctl_depth(table);
-
- for (head = __sysctl_head_next(namespaces, NULL); head;
- head = __sysctl_head_next(namespaces, head)) {
- cur_depth = depth;
- ref = head->ctl_table;
-repeat:
- test = sysctl_parent(table, cur_depth);
- for (; ref->procname; ref++) {
- int match = 0;
- if (cur_depth && !ref->child)
- continue;
-
- if (test->procname && ref->procname &&
- (strcmp(test->procname, ref->procname) == 0))
- match++;
-
- if (match) {
- if (cur_depth != 0) {
- cur_depth--;
- ref = ref->child;
- goto repeat;
- }
- goto out;
- }
+ const char *names[CTL_MAXNAME];
+ int i = 0;
+
+ printk(KERN_ERR "sysctl duplicate check failed: ");
+
+ for (; dir->parent; dir = dir->parent)
+ /* dirname can be NULL: netns-correspondent
+ * directories do not have a dirname. Their only
+ * pourpose is to hold the list of
+ * subdirs/subtables. They hold netns-specific
+ * information for the parent directory. */
+ if (dir->dirname) {
+ names[i] = dir->dirname;
+ i++;
}
- }
- ref = NULL;
-out:
- sysctl_head_finish(head);
- return ref;
+
+ /* Print the names in the normal path order, not reversed */
+ for(i--; i >= 0; i--)
+ printk("/%s", names[i]);
+
+ printk("/%s \n", offender);
}
-static void set_fail(const char **fail, struct ctl_table *table, const char *str)
+/* is there an entry in the table with the same procname? */
+static int match(struct ctl_table *table, const char *name)
{
- if (*fail) {
- printk(KERN_ERR "sysctl table check failed: ");
- sysctl_print_path(table);
- printk(" %s\n", *fail);
- dump_stack();
+ for ( ; table->procname; table++) {
+
+ if (strcmp(table->procname, name) == 0)
+ return 1;
}
- *fail = str;
+ return 0;
}
-static void sysctl_check_leaf(struct nsproxy *namespaces,
- struct ctl_table *table, const char **fail)
+
+/* Called under header->parent write lock.
+ *
+ * checks whether this header's table introduces items that have the
+ * same names as other items at the same level (other files or
+ * subdirectories of the current dir). */
+int sysctl_check_duplicates(struct ctl_table_header *header)
{
- struct ctl_table *ref;
+ int has_duplicates = 0;
+ struct ctl_table *table = header->ctl_table_arg;
+ struct ctl_table_header *dir = header->parent;
+ struct ctl_table_header *h;
+
+ list_for_each_entry(h, &dir->ctl_subdirs, ctl_entry) {
+ if (IS_ERR(sysctl_fs_get(h)))
+ continue;
+
+ if (match(table, h->dirname)) {
+ has_duplicates = 1;
+ duplicate_error(dir, h->dirname);
+ }
- ref = sysctl_check_lookup(namespaces, table);
- if (ref && (ref != table))
- set_fail(fail, table, "Sysctl already exists");
-}
+ sysctl_fs_put(h);
+ }
-int sysctl_check_table(struct nsproxy *namespaces, struct ctl_table *table)
-{
- int error = 0;
- for (; table->procname; table++) {
- const char *fail = NULL;
+ list_for_each_entry(h, &dir->ctl_tables, ctl_entry) {
+ ctl_table *t;
- if (table->parent) {
- if (!table->parent->procname)
- set_fail(&fail, table, "Parent without procname");
- }
- if (table->child) {
- if (table->data)
- set_fail(&fail, table, "Directory with data?");
- if (table->maxlen)
- set_fail(&fail, table, "Directory with maxlen?");
- if ((table->mode & (S_IRUGO|S_IXUGO)) != table->mode)
- set_fail(&fail, table, "Writable sysctl directory");
- if (table->proc_handler)
- set_fail(&fail, table, "Directory with proc_handler");
- if (table->extra1)
- set_fail(&fail, table, "Directory with extra1");
- if (table->extra2)
- set_fail(&fail, table, "Directory with extra2");
- } else {
- if ((table->proc_handler == proc_dostring) ||
- (table->proc_handler == proc_dointvec) ||
- (table->proc_handler == proc_dointvec_minmax) ||
- (table->proc_handler == proc_dointvec_jiffies) ||
- (table->proc_handler == proc_dointvec_userhz_jiffies) ||
- (table->proc_handler == proc_dointvec_ms_jiffies) ||
- (table->proc_handler == proc_doulongvec_minmax) ||
- (table->proc_handler == proc_doulongvec_ms_jiffies_minmax)) {
- if (!table->data)
- set_fail(&fail, table, "No data");
- if (!table->maxlen)
- set_fail(&fail, table, "No maxlen");
+ if (IS_ERR(sysctl_fs_get(h)))
+ continue;
+
+ for (t = h->ctl_table_arg; t->procname; t++) {
+ if (match(table, t->procname)) {
+ has_duplicates = 1;
+ duplicate_error(dir, t->procname);
}
-#ifdef CONFIG_PROC_SYSCTL
- if (!table->proc_handler)
- set_fail(&fail, table, "No proc_handler");
-#endif
- sysctl_check_leaf(namespaces, table, &fail);
- }
- if (table->mode > 0777)
- set_fail(&fail, table, "bogus .mode");
- if (fail) {
- set_fail(&fail, table, NULL);
- error = -EINVAL;
}
- if (table->child)
- error |= sysctl_check_table(namespaces, table->child);
+ sysctl_fs_put(h);
}
- return error;
+
+ if (has_duplicates)
+ dump_stack();
+
+ return has_duplicates;
}
diff --git a/net/sysctl_net.c b/net/sysctl_net.c
index ca84212..c541541 100644
--- a/net/sysctl_net.c
+++ b/net/sysctl_net.c
@@ -29,21 +29,13 @@
#include <linux/if_tr.h>
#endif
-static struct ctl_table_set *
-net_ctl_header_lookup(struct ctl_table_root *root, struct nsproxy *namespaces)
+static int is_seen(struct ctl_table_group *group)
{
- return &namespaces->net_ns->sysctls;
-}
-
-static int is_seen(struct ctl_table_set *set)
-{
- return ¤t->nsproxy->net_ns->sysctls == set;
+ return ¤t->nsproxy->net_ns->netns_ctl_group == group;
}
/* Return standard mode bits for table entry. */
-static int net_ctl_permissions(struct ctl_table_root *root,
- struct nsproxy *nsproxy,
- struct ctl_table *table)
+static int net_ctl_permissions(struct ctl_table *table)
{
/* Allow network administrator to have same access as root. */
if (capable(CAP_NET_ADMIN)) {
@@ -53,35 +45,39 @@ static int net_ctl_permissions(struct ctl_table_root *root,
return table->mode;
}
-static struct ctl_table_root net_sysctl_root = {
- .lookup = net_ctl_header_lookup,
+static const struct ctl_table_group_ops net_sysctl_group_ops = {
+ .is_seen = is_seen,
.permissions = net_ctl_permissions,
};
-static int net_ctl_ro_header_perms(struct ctl_table_root *root,
- struct nsproxy *namespaces, struct ctl_table *table)
+static int net_ctl_ro_permissions(struct ctl_table *table)
{
- if (net_eq(namespaces->net_ns, &init_net))
+ if (net_eq(current->nsproxy->net_ns, &init_net))
return table->mode;
else
return table->mode & ~0222;
}
-static struct ctl_table_root net_sysctl_ro_root = {
- .permissions = net_ctl_ro_header_perms,
+static const struct ctl_table_group_ops net_sysctl_ro_group_ops = {
+ .permissions = net_ctl_ro_permissions,
+};
+static struct ctl_table_group net_sysctl_ro_group = {
+ .has_netns_corresp = 0,
+ .ctl_ops = &net_sysctl_ro_group_ops,
};
static int __net_init sysctl_net_init(struct net *net)
{
- setup_sysctl_set(&net->sysctls,
- &net_sysctl_ro_root.default_set,
- is_seen);
+ int has_netns_corresp = 1;
+
+ sysctl_init_group(&net->netns_ctl_group, &net_sysctl_group_ops,
+ has_netns_corresp);
return 0;
}
static void __net_exit sysctl_net_exit(struct net *net)
{
- WARN_ON(!list_empty(&net->sysctls.list));
+ WARN_ON(!list_empty(&net->netns_ctl_group.corresp_list));
}
static struct pernet_operations sysctl_pernet_ops = {
@@ -89,36 +85,29 @@ static struct pernet_operations sysctl_pernet_ops = {
.exit = sysctl_net_exit,
};
-static __init int sysctl_init(void)
+static __init int net_sysctl_init(void)
{
int ret;
ret = register_pernet_subsys(&sysctl_pernet_ops);
if (ret)
goto out;
- register_sysctl_root(&net_sysctl_root);
- setup_sysctl_set(&net_sysctl_ro_root.default_set, NULL, NULL);
- register_sysctl_root(&net_sysctl_ro_root);
out:
return ret;
}
-subsys_initcall(sysctl_init);
+subsys_initcall(net_sysctl_init);
struct ctl_table_header *register_net_sysctl_table(struct net *net,
- const struct ctl_path *path, struct ctl_table *table)
+ const struct ctl_path *path,
+ struct ctl_table *table)
{
- struct nsproxy namespaces;
- namespaces = *current->nsproxy;
- namespaces.net_ns = net;
- return __register_sysctl_paths(&net_sysctl_root,
- &namespaces, path, table);
+ return __register_sysctl_paths(&net->netns_ctl_group, path, table);
}
EXPORT_SYMBOL_GPL(register_net_sysctl_table);
-struct ctl_table_header *register_net_sysctl_rotable(const
- struct ctl_path *path, struct ctl_table *table)
+struct ctl_table_header *register_net_sysctl_rotable(const struct ctl_path *path,
+ struct ctl_table *table)
{
- return __register_sysctl_paths(&net_sysctl_ro_root,
- &init_nsproxy, path, table);
+ return __register_sysctl_paths(&net_sysctl_ro_group, path, table);
}
EXPORT_SYMBOL_GPL(register_net_sysctl_rotable);
--
1.7.5.134.g1c08b
This is an optimisation for registering paths that you know will be
used to register a single table. Because such directories will be used
only once, sysctl will always create an entry for it when it sees it.
When sysctl registers a table, for each directory that may be used
while registering other tables we do a linear search to see if it's
already added, and, if not, add it ourselves.
For example: each netdevice will register a single table under
/proc/sys/net/ipv4/conf/DEVNAME/.
The 'DEVNAME' component of the path is not used to register other
headers, and we can optimise adding that directory: we don't have to
check if it's already registered.
This will have a positive performance impact when registering many
such directories because we're doing a O(nr of sibling directories)
search. With @has_just_one_subheader=1 set we skip that search and add
the directory directly because we know no other sibling directory with
the same name was registered.
NOTE: in this example setting @has_just_one_subheader=1 for the 'conf'
ctl_path would be wrong because it's used when registering other
subheaders too (e.g. subheaders for other netdevices).
Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
include/linux/sysctl.h | 31 +++++++++++++++++++++++++++++++
kernel/sysctl.c | 12 +++++++-----
2 files changed, 38 insertions(+), 5 deletions(-)
diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h
index cd9e789..0931165 100644
--- a/include/linux/sysctl.h
+++ b/include/linux/sysctl.h
@@ -1072,6 +1072,37 @@ struct ctl_table_header {
/* struct ctl_path describes where in the hierarchy a table is added */
struct ctl_path {
const char *procname;
+
+
+ /* This is an optimisation for registering paths that you know
+ * will be used to register a single table. Because such
+ * directories will be used only once, sysctl will always
+ * create an entry for it when it sees it.
+ *
+ * When sysctl registers a table, for each directory that may
+ * be used while registering other tables we do a linear
+ * search to see if it's already added, and, if not, add it
+ * ourselves.
+ *
+ * For example: each netdevice will register a single table
+ * under /proc/sys/net/ipv4/conf/DEVNAME/.
+ *
+ * The 'DEVNAME' component of the path is not used to register
+ * other headers, and we can optimise adding that directory:
+ * we don't have to check if it's already registered.
+ *
+ * This will have a positive performance impact when
+ * registering many such directories because we're doing a
+ * O(nr of sibling directories) search. With
+ * @has_just_one_subheader=1 set we skip that search and add
+ * the directory directly because we know no other sibling
+ * directory with the same name was registered.
+ *
+ * NOTE: in this example setting @has_just_one_subheader=1 for
+ * the 'conf' ctl_path would be wrong because it's used when
+ * registering other subheaders too (e.g. subheaders for other
+ * netdevices). */
+ int has_just_one_subheader;
};
extern struct ctl_table_header *__register_sysctl_paths(struct ctl_table_group *g,
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 3ff4384..6747259 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -1896,11 +1896,13 @@ static struct ctl_table_header *sysctl_mkdirs(struct ctl_table_header *parent,
retry:
sysctl_write_lock_head(parent);
- h = mkdir_existing_dir(parent, dirs[i]->dirname);
- if (h != NULL) {
- sysctl_write_unlock_head(parent);
- parent = h;
- continue;
+ if (!path[i].has_just_one_subheader) {
+ h = mkdir_existing_dir(parent, dirs[i]->dirname);
+ if (h != NULL) {
+ sysctl_write_unlock_head(parent);
+ parent = h;
+ continue;
+ }
}
if (likely(!create_first_netns_corresp)) {
--
1.7.5.134.g1c08b
Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
net/ipv4/devinet.c | 8 +++++++-
1 files changed, 7 insertions(+), 1 deletions(-)
diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
index 5345b0b..a9e2094 100644
--- a/net/ipv4/devinet.c
+++ b/net/ipv4/devinet.c
@@ -1631,7 +1631,13 @@ static int __devinet_sysctl_register(struct net *net, char *dev_name,
{ .procname = "net", },
{ .procname = "ipv4", },
{ .procname = "conf", },
- { /* to be set */ },
+ {
+ /* to be set bellow (DEVINET_CTL_PATH_DEV) */
+ .procname = NULL,
+ /* skip duplicate name check; we're registering
+ * just one subheader for this directory */
+ .has_just_one_subheader = 1,
+ },
{ },
};
--
1.7.5.134.g1c08b
Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
net/core/neighbour.c | 8 +++++++-
1 files changed, 7 insertions(+), 1 deletions(-)
diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index 799f06e..63677be 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -2818,7 +2818,13 @@ int neigh_sysctl_register(struct net_device *dev, struct neigh_parms *p,
{ .procname = "net", },
{ .procname = "proto", },
{ .procname = "neigh", },
- { .procname = "default", },
+ {
+ /* will be set to device name (NEIGH_CTL_PATH_DEV) */
+ .procname = "default",
+ /* skip duplicate name check; we're registering
+ * just one subheader for this directory */
+ .has_just_one_subheader = 1,
+ },
{ },
};
--
1.7.5.134.g1c08b
Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
net/ipv6/addrconf.c | 8 +++++++-
1 files changed, 7 insertions(+), 1 deletions(-)
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 1493534..9e2feb0 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -4486,7 +4486,13 @@ static int __addrconf_sysctl_register(struct net *net, char *dev_name,
{ .procname = "net", },
{ .procname = "ipv6", },
{ .procname = "conf", },
- { /* to be set */ },
+ {
+ /* to be set bellow (ADDRCONF_CTL_PATH_DEV) */
+ .procname = NULL,
+ /* skip duplicate name check; we're registering
+ * just one subheader for this directory */
+ .has_just_one_subheader = 1,
+ },
{ },
};
--
1.7.5.134.g1c08b
This patch was not tested!
Parport registers tables under these paths:
dev/parport/default/
dev/parport/PORT/
dev/parport/PORT/devices/
dev/parport/PORT/devices/DEVICE/
Nothing else is registered below dev/parport/PORT/devices/DEVICE/ and
I assume device names are unique (if they are not this patch is
invalid), so we can skip name checks for the 'DEVICE' directory.
This will have a positive performance impact when there are many
devices registered on the same port.
Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
drivers/parport/procfs.c | 7 ++++++-
1 files changed, 6 insertions(+), 1 deletions(-)
diff --git a/drivers/parport/procfs.c b/drivers/parport/procfs.c
index 3bb5bed..9c48946 100644
--- a/drivers/parport/procfs.c
+++ b/drivers/parport/procfs.c
@@ -442,7 +442,12 @@ int parport_device_proc_register(struct pardevice *device)
{ .procname = "parport" },
{ .procname = port->name },
{ .procname = "devices" },
- { .procname = device->name },
+ {
+ .procname = device->name,
+ /* skip duplicate name check; we're registering
+ * just one subheader for this directory */
+ .has_just_one_subheader = 1,
+ },
{ },
};
--
1.7.5.134.g1c08b
Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
net/ax25/sysctl_net_ax25.c | 8 +++++++-
1 files changed, 7 insertions(+), 1 deletions(-)
diff --git a/net/ax25/sysctl_net_ax25.c b/net/ax25/sysctl_net_ax25.c
index b1181bc..9bd49c0 100644
--- a/net/ax25/sysctl_net_ax25.c
+++ b/net/ax25/sysctl_net_ax25.c
@@ -160,7 +160,13 @@ void ax25_register_sysctl(struct ax25_dev *ax25_dev)
struct ctl_path ax25_path[] = {
{ .procname = "net" },
{ .procname = "ax25" },
- { .procname = ax25_dev->dev->name },
+ {
+ .procname = ax25_dev->dev->name,
+ /* skip duplicate name check; we're registering
+ * just one subheader for this directory */
+ .has_just_one_subheader = 1,
+
+ },
{ }
};
--
1.7.5.134.g1c08b
Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
kernel/sched.c | 8 +++++++-
1 files changed, 7 insertions(+), 1 deletions(-)
diff --git a/kernel/sched.c b/kernel/sched.c
index 6e39b7c..8320365 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -6250,7 +6250,13 @@ static void register_sched_domain_sysctl(void)
{ .procname = "kernel" },
{ .procname = "sched_domain" },
{ /* 'cpu0' */ },
- { /* 'domain0' */ },
+ {
+ /* 'domain0' */
+ .procname = NULL,
+ /* skip duplicate name check; we're registering
+ * just one subheader for this directory */
+ .has_just_one_subheader = 1,
+ },
{ },
};
--
1.7.5.134.g1c08b
This patch was not tested!
I assume the DN_CTL_PATH_DEV .procname names are unique. If they are
not this patch is invalid.
Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
net/decnet/dn_dev.c | 8 +++++++-
1 files changed, 7 insertions(+), 1 deletions(-)
diff --git a/net/decnet/dn_dev.c b/net/decnet/dn_dev.c
index 0dcaa90..d83d561 100644
--- a/net/decnet/dn_dev.c
+++ b/net/decnet/dn_dev.c
@@ -216,7 +216,13 @@ static void dn_dev_sysctl_register(struct net_device *dev, struct dn_dev_parms *
{ .procname = "net", },
{ .procname = "decnet", },
{ .procname = "conf", },
- { /* to be set */ },
+ {
+ /* to be set bellow (DN_CTL_PATH_DEV) */
+ .procname = NULL,
+ /* skip duplicate name check; we're registering
+ * just one subheader for this directory */
+ .has_just_one_subheader = 1,
+ },
{ },
};
--
1.7.5.134.g1c08b
Apologies to reviewers who will feel insulted reading this. This patch
is just for kicks - and by kicks I mean ass-kicks for such an awful
misuse of the RCU API. I haven't done anything with RCUs until now and
I'm very unsure about the sanity of this patch.
This patch replaces the reader-writer lock protected lists ctl_subdirs
and ctl_tables with RCU protected lists.
Unlike in the RCU sniplets I found, where the Reader part only read
data from the object - Updates were done on a separate Copy (RCU ...),
here readers do change some data in the list elements (data access
protected by a separate spin lock), but does not touch the list_head.
read-side:
- uses the for...rcu list traversal for DEC Alpha memory whatever
- rcu_read_(un)lock make sure the grace period is as long as needed
write-site:
- writers are synchronized with a spin-lock
- list adding/removing is done with list_add_tail_rcu/list_del_rcu
- freeing of elements is done after the grace period has ended (call_rcu)
Also note that there may be unwanted interactions with the RCU
protected VFS routines: ctl_table_header elements are scheduled to be
freed when all references to them have disappeared. This means after
removing the element from the list of at a later time (also with
call_rcu). I don't think that delaying free-ing some more would be a
problem, but I may be very wrong.
Free-ing of ctl_table_header is done with free_head. This is
scheduled to be called with call_rcu in two places:
- sysctl_proc_inode_put() called from the VFS by proc_evict_inode which uses
rcu_assign_pointer(PROC_I(inode)->sysctl, NULL)
to delete the VFS's last reference to the object
- unregister_sysctl_table (no connection to the VFS).
Each of them determines if all references to that object have
disappeared, and if so, schedule the object to be freed with call_rcu.
Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
fs/proc/proc_sysctl.c | 8 ++++----
kernel/sysctl.c | 23 +++++++++++------------
kernel/sysctl_check.c | 5 +++--
3 files changed, 18 insertions(+), 18 deletions(-)
diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
index c0cc16b..692acbb 100644
--- a/fs/proc/proc_sysctl.c
+++ b/fs/proc/proc_sysctl.c
@@ -79,7 +79,7 @@ retry:
sysctl_read_lock_head(head);
/* first check whether a subdirectory has the searched-for name */
- list_for_each_entry(h, &head->ctl_subdirs, ctl_entry) {
+ list_for_each_entry_rcu(h, &head->ctl_subdirs, ctl_entry) {
if (IS_ERR(sysctl_fs_get(h)))
continue;
@@ -91,7 +91,7 @@ retry:
}
/* no subdir with that name, look for the file in the ctl_tables */
- list_for_each_entry(h, &head->ctl_tables, ctl_entry) {
+ list_for_each_entry_rcu(h, &head->ctl_tables, ctl_entry) {
if (IS_ERR(sysctl_fs_get(h)))
continue;
@@ -230,7 +230,7 @@ static int scan(struct ctl_table_header *head,
sysctl_read_lock_head(head);
- list_for_each_entry(h, &head->ctl_subdirs, ctl_entry) {
+ list_for_each_entry_rcu(h, &head->ctl_subdirs, ctl_entry) {
if (*pos < file->f_pos) {
(*pos)++;
continue;
@@ -248,7 +248,7 @@ static int scan(struct ctl_table_header *head,
(*pos)++;
}
- list_for_each_entry(h, &head->ctl_tables, ctl_entry) {
+ list_for_each_entry_rcu(h, &head->ctl_tables, ctl_entry) {
ctl_table *t;
if (IS_ERR(sysctl_fs_get(h)))
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 6747259..76dfcd7 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -1491,28 +1491,26 @@ static struct ctl_table dev_table[] = {
static DEFINE_SPINLOCK(sysctl_lock);
-/* if it's deemed necessary, we can create a per-header rwsem. For now
- * a global one will do. */
-static DECLARE_RWSEM(sysctl_rwsem);
+/* protection for the headers' ctl_subdirs/ctl_tables lists */
+static DEFINE_SPINLOCK(sysctl_list_lock);
void sysctl_write_lock_head(struct ctl_table_header *head)
{
- down_write(&sysctl_rwsem);
+ spin_lock(&sysctl_list_lock);
}
void sysctl_write_unlock_head(struct ctl_table_header *head)
{
- up_write(&sysctl_rwsem);
+ spin_unlock(&sysctl_list_lock);
}
void sysctl_read_lock_head(struct ctl_table_header *head)
{
- down_read(&sysctl_rwsem);
+ rcu_read_lock();
}
void sysctl_read_unlock_head(struct ctl_table_header *head)
{
- up_read(&sysctl_rwsem);
+ rcu_read_unlock();
}
-
/* called under sysctl_lock, will reacquire if has to wait */
static void start_unregistering(struct ctl_table_header *p)
{
@@ -1768,6 +1766,7 @@ static struct ctl_table_header *alloc_sysctl_header(struct ctl_table_group *grou
h->ctl_table_arg = NULL;
h->unregistering = NULL;
h->ctl_group = group;
+ INIT_LIST_HEAD(&h->ctl_entry);
return h;
}
@@ -1779,7 +1778,7 @@ static struct ctl_table_header *mkdir_existing_dir(struct ctl_table_header *pare
const char *name)
{
struct ctl_table_header *h;
- list_for_each_entry(h, &parent->ctl_subdirs, ctl_entry) {
+ list_for_each_entry_rcu(h, &parent->ctl_subdirs, ctl_entry) {
if (IS_ERR(sysctl_fs_get(h)))
continue;
if (strcmp(name, h->dirname) == 0) {
@@ -1834,7 +1833,7 @@ static struct ctl_table_header *mkdir_new_dir(struct ctl_table_header *parent,
{
dir->parent = parent;
header_refs_inc(dir);
- list_add_tail(&dir->ctl_entry, &parent->ctl_subdirs);
+ list_add_tail_rcu(&dir->ctl_entry, &parent->ctl_subdirs);
return dir;
}
@@ -2028,7 +2027,7 @@ struct ctl_table_header *__register_sysctl_paths(
failed_duplicate_check = sysctl_check_duplicates(header);
#endif
if (!failed_duplicate_check)
- list_add_tail(&header->ctl_entry, &header->parent->ctl_tables);
+ list_add_tail_rcu(&header->ctl_entry, &header->parent->ctl_tables);
sysctl_write_unlock_head(header->parent);
@@ -2081,7 +2080,7 @@ void unregister_sysctl_table(struct ctl_table_header *header)
spin_lock(&sysctl_lock);
if (!--header->header_refs) {
start_unregistering(header);
- list_del_init(&header->ctl_entry);
+ list_del_rcu(&header->ctl_entry);
if (!header->proc_inode_refs)
call_rcu(&header->rcu, free_head);
}
diff --git a/kernel/sysctl_check.c b/kernel/sysctl_check.c
index ccd39a3..03db5c5 100644
--- a/kernel/sysctl_check.c
+++ b/kernel/sysctl_check.c
@@ -1,5 +1,6 @@
#include <linux/sysctl.h>
#include <linux/string.h>
+#include <linux/rculist.h>
/*
* @path: the path to the offender
@@ -118,7 +119,7 @@ int sysctl_check_duplicates(struct ctl_table_header *header)
struct ctl_table_header *dir = header->parent;
struct ctl_table_header *h;
- list_for_each_entry(h, &dir->ctl_subdirs, ctl_entry) {
+ list_for_each_entry_rcu(h, &dir->ctl_subdirs, ctl_entry) {
if (IS_ERR(sysctl_fs_get(h)))
continue;
@@ -130,7 +131,7 @@ int sysctl_check_duplicates(struct ctl_table_header *header)
sysctl_fs_put(h);
}
- list_for_each_entry(h, &dir->ctl_tables, ctl_entry) {
+ list_for_each_entry_rcu(h, &dir->ctl_tables, ctl_entry) {
ctl_table *t;
if (IS_ERR(sysctl_fs_get(h)))
--
1.7.5.134.g1c08b
For anyone interested in testing these patches check them out from:
web: https://github.com/luciang/linux-2.6-new-sysctl
git: git://github.com/luciang/linux-2.6-new-sysctl.git
--
.
..: Lucian
Lucian Adrian Grijincu <[email protected]> writes:
After having read through some of your patches this looks like a good
overall direction. At a more detailed level there are a lot of pieces
to your patch series that appear to need some more attention.
> The first patches remove the .child field of ctl_table. This is a
> requirement for the new algorithm. These patches are scattered all
> over the tree :(
Only registering sysctl leaves seems very reasonable, the code has
been slowly moving in that direction for quite a while.
We need to make it a firm rule that directories are created before
they are used. (aka normal filesystem semantics) before we churn
through and change everything.
This also addresses a question you asked in patch 60.
> This can be fixed in two ways:
>
> - A) by making sure to never register a netns specific directory and
> after that register that directory as a common one. From what I can
> tell there isn't such a problem in the kernel at the moment, but I
> did not study the source in detail.
Currently it is a requirement that if you are going to have a netns
component and a non nents component the non-netns directory must
be created first. However that requirement is not currently enforced.
It is a bit too easy to get that wrong. So we need enforcement of that
rule and enforcement of duplicate checks. The sysctl_check code should
become mandatory at this point, or at least the duplicate checks.
Since we are touching most if not all of the sysctl registrations this
would also be a good time to pass a path string instead of the weird
ctl_path data structure. We needed ctl_path when we had both binary
and proc paths to worry about but we no longer have that concern.
We may also want to add a special sysctl directory creation and removal
operations.
> People interested in the core sysctl changes/networking should read:
>
> [PATCH 60/69] sysctl: faster tree-based sysctl implementation
>
> which introduces the new algorithm (commit message and comments have
> more details), and the next few patches which add some further (simple
> and effective) optimisations for networking (and not only).
Patch 60 does too many things all in one patch. It would be very good
if it could be split up some more, so it could be more easily verified.
Eric
On Sun, May 1, 2011 at 5:28 PM, Eric W. Biederman <[email protected]> wrote:
>> The first patches remove the .child field of ctl_table. This is a
>> requirement for the new algorithm. These patches are scattered all
>> over the tree :(
>
> Only registering sysctl leaves seems very reasonable, the code has
> been slowly moving in that direction for quite a while.
>
> We need to make it a firm rule that directories are created before
> they are used. (aka normal filesystem semantics) before we churn
> through and change everything.
I don't understand what you want to say.
Patch 60 makes sure that if a directory is not found a
ctl_table_header will be created for it. That directory can be freed
by anyone else when the reference count becomes 0.
E.g. register /newdir/file1, register /newdir/file2, unregister
/newdir/file1, unregister /newdir/file1
- 1st registration created 'newdir' and takes a reference to it.
- 2nd regisitration incs the reference count.
- 1st unregister decs the reference count.
- 2nd unregister decs the reference count which becomes 0 and frees 'newdir'.
I may have misunderstood your comment.
> This also addresses a question you asked in patch 60.
>
>> This can be fixed in two ways:
>>
>> - A) by making sure to never register a netns specific directory and
>> after that register that directory as a common one. From what I can
>> tell there isn't such a problem in the kernel at the moment, but I
>> did not study the source in detail.
>
> Currently it is a requirement that if you are going to have a netns
> component and a non nents component the non-netns directory must
> be created first. However that requirement is not currently enforced.
>
> It is a bit too easy to get that wrong. So we need enforcement of that
> rule and enforcement of duplicate checks. The sysctl_check code should
> become mandatory at this point, or at least the duplicate checks.
I'll post a modified check enforcing this.
> Since we are touching most if not all of the sysctl registrations this
> would also be a good time to pass a path string instead of the weird
> ctl_path data structure. We needed ctl_path when we had both binary
> and proc paths to worry about but we no longer have that concern.
I still find good use for it in the next patches (some optimisations).
Getting rid of it makes some things more difficult:
- I wouldn't like to parse strings into path components at registeration
- users of the register_sysctl_paths would need to create strings with
dynamic components (for example "net/conf/%s/" - where %s is a
netdevice-name or "kernel/sched_domain/%s/%s" with cpu-name and
domain-name).
> We may also want to add a special sysctl directory creation and removal
> operations.
That can be done very easily now: register an empty table. But I can
add something special for directories only if needed.
>> People interested in the core sysctl changes/networking should read:
>>
>> [PATCH 60/69] sysctl: faster tree-based sysctl implementation
>>
>> which introduces the new algorithm (commit message and comments have
>> more details), and the next few patches which add some further (simple
>> and effective) optimisations for networking (and not only).
>
> Patch 60 does too many things all in one patch. It would be very good
> if it could be split up some more, so it could be more easily verified.
Ok. I'll see what I can do.
--
.
..: Lucian
Lucian Adrian Grijincu <[email protected]> writes:
> On Sun, May 1, 2011 at 5:28 PM, Eric W. Biederman <[email protected]> wrote:
>>> The first patches remove the .child field of ctl_table. This is a
>>> requirement for the new algorithm. These patches are scattered all
>>> over the tree :(
>>
>> Only registering sysctl leaves seems very reasonable, the code has
>> been slowly moving in that direction for quite a while.
>>
>> We need to make it a firm rule that directories are created before
>> they are used. (aka normal filesystem semantics) before we churn
>> through and change everything.
>
>
> I don't understand what you want to say.
>
> Patch 60 makes sure that if a directory is not found a
> ctl_table_header will be created for it. That directory can be freed
> by anyone else when the reference count becomes 0.
> E.g. register /newdir/file1, register /newdir/file2, unregister
> /newdir/file1, unregister /newdir/file1
> - 1st registration created 'newdir' and takes a reference to it.
> - 2nd regisitration incs the reference count.
> - 1st unregister decs the reference count.
> - 2nd unregister decs the reference count which becomes 0 and frees 'newdir'.
>
> I may have misunderstood your comment.
While you can make directory lifetimes work by ref counting. Making it
work by ref counting removes a concept of ownership and makes it hard to
see when a sysctl directory should exist.
If we are reforming this we should go all of the way and make strict
lifetime rules of directories. So that it is required to do:
register newdir
register newdir/file1
unregister newdir/file1
unregister newdir
Today there is a WARN_ON in unregister_sysctl_table enforcing the
ordering that directories must come first and be removed after
the files in those directories. If we change the table registrations
to only include leaves (as your removal of .child patches do) I
believe that rule becomes absolute.
That WARN_ON came in with Al Viro's last major sweep through the code,
and it started moving sysctl in the direction of a normal filesystem.
For long term maintainability and comprehensibility I think moving
sysctl in the direction of a normal filesystem makes a lot of sense.
>> Since we are touching most if not all of the sysctl registrations this
>> would also be a good time to pass a path string instead of the weird
>> ctl_path data structure. We needed ctl_path when we had both binary
>> and proc paths to worry about but we no longer have that concern.
>
>
> I still find good use for it in the next patches (some optimisations).
> Getting rid of it makes some things more difficult:
> - I wouldn't like to parse strings into path components at registeration
I don't expect '/' being more difficult to deal with than an array. In
general I expect a single string to be more space efficient and easier
for human comprehension.
> - users of the register_sysctl_paths would need to create strings with
> dynamic components (for example "net/conf/%s/" - where %s is a
> netdevice-name or "kernel/sched_domain/%s/%s" with cpu-name and
> domain-name).
This is a good point.
In the normal proc implementation this is solved by being able to
pass the equivalent of a ctl_table_header into the registration
function, which allows the use of relative paths in the registration
function.
In the examples you have given relative paths should also work for
sysctl.
Using string paths is not a must have but in practice I think it
simplifies things quite a bit
Eric
On Mon, May 2, 2011 at 12:49 AM, Eric W. Biederman
<[email protected]> wrote:
>>> Since we are touching most if not all of the sysctl registrations this
>>> would also be a good time to pass a path string instead of the weird
>>> ctl_path data structure. We needed ctl_path when we had both binary
>>> and proc paths to worry about but we no longer have that concern.
>>
>>
>> I still find good use for it in the next patches (some optimisations).
>> Getting rid of it makes some things more difficult:
>> - I wouldn't like to parse strings into path components at registeration
>
> I don't expect '/' being more difficult to deal with than an array. In
> general I expect a single string to be more space efficient and easier
> for human comprehension.
We also use the string from ctl_path as a name for the sysctl
directory. We would need to either:
* strdup part of the string for each directory, remember to kfree
* replace '/' with '\0' in the given string (meaning it can't be put
in a read-only zone)
Also I make use of the ctl_path to add some optimisations that deal
with the case when there are very many known-to-be-uniquely-named
sub-directories like for /proc/sys/net/ipv4/conf/DEVICE. IXIACOM which
sponsored this work has usecases where they need to create 10^3..10^5
virtual network devices and these optimisations really add up for that
many interfaces.
For details about the optimisation see patches:
61/69 http://thread.gmane.org/gmane.linux.kernel/1133667/focus=1133694 and
62/69 http://thread.gmane.org/gmane.linux.kernel/1133667/focus=1133711
I will make another function that would take a string, parse it up,
create a ctl_path array and register it, but I'd really like to keep
ctl_path both in the implementation and as a means to register a
table.
>> - users of the register_sysctl_paths would need to create strings with
>> dynamic components (for example "net/conf/%s/" - where %s is a
>> netdevice-name or "kernel/sched_domain/%s/%s" with cpu-name and
>> domain-name).
>
> This is a good point.
>
> In the normal proc implementation this is solved by being able to
> pass the equivalent of a ctl_table_header into the registration
> function, which allows the use of relative paths in the registration
> function.
>
> In the examples you have given relative paths should also work for
> sysctl.
Hmm, I don't think we're on the same channel here. I don't understand
what you're trying to say
- normal proc implementation?
- the equivalent of a ctl_table_header?
- relative paths?
I was saying that if we are to *replace* the ctl_path based mechanism
with a string denoting the path, then some other registrants will need
to allocate memory for those strings because the paths they register
are computed at runtime. Then I gave two distinct examples where this
is done. In both of those cases, ctl_path saves us from allocating a
string before allocation, only to chop it then back to pieces in the
__register function.
--
.
..: Lucian
On Mon, May 2, 2011 at 12:49 AM, Eric W. Biederman
<[email protected]> wrote:
>> Patch 60 makes sure that if a directory is not found a
>> ctl_table_header will be created for it. That directory can be freed
>> by anyone else when the reference count becomes 0.
>> E.g. register /newdir/file1, register /newdir/file2, unregister
>> /newdir/file1, unregister /newdir/file1
>> - 1st registration created 'newdir' and takes a reference to it.
>> - 2nd regisitration incs the reference count.
>> - 1st unregister decs the reference count.
>> - 2nd unregister decs the reference count which becomes 0 and frees 'newdir'.
>>
>> I may have misunderstood your comment.
>
> While you can make directory lifetimes work by ref counting. Making it
> work by ref counting removes a concept of ownership and makes it hard to
> see when a sysctl directory should exist.
I can do what you want but I see no good reason to this restriction.
Entities that register sysctls are interested in their files showing
up there.
In my previous example if two modules want to add /newdir/file1 and
/newdir/file2 they must either:
- find a third party that will register /newdir for them
- make sure they have a strict dependency relation (only registere
/newdir/file2 if the first module is loaded and it registered
/newdir/file1 or at least /newdir)
This is why we register an empty directory for "/proc/sys/dev/" (a
third party that registers the 'dev' dir and all others use it).
Again, I can do what you ask, but it either means rethinking some of
my patches, or adding the restriction on top of them.
> If we are reforming this we should go all of the way and make strict
> lifetime rules of directories. So that it is required to do:
>
> register newdir
> register newdir/file1
> unregister newdir/file1
> unregister newdir
>
> Today there is a WARN_ON in unregister_sysctl_table enforcing the
> ordering that directories must come first and be removed after
> the files in those directories. If we change the table registrations
> to only include leaves (as your removal of .child patches do) I
> believe that rule becomes absolute.
> That WARN_ON came in with Al Viro's last major sweep through the code,
> and it started moving sysctl in the direction of a normal filesystem.
Regarding that: I'm now trying to reproduce a NULL access in
sysctl_is_seen that I got only once. It's coupled with a
"rcu_sched_state stall detected" warning from RCU.
> For long term maintainability and comprehensibility I think moving
> sysctl in the direction of a normal filesystem makes a lot of sense.
Ok, I agree that a later redesign of sysctl would suffer if there
would be a mess regarding registration/unregistration of directories
and no ownership information/enforcement. I'll see how this can be
addressed in the next respin of this series.
--
.
..: Lucian
On Mon, May 2, 2011 at 1:34 AM, Lucian Adrian Grijincu
<[email protected]> wrote:
> Regarding that: I'm now trying to reproduce a NULL access in
> sysctl_is_seen that I got only once. It's coupled with a
> "rcu_sched_state stall detected" warning from RCU.
Disregard that, I...
PEBKAC. I thought it was produced in a unpatched 2.6.39-rc5, but it
happened in one that I messed up.
--
.
..: Lucian
Lucian Adrian Grijincu <[email protected]> writes:
> On Mon, May 2, 2011 at 12:49 AM, Eric W. Biederman
> <[email protected]> wrote:
>>>> Since we are touching most if not all of the sysctl registrations this
>>>> would also be a good time to pass a path string instead of the weird
>>>> ctl_path data structure. We needed ctl_path when we had both binary
>>>> and proc paths to worry about but we no longer have that concern.
>>>
>>>
>>> I still find good use for it in the next patches (some optimisations).
>>> Getting rid of it makes some things more difficult:
>>> - I wouldn't like to parse strings into path components at registeration
>>
>> I don't expect '/' being more difficult to deal with than an array. In
>> general I expect a single string to be more space efficient and easier
>> for human comprehension.
>
>
> We also use the string from ctl_path as a name for the sysctl
> directory. We would need to either:
> * strdup part of the string for each directory, remember to kfree
> * replace '/' with '\0' in the given string (meaning it can't be put
> in a read-only zone)
If we are only registering leaves, we can just deal with the tail of the
path and point just past the final /. There should be no need to
duplicate anything.
> Also I make use of the ctl_path to add some optimisations that deal
> with the case when there are very many known-to-be-uniquely-named
> sub-directories like for /proc/sys/net/ipv4/conf/DEVICE. IXIACOM which
> sponsored this work has usecases where they need to create 10^3..10^5
> virtual network devices and these optimisations really add up for that
> many interfaces.
I am convinced the places where we have network devices in the path are
indeed the pain points for scaling.
My gut feel is that we should use a balanced binary tree instead of a
doubly linked list for the directories. The space cost of a tree
is just an extra color member instead of two pointers.
For 100000 entries your target of a binary tree should only be 17
entries tall. Maybe twice that for if the tree is an rbtree. 17 or
even 33 should be a small enough value log(N) to keep the cost from
being painful. And using a binary tree means fewer special cases
overall.
A binary tree is faster than your special case for lookup. Which means
it solves the case of actually using the sysctl entries as well as the
case for creating them.
Furthermore to we also need to change sysfs because it also has
directories that will contain all 100000 of the network devices,
and I don't expect simply skipping the duplicate check is going to
fly in sysfs.
We could do something besides a data structure without a logN
insert/remove/lookup cost complexity. But I think we need numbers
to show that won't scale. So far all we have are numbers that show
a linked list doesn't scale.
> For details about the optimisation see patches:
> 61/69 http://thread.gmane.org/gmane.linux.kernel/1133667/focus=1133694 and
> 62/69 http://thread.gmane.org/gmane.linux.kernel/1133667/focus=1133711
>
>
> I will make another function that would take a string, parse it up,
> create a ctl_path array and register it, but I'd really like to keep
> ctl_path both in the implementation and as a means to register a
> table.
Using a string path certainly isn't critical at this point. But so
far I don't see practical down sides.
>>> - users of the register_sysctl_paths would need to create strings with
>>> dynamic components (for example "net/conf/%s/" - where %s is a
>>> netdevice-name or "kernel/sched_domain/%s/%s" with cpu-name and
>>> domain-name).
>>
>> This is a good point.
>>
>> In the normal proc implementation this is solved by being able to
>> pass the equivalent of a ctl_table_header into the registration
>> function, which allows the use of relative paths in the registration
>> function.
>>
>> In the examples you have given relative paths should also work for
>> sysctl.
>
>
> Hmm, I don't think we're on the same channel here. I don't understand
> what you're trying to say
> - normal proc implementation?
> - the equivalent of a ctl_table_header?
> - relative paths?
I was looking at effectively other virtual filesystems that have
had similar problems and talking about other solutions used.
In particular I was referring to create_proc_entry. It takes
a path and an optional parent directory.
> I was saying that if we are to *replace* the ctl_path based mechanism
> with a string denoting the path, then some other registrants will need
> to allocate memory for those strings because the paths they register
> are computed at runtime. Then I gave two distinct examples where this
> is done. In both of those cases, ctl_path saves us from allocating a
> string before allocation, only to chop it then back to pieces in the
> __register function.
And I was saying if that string was treated as a relative path. We
could have:
struct ctl_table_header *register_sysctl_path(struct ctl_table_header *parent,
const char *path,
struct ctl_table *table);
The optional parent parameter would save us from the pain of having to
even place the sysctl entry in a ctl_path. __register_sysctl_paths
already has a very similar interface.
Eric
Lucian Adrian Grijincu <[email protected]> writes:
>> For long term maintainability and comprehensibility I think moving
>> sysctl in the direction of a normal filesystem makes a lot of sense.
>
> Ok, I agree that a later redesign of sysctl would suffer if there
> would be a mess regarding registration/unregistration of directories
> and no ownership information/enforcement. I'll see how this can be
> addressed in the next respin of this series.
Thanks.
Eric
On Mon, May 2, 2011 at 5:06 AM, Eric W. Biederman <[email protected]> wrote:
> In particular I was referring to create_proc_entry. It takes
> a path and an optional parent directory.
I did something like that in this patch series I sent a month ago
which was mostly ignored (except for a call from David Miller for
feedback on the patches):
http://thread.gmane.org/gmane.linux.kernel/1121889
I'll see how I can add the optional parent directory in this patch series.
Could you please take a look at a few patches in the above series?
The ones that have "cookie:" add a cookie pointer to the
ctl_table_header making it possible to skip duplicating ctl_table
arrays in a few places in the kernel. After finishing this series I'll
send one with the cookie..
--
.
..: Lucian
Lucian Adrian Grijincu <[email protected]> writes:
> On Mon, May 2, 2011 at 5:06 AM, Eric W. Biederman <[email protected]> wrote:
>> In particular I was referring to create_proc_entry. It takes
>> a path and an optional parent directory.
>
> I did something like that in this patch series I sent a month ago
> which was mostly ignored (except for a call from David Miller for
> feedback on the patches):
> http://thread.gmane.org/gmane.linux.kernel/1121889
>
> I'll see how I can add the optional parent directory in this patch series.
>
> Could you please take a look at a few patches in the above series?
>
> The ones that have "cookie:" add a cookie pointer to the
> ctl_table_header making it possible to skip duplicating ctl_table
> arrays in a few places in the kernel. After finishing this series I'll
> send one with the cookie..
The cookie changes seem particularly intrusive, and if I read your
patches properly the cookies are only useful for table sharing when
implementing network namespaces. At first glance those changes seem
pretty horrible.
Do you also have a lot of network namespaces in the workloads you care
about?
Eric
On Mon, May 2, 2011 at 9:02 PM, Eric W. Biederman <[email protected]> wrote:
> Do you also have a lot of network namespaces in the workloads you care
> about?
No, the usecase deals with high number of netdevices.
The cookie can be used in lots of places that kmemdup ctl_table arrays
and then set ->data to be the address of a member of a structure.
- netdevice config sysctls
- netns specific sysctls (e.g. net/somaxconn
- parport device specific sysctls
- cpu sched domain config sysctls
- et. al.
I sent another series sometime in February that added the cookie to
ipv4/6 conf sysctls.
I don't know why I did not include that in the patch series from April.
> The cookie changes seem particularly intrusive, and if I read your
> patches properly the cookies are only useful for table sharing when
> implementing network namespaces. At first glance those changes seem
> pretty horrible.
I know it's ugly. I was thinking about this today. I think I have a
cleaner solution, that I'll post after this patch series.
--
.
..: Lucian