2011-05-08 22:40:39

by Lucian Adrian Grijincu

[permalink] [raw]
Subject: [v2 000/115] faster tree-based sysctl implementation

This patch series introduces a faster/leaner sysctl internal implementation:

$ time modprobe dummy numdummies=N

Without this patch series :(
- ipv4 only
- N=1000 time= 0m 06s
- N=2000 time= 0m 30s
- N=4000 time= 2m 35s
- ipv4 and ipv6
- N=1000 time= 0m 24s
- N=2000 time= 2m 14s
- N=4000 time=10m 16s
- N=5000 time=16m 3s

With this patch series :)
- ipv4 only
- N=1000 time= 0m 0.33s
- N=2000 time= 0m 1.25s
- N=4000 time= 0m 5.31s
- ipv4 and ipv6
- N=1000 time= 0m 0.41s
- N=2000 time= 0m 1.62s
- N=4000 time= 0m 7.64s
- N=5000 time= 0m 12.35s
- N=8000 time= 0m 36.95s


Since v1 (http://thread.gmane.org/gmane.linux.kernel/1133667):
- rebased on top of 2.6.39-rc6
- split the patch that adds the new algorithm and data structures.
- fixed a few bugs lingering in the old code
- shrinked a reference counter
- added a new reference counter to maintain ownership information
- added method to register an empty sysctl dir and converted some users
- added checks enforcing the rule that a non-netns specific directory may
not be registered after a netns specific one has already been registered.
- added cookie support: register a piece of data with the header to be
used to make simple conversions on the ctl_table. This saves memory where
we need to register sysctl tables with the same content affecting
different pieces of data.
- enforced sysctl checks


Eric also asked for:
- registration based on strings, not the ctl_path version
-- I did not add this at the moment because of lack of time and,
if needed, this can be added any time later. The patch series
is long enogh.

- replacing the per-header list of subdirs with a rbtree.
-- Again, lack of time, and this can always be added at a later time
to optimize lookup and duplicate checks. At the moment this patch
series does not add a complexity regression over the previous
implementation, au contraire.


For anyone interested in testing these patches check them out from:

web: https://github.com/luciang/linux-2.6-new-sysctl
git: git://github.com/luciang/linux-2.6-new-sysctl.git


Cc: "Eric W . Biederman" <[email protected]>
Cc: Alexey Dobriyan <[email protected]>
Cc: Octavian Purdila <[email protected]>
Cc: "David S . Miller" <[email protected]>

Lucian Adrian Grijincu (115):
sysctl: remove .child from dev/parport/default
sysctl: parport: reorder .child assignments to simplify review
sysctl: remove .child from dev/parport/PORT/devices/DEVICE
sysctl: remove .child from dev/parport/PORT/
sysctl: remove .child from dev/parport/PORT/devices/
sysctl: remove .child from kernel/vsyscall64 (x86)
sysctl: remove .child from abi/vsyscall32 (x86)
sysctl: remove .child from crypto/fips_enabled
sysctl: remove .child from dev/cdrom/
sysctl: remove .child from dev/hpet/
sysctl: remove .child from dev/ipmi/
sysctl: remove .child from dev/rtc/
sysctl: remove .child from dev/mac_hid/
sysctl: remove .child from dev/raid/
sysctl: remove .child from xpc/
sysctl: remove .child from xpc/hb
sysctl: remove .child from kernel/sclp (s390)
sysctl: remove .child from dev/scsi
sysctl: remove .child from kernel/pty
sysctl: remove .child from coda/
sysctl: remove .child from fscache/
sysctl: remove .child from fs/nfs/ nlm_table table
sysctl: remove .child from fs/nfs/ nfs_cb_table
sysctl: remove .child from fs/ntfs-debug
sysctl: remove .child from fs/ocfs2/nm/
sysctl: remove .child from fs/quota/
sysctl: remove .child from fs/xfs/
sysctl: remove .child from kernel/ (ipc)
sysctl: remove .child from fs/mqueue
sysctl: sched: add sd_table_template
sysctl: remove .child from kernel/sched_domain/cpuX/domainY/
sysctl: remove .child from kernel/ (utsname)
sysctl: remove .child from sunrpc/
sysctl: remove .child from sunrpc/svc_rdma
sysctl: remove .child from sunrpc/ (xprtrdma)
sysctl: remove .child from sunrpc/ (xprtsock)
sysctl: remove .child from bus/isa/ (arm)
sysctl: remove .child from reboot/warm (arm)
sysctl: remove .child from lasat/ (mips)
sysctl: remove .child from appldata/ (s390)
sysctl: remove .child from s390dbf/
sysctl: remove .child from vm/ (s390)
sysctl: remove .child from kernel/perfmon/ (ia64)
sysctl: remove .child from kernel/ (ia64/kdump)
sysctl: remove .child from kernel/powersave-nap (powerpc)
sysctl: remove .child from pm/ (frv)
sysctl: remove .child from frv/
sysctl: remove .child from sh64/unaligned_fixup/
sysctl: delete unused register_sysctl_table function
sysctl: remove .child from ax25 table
sysctl: remove .child from net/ipv4/route and net/ipv4/neigh tables
sysctl: remove .child from net/ipv4/neigh table
sysctl: remove .child from net/ipv6/route, net/ipv6/icmp, net/ipv6
tables
sysctl: remove .child from net/llc tables
sysctl: call sysctl_init before the first sysctl registration
sysctl: no-child: manually register kernel/random
sysctl: no-child: manually register kernel/keys
sysctl: no-child: manually register fs/inotify
sysctl: no-child: manually register fs/epoll
sysctl: no-child: manually register root tables
sysctl: faster reimplementation of sysctl_check_table
sysctl: remove useless ctl_table->parent field
sysctl: simplify find_in_table
sysctl: sysctl_head_grab defaults to root header on NULL
sysctl: delete useless grab_header function
sysctl: rename ->used to ->ctl_use_refs
sysctl: rename sysctl_head_grab/finish to sysctl_use_header/unuse
sysctl: rename sysctl_head_next to sysctl_use_next_header
sysctl: split ->count into ctl_procfs_refs and ctl_header_refs
sysctl: rename sysctl_head_get/put to sysctl_proc_inode_get/put
sysctl: rename (un)use_table to __sysctl_(un)use_header
sysctl: simplify ->permissions hook
sysctl: group root-specific operations
sysctl: introduce ctl_table_group
sysctl: move removal from list out of start_unregistering
sysctl: faster tree-based sysctl implementation
sysctl: add duplicate entry and sanity ctl_table checks
sysctl: alloc ctl_table_header with kmem_cache
sysctl: single subheader path: optimisation for paths used only once
sysctl: single subheader path: net/ipv4/conf/DEVICE-NAME/
sysctl: single subheader path: net/{ipv4|ipv6}/neigh/DEV/
sysctl: single subheader path: net/ipv6/conf/DEVICE-NAME/
sysctl: single subheader path: dev/parport/PORT/devices/DEVICE/
sysctl: single subheader path: net/ax25/DEVICE
sysctl: single subheader path: kernel/sched_domain/CPU/DOMAIN/
sysctl: single subheader path: net/decnet/conf/DEVNAME
sysctl: check netns-specific registration order respected
RFC: sysctl: convert read-write lock to RCU
RFC: sysctl: change type of ctl_procfs_refs to u8
sysctl: warn if registration/unregistration order is not respected
sysctl: add register_sysctl_dir: register an empty sysctl directory
sysctl: sched: create empty dir with register_sysctl_dir
sysctl: ax25: create empty dir with register_sysctl_dir
sysctl: net/core: create empty dir with register_sysctl_dir
sysctl: net/ipv4/neigh: create empty dir with register_sysctl_dir
sysctl: net/ipv6/neigh: create empty dir with register_sysctl_dir
sysctl: add ctl_cookie
sysctl: add cookie to __register_sysctl_paths
sysctl: add register_net_sysctl_table_net_cookie
sysctl: cookie: share ip4_frags_ns_ctl_table between nets
sysctl: cookie: share netns_core_table between nets
sysctl: cookie: share ipv4_net_table between nets
sysctl: cookie: share ip6_frags_ns_ctl_table between nets
sysctl: cookie: share ipv6_route_table/ipv6_icmp_table between nets
sysctl: cookie: share ipv6_bindv6only_table between nets
sysctl: cookie: share acct_sysctl_table table between nets
sysctl: cookie: share event_sysctl_table between nets
net: split nf_ct_sysctl_table
sysctl: cookie: share nf_ct_sysctl_table between nets
sysctl: cookie: share unix_table between nets
sysctl: cookie: share xfrm_table between nets
sysctl: cookie: add register_net_sysctl_table_custom_cookie
sysctl: cookie: share devinet tables between network devices
sysctl: cookie: share addrconf tables between network devices
RFC: sysctl: always perform sysctl checks

arch/arm/kernel/isa.c | 31 +-
arch/arm/mach-bcmring/arch.c | 25 +-
arch/frv/kernel/pm.c | 10 +-
arch/frv/kernel/sysctl.c | 12 +-
arch/ia64/kernel/crash.c | 13 +-
arch/ia64/kernel/perfmon.c | 23 +-
arch/mips/lasat/sysctl.c | 13 +-
arch/powerpc/kernel/idle.c | 13 +-
arch/s390/appldata/appldata_base.c | 42 +-
arch/s390/kernel/debug.c | 13 +-
arch/s390/mm/cmm.c | 11 +-
arch/sh/kernel/traps_64.c | 21 +-
arch/x86/kernel/vsyscall_64.c | 25 +-
arch/x86/vdso/vdso32-setup.c | 14 +-
crypto/proc.c | 12 +-
drivers/cdrom/cdrom.c | 22 +-
drivers/char/hpet.c | 38 +-
drivers/char/ipmi/ipmi_poweroff.c | 16 +-
drivers/char/random.c | 27 +-
drivers/char/rtc.c | 24 +-
drivers/macintosh/mac_hid.c | 26 +-
drivers/md/md.c | 22 +-
drivers/misc/sgi-xp/xpc_main.c | 81 ++--
drivers/parport/procfs.c | 231 ++++-----
drivers/s390/char/sclp_async.c | 13 +-
drivers/scsi/scsi_sysctl.c | 28 +-
drivers/tty/pty.c | 23 +-
fs/coda/sysctl.c | 12 +-
fs/eventpoll.c | 22 +-
fs/fscache/main.c | 15 +-
fs/lockd/svc.c | 22 +-
fs/nfs/sysctl.c | 22 +-
fs/notify/inotify/inotify_user.c | 22 +-
fs/ntfs/sysctl.c | 15 +-
fs/ocfs2/stackglue.c | 36 +-
fs/proc/inode.c | 2 +-
fs/proc/proc_sysctl.c | 217 +++++---
fs/quota/dquot.c | 21 +-
fs/xfs/linux-2.6/xfs_sysctl.c | 22 +-
include/linux/inetdevice.h | 6 +-
include/linux/inotify.h | 2 -
include/linux/ipv6.h | 6 +-
include/linux/key.h | 4 +-
include/linux/poll.h | 2 -
include/linux/sysctl.h | 227 ++++++---
include/net/ax25.h | 10 +-
include/net/ipv6.h | 8 +-
include/net/net_namespace.h | 7 +-
include/net/netns/conntrack.h | 1 +
include/net/netns/ipv6.h | 4 +-
init/main.c | 1 +
ipc/ipc_sysctl.c | 12 +-
ipc/mq_sysctl.c | 24 +-
kernel/Makefile | 5 +-
kernel/sched.c | 389 +++++++++----
kernel/sysctl.c | 920 ++++++++++++++++++++-----------
kernel/sysctl_check.c | 322 +++++++-----
kernel/utsname_sysctl.c | 14 +-
lib/Kconfig.debug | 8 -
net/ax25/af_ax25.c | 22 +-
net/ax25/ax25_dev.c | 10 +-
net/ax25/sysctl_net_ax25.c | 82 +--
net/core/neighbour.c | 8 +-
net/core/sysctl_net_core.c | 33 +-
net/decnet/dn_dev.c | 8 +-
net/ipv4/devinet.c | 154 +++---
net/ipv4/ip_fragment.c | 28 +-
net/ipv4/route.c | 17 +-
net/ipv4/sysctl_net_ipv4.c | 40 +--
net/ipv6/addrconf.c | 506 +++++++++---------
net/ipv6/icmp.c | 18 +-
net/ipv6/reassembly.c | 34 +-
net/ipv6/route.c | 36 +-
net/ipv6/sysctl_net_ipv6.c | 118 ++---
net/llc/sysctl_net_llc.c | 55 +-
net/netfilter/nf_conntrack_acct.c | 24 +-
net/netfilter/nf_conntrack_ecache.c | 26 +-
net/netfilter/nf_conntrack_standalone.c | 52 +-
net/sunrpc/sysctl.c | 19 +-
net/sunrpc/xprtrdma/svc_rdma.c | 26 +-
net/sunrpc/xprtrdma/transport.c | 14 +-
net/sunrpc/xprtsock.c | 16 +-
net/sysctl_net.c | 95 ++--
net/unix/sysctl_net_unix.c | 23 +-
net/xfrm/xfrm_sysctl.c | 29 +-
security/keys/key.c | 1 +
security/keys/sysctl.c | 18 +-
87 files changed, 2436 insertions(+), 2305 deletions(-)

--
1.7.5.134.g1c08b


2011-05-08 22:40:30

by Lucian Adrian Grijincu

[permalink] [raw]
Subject: [v2 001/115] sysctl: remove .child from dev/parport/default

First patch in a series that will end with a rewrite of sysctl. The
new implementation needs to get rid of the .child field of ctl_table.

Same functionality, but a little more clarity.

MAINTAINERS says parport is "Orphan" and I don't have a parallel
port. I minimally tested this patch, but I don't know who to resort to
for an ACK.

Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
drivers/parport/procfs.c | 96 +++++++++++++++++++---------------------------
1 files changed, 40 insertions(+), 56 deletions(-)

diff --git a/drivers/parport/procfs.c b/drivers/parport/procfs.c
index 3f56bc0..89b8b71 100644
--- a/drivers/parport/procfs.c
+++ b/drivers/parport/procfs.c
@@ -419,56 +419,6 @@ parport_device_sysctl_template = {
}
};

-struct parport_default_sysctl_table
-{
- struct ctl_table_header *sysctl_header;
- ctl_table vars[3];
- ctl_table default_dir[2];
- ctl_table parport_dir[2];
- ctl_table dev_dir[2];
-};
-
-static struct parport_default_sysctl_table
-parport_default_sysctl_table = {
- .sysctl_header = NULL,
- {
- {
- .procname = "timeslice",
- .data = &parport_default_timeslice,
- .maxlen = sizeof(parport_default_timeslice),
- .mode = 0644,
- .proc_handler = proc_doulongvec_ms_jiffies_minmax,
- .extra1 = (void*) &parport_min_timeslice_value,
- .extra2 = (void*) &parport_max_timeslice_value
- },
- {
- .procname = "spintime",
- .data = &parport_default_spintime,
- .maxlen = sizeof(parport_default_spintime),
- .mode = 0644,
- .proc_handler = proc_dointvec_minmax,
- .extra1 = (void*) &parport_min_spintime_value,
- .extra2 = (void*) &parport_max_spintime_value
- },
- {}
- },
- {
- {
- .procname = "default",
- .mode = 0555,
- .child = parport_default_sysctl_table.vars
- },
- {}
- },
- {
- PARPORT_PARPORT_DIR(parport_default_sysctl_table.default_dir),
- {}
- },
- {
- PARPORT_DEV_DIR(parport_default_sysctl_table.parport_dir),
- {}
- }
-};


int parport_proc_register(struct parport *port)
@@ -558,19 +508,53 @@ int parport_device_proc_unregister(struct pardevice *device)
return 0;
}

+
+static struct ctl_table_header *parport_default_sysctl_header;
+
+static struct ctl_table parport_default_sysctl_table[] = {
+ {
+ .procname = "timeslice",
+ .data = &parport_default_timeslice,
+ .maxlen = sizeof(parport_default_timeslice),
+ .mode = 0644,
+ .proc_handler = proc_doulongvec_ms_jiffies_minmax,
+ .extra1 = (void*) &parport_min_timeslice_value,
+ .extra2 = (void*) &parport_max_timeslice_value
+ },
+ {
+ .procname = "spintime",
+ .data = &parport_default_spintime,
+ .maxlen = sizeof(parport_default_spintime),
+ .mode = 0644,
+ .proc_handler = proc_dointvec_minmax,
+ .extra1 = (void*) &parport_min_spintime_value,
+ .extra2 = (void*) &parport_max_spintime_value
+ },
+ { },
+};
+
+static const __initdata struct ctl_path parport_default_path[] = {
+ { .procname = "dev" },
+ { .procname = "parport" },
+ { .procname = "default" },
+ { },
+};
+
static int __init parport_default_proc_register(void)
{
- parport_default_sysctl_table.sysctl_header =
- register_sysctl_table(parport_default_sysctl_table.dev_dir);
+ parport_default_sysctl_header =
+ register_sysctl_paths(parport_default_path,
+ parport_default_sysctl_table);
+ /* XXX: if this fails then we can't access the sysctl tables for
+ * /proc/sys/dev/parport/default/. Should the module fail to load? */
return 0;
}

static void __exit parport_default_proc_unregister(void)
{
- if (parport_default_sysctl_table.sysctl_header) {
- unregister_sysctl_table(parport_default_sysctl_table.
- sysctl_header);
- parport_default_sysctl_table.sysctl_header = NULL;
+ if (parport_default_sysctl_header) {
+ unregister_sysctl_table(parport_default_sysctl_header);
+ parport_default_sysctl_header = NULL;
}
}

--
1.7.5.134.g1c08b

2011-05-08 23:07:25

by Lucian Adrian Grijincu

[permalink] [raw]
Subject: [v2 002/115] sysctl: parport: reorder .child assignments to simplify review

Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
drivers/parport/procfs.c | 14 ++++++++------
1 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/drivers/parport/procfs.c b/drivers/parport/procfs.c
index 89b8b71..edeb012 100644
--- a/drivers/parport/procfs.c
+++ b/drivers/parport/procfs.c
@@ -437,16 +437,17 @@ int parport_proc_register(struct parport *port)
t->vars[i].extra1 = port;

t->vars[0].data = &port->spintime;
- t->vars[5].child = t->device_dir;

for (i = 0; i < 5; i++)
t->vars[6 + i].extra2 = &port->probe_info[i];

t->port_dir[0].procname = port->name;

- t->port_dir[0].child = t->vars;
- t->parport_dir[0].child = t->port_dir;
t->dev_dir[0].child = t->parport_dir;
+ t->parport_dir[0].child = t->port_dir;
+ t->port_dir[0].child = t->vars;
+ t->vars[5].child = t->device_dir;
+ /* vars[5] = PARPORT_DEVICES_ROOT_DIR => .procname = 'devices' */

t->sysctl_header = register_sysctl_table(t->dev_dir);
if (t->sysctl_header == NULL) {
@@ -478,14 +479,15 @@ int parport_device_proc_register(struct pardevice *device)
return -ENOMEM;
memcpy(t, &parport_device_sysctl_template, sizeof(*t));

+ t->port_dir[0].procname = port->name;
+ t->device_dir[0].procname = device->name;
+
t->dev_dir[0].child = t->parport_dir;
t->parport_dir[0].child = t->port_dir;
- t->port_dir[0].procname = port->name;
t->port_dir[0].child = t->devices_root_dir;
t->devices_root_dir[0].child = t->device_dir;
-
- t->device_dir[0].procname = device->name;
t->device_dir[0].child = t->vars;
+
t->vars[0].data = &device->timeslice;

t->sysctl_header = register_sysctl_table(t->dev_dir);
--
1.7.5.134.g1c08b

2011-05-08 22:40:36

by Lucian Adrian Grijincu

[permalink] [raw]
Subject: [v2 003/115] sysctl: remove .child from dev/parport/PORT/devices/DEVICE

MAINTAINERS says parport is "Orphan" and I don't have a parallel
port => I cannot test that this patch works.

Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
drivers/parport/procfs.c | 56 ++++++++++------------------------------------
1 files changed, 12 insertions(+), 44 deletions(-)

diff --git a/drivers/parport/procfs.c b/drivers/parport/procfs.c
index edeb012..350233e 100644
--- a/drivers/parport/procfs.c
+++ b/drivers/parport/procfs.c
@@ -370,17 +370,11 @@ struct parport_device_sysctl_table
{
struct ctl_table_header *sysctl_header;
ctl_table vars[2];
- ctl_table device_dir[2];
- ctl_table devices_root_dir[2];
- ctl_table port_dir[2];
- ctl_table parport_dir[2];
- ctl_table dev_dir[2];
};

static const struct parport_device_sysctl_table
parport_device_sysctl_template = {
- .sysctl_header = NULL,
- {
+ .vars = {
{
.procname = "timeslice",
.data = NULL,
@@ -391,32 +385,6 @@ parport_device_sysctl_template = {
.extra2 = (void*) &parport_max_timeslice_value
},
},
- {
- {
- .procname = NULL,
- .data = NULL,
- .maxlen = 0,
- .mode = 0555,
- .child = NULL
- },
- {}
- },
- {
- PARPORT_DEVICES_ROOT_DIR,
- {}
- },
- {
- PARPORT_PORT_DIR(NULL),
- {}
- },
- {
- PARPORT_PARPORT_DIR(NULL),
- {}
- },
- {
- PARPORT_DEV_DIR(NULL),
- {}
- }
};


@@ -473,24 +441,24 @@ int parport_device_proc_register(struct pardevice *device)
{
struct parport_device_sysctl_table *t;
struct parport * port = device->port;
-
+ struct ctl_path parport_devices_port_path[] = {
+ { .procname = "dev" },
+ { .procname = "parport" },
+ { .procname = port->name },
+ { .procname = "devices" },
+ { .procname = device->name },
+ { },
+ };
+
t = kmalloc(sizeof(*t), GFP_KERNEL);
if (t == NULL)
return -ENOMEM;
memcpy(t, &parport_device_sysctl_template, sizeof(*t));

- t->port_dir[0].procname = port->name;
- t->device_dir[0].procname = device->name;
-
- t->dev_dir[0].child = t->parport_dir;
- t->parport_dir[0].child = t->port_dir;
- t->port_dir[0].child = t->devices_root_dir;
- t->devices_root_dir[0].child = t->device_dir;
- t->device_dir[0].child = t->vars;
-
t->vars[0].data = &device->timeslice;

- t->sysctl_header = register_sysctl_table(t->dev_dir);
+ t->sysctl_header = register_sysctl_paths(parport_devices_port_path,
+ t->vars);
if (t->sysctl_header == NULL) {
kfree(t);
t = NULL;
--
1.7.5.134.g1c08b

2011-05-08 23:06:58

by Lucian Adrian Grijincu

[permalink] [raw]
Subject: [v2 004/115] sysctl: remove .child from dev/parport/PORT/

Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
drivers/parport/procfs.c | 48 ++++++++++++++-------------------------------
1 files changed, 15 insertions(+), 33 deletions(-)

diff --git a/drivers/parport/procfs.c b/drivers/parport/procfs.c
index 350233e..e55b9b6 100644
--- a/drivers/parport/procfs.c
+++ b/drivers/parport/procfs.c
@@ -233,13 +233,6 @@ static int do_hardware_modes (ctl_table *table, int write,
return copy_to_user(result, buffer, len) ? -EFAULT : 0;
}

-#define PARPORT_PORT_DIR(CHILD) { .procname = NULL, .mode = 0555, .child = CHILD }
-#define PARPORT_PARPORT_DIR(CHILD) { .procname = "parport", \
- .mode = 0555, .child = CHILD }
-#define PARPORT_DEV_DIR(CHILD) { .procname = "dev", .mode = 0555, .child = CHILD }
-#define PARPORT_DEVICES_ROOT_DIR { .procname = "devices", \
- .mode = 0555, .child = NULL }
-
static const unsigned long parport_min_timeslice_value =
PARPORT_MIN_TIMESLICE_VALUE;

@@ -257,14 +250,10 @@ struct parport_sysctl_table {
struct ctl_table_header *sysctl_header;
ctl_table vars[12];
ctl_table device_dir[2];
- ctl_table port_dir[2];
- ctl_table parport_dir[2];
- ctl_table dev_dir[2];
};

static const struct parport_sysctl_table parport_sysctl_template = {
- .sysctl_header = NULL,
- {
+ .vars = {
{
.procname = "spintime",
.data = NULL,
@@ -302,7 +291,11 @@ static const struct parport_sysctl_table parport_sysctl_template = {
.mode = 0444,
.proc_handler = do_hardware_modes
},
- PARPORT_DEVICES_ROOT_DIR,
+ {
+ .procname = "devices",
+ .mode = 0555,
+ .child = NULL, /* child will point to .device_dir */
+ },
#ifdef CONFIG_PARPORT_1284
{
.procname = "autoprobe",
@@ -342,7 +335,7 @@ static const struct parport_sysctl_table parport_sysctl_template = {
#endif /* IEEE 1284 support */
{}
},
- {
+ .device_dir = {
{
.procname = "active",
.data = NULL,
@@ -352,18 +345,6 @@ static const struct parport_sysctl_table parport_sysctl_template = {
},
{}
},
- {
- PARPORT_PORT_DIR(NULL),
- {}
- },
- {
- PARPORT_PARPORT_DIR(NULL),
- {}
- },
- {
- PARPORT_DEV_DIR(NULL),
- {}
- }
};

struct parport_device_sysctl_table
@@ -391,6 +372,12 @@ parport_device_sysctl_template = {

int parport_proc_register(struct parport *port)
{
+ struct ctl_path parport_port_path[] = {
+ { .procname = "dev" },
+ { .procname = "parport" },
+ { .procname = port->name },
+ { },
+ };
struct parport_sysctl_table *t;
int i;

@@ -409,15 +396,10 @@ int parport_proc_register(struct parport *port)
for (i = 0; i < 5; i++)
t->vars[6 + i].extra2 = &port->probe_info[i];

- t->port_dir[0].procname = port->name;
-
- t->dev_dir[0].child = t->parport_dir;
- t->parport_dir[0].child = t->port_dir;
- t->port_dir[0].child = t->vars;
t->vars[5].child = t->device_dir;
- /* vars[5] = PARPORT_DEVICES_ROOT_DIR => .procname = 'devices' */
+ /* vars[5].procname is the 'devices' dir entry */

- t->sysctl_header = register_sysctl_table(t->dev_dir);
+ t->sysctl_header = register_sysctl_paths(parport_port_path, t->vars);
if (t->sysctl_header == NULL) {
kfree(t);
t = NULL;
--
1.7.5.134.g1c08b

2011-05-08 22:40:42

by Lucian Adrian Grijincu

[permalink] [raw]
Subject: [v2 005/115] sysctl: remove .child from dev/parport/PORT/devices/

Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
drivers/parport/procfs.c | 42 ++++++++++++++++++++++++++++--------------
1 files changed, 28 insertions(+), 14 deletions(-)

diff --git a/drivers/parport/procfs.c b/drivers/parport/procfs.c
index e55b9b6..3bb5bed 100644
--- a/drivers/parport/procfs.c
+++ b/drivers/parport/procfs.c
@@ -248,6 +248,7 @@ PARPORT_MAX_SPINTIME_VALUE;

struct parport_sysctl_table {
struct ctl_table_header *sysctl_header;
+ struct ctl_table_header *devices_sysctl_header;
ctl_table vars[12];
ctl_table device_dir[2];
};
@@ -291,11 +292,6 @@ static const struct parport_sysctl_table parport_sysctl_template = {
.mode = 0444,
.proc_handler = do_hardware_modes
},
- {
- .procname = "devices",
- .mode = 0555,
- .child = NULL, /* child will point to .device_dir */
- },
#ifdef CONFIG_PARPORT_1284
{
.procname = "autoprobe",
@@ -378,6 +374,14 @@ int parport_proc_register(struct parport *port)
{ .procname = port->name },
{ },
};
+ struct ctl_path parport_port_devices_path[] = {
+ { .procname = "dev" },
+ { .procname = "parport" },
+ { .procname = port->name },
+ { .procname = "devices" },
+ { },
+ };
+
struct parport_sysctl_table *t;
int i;

@@ -392,20 +396,29 @@ int parport_proc_register(struct parport *port)
t->vars[i].extra1 = port;

t->vars[0].data = &port->spintime;
-
- for (i = 0; i < 5; i++)
- t->vars[6 + i].extra2 = &port->probe_info[i];

- t->vars[5].child = t->device_dir;
- /* vars[5].procname is the 'devices' dir entry */
+#ifdef CONFIG_PARPORT_1284
+ for (i = 0; i < 5; i++)
+ t->vars[5 + i].extra2 = &port->probe_info[i];
+#endif /* CONFIG_PARPORT_1284 */

t->sysctl_header = register_sysctl_paths(parport_port_path, t->vars);
- if (t->sysctl_header == NULL) {
- kfree(t);
- t = NULL;
- }
+ if (t->sysctl_header == NULL)
+ goto fail_register_port;
+
+ t->devices_sysctl_header = register_sysctl_paths(parport_port_devices_path,
+ t->device_dir);
+ if (t->devices_sysctl_header == NULL)
+ goto fail_register_devices;
port->sysctl_table = t;
return 0;
+
+fail_register_devices:
+ unregister_sysctl_table(t->sysctl_header);
+fail_register_port:
+ kfree(t);
+
+ return -ENOMEM;
}

int parport_proc_unregister(struct parport *port)
@@ -413,6 +426,7 @@ int parport_proc_unregister(struct parport *port)
if (port->sysctl_table) {
struct parport_sysctl_table *t = port->sysctl_table;
port->sysctl_table = NULL;
+ unregister_sysctl_table(t->devices_sysctl_header);
unregister_sysctl_table(t->sysctl_header);
kfree(t);
}
--
1.7.5.134.g1c08b

2011-05-08 23:06:35

by Lucian Adrian Grijincu

[permalink] [raw]
Subject: [v2 006/115] sysctl: remove .child from kernel/vsyscall64 (x86)

Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
arch/x86/kernel/vsyscall_64.c | 25 ++++++++++++++-----------
1 files changed, 14 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kernel/vsyscall_64.c b/arch/x86/kernel/vsyscall_64.c
index dcbb28c..7d8b83d 100644
--- a/arch/x86/kernel/vsyscall_64.c
+++ b/arch/x86/kernel/vsyscall_64.c
@@ -234,18 +234,21 @@ static long __vsyscall(3) venosys_1(void)
}

#ifdef CONFIG_SYSCTL
-static ctl_table kernel_table2[] = {
- { .procname = "vsyscall64",
- .data = &vsyscall_gtod_data.sysctl_enabled, .maxlen = sizeof(int),
- .mode = 0644,
- .proc_handler = proc_dointvec },
- {}
+static ctl_table vsyscall64_table[] = {
+ {
+ .procname = "vsyscall64",
+ .data = &vsyscall_gtod_data.sysctl_enabled,
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec,
+ },
+ { }
};

-static ctl_table kernel_root_table2[] = {
- { .procname = "kernel", .mode = 0555,
- .child = kernel_table2 },
- {}
+
+static struct ctl_path kernel_root_path[] = {
+ { .procname = "kernel" },
+ { }
};
#endif

@@ -303,7 +306,7 @@ static int __init vsyscall_init(void)
BUG_ON((VSYSCALL_ADDR(0) != __fix_to_virt(VSYSCALL_FIRST_PAGE)));
BUG_ON((unsigned long) &vgetcpu != VSYSCALL_ADDR(__NR_vgetcpu));
#ifdef CONFIG_SYSCTL
- register_sysctl_table(kernel_root_table2);
+ register_sysctl_paths(kernel_root_path, vsyscall64_table);
#endif
on_each_cpu(cpu_vsyscall_init, NULL, 1);
/* notifier priority > KVM */
--
1.7.5.134.g1c08b

2011-05-08 23:06:13

by Lucian Adrian Grijincu

[permalink] [raw]
Subject: [v2 007/115] sysctl: remove .child from abi/vsyscall32 (x86)

Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
arch/x86/vdso/vdso32-setup.c | 14 +++++---------
1 files changed, 5 insertions(+), 9 deletions(-)

diff --git a/arch/x86/vdso/vdso32-setup.c b/arch/x86/vdso/vdso32-setup.c
index 468d591..e6ef3b4 100644
--- a/arch/x86/vdso/vdso32-setup.c
+++ b/arch/x86/vdso/vdso32-setup.c
@@ -380,7 +380,7 @@ subsys_initcall(sysenter_setup);
/* Register vsyscall32 into the ABI table */
#include <linux/sysctl.h>

-static ctl_table abi_table2[] = {
+static ctl_table abi_table[] = {
{
.procname = "vsyscall32",
.data = &sysctl_vsyscall32,
@@ -391,18 +391,14 @@ static ctl_table abi_table2[] = {
{}
};

-static ctl_table abi_root_table2[] = {
- {
- .procname = "abi",
- .mode = 0555,
- .child = abi_table2
- },
- {}
+static const struct ctl_path abi_root_path[] = {
+ { .procname = "abi" },
+ { }
};

static __init int ia32_binfmt_init(void)
{
- register_sysctl_table(abi_root_table2);
+ register_sysctl_paths(abi_root_path, abi_table);
return 0;
}
__initcall(ia32_binfmt_init);
--
1.7.5.134.g1c08b

2011-05-08 23:05:45

by Lucian Adrian Grijincu

[permalink] [raw]
Subject: [v2 008/115] sysctl: remove .child from crypto/fips_enabled

Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
crypto/proc.c | 12 ++++--------
1 files changed, 4 insertions(+), 8 deletions(-)

diff --git a/crypto/proc.c b/crypto/proc.c
index 58fef67..2ef248b 100644
--- a/crypto/proc.c
+++ b/crypto/proc.c
@@ -34,20 +34,16 @@ static struct ctl_table crypto_sysctl_table[] = {
{}
};

-static struct ctl_table crypto_dir_table[] = {
- {
- .procname = "crypto",
- .mode = 0555,
- .child = crypto_sysctl_table
- },
- {}
+static const struct ctl_path crypto_root_path[] = {
+ { .procname = "crypto" },
+ { }
};

static struct ctl_table_header *crypto_sysctls;

static void crypto_proc_fips_init(void)
{
- crypto_sysctls = register_sysctl_table(crypto_dir_table);
+ crypto_sysctls = register_sysctl_paths(crypto_root_path, crypto_sysctl_table);
}

static void crypto_proc_fips_exit(void)
--
1.7.5.134.g1c08b

2011-05-08 22:40:46

by Lucian Adrian Grijincu

[permalink] [raw]
Subject: [v2 009/115] sysctl: remove .child from dev/cdrom/

Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
drivers/cdrom/cdrom.c | 22 ++++------------------
1 files changed, 4 insertions(+), 18 deletions(-)

diff --git a/drivers/cdrom/cdrom.c b/drivers/cdrom/cdrom.c
index 514dd8e..9560789 100644
--- a/drivers/cdrom/cdrom.c
+++ b/drivers/cdrom/cdrom.c
@@ -3654,26 +3654,12 @@ static ctl_table cdrom_table[] = {
{ }
};

-static ctl_table cdrom_cdrom_table[] = {
- {
- .procname = "cdrom",
- .maxlen = 0,
- .mode = 0555,
- .child = cdrom_table,
- },
+static const struct ctl_path cdrom_root_path[] = {
+ { .procname = "dev" },
+ { .procname = "cdrom" },
{ }
};

-/* Make sure that /proc/sys/dev is there */
-static ctl_table cdrom_root_table[] = {
- {
- .procname = "dev",
- .maxlen = 0,
- .mode = 0555,
- .child = cdrom_cdrom_table,
- },
- { }
-};
static struct ctl_table_header *cdrom_sysctl_header;

static void cdrom_sysctl_register(void)
@@ -3683,7 +3669,7 @@ static void cdrom_sysctl_register(void)
if (initialized == 1)
return;

- cdrom_sysctl_header = register_sysctl_table(cdrom_root_table);
+ cdrom_sysctl_header = register_sysctl_paths(cdrom_root_path, cdrom_table);

/* set the defaults */
cdrom_sysctl_settings.autoclose = autoclose;
--
1.7.5.134.g1c08b

2011-05-08 23:05:19

by Lucian Adrian Grijincu

[permalink] [raw]
Subject: [v2 010/115] sysctl: remove .child from dev/hpet/

Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
drivers/char/hpet.c | 38 ++++++++++++--------------------------
1 files changed, 12 insertions(+), 26 deletions(-)

diff --git a/drivers/char/hpet.c b/drivers/char/hpet.c
index 7066e80..303de7e 100644
--- a/drivers/char/hpet.c
+++ b/drivers/char/hpet.c
@@ -721,33 +721,19 @@ static int hpet_is_known(struct hpet_data *hdp)

static ctl_table hpet_table[] = {
{
- .procname = "max-user-freq",
- .data = &hpet_max_freq,
- .maxlen = sizeof(int),
- .mode = 0644,
- .proc_handler = proc_dointvec,
- },
- {}
+ .procname = "max-user-freq",
+ .data = &hpet_max_freq,
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec,
+ },
+ { }
};

-static ctl_table hpet_root[] = {
- {
- .procname = "hpet",
- .maxlen = 0,
- .mode = 0555,
- .child = hpet_table,
- },
- {}
-};
-
-static ctl_table dev_root[] = {
- {
- .procname = "dev",
- .maxlen = 0,
- .mode = 0555,
- .child = hpet_root,
- },
- {}
+static const struct ctl_path hpet_path[] = {
+ { .procname = "dev" },
+ { .procname = "hpet" },
+ { }
};

static struct ctl_table_header *sysctl_header;
@@ -1053,7 +1039,7 @@ static int __init hpet_init(void)
if (result < 0)
return -ENODEV;

- sysctl_header = register_sysctl_table(dev_root);
+ sysctl_header = register_sysctl_paths(hpet_path, hpet_table);

result = acpi_bus_register_driver(&hpet_acpi_driver);
if (result < 0) {
--
1.7.5.134.g1c08b

2011-05-08 23:04:27

by Lucian Adrian Grijincu

[permalink] [raw]
Subject: [v2 011/115] sysctl: remove .child from dev/ipmi/

Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
drivers/char/ipmi/ipmi_poweroff.c | 16 ++++------------
1 files changed, 4 insertions(+), 12 deletions(-)

diff --git a/drivers/char/ipmi/ipmi_poweroff.c b/drivers/char/ipmi/ipmi_poweroff.c
index 2efa176..ac71d69 100644
--- a/drivers/char/ipmi/ipmi_poweroff.c
+++ b/drivers/char/ipmi/ipmi_poweroff.c
@@ -668,17 +668,9 @@ static ctl_table ipmi_table[] = {
{ }
};

-static ctl_table ipmi_dir_table[] = {
- { .procname = "ipmi",
- .mode = 0555,
- .child = ipmi_table },
- { }
-};
-
-static ctl_table ipmi_root_table[] = {
- { .procname = "dev",
- .mode = 0555,
- .child = ipmi_dir_table },
+static const struct ctl_path ipmi_path[] = {
+ { .procname = "dev" },
+ { .procname = "ipmi" },
{ }
};

@@ -699,7 +691,7 @@ static int __init ipmi_poweroff_init(void)
printk(KERN_INFO PFX "Power cycle is enabled.\n");

#ifdef CONFIG_PROC_FS
- ipmi_table_header = register_sysctl_table(ipmi_root_table);
+ ipmi_table_header = register_sysctl_paths(ipmi_path, ipmi_table);
if (!ipmi_table_header) {
printk(KERN_ERR PFX "Unable to register powercycle sysctl\n");
rv = -ENOMEM;
--
1.7.5.134.g1c08b

2011-05-08 23:04:25

by Lucian Adrian Grijincu

[permalink] [raw]
Subject: [v2 012/115] sysctl: remove .child from dev/rtc/

Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
drivers/char/rtc.c | 24 ++++++------------------
1 files changed, 6 insertions(+), 18 deletions(-)

diff --git a/drivers/char/rtc.c b/drivers/char/rtc.c
index dfa8b30..cc752f5 100644
--- a/drivers/char/rtc.c
+++ b/drivers/char/rtc.c
@@ -291,21 +291,9 @@ static ctl_table rtc_table[] = {
{ }
};

-static ctl_table rtc_root[] = {
- {
- .procname = "rtc",
- .mode = 0555,
- .child = rtc_table,
- },
- { }
-};
-
-static ctl_table dev_root[] = {
- {
- .procname = "dev",
- .mode = 0555,
- .child = rtc_root,
- },
+static const __initdata struct ctl_path rtc_path[] = {
+ { .procname = "dev" },
+ { .procname = "rtc" },
{ }
};

@@ -313,13 +301,13 @@ static struct ctl_table_header *sysctl_header;

static int __init init_sysctl(void)
{
- sysctl_header = register_sysctl_table(dev_root);
- return 0;
+ sysctl_header = register_sysctl_paths(rtc_path, rtc_table);
+ return 0;
}

static void __exit cleanup_sysctl(void)
{
- unregister_sysctl_table(sysctl_header);
+ unregister_sysctl_table(sysctl_header);
}

/*
--
1.7.5.134.g1c08b

2011-05-08 23:04:23

by Lucian Adrian Grijincu

[permalink] [raw]
Subject: [v2 013/115] sysctl: remove .child from dev/mac_hid/

Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
drivers/macintosh/mac_hid.c | 26 +++++---------------------
1 files changed, 5 insertions(+), 21 deletions(-)

diff --git a/drivers/macintosh/mac_hid.c b/drivers/macintosh/mac_hid.c
index 6a82388..5eec7b7 100644
--- a/drivers/macintosh/mac_hid.c
+++ b/drivers/macintosh/mac_hid.c
@@ -214,7 +214,7 @@ static int mac_hid_toggle_emumouse(ctl_table *table, int write,
}

/* file(s) in /proc/sys/dev/mac_hid */
-static ctl_table mac_hid_files[] = {
+static ctl_table mac_hid_table[] = {
{
.procname = "mouse_button_emulation",
.data = &mouse_emulate_buttons,
@@ -239,25 +239,9 @@ static ctl_table mac_hid_files[] = {
{ }
};

-/* dir in /proc/sys/dev */
-static ctl_table mac_hid_dir[] = {
- {
- .procname = "mac_hid",
- .maxlen = 0,
- .mode = 0555,
- .child = mac_hid_files,
- },
- { }
-};
-
-/* /proc/sys/dev itself, in case that is not there yet */
-static ctl_table mac_hid_root_dir[] = {
- {
- .procname = "dev",
- .maxlen = 0,
- .mode = 0555,
- .child = mac_hid_dir,
- },
+static const __initdata struct ctl_path mac_hid_path[] = {
+ { .procname = "dev" },
+ { .procname = "mac_hid" },
{ }
};

@@ -265,7 +249,7 @@ static struct ctl_table_header *mac_hid_sysctl_header;

static int __init mac_hid_init(void)
{
- mac_hid_sysctl_header = register_sysctl_table(mac_hid_root_dir);
+ mac_hid_sysctl_header = register_sysctl_paths(mac_hid_path, mac_hid_table);
if (!mac_hid_sysctl_header)
return -ENOMEM;

--
1.7.5.134.g1c08b

2011-05-08 23:04:04

by Lucian Adrian Grijincu

[permalink] [raw]
Subject: [v2 014/115] sysctl: remove .child from dev/raid/

Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
drivers/md/md.c | 22 ++++------------------
1 files changed, 4 insertions(+), 18 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 7d6f7f1..3b54374 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -125,26 +125,12 @@ static ctl_table raid_table[] = {
{ }
};

-static ctl_table raid_dir_table[] = {
- {
- .procname = "raid",
- .maxlen = 0,
- .mode = S_IRUGO|S_IXUGO,
- .child = raid_table,
- },
+static const __initdata struct ctl_path raid_path[] = {
+ { .procname = "dev" },
+ { .procname = "raid" },
{ }
};

-static ctl_table raid_root_table[] = {
- {
- .procname = "dev",
- .maxlen = 0,
- .mode = 0555,
- .child = raid_dir_table,
- },
- { }
-};
-
static const struct block_device_operations md_fops;

static int start_readonly;
@@ -7380,7 +7366,7 @@ static int __init md_init(void)
md_probe, NULL, NULL);

register_reboot_notifier(&md_notifier);
- raid_table_header = register_sysctl_table(raid_root_table);
+ raid_table_header = register_sysctl_paths(raid_path, raid_table);

md_geninit();
return 0;
--
1.7.5.134.g1c08b

2011-05-08 22:40:52

by Lucian Adrian Grijincu

[permalink] [raw]
Subject: [v2 015/115] sysctl: remove .child from xpc/

Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
drivers/misc/sgi-xp/xpc_main.c | 12 +++++-------
1 files changed, 5 insertions(+), 7 deletions(-)

diff --git a/drivers/misc/sgi-xp/xpc_main.c b/drivers/misc/sgi-xp/xpc_main.c
index 8d082b4..642efb1 100644
--- a/drivers/misc/sgi-xp/xpc_main.c
+++ b/drivers/misc/sgi-xp/xpc_main.c
@@ -122,13 +122,11 @@ static ctl_table xpc_sys_xpc_dir[] = {
.extra2 = &xpc_disengage_max_timelimit},
{}
};
-static ctl_table xpc_sys_dir[] = {
- {
- .procname = "xpc",
- .mode = 0555,
- .child = xpc_sys_xpc_dir},
- {}
+static const __initdata struct ctl_path xpc_path[] = {
+ { .procname = "xpc" },
+ { }
};
+
static struct ctl_table_header *xpc_sysctl;

/* non-zero if any remote partition disengage was timed out */
@@ -1236,7 +1234,7 @@ xpc_init(void)
goto out_1;
}

- xpc_sysctl = register_sysctl_table(xpc_sys_dir);
+ xpc_sysctl = register_sysctl_paths(xpc_path, xpc_sys_xpc_dir);

/*
* Fill the partition reserved page with the information needed by
--
1.7.5.134.g1c08b

2011-05-08 22:40:57

by Lucian Adrian Grijincu

[permalink] [raw]
Subject: [v2 016/115] sysctl: remove .child from xpc/hb

Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
drivers/misc/sgi-xp/xpc_main.c | 69 +++++++++++++++++++++++-----------------
1 files changed, 40 insertions(+), 29 deletions(-)

diff --git a/drivers/misc/sgi-xp/xpc_main.c b/drivers/misc/sgi-xp/xpc_main.c
index 642efb1..414d68b 100644
--- a/drivers/misc/sgi-xp/xpc_main.c
+++ b/drivers/misc/sgi-xp/xpc_main.c
@@ -88,46 +88,52 @@ int xpc_disengage_timelimit = XPC_DISENGAGE_DEFAULT_TIMELIMIT;
static int xpc_disengage_min_timelimit; /* = 0 */
static int xpc_disengage_max_timelimit = 120;

-static ctl_table xpc_sys_xpc_hb_dir[] = {
+static ctl_table xpc_hb_table[] = {
{
- .procname = "hb_interval",
- .data = &xpc_hb_interval,
- .maxlen = sizeof(int),
- .mode = 0644,
- .proc_handler = proc_dointvec_minmax,
- .extra1 = &xpc_hb_min_interval,
- .extra2 = &xpc_hb_max_interval},
+ .procname = "hb_interval",
+ .data = &xpc_hb_interval,
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec_minmax,
+ .extra1 = &xpc_hb_min_interval,
+ .extra2 = &xpc_hb_max_interval
+ },
{
- .procname = "hb_check_interval",
- .data = &xpc_hb_check_interval,
- .maxlen = sizeof(int),
- .mode = 0644,
- .proc_handler = proc_dointvec_minmax,
- .extra1 = &xpc_hb_check_min_interval,
- .extra2 = &xpc_hb_check_max_interval},
- {}
+ .procname = "hb_check_interval",
+ .data = &xpc_hb_check_interval,
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec_minmax,
+ .extra1 = &xpc_hb_check_min_interval,
+ .extra2 = &xpc_hb_check_max_interval
+ },
+ { }
};
-static ctl_table xpc_sys_xpc_dir[] = {
- {
- .procname = "hb",
- .mode = 0555,
- .child = xpc_sys_xpc_hb_dir},
+static ctl_table xpc_table[] = {
{
- .procname = "disengage_timelimit",
- .data = &xpc_disengage_timelimit,
- .maxlen = sizeof(int),
- .mode = 0644,
- .proc_handler = proc_dointvec_minmax,
- .extra1 = &xpc_disengage_min_timelimit,
- .extra2 = &xpc_disengage_max_timelimit},
- {}
+ .procname = "disengage_timelimit",
+ .data = &xpc_disengage_timelimit,
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec_minmax,
+ .extra1 = &xpc_disengage_min_timelimit,
+ .extra2 = &xpc_disengage_max_timelimit
+ },
+ { }
};
static const __initdata struct ctl_path xpc_path[] = {
{ .procname = "xpc" },
{ }
};

+static const __initdata struct ctl_path xpc_hb_path[] = {
+ { .procname = "xpc" },
+ { .procname = "hb" },
+ { }
+};
+
static struct ctl_table_header *xpc_sysctl;
+static struct ctl_table_header *xpc_hb_sysctl;

/* non-zero if any remote partition disengage was timed out */
int xpc_disengage_timedout;
@@ -1040,6 +1046,8 @@ xpc_do_exit(enum xp_retval reason)
/* clear the interface to XPC's functions */
xpc_clear_interface();

+ if (xpc_hb_sysctl)
+ unregister_sysctl_table(xpc_hb_sysctl);
if (xpc_sysctl)
unregister_sysctl_table(xpc_sysctl);

@@ -1235,6 +1243,7 @@ xpc_init(void)
}

xpc_sysctl = register_sysctl_paths(xpc_path, xpc_sys_xpc_dir);
+ xpc_hb_sysctl = register_sysctl_paths(xpc_hb_path, xpc_hb_table);

/*
* Fill the partition reserved page with the information needed by
@@ -1299,6 +1308,8 @@ out_3:
(void)unregister_die_notifier(&xpc_die_notifier);
(void)unregister_reboot_notifier(&xpc_reboot_notifier);
out_2:
+ if (xpc_hb_sysctl)
+ unregister_sysctl_table(xpc_hb_sysctl);
if (xpc_sysctl)
unregister_sysctl_table(xpc_sysctl);

--
1.7.5.134.g1c08b

2011-05-08 23:03:42

by Lucian Adrian Grijincu

[permalink] [raw]
Subject: [v2 017/115] sysctl: remove .child from kernel/sclp (s390)

Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
drivers/s390/char/sclp_async.c | 13 ++++---------
1 files changed, 4 insertions(+), 9 deletions(-)

diff --git a/drivers/s390/char/sclp_async.c b/drivers/s390/char/sclp_async.c
index 7ad30e7..43f8b1e 100644
--- a/drivers/s390/char/sclp_async.c
+++ b/drivers/s390/char/sclp_async.c
@@ -106,14 +106,9 @@ static struct ctl_table callhome_table[] = {
{}
};

-static struct ctl_table kern_dir_table[] = {
- {
- .procname = "kernel",
- .maxlen = 0,
- .mode = 0555,
- .child = callhome_table,
- },
- {}
+static const __initdata struct ctl_path kern_path[] = {
+ { .procname = "kernel" },
+ { }
};

/*
@@ -175,7 +170,7 @@ static int __init sclp_async_init(void)
if (!(sclp_async_register.sclp_receive_mask & EVTYP_ASYNC_MASK))
goto out_sclp;
rc = -ENOMEM;
- callhome_sysctl_header = register_sysctl_table(kern_dir_table);
+ callhome_sysctl_header = register_sysctl_paths(kern_path, callhome_table);
if (!callhome_sysctl_header)
goto out_sclp;
request = kzalloc(sizeof(struct sclp_req), GFP_KERNEL);
--
1.7.5.134.g1c08b

2011-05-08 23:03:22

by Lucian Adrian Grijincu

[permalink] [raw]
Subject: [v2 018/115] sysctl: remove .child from dev/scsi

Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
drivers/scsi/scsi_sysctl.c | 28 +++++++++++-----------------
1 files changed, 11 insertions(+), 17 deletions(-)

diff --git a/drivers/scsi/scsi_sysctl.c b/drivers/scsi/scsi_sysctl.c
index 2b6b93f..a28707f 100644
--- a/drivers/scsi/scsi_sysctl.c
+++ b/drivers/scsi/scsi_sysctl.c
@@ -13,25 +13,19 @@


static ctl_table scsi_table[] = {
- { .procname = "logging_level",
- .data = &scsi_logging_level,
- .maxlen = sizeof(scsi_logging_level),
- .mode = 0644,
- .proc_handler = proc_dointvec },
+ {
+ .procname = "logging_level",
+ .data = &scsi_logging_level,
+ .maxlen = sizeof(scsi_logging_level),
+ .mode = 0644,
+ .proc_handler = proc_dointvec,
+ },
{ }
};

-static ctl_table scsi_dir_table[] = {
- { .procname = "scsi",
- .mode = 0555,
- .child = scsi_table },
- { }
-};
-
-static ctl_table scsi_root_table[] = {
- { .procname = "dev",
- .mode = 0555,
- .child = scsi_dir_table },
+static const __initdata struct ctl_path scsi_path[] = {
+ { .procname = "dev" },
+ { .procname = "scsi" },
{ }
};

@@ -39,7 +33,7 @@ static struct ctl_table_header *scsi_table_header;

int __init scsi_init_sysctl(void)
{
- scsi_table_header = register_sysctl_table(scsi_root_table);
+ scsi_table_header = register_sysctl_paths(scsi_path, scsi_table);
if (!scsi_table_header)
return -ENOMEM;
return 0;
--
1.7.5.134.g1c08b

2011-05-08 23:03:00

by Lucian Adrian Grijincu

[permalink] [raw]
Subject: [v2 019/115] sysctl: remove .child from kernel/pty

Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
drivers/tty/pty.c | 23 +++++------------------
1 files changed, 5 insertions(+), 18 deletions(-)

diff --git a/drivers/tty/pty.c b/drivers/tty/pty.c
index 2107747..2a40b34 100644
--- a/drivers/tty/pty.c
+++ b/drivers/tty/pty.c
@@ -469,25 +469,12 @@ static struct ctl_table pty_table[] = {
{}
};

-static struct ctl_table pty_kern_table[] = {
- {
- .procname = "pty",
- .mode = 0555,
- .child = pty_table,
- },
- {}
+static const __initdata struct ctl_path pty_path[] = {
+ { .procname = "kernel" },
+ { .procname = "pty" },
+ { }
};

-static struct ctl_table pty_root_table[] = {
- {
- .procname = "kernel",
- .mode = 0555,
- .child = pty_kern_table,
- },
- {}
-};
-
-
static int pty_unix98_ioctl(struct tty_struct *tty,
unsigned int cmd, unsigned long arg)
{
@@ -750,7 +737,7 @@ static void __init unix98_pty_init(void)
if (tty_register_driver(pts_driver))
panic("Couldn't register Unix98 pts driver");

- register_sysctl_table(pty_root_table);
+ register_sysctl_paths(pty_path, pty_table);

/* Now create the /dev/ptmx special device */
tty_default_fops(&ptmx_fops);
--
1.7.5.134.g1c08b

2011-05-08 23:02:11

by Lucian Adrian Grijincu

[permalink] [raw]
Subject: [v2 020/115] sysctl: remove .child from coda/

Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
fs/coda/sysctl.c | 12 ++++--------
1 files changed, 4 insertions(+), 8 deletions(-)

diff --git a/fs/coda/sysctl.c b/fs/coda/sysctl.c
index af56ad5..8c328c9 100644
--- a/fs/coda/sysctl.c
+++ b/fs/coda/sysctl.c
@@ -39,19 +39,15 @@ static ctl_table coda_table[] = {
{}
};

-static ctl_table fs_table[] = {
- {
- .procname = "coda",
- .mode = 0555,
- .child = coda_table
- },
- {}
+static const __initdata struct ctl_path coda_path[] = {
+ { .procname = "coda" },
+ { }
};

void coda_sysctl_init(void)
{
if ( !fs_table_header )
- fs_table_header = register_sysctl_table(fs_table);
+ fs_table_header = register_sysctl_paths(coda_path, coda_table);
}

void coda_sysctl_clean(void)
--
1.7.5.134.g1c08b

2011-05-08 23:02:14

by Lucian Adrian Grijincu

[permalink] [raw]
Subject: [v2 021/115] sysctl: remove .child from fscache/

Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
fs/fscache/main.c | 15 ++++++---------
1 files changed, 6 insertions(+), 9 deletions(-)

diff --git a/fs/fscache/main.c b/fs/fscache/main.c
index f9d8567..7f9c055 100644
--- a/fs/fscache/main.c
+++ b/fs/fscache/main.c
@@ -67,7 +67,7 @@ static int fscache_max_active_sysctl(struct ctl_table *table, int write,
return ret;
}

-ctl_table fscache_sysctls[] = {
+static ctl_table fscache_table[] = {
{
.procname = "object_max_active",
.data = &fscache_object_max_active,
@@ -87,14 +87,11 @@ ctl_table fscache_sysctls[] = {
{}
};

-ctl_table fscache_sysctls_root[] = {
- {
- .procname = "fscache",
- .mode = 0555,
- .child = fscache_sysctls,
- },
- {}
+static const __initdata struct ctl_path fscache_path[] = {
+ { .procname = "fscache" },
+ { }
};
+
#endif

/*
@@ -135,7 +132,7 @@ static int __init fscache_init(void)

#ifdef CONFIG_SYSCTL
ret = -ENOMEM;
- fscache_sysctl_header = register_sysctl_table(fscache_sysctls_root);
+ fscache_sysctl_header = register_sysctl_paths(fscache_path, fscache_table);
if (!fscache_sysctl_header)
goto error_sysctl;
#endif
--
1.7.5.134.g1c08b

2011-05-08 23:02:08

by Lucian Adrian Grijincu

[permalink] [raw]
Subject: [v2 022/115] sysctl: remove .child from fs/nfs/ nlm_table table

Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
fs/lockd/svc.c | 22 +++++-----------------
1 files changed, 5 insertions(+), 17 deletions(-)

diff --git a/fs/lockd/svc.c b/fs/lockd/svc.c
index abfff9d..6ab5932 100644
--- a/fs/lockd/svc.c
+++ b/fs/lockd/svc.c
@@ -355,7 +355,7 @@ EXPORT_SYMBOL_GPL(lockd_down);
* Sysctl parameters (same as module parameters, different interface).
*/

-static ctl_table nlm_sysctls[] = {
+static ctl_table nlm_table[] = {
{
.procname = "nlm_grace_period",
.data = &nlm_grace_period,
@@ -409,21 +409,9 @@ static ctl_table nlm_sysctls[] = {
{ }
};

-static ctl_table nlm_sysctl_dir[] = {
- {
- .procname = "nfs",
- .mode = 0555,
- .child = nlm_sysctls,
- },
- { }
-};
-
-static ctl_table nlm_sysctl_root[] = {
- {
- .procname = "fs",
- .mode = 0555,
- .child = nlm_sysctl_dir,
- },
+static const __initdata struct ctl_path nlm_path[] = {
+ { .procname = "fs" },
+ { .procname = "nfs" },
{ }
};

@@ -504,7 +492,7 @@ module_param(nlm_max_connections, uint, 0644);
static int __init init_nlm(void)
{
#ifdef CONFIG_SYSCTL
- nlm_sysctl_table = register_sysctl_table(nlm_sysctl_root);
+ nlm_sysctl_table = register_sysctl_paths(nlm_path, nlm_table);
return nlm_sysctl_table ? 0 : -ENOMEM;
#else
return 0;
--
1.7.5.134.g1c08b

2011-05-08 23:01:37

by Lucian Adrian Grijincu

[permalink] [raw]
Subject: [v2 023/115] sysctl: remove .child from fs/nfs/ nfs_cb_table

Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
fs/nfs/sysctl.c | 22 +++++-----------------
1 files changed, 5 insertions(+), 17 deletions(-)

diff --git a/fs/nfs/sysctl.c b/fs/nfs/sysctl.c
index 978aaeb..046fe19 100644
--- a/fs/nfs/sysctl.c
+++ b/fs/nfs/sysctl.c
@@ -21,7 +21,7 @@ static const int nfs_set_port_max = 65535;
#endif
static struct ctl_table_header *nfs_callback_sysctl_table;

-static ctl_table nfs_cb_sysctls[] = {
+static ctl_table nfs_cb_table[] = {
#ifdef CONFIG_NFS_V4
{
.procname = "nfs_callback_tcpport",
@@ -59,27 +59,15 @@ static ctl_table nfs_cb_sysctls[] = {
{ }
};

-static ctl_table nfs_cb_sysctl_dir[] = {
- {
- .procname = "nfs",
- .mode = 0555,
- .child = nfs_cb_sysctls,
- },
- { }
-};
-
-static ctl_table nfs_cb_sysctl_root[] = {
- {
- .procname = "fs",
- .mode = 0555,
- .child = nfs_cb_sysctl_dir,
- },
+static const __initdata struct ctl_path nfs_cb_path[] = {
+ { .procname = "fs" },
+ { .procname = "nfs" },
{ }
};

int nfs_register_sysctl(void)
{
- nfs_callback_sysctl_table = register_sysctl_table(nfs_cb_sysctl_root);
+ nfs_callback_sysctl_table = register_sysctl_paths(nfs_cb_path, nfs_cb_table);
if (nfs_callback_sysctl_table == NULL)
return -ENOMEM;
return 0;
--
1.7.5.134.g1c08b

2011-05-08 23:01:34

by Lucian Adrian Grijincu

[permalink] [raw]
Subject: [v2 024/115] sysctl: remove .child from fs/ntfs-debug

Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
fs/ntfs/sysctl.c | 15 +++++----------
1 files changed, 5 insertions(+), 10 deletions(-)

diff --git a/fs/ntfs/sysctl.c b/fs/ntfs/sysctl.c
index 79a8918..da1293d 100644
--- a/fs/ntfs/sysctl.c
+++ b/fs/ntfs/sysctl.c
@@ -34,7 +34,7 @@
#include "debug.h"

/* Definition of the ntfs sysctl. */
-static ctl_table ntfs_sysctls[] = {
+static ctl_table ntfs_table[] = {
{
.procname = "ntfs-debug",
.data = &debug_msgs, /* Data pointer and size. */
@@ -45,14 +45,9 @@ static ctl_table ntfs_sysctls[] = {
{}
};

-/* Define the parent directory /proc/sys/fs. */
-static ctl_table sysctls_root[] = {
- {
- .procname = "fs",
- .mode = 0555,
- .child = ntfs_sysctls
- },
- {}
+static const __initdata struct ctl_path ntfs_path[] = {
+ { .procname = "fs" },
+ { }
};

/* Storage for the sysctls header. */
@@ -68,7 +63,7 @@ int ntfs_sysctl(int add)
{
if (add) {
BUG_ON(sysctls_root_table);
- sysctls_root_table = register_sysctl_table(sysctls_root);
+ sysctls_root_table = register_sysctl_paths(ntfs_path, ntfs_table);
if (!sysctls_root_table)
return -ENOMEM;
} else {
--
1.7.5.134.g1c08b

2011-05-08 22:41:11

by Lucian Adrian Grijincu

[permalink] [raw]
Subject: [v2 025/115] sysctl: remove .child from fs/ocfs2/nm/

Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
fs/ocfs2/stackglue.c | 36 +++++-------------------------------
1 files changed, 5 insertions(+), 31 deletions(-)

diff --git a/fs/ocfs2/stackglue.c b/fs/ocfs2/stackglue.c
index 39abf89..3cb738a 100644
--- a/fs/ocfs2/stackglue.c
+++ b/fs/ocfs2/stackglue.c
@@ -654,36 +654,10 @@ static ctl_table ocfs2_nm_table[] = {
{ }
};

-static ctl_table ocfs2_mod_table[] = {
- {
- .procname = "nm",
- .data = NULL,
- .maxlen = 0,
- .mode = 0555,
- .child = ocfs2_nm_table
- },
- { }
-};
-
-static ctl_table ocfs2_kern_table[] = {
- {
- .procname = "ocfs2",
- .data = NULL,
- .maxlen = 0,
- .mode = 0555,
- .child = ocfs2_mod_table
- },
- { }
-};
-
-static ctl_table ocfs2_root_table[] = {
- {
- .procname = "fs",
- .data = NULL,
- .maxlen = 0,
- .mode = 0555,
- .child = ocfs2_kern_table
- },
+static const __initdata struct ctl_path ocfs2_nm_path[] = {
+ { .procname = "fs" },
+ { .procname = "ocfs2" },
+ { .procname = "nm" },
{ }
};

@@ -698,7 +672,7 @@ static int __init ocfs2_stack_glue_init(void)
{
strcpy(cluster_stack_name, OCFS2_STACK_PLUGIN_O2CB);

- ocfs2_table_header = register_sysctl_table(ocfs2_root_table);
+ ocfs2_table_header = register_sysctl_paths(ocfs2_nm_path, ocfs2_nm_table);
if (!ocfs2_table_header) {
printk(KERN_ERR
"ocfs2 stack glue: unable to register sysctl\n");
--
1.7.5.134.g1c08b

2011-05-08 22:41:02

by Lucian Adrian Grijincu

[permalink] [raw]
Subject: [v2 026/115] sysctl: remove .child from fs/quota/

Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
fs/quota/dquot.c | 21 +++++----------------
1 files changed, 5 insertions(+), 16 deletions(-)

diff --git a/fs/quota/dquot.c b/fs/quota/dquot.c
index d3c032f..7837944 100644
--- a/fs/quota/dquot.c
+++ b/fs/quota/dquot.c
@@ -2591,22 +2591,11 @@ static ctl_table fs_dqstats_table[] = {
{ },
};

-static ctl_table fs_table[] = {
- {
- .procname = "quota",
- .mode = 0555,
- .child = fs_dqstats_table,
- },
- { },
-};

-static ctl_table sys_table[] = {
- {
- .procname = "fs",
- .mode = 0555,
- .child = fs_table,
- },
- { },
+static const __initdata struct ctl_path quota_path[] = {
+ { .procname = "fs" },
+ { .procname = "quota" },
+ { }
};

static int __init dquot_init(void)
@@ -2616,7 +2605,7 @@ static int __init dquot_init(void)

printk(KERN_NOTICE "VFS: Disk quotas %s\n", __DQUOT_VERSION__);

- register_sysctl_table(sys_table);
+ register_sysctl_paths(quota_path, fs_dqstats_table);

dquot_cachep = kmem_cache_create("dquot",
sizeof(struct dquot), sizeof(unsigned long) * 4,
--
1.7.5.134.g1c08b

2011-05-08 22:41:06

by Lucian Adrian Grijincu

[permalink] [raw]
Subject: [v2 027/115] sysctl: remove .child from fs/xfs/

Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
fs/xfs/linux-2.6/xfs_sysctl.c | 22 +++++-----------------
1 files changed, 5 insertions(+), 17 deletions(-)

diff --git a/fs/xfs/linux-2.6/xfs_sysctl.c b/fs/xfs/linux-2.6/xfs_sysctl.c
index ee2d2ad..95f803c 100644
--- a/fs/xfs/linux-2.6/xfs_sysctl.c
+++ b/fs/xfs/linux-2.6/xfs_sysctl.c
@@ -218,28 +218,16 @@ static ctl_table xfs_table[] = {
{}
};

-static ctl_table xfs_dir_table[] = {
- {
- .procname = "xfs",
- .mode = 0555,
- .child = xfs_table
- },
- {}
-};
-
-static ctl_table xfs_root_table[] = {
- {
- .procname = "fs",
- .mode = 0555,
- .child = xfs_dir_table
- },
- {}
+static const __initdata struct ctl_path xfs_path[] = {
+ { .procname = "fs" },
+ { .procname = "xfs" },
+ { }
};

int
xfs_sysctl_register(void)
{
- xfs_table_header = register_sysctl_table(xfs_root_table);
+ xfs_table_header = register_sysctl_paths(xfs_path, xfs_table);
if (!xfs_table_header)
return -ENOMEM;
return 0;
--
1.7.5.134.g1c08b

2011-05-08 23:01:17

by Lucian Adrian Grijincu

[permalink] [raw]
Subject: [v2 028/115] sysctl: remove .child from kernel/ (ipc)

Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
ipc/ipc_sysctl.c | 12 ++++--------
1 files changed, 4 insertions(+), 8 deletions(-)

diff --git a/ipc/ipc_sysctl.c b/ipc/ipc_sysctl.c
index 56410fa..9e408a6 100644
--- a/ipc/ipc_sysctl.c
+++ b/ipc/ipc_sysctl.c
@@ -194,18 +194,14 @@ static struct ctl_table ipc_kern_table[] = {
{}
};

-static struct ctl_table ipc_root_table[] = {
- {
- .procname = "kernel",
- .mode = 0555,
- .child = ipc_kern_table,
- },
- {}
+static const __initdata struct ctl_path ipc_path[] = {
+ { .procname = "kernel" },
+ { }
};

static int __init ipc_sysctl_init(void)
{
- register_sysctl_table(ipc_root_table);
+ register_sysctl_paths(ipc_path, ipc_kern_table);
return 0;
}

--
1.7.5.134.g1c08b

2011-05-08 23:00:53

by Lucian Adrian Grijincu

[permalink] [raw]
Subject: [v2 029/115] sysctl: remove .child from fs/mqueue

Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
ipc/mq_sysctl.c | 24 ++++++------------------
1 files changed, 6 insertions(+), 18 deletions(-)

diff --git a/ipc/mq_sysctl.c b/ipc/mq_sysctl.c
index 0c09366..007164b 100644
--- a/ipc/mq_sysctl.c
+++ b/ipc/mq_sysctl.c
@@ -62,7 +62,7 @@ static int msg_max_limit_max = MAX_MSGMAX;
static int msg_maxsize_limit_min = MIN_MSGSIZEMAX;
static int msg_maxsize_limit_max = MAX_MSGSIZEMAX;

-static ctl_table mq_sysctls[] = {
+static ctl_table mq_table[] = {
{
.procname = "queues_max",
.data = &init_ipc_ns.mq_queues_max,
@@ -91,25 +91,13 @@ static ctl_table mq_sysctls[] = {
{}
};

-static ctl_table mq_sysctl_dir[] = {
- {
- .procname = "mqueue",
- .mode = 0555,
- .child = mq_sysctls,
- },
- {}
-};
-
-static ctl_table mq_sysctl_root[] = {
- {
- .procname = "fs",
- .mode = 0555,
- .child = mq_sysctl_dir,
- },
- {}
+static const struct ctl_path mq_path[] = {
+ { .procname = "fs" },
+ { .procname = "mqueue" },
+ { }
};

struct ctl_table_header *mq_register_sysctl_table(void)
{
- return register_sysctl_table(mq_sysctl_root);
+ return register_sysctl_paths(mq_path, mq_table);
}
--
1.7.5.134.g1c08b

2011-05-08 22:41:15

by Lucian Adrian Grijincu

[permalink] [raw]
Subject: [v2 030/115] sysctl: sched: add sd_table_template

This is just a cleanup patch, it doesn't change any functionality.

Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
kernel/sched.c | 144 ++++++++++++++++++++++++++++++++++++++++----------------
1 files changed, 103 insertions(+), 41 deletions(-)

diff --git a/kernel/sched.c b/kernel/sched.c
index 312f8b9..23a980c 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -6092,6 +6092,95 @@ static void migrate_tasks(unsigned int dead_cpu)

#if defined(CONFIG_SCHED_DEBUG) && defined(CONFIG_SYSCTL)

+
+static struct ctl_table sd_table_template[] = {
+ {
+ .procname = "min_interval",
+ /* .data = &sd->min_interval, */
+ .maxlen = sizeof(long),
+ .mode = 0644,
+ .proc_handler = proc_doulongvec_minmax,
+ },
+ {
+ .procname = "max_interval",
+ /* .data = &sd->max_interval, */
+ .maxlen = sizeof(long),
+ .mode = 0644,
+ .proc_handler = proc_doulongvec_minmax,
+ },
+ {
+ .procname = "busy_idx",
+ /* .data = &sd->busy_idx, */
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec_minmax,
+ },
+ {
+ .procname = "idle_idx",
+ /* .data = &sd->idle_idx, */
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec_minmax,
+ },
+ {
+ .procname = "newidle_idx",
+ /* .data = &sd->newidle_idx, */
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec_minmax,
+ },
+ {
+ .procname = "wake_idx",
+ /* .data = &sd->wake_idx, */
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec_minmax,
+ },
+ {
+ .procname = "forkexec_idx",
+ /* .data = &sd->forkexec_idx, */
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec_minmax,
+ },
+ {
+ .procname = "busy_factor",
+ /* .data = &sd->busy_factor, */
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec_minmax,
+ },
+ {
+ .procname = "imbalance_pct",
+ /* .data = &sd->imbalance_pct, */
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec_minmax,
+ },
+ {
+ .procname = "cache_nice_tries",
+ /* .data = &sd->cache_nice_tries, */
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec_minmax,
+ },
+ {
+ .procname = "flags",
+ /* .data = &sd->flags, */
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec_minmax,
+ },
+ {
+ .procname = "name",
+ /* .data = sd->name, */
+ .maxlen = CORENAME_MAX_SIZE,
+ .mode = 0444,
+ .proc_handler = proc_dostring,
+ },
+ { }
+};
+
static struct ctl_table sd_ctl_dir[] = {
{
.procname = "sched_domain",
@@ -6138,52 +6227,25 @@ static void sd_free_ctl_entry(struct ctl_table **tablep)
*tablep = NULL;
}

-static void
-set_table_entry(struct ctl_table *entry,
- const char *procname, void *data, int maxlen,
- mode_t mode, proc_handler *proc_handler)
-{
- entry->procname = procname;
- entry->data = data;
- entry->maxlen = maxlen;
- entry->mode = mode;
- entry->proc_handler = proc_handler;
-}
-
static struct ctl_table *
sd_alloc_ctl_domain_table(struct sched_domain *sd)
{
- struct ctl_table *table = sd_alloc_ctl_entry(13);
-
+ struct ctl_table *table = kmemdup(&sd_table_template,
+ sizeof(sd_table_template), GFP_KERNEL);
if (table == NULL)
return NULL;
-
- set_table_entry(&table[0], "min_interval", &sd->min_interval,
- sizeof(long), 0644, proc_doulongvec_minmax);
- set_table_entry(&table[1], "max_interval", &sd->max_interval,
- sizeof(long), 0644, proc_doulongvec_minmax);
- set_table_entry(&table[2], "busy_idx", &sd->busy_idx,
- sizeof(int), 0644, proc_dointvec_minmax);
- set_table_entry(&table[3], "idle_idx", &sd->idle_idx,
- sizeof(int), 0644, proc_dointvec_minmax);
- set_table_entry(&table[4], "newidle_idx", &sd->newidle_idx,
- sizeof(int), 0644, proc_dointvec_minmax);
- set_table_entry(&table[5], "wake_idx", &sd->wake_idx,
- sizeof(int), 0644, proc_dointvec_minmax);
- set_table_entry(&table[6], "forkexec_idx", &sd->forkexec_idx,
- sizeof(int), 0644, proc_dointvec_minmax);
- set_table_entry(&table[7], "busy_factor", &sd->busy_factor,
- sizeof(int), 0644, proc_dointvec_minmax);
- set_table_entry(&table[8], "imbalance_pct", &sd->imbalance_pct,
- sizeof(int), 0644, proc_dointvec_minmax);
- set_table_entry(&table[9], "cache_nice_tries",
- &sd->cache_nice_tries,
- sizeof(int), 0644, proc_dointvec_minmax);
- set_table_entry(&table[10], "flags", &sd->flags,
- sizeof(int), 0644, proc_dointvec_minmax);
- set_table_entry(&table[11], "name", sd->name,
- CORENAME_MAX_SIZE, 0444, proc_dostring);
- /* &table[12] is terminator */
+ table[ 0].data = &sd->min_interval;
+ table[ 1].data = &sd->max_interval;
+ table[ 2].data = &sd->busy_idx;
+ table[ 3].data = &sd->idle_idx;
+ table[ 4].data = &sd->newidle_idx;
+ table[ 5].data = &sd->wake_idx;
+ table[ 6].data = &sd->forkexec_idx;
+ table[ 7].data = &sd->busy_factor;
+ table[ 8].data = &sd->imbalance_pct;
+ table[ 9].data = &sd->cache_nice_tries;
+ table[10].data = &sd->flags;
+ table[11].data = sd->name;

return table;
}
--
1.7.5.134.g1c08b

2011-05-08 23:00:04

by Lucian Adrian Grijincu

[permalink] [raw]
Subject: [v2 031/115] sysctl: remove .child from kernel/sched_domain/cpuX/domainY/

Note: this patch makes sure to add empty kernel/sched_domain/cpuX/
directories when there are no domains in them.

This was the behaviour before this patch, and I thought it may need to
remain so in the new implementation. If they are not necessary this
can be removed to simplify the code.

Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
kernel/sched.c | 266 ++++++++++++++++++++++++++++++++++++++------------------
1 files changed, 180 insertions(+), 86 deletions(-)

diff --git a/kernel/sched.c b/kernel/sched.c
index 23a980c..6e39b7c 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -6181,52 +6181,6 @@ static struct ctl_table sd_table_template[] = {
{ }
};

-static struct ctl_table sd_ctl_dir[] = {
- {
- .procname = "sched_domain",
- .mode = 0555,
- },
- {}
-};
-
-static struct ctl_table sd_ctl_root[] = {
- {
- .procname = "kernel",
- .mode = 0555,
- .child = sd_ctl_dir,
- },
- {}
-};
-
-static struct ctl_table *sd_alloc_ctl_entry(int n)
-{
- struct ctl_table *entry =
- kcalloc(n, sizeof(struct ctl_table), GFP_KERNEL);
-
- return entry;
-}
-
-static void sd_free_ctl_entry(struct ctl_table **tablep)
-{
- struct ctl_table *entry;
-
- /*
- * In the intermediate directories, both the child directory and
- * procname are dynamically allocated and could fail but the mode
- * will always be set. In the lowest directory the names are
- * static strings and all have proc handlers.
- */
- for (entry = *tablep; entry->mode; entry++) {
- if (entry->child)
- sd_free_ctl_entry(&entry->child);
- if (entry->proc_handler == NULL)
- kfree(entry->procname);
- }
-
- kfree(*tablep);
- *tablep = NULL;
-}
-
static struct ctl_table *
sd_alloc_ctl_domain_table(struct sched_domain *sd)
{
@@ -6250,64 +6204,204 @@ sd_alloc_ctl_domain_table(struct sched_domain *sd)
return table;
}

-static ctl_table *sd_alloc_ctl_cpu_table(int cpu)
+/*
+ * Find out what is the maximum number of domains in a cpu, and the
+ * total number of domains across all cpus.
+ */
+static void count_sd_domains(int *p_max, int *p_total)
{
- struct ctl_table *entry, *table;
- struct sched_domain *sd;
- int domain_num = 0, i;
- char buf[32];
+ int cpu;
+ int max = 0;
+ int total = 0;

- for_each_domain(cpu, sd)
- domain_num++;
- entry = table = sd_alloc_ctl_entry(domain_num + 1);
- if (table == NULL)
- return NULL;
+ for_each_possible_cpu(cpu) {
+ struct sched_domain *sd;
+ int domain_num = 0;

- i = 0;
- for_each_domain(cpu, sd) {
- snprintf(buf, 32, "domain%d", i);
- entry->procname = kstrdup(buf, GFP_KERNEL);
- entry->mode = 0555;
- entry->child = sd_alloc_ctl_domain_table(sd);
- entry++;
- i++;
+ for_each_domain(cpu, sd)
+ domain_num++;
+
+ if (domain_num > max)
+ max = domain_num;
+ total += domain_num;
}
- return table;
+ *p_max = max;
+ *p_total = total;
}

-static struct ctl_table_header *sd_sysctl_header;
+
+/* enough space to hold a string "cpu%d" or "domain%d" */
+#define SD_NAME_LEN 32
+typedef char sd_name_buf[SD_NAME_LEN];
+
+static sd_name_buf *sd_cpu_names, *sd_domain_names;
+static int sd_domain_headers_num, sd_cpudir_headers_num;
+static struct ctl_table_header **sd_domain_headers, **sd_cpudir_headers;
+
static void register_sched_domain_sysctl(void)
{
- int i, cpu_num = num_possible_cpus();
- struct ctl_table *entry = sd_alloc_ctl_entry(cpu_num + 1);
- char buf[32];
+ int cpu, i;
+ int cpu_num, max_domain_num;
+
+ /* possitions 2 and 3 in the array bellow */
+#define SD_PATH_CPU 2
+#define SD_PATH_DOM 3
+ struct ctl_path sd_path[] = {
+ { .procname = "kernel" },
+ { .procname = "sched_domain" },
+ { /* 'cpu0' */ },
+ { /* 'domain0' */ },
+ { },
+ };

- WARN_ON(sd_ctl_dir[0].child);
- sd_ctl_dir[0].child = entry;
+ sd_cpudir_headers_num = cpu_num = num_possible_cpus();
+ count_sd_domains(&max_domain_num, &sd_domain_headers_num);

- if (entry == NULL)
- return;
+ /*
+ * Allocate space for:
+ * - all cpu names (cpu0, cpu1,...) and all domain names (domain0,...)
+ * - the array of headers for cpu dirs kernel/sched_domain/cpuX/
+ * - the array of headers for domain dirs kernel/sched_domain/cpuX/domainY
+ *
+ * We only register the empty kernel/sched_domain/cpuX/ dirs
+ * to not break the ABI: if there were no domains defined, we
+ * would still have empty cpuX dir entries in
+ * kernel/sched_domain/.
+ *
+ * If this is not considered useful or part of the ABI, then
+ * we can drop the empty cpu dir entries.
+ */
+ sd_cpu_names = kmalloc(sizeof(sd_name_buf) * cpu_num, GFP_KERNEL);
+ if (sd_cpu_names == NULL)
+ goto fail_alloc_sd_cpu_names;

- for_each_possible_cpu(i) {
- snprintf(buf, 32, "cpu%d", i);
- entry->procname = kstrdup(buf, GFP_KERNEL);
- entry->mode = 0555;
- entry->child = sd_alloc_ctl_cpu_table(i);
- entry++;
+ sd_domain_names = kmalloc(sizeof(sd_name_buf) * max_domain_num, GFP_KERNEL);
+ if (sd_domain_names == NULL)
+ goto fail_alloc_sd_domain_names;
+
+ sd_cpudir_headers = kmalloc(sizeof(*sd_cpudir_headers) *
+ sd_cpudir_headers_num, GFP_KERNEL);
+ if (sd_cpudir_headers == NULL)
+ goto fail_alloc_sd_cpudir_headers;
+
+ sd_domain_headers = kmalloc(sizeof(*sd_domain_headers) *
+ sd_domain_headers_num, GFP_KERNEL);
+ if (sd_domain_headers == NULL)
+ goto fail_alloc_sd_domain_headers;
+
+ for_each_possible_cpu(cpu)
+ snprintf((char*)&sd_cpu_names[cpu], SD_NAME_LEN, "cpu%d", cpu);
+ for (i = 0; i < max_domain_num; i++)
+ snprintf((char*)&sd_domain_names[i], SD_NAME_LEN, "domain%d", i);
+
+ i = 0;
+ for_each_possible_cpu(cpu) {
+ struct ctl_table *empty = kzalloc(sizeof(*empty), GFP_KERNEL);
+ if (empty == NULL)
+ goto unregister_sd_cpudir_headers;
+ sd_path[SD_PATH_CPU].procname = sd_cpu_names[cpu];
+ sd_path[SD_PATH_DOM].procname = NULL; /* end of array sentinel */
+ sd_cpudir_headers[i] = register_sysctl_paths(sd_path, empty);
+ if (sd_cpudir_headers[i] == NULL) {
+ kfree(empty);
+ goto unregister_sd_cpudir_headers;
+ }
+ i++;
+ }
+
+ i = 0;
+ for_each_possible_cpu(cpu) {
+ struct sched_domain *sd;
+ int domain = 0;
+ for_each_domain(cpu, sd) {
+ struct ctl_table *table = sd_alloc_ctl_domain_table(sd);
+ if (table == NULL)
+ goto unregister_sd_domain_headers;
+ sd_path[SD_PATH_CPU].procname = sd_cpu_names[cpu];
+ sd_path[SD_PATH_DOM].procname = sd_domain_names[domain];
+ sd_domain_headers[i] = register_sysctl_paths(sd_path, table);
+ if (sd_domain_headers[i] == NULL) {
+ kfree(table);
+ goto unregister_sd_domain_headers;
+ }
+ i++;
+ domain++;
+ }
}

- WARN_ON(sd_sysctl_header);
- sd_sysctl_header = register_sysctl_table(sd_ctl_root);
+ return;
+
+unregister_sd_domain_headers:
+ i--; /* the current 'i' was being filled in, but fail_alloced */
+ for(; i >= 0; i--) {
+ struct ctl_table *table = sd_domain_headers[i]->ctl_table_arg;
+ unregister_sysctl_table(sd_domain_headers[i]);
+ kfree(table);
+ }
+ i = sd_cpudir_headers_num;
+unregister_sd_cpudir_headers:
+ i--;
+ for(; i >= 0; i--) {
+ struct ctl_table *table = sd_cpudir_headers[i]->ctl_table_arg;
+ unregister_sysctl_table(sd_cpudir_headers[i]);
+ kfree(table);
+ }
+
+ kfree(sd_domain_headers);
+fail_alloc_sd_domain_headers:
+ kfree(sd_cpudir_headers);
+fail_alloc_sd_cpudir_headers:
+ kfree(sd_domain_names);
+fail_alloc_sd_domain_names:
+ kfree(sd_cpu_names);
+fail_alloc_sd_cpu_names:
+ sd_domain_headers = NULL;
+ sd_cpudir_headers = NULL;
+ sd_domain_names = NULL;
+ sd_cpu_names = NULL;
+ sd_domain_headers_num = 0;
+ sd_cpudir_headers_num = 0;
}

/* may be called multiple times per register */
static void unregister_sched_domain_sysctl(void)
{
- if (sd_sysctl_header)
- unregister_sysctl_table(sd_sysctl_header);
- sd_sysctl_header = NULL;
- if (sd_ctl_dir[0].child)
- sd_free_ctl_entry(&sd_ctl_dir[0].child);
+ int i;
+
+ /* because this function may be called multiple times (not
+ * concurrently) for a single register_sched_domain_sysctl call,
+ * we skip unregistering if it was already done by a previous
+ * call. This is also why we make sure to NULLify all
+ * pointers: make sure nothing is double-freed. */
+ if (sd_domain_headers == NULL)
+ return;
+
+ /* unregister in the reverse order of registering, or we'll
+ * get a harmless warning saying that the parent of a header
+ * was registered before all it's children. */
+ for(i = sd_domain_headers_num - 1; i >= 0; i--) {
+ struct ctl_table *table = sd_domain_headers[i]->ctl_table_arg;
+ unregister_sysctl_table(sd_domain_headers[i]);
+ kfree(table);
+ }
+
+ for(i = sd_cpudir_headers_num - 1; i >= 0; i--) {
+ struct ctl_table *table = sd_cpudir_headers[i]->ctl_table_arg;
+ unregister_sysctl_table(sd_cpudir_headers[i]);
+ kfree(table);
+ }
+
+ kfree(sd_domain_headers);
+ kfree(sd_cpudir_headers);
+ kfree(sd_domain_names);
+ kfree(sd_cpu_names);
+
+ sd_domain_headers = NULL;
+ sd_cpudir_headers = NULL;
+ sd_domain_names = NULL;
+ sd_cpu_names = NULL;
+ sd_cpudir_headers_num = 0;
+ sd_domain_headers_num = 0;
}
#else
static void register_sched_domain_sysctl(void)
--
1.7.5.134.g1c08b

2011-05-08 23:00:00

by Lucian Adrian Grijincu

[permalink] [raw]
Subject: [v2 032/115] sysctl: remove .child from kernel/ (utsname)

Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
kernel/utsname_sysctl.c | 14 +++++---------
1 files changed, 5 insertions(+), 9 deletions(-)

diff --git a/kernel/utsname_sysctl.c b/kernel/utsname_sysctl.c
index a2cd77e..7606026 100644
--- a/kernel/utsname_sysctl.c
+++ b/kernel/utsname_sysctl.c
@@ -57,7 +57,7 @@ static int proc_do_uts_string(ctl_table *table, int write,
#define proc_do_uts_string NULL
#endif

-static struct ctl_table uts_kern_table[] = {
+static struct ctl_table uts_table[] = {
{
.procname = "ostype",
.data = init_uts_ns.name.sysname,
@@ -96,18 +96,14 @@ static struct ctl_table uts_kern_table[] = {
{}
};

-static struct ctl_table uts_root_table[] = {
- {
- .procname = "kernel",
- .mode = 0555,
- .child = uts_kern_table,
- },
- {}
+static const __initdata struct ctl_path uts_path[] = {
+ { .procname = "kernel" },
+ { },
};

static int __init utsname_sysctl_init(void)
{
- register_sysctl_table(uts_root_table);
+ register_sysctl_paths(uts_path, uts_table);
return 0;
}

--
1.7.5.134.g1c08b

2011-05-08 22:59:25

by Lucian Adrian Grijincu

[permalink] [raw]
Subject: [v2 033/115] sysctl: remove .child from sunrpc/

Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
net/sunrpc/sysctl.c | 19 +++++++------------
1 files changed, 7 insertions(+), 12 deletions(-)

diff --git a/net/sunrpc/sysctl.c b/net/sunrpc/sysctl.c
index e65dcc6..7450ab2 100644
--- a/net/sunrpc/sysctl.c
+++ b/net/sunrpc/sysctl.c
@@ -38,13 +38,17 @@ EXPORT_SYMBOL_GPL(nlm_debug);
#ifdef RPC_DEBUG

static struct ctl_table_header *sunrpc_table_header;
-static ctl_table sunrpc_table[];
+static ctl_table sunrpc_table[];
+static const struct ctl_path sunrpc_path[] = {
+ { .procname = "sunrpc" },
+ { }
+};

void
rpc_register_sysctl(void)
{
if (!sunrpc_table_header)
- sunrpc_table_header = register_sysctl_table(sunrpc_table);
+ sunrpc_table_header = register_sysctl_paths(sunrpc_path, sunrpc_table);
}

void
@@ -133,7 +137,7 @@ done:
}


-static ctl_table debug_table[] = {
+static ctl_table sunrpc_table[] = {
{
.procname = "rpc_debug",
.data = &rpc_debug,
@@ -171,13 +175,4 @@ static ctl_table debug_table[] = {
{ }
};

-static ctl_table sunrpc_table[] = {
- {
- .procname = "sunrpc",
- .mode = 0555,
- .child = debug_table
- },
- { }
-};
-
#endif
--
1.7.5.134.g1c08b

2011-05-08 22:59:20

by Lucian Adrian Grijincu

[permalink] [raw]
Subject: [v2 034/115] sysctl: remove .child from sunrpc/svc_rdma

Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
net/sunrpc/xprtrdma/svc_rdma.c | 26 +++++++-------------------
1 files changed, 7 insertions(+), 19 deletions(-)

diff --git a/net/sunrpc/xprtrdma/svc_rdma.c b/net/sunrpc/xprtrdma/svc_rdma.c
index 09af4fa..d7c0a70 100644
--- a/net/sunrpc/xprtrdma/svc_rdma.c
+++ b/net/sunrpc/xprtrdma/svc_rdma.c
@@ -118,7 +118,7 @@ static int read_reset_stat(ctl_table *table, int write,
}

static struct ctl_table_header *svcrdma_table_header;
-static ctl_table svcrdma_parm_table[] = {
+static ctl_table svcrdma_table[] = {
{
.procname = "max_requests",
.data = &svcrdma_max_requests,
@@ -213,22 +213,10 @@ static ctl_table svcrdma_parm_table[] = {
{ },
};

-static ctl_table svcrdma_table[] = {
- {
- .procname = "svc_rdma",
- .mode = 0555,
- .child = svcrdma_parm_table
- },
- { },
-};
-
-static ctl_table svcrdma_root_table[] = {
- {
- .procname = "sunrpc",
- .mode = 0555,
- .child = svcrdma_table
- },
- { },
+static const struct ctl_path svcrdma_path[] = {
+ { .procname = "sunrpc" },
+ { .procname = "svc_rdma" },
+ { }
};

void svc_rdma_cleanup(void)
@@ -258,8 +246,8 @@ int svc_rdma_init(void)
return -ENOMEM;

if (!svcrdma_table_header)
- svcrdma_table_header =
- register_sysctl_table(svcrdma_root_table);
+ svcrdma_table_header = register_sysctl_paths(
+ svcrdma_path, svcrdma_table);

/* Create the temporary map cache */
svc_rdma_map_cachep = kmem_cache_create("svc_rdma_map_cache",
--
1.7.5.134.g1c08b

2011-05-08 22:41:19

by Lucian Adrian Grijincu

[permalink] [raw]
Subject: [v2 035/115] sysctl: remove .child from sunrpc/ (xprtrdma)

Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
net/sunrpc/xprtrdma/transport.c | 14 +++++---------
1 files changed, 5 insertions(+), 9 deletions(-)

diff --git a/net/sunrpc/xprtrdma/transport.c b/net/sunrpc/xprtrdma/transport.c
index 0867070..9736c93 100644
--- a/net/sunrpc/xprtrdma/transport.c
+++ b/net/sunrpc/xprtrdma/transport.c
@@ -85,7 +85,7 @@ static unsigned int max_memreg = RPCRDMA_LAST - 1;

static struct ctl_table_header *sunrpc_table_header;

-static ctl_table xr_tunables_table[] = {
+static ctl_table rdma_table[] = {
{
.procname = "rdma_slot_table_entries",
.data = &xprt_rdma_slot_table_entries,
@@ -137,13 +137,9 @@ static ctl_table xr_tunables_table[] = {
{ },
};

-static ctl_table sunrpc_table[] = {
- {
- .procname = "sunrpc",
- .mode = 0555,
- .child = xr_tunables_table
- },
- { },
+static const struct ctl_path sunrpc_path[] = {
+ { .procname = "sunrpc" },
+ { }
};

#endif
@@ -771,7 +767,7 @@ static int __init xprt_rdma_init(void)

#ifdef RPC_DEBUG
if (!sunrpc_table_header)
- sunrpc_table_header = register_sysctl_table(sunrpc_table);
+ sunrpc_table_header = register_sysctl_paths(sunrpc_path, rdma_table);
#endif
return 0;
}
--
1.7.5.134.g1c08b

2011-05-08 22:58:59

by Lucian Adrian Grijincu

[permalink] [raw]
Subject: [v2 036/115] sysctl: remove .child from sunrpc/ (xprtsock)


Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
net/sunrpc/xprtsock.c | 16 ++++++----------
1 files changed, 6 insertions(+), 10 deletions(-)

diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
index bf005d3..610a2fe 100644
--- a/net/sunrpc/xprtsock.c
+++ b/net/sunrpc/xprtsock.c
@@ -59,7 +59,7 @@ static unsigned int xs_tcp_fin_timeout __read_mostly = XS_TCP_LINGER_TO;

/*
* We can register our own files under /proc/sys/sunrpc by
- * calling register_sysctl_table() again. The files in that
+ * calling register_sysctl_paths() again. The files in that
* directory become the union of all files registered there.
*
* We simply need to make sure that we don't collide with
@@ -79,7 +79,7 @@ static struct ctl_table_header *sunrpc_table_header;
* FIXME: changing the UDP slot table size should also resize the UDP
* socket buffers for existing UDP transports
*/
-static ctl_table xs_tunables_table[] = {
+static ctl_table xprtsock_table[] = {
{
.procname = "udp_slot_table_entries",
.data = &xprt_udp_slot_table_entries,
@@ -126,13 +126,9 @@ static ctl_table xs_tunables_table[] = {
{ },
};

-static ctl_table sunrpc_table[] = {
- {
- .procname = "sunrpc",
- .mode = 0555,
- .child = xs_tunables_table
- },
- { },
+static const struct ctl_path sunrpc_path[] = {
+ { .procname = "sunrpc" },
+ { }
};

#endif
@@ -2470,7 +2466,7 @@ int init_socket_xprt(void)
{
#ifdef RPC_DEBUG
if (!sunrpc_table_header)
- sunrpc_table_header = register_sysctl_table(sunrpc_table);
+ sunrpc_table_header = register_sysctl_paths(sunrpc_path, xprtsock_table);
#endif

xprt_register_transport(&xs_udp_transport);
--
1.7.5.134.g1c08b

2011-05-08 22:58:33

by Lucian Adrian Grijincu

[permalink] [raw]
Subject: [v2 037/115] sysctl: remove .child from bus/isa/ (arm)

Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
arch/arm/kernel/isa.c | 31 ++++++++++++-------------------
1 files changed, 12 insertions(+), 19 deletions(-)

diff --git a/arch/arm/kernel/isa.c b/arch/arm/kernel/isa.c
index 3464859..0236609 100644
--- a/arch/arm/kernel/isa.c
+++ b/arch/arm/kernel/isa.c
@@ -20,44 +20,37 @@

static unsigned int isa_membase, isa_portbase, isa_portshift;

-static ctl_table ctl_isa_vars[4] = {
+static ctl_table isa_table[] = {
{
.procname = "membase",
.data = &isa_membase,
.maxlen = sizeof(isa_membase),
.mode = 0444,
.proc_handler = proc_dointvec,
- }, {
+ },
+ {
.procname = "portbase",
.data = &isa_portbase,
.maxlen = sizeof(isa_portbase),
.mode = 0444,
.proc_handler = proc_dointvec,
- }, {
+ },
+ {
.procname = "portshift",
.data = &isa_portshift,
.maxlen = sizeof(isa_portshift),
.mode = 0444,
.proc_handler = proc_dointvec,
- }, {}
+ },
+ { }
};

static struct ctl_table_header *isa_sysctl_header;

-static ctl_table ctl_isa[2] = {
- {
- .procname = "isa",
- .mode = 0555,
- .child = ctl_isa_vars,
- }, {}
-};
-
-static ctl_table ctl_bus[2] = {
- {
- .procname = "bus",
- .mode = 0555,
- .child = ctl_isa,
- }, {}
+static const __initdata struct ctl_path isa_path[] = {
+ { .procname = "bus" },
+ { .procname = "isa" },
+ { }
};

void __init
@@ -66,5 +59,5 @@ register_isa_ports(unsigned int membase, unsigned int portbase, unsigned int por
isa_membase = membase;
isa_portbase = portbase;
isa_portshift = portshift;
- isa_sysctl_header = register_sysctl_table(ctl_bus);
+ isa_sysctl_header = register_sysctl_paths(isa_path, isa_table);
}
--
1.7.5.134.g1c08b

2011-05-08 22:41:26

by Lucian Adrian Grijincu

[permalink] [raw]
Subject: [v2 038/115] sysctl: remove .child from reboot/warm (arm)

Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
arch/arm/mach-bcmring/arch.c | 25 ++++++++++++-------------
1 files changed, 12 insertions(+), 13 deletions(-)

diff --git a/arch/arm/mach-bcmring/arch.c b/arch/arm/mach-bcmring/arch.c
index 73eb066..33c10fd 100644
--- a/arch/arm/mach-bcmring/arch.c
+++ b/arch/arm/mach-bcmring/arch.c
@@ -55,20 +55,18 @@ static struct ctl_table_header *bcmring_sysctl_header;

static struct ctl_table bcmring_sysctl_warm_reboot[] = {
{
- .procname = "warm",
- .data = &bcmring_arch_warm_reboot,
- .maxlen = sizeof(int),
- .mode = 0644,
- .proc_handler = proc_dointvec},
- {}
+ .procname = "warm",
+ .data = &bcmring_arch_warm_reboot,
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec,
+ },
+ { }
};

-static struct ctl_table bcmring_sysctl_reboot[] = {
- {
- .procname = "reboot",
- .mode = 0555,
- .child = bcmring_sysctl_warm_reboot},
- {}
+static const __initdata struct ctl_path bcmring_sysctl_path[] = {
+ { .procname = "reboot" },
+ { }
};

static struct resource nand_resource[] = {
@@ -117,7 +115,8 @@ static struct platform_device *devices[] __initdata = {
static void __init bcmring_init_machine(void)
{

- bcmring_sysctl_header = register_sysctl_table(bcmring_sysctl_reboot);
+ bcmring_sysctl_header = register_sysctl_paths(bcmring_sysctl_path,
+ bcmring_sysctl_warm_reboot);

/* Enable spread spectrum */
chipcHw_enableSpreadSpectrum();
--
1.7.5.134.g1c08b

2011-05-08 22:41:23

by Lucian Adrian Grijincu

[permalink] [raw]
Subject: [v2 039/115] sysctl: remove .child from lasat/ (mips)

Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
arch/mips/lasat/sysctl.c | 13 ++++---------
1 files changed, 4 insertions(+), 9 deletions(-)

diff --git a/arch/mips/lasat/sysctl.c b/arch/mips/lasat/sysctl.c
index d87ffd0..a6191f0 100644
--- a/arch/mips/lasat/sysctl.c
+++ b/arch/mips/lasat/sysctl.c
@@ -262,21 +262,16 @@ static ctl_table lasat_table[] = {
{}
};

-static ctl_table lasat_root_table[] = {
- {
- .procname = "lasat",
- .mode = 0555,
- .child = lasat_table
- },
- {}
+static const __initdata struct ctl_path lasat_path[] = {
+ { .procname = "lasat" },
+ { }
};

static int __init lasat_register_sysctl(void)
{
struct ctl_table_header *lasat_table_header;

- lasat_table_header =
- register_sysctl_table(lasat_root_table);
+ lasat_table_header = register_sysctl_paths(lasat_path, lasat_table);
if (!lasat_table_header) {
printk(KERN_ERR "Unable to register LASAT sysctl\n");
return -ENOMEM;
--
1.7.5.134.g1c08b

2011-05-08 22:58:07

by Lucian Adrian Grijincu

[permalink] [raw]
Subject: [v2 040/115] sysctl: remove .child from appldata/ (s390)

Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
arch/s390/appldata/appldata_base.c | 42 ++++++++++++++++++------------------
1 files changed, 21 insertions(+), 21 deletions(-)

diff --git a/arch/s390/appldata/appldata_base.c b/arch/s390/appldata/appldata_base.c
index 5c91995..0f336a8 100644
--- a/arch/s390/appldata/appldata_base.c
+++ b/arch/s390/appldata/appldata_base.c
@@ -49,7 +49,6 @@ static struct platform_device *appldata_pdev;
/*
* /proc entries (sysctl)
*/
-static const char appldata_proc_name[APPLDATA_PROC_NAME_LENGTH] = "appldata";
static int appldata_timer_handler(ctl_table *ctl, int write,
void __user *buffer, size_t *lenp, loff_t *ppos);
static int appldata_interval_handler(ctl_table *ctl, int write,
@@ -71,14 +70,9 @@ static struct ctl_table appldata_table[] = {
{ },
};

-static struct ctl_table appldata_dir_table[] = {
- {
- .procname = appldata_proc_name,
- .maxlen = 0,
- .mode = S_IRUGO | S_IXUGO,
- .child = appldata_table,
- },
- { },
+static const struct ctl_path appldata_path[] = {
+ { .procname = "appldata" },
+ { }
};

/*
@@ -424,6 +418,18 @@ out:


/************************* module-ops management *****************************/
+
+static const struct ctl_table appldata_ops_template[2] = {
+ {
+ .procname = NULL, /* ops->name */
+ .data = NULL, /* ops */
+ .maxlen = 0,
+ .mode = S_IRUGO | S_IWUSR,
+ .proc_handler = appldata_generic_handler,
+ },
+ { }
+};
+
/*
* appldata_register_ops()
*
@@ -434,7 +440,8 @@ int appldata_register_ops(struct appldata_ops *ops)
if (ops->size > APPLDATA_MAX_REC_SIZE)
return -EINVAL;

- ops->ctl_table = kzalloc(4 * sizeof(struct ctl_table), GFP_KERNEL);
+ ops->ctl_table = kmemdup(&appldata_ops_template,
+ sizeof(appldata_ops_template), GFP_KERNEL);
if (!ops->ctl_table)
return -ENOMEM;

@@ -442,17 +449,10 @@ int appldata_register_ops(struct appldata_ops *ops)
list_add(&ops->list, &appldata_ops_list);
mutex_unlock(&appldata_ops_mutex);

- ops->ctl_table[0].procname = appldata_proc_name;
- ops->ctl_table[0].maxlen = 0;
- ops->ctl_table[0].mode = S_IRUGO | S_IXUGO;
- ops->ctl_table[0].child = &ops->ctl_table[2];
-
- ops->ctl_table[2].procname = ops->name;
- ops->ctl_table[2].mode = S_IRUGO | S_IWUSR;
- ops->ctl_table[2].proc_handler = appldata_generic_handler;
- ops->ctl_table[2].data = ops;
+ ops->ctl_table[0].procname = ops->name;
+ ops->ctl_table[0].data = ops;

- ops->sysctl_header = register_sysctl_table(ops->ctl_table);
+ ops->sysctl_header = register_sysctl_paths(appldata_path, ops->ctl_table);
if (!ops->sysctl_header)
goto out;
return 0;
@@ -649,7 +649,7 @@ static int __init appldata_init(void)
/* Register cpu hotplug notifier */
register_hotcpu_notifier(&appldata_nb);

- appldata_sysctl_header = register_sysctl_table(appldata_dir_table);
+ appldata_sysctl_header = register_sysctl_paths(appldata_path, appldata_table);
return 0;

out_device:
--
1.7.5.134.g1c08b

2011-05-08 22:41:31

by Lucian Adrian Grijincu

[permalink] [raw]
Subject: [v2 041/115] sysctl: remove .child from s390dbf/

Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
arch/s390/kernel/debug.c | 13 ++++---------
1 files changed, 4 insertions(+), 9 deletions(-)

diff --git a/arch/s390/kernel/debug.c b/arch/s390/kernel/debug.c
index 5ad6bc0..384f67b 100644
--- a/arch/s390/kernel/debug.c
+++ b/arch/s390/kernel/debug.c
@@ -902,7 +902,7 @@ static struct ctl_table s390dbf_table[] = {
.mode = S_IRUGO | S_IWUSR,
.proc_handler = proc_dointvec,
},
- {
+ {
.procname = "debug_active",
.data = &debug_active,
.maxlen = sizeof(int),
@@ -912,13 +912,8 @@ static struct ctl_table s390dbf_table[] = {
{ }
};

-static struct ctl_table s390dbf_dir_table[] = {
- {
- .procname = "s390dbf",
- .maxlen = 0,
- .mode = S_IRUGO | S_IXUGO,
- .child = s390dbf_table,
- },
+static const __initdata struct ctl_path s390dbf_path[] = {
+ { .procname = "s390dbf" },
{ }
};

@@ -1071,7 +1066,7 @@ __init debug_init(void)
{
int rc = 0;

- s390dbf_sysctl_header = register_sysctl_table(s390dbf_dir_table);
+ s390dbf_sysctl_header = register_sysctl_paths(s390dbf_path, s390dbf_table);
mutex_lock(&debug_mutex);
debug_debugfs_root_entry = debugfs_create_dir(DEBUG_DIR_ROOT,NULL);
initialized = 1;
--
1.7.5.134.g1c08b

2011-05-08 22:57:36

by Lucian Adrian Grijincu

[permalink] [raw]
Subject: [v2 042/115] sysctl: remove .child from vm/ (s390)

Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
arch/s390/mm/cmm.c | 11 +++--------
1 files changed, 3 insertions(+), 8 deletions(-)

diff --git a/arch/s390/mm/cmm.c b/arch/s390/mm/cmm.c
index c66ffd8..0ef5bbf 100644
--- a/arch/s390/mm/cmm.c
+++ b/arch/s390/mm/cmm.c
@@ -348,13 +348,8 @@ static struct ctl_table cmm_table[] = {
{ }
};

-static struct ctl_table cmm_dir_table[] = {
- {
- .procname = "vm",
- .maxlen = 0,
- .mode = 0555,
- .child = cmm_table,
- },
+static const __initdata struct ctl_path cmm_path[] = {
+ { .procname = "vm" },
{ }
};

@@ -434,7 +429,7 @@ static int __init cmm_init(void)
{
int rc = -ENOMEM;

- cmm_sysctl_header = register_sysctl_table(cmm_dir_table);
+ cmm_sysctl_header = register_sysctl_paths(cmm_path, cmm_table);
if (!cmm_sysctl_header)
goto out_sysctl;
#ifdef CONFIG_CMM_IUCV
--
1.7.5.134.g1c08b

2011-05-08 22:57:31

by Lucian Adrian Grijincu

[permalink] [raw]
Subject: [v2 043/115] sysctl: remove .child from kernel/perfmon/ (ia64)

Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
arch/ia64/kernel/perfmon.c | 23 +++++++----------------
1 files changed, 7 insertions(+), 16 deletions(-)

diff --git a/arch/ia64/kernel/perfmon.c b/arch/ia64/kernel/perfmon.c
index 89accc6..96743dd 100644
--- a/arch/ia64/kernel/perfmon.c
+++ b/arch/ia64/kernel/perfmon.c
@@ -552,22 +552,13 @@ static ctl_table pfm_ctl_table[]={
},
{}
};
-static ctl_table pfm_sysctl_dir[] = {
- {
- .procname = "perfmon",
- .mode = 0555,
- .child = pfm_ctl_table,
- },
- {}
-};
-static ctl_table pfm_sysctl_root[] = {
- {
- .procname = "kernel",
- .mode = 0555,
- .child = pfm_sysctl_dir,
- },
- {}
+
+static const __initdata struct ctl_path pfm_path[] = {
+ { .procname = "kernel" },
+ { .procname = "perfmon" },
+ { }
};
+
static struct ctl_table_header *pfm_sysctl_header;

static int pfm_context_unload(pfm_context_t *ctx, void *arg, int count, struct pt_regs *regs);
@@ -6687,7 +6678,7 @@ pfm_init(void)
/*
* create /proc/sys/kernel/perfmon (for debugging purposes)
*/
- pfm_sysctl_header = register_sysctl_table(pfm_sysctl_root);
+ pfm_sysctl_header = register_sysctl_paths(pfm_path, pfm_ctl_table);

/*
* initialize all our spinlocks
--
1.7.5.134.g1c08b

2011-05-08 22:57:07

by Lucian Adrian Grijincu

[permalink] [raw]
Subject: [v2 044/115] sysctl: remove .child from kernel/ (ia64/kdump)

Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
arch/ia64/kernel/crash.c | 13 +++++--------
1 files changed, 5 insertions(+), 8 deletions(-)

diff --git a/arch/ia64/kernel/crash.c b/arch/ia64/kernel/crash.c
index b942f40..e54aea5 100644
--- a/arch/ia64/kernel/crash.c
+++ b/arch/ia64/kernel/crash.c
@@ -255,17 +255,14 @@ static ctl_table kdump_ctl_table[] = {
{ }
};

-static ctl_table sys_table[] = {
- {
- .procname = "kernel",
- .mode = 0555,
- .child = kdump_ctl_table,
- },
+static const __initdata struct ctl_path kdump_path[] = {
+ { .procname = "kernel" },
{ }
};
+
#endif

-static int
+static __init int
machine_crash_setup(void)
{
/* be notified before default_monarch_init_process */
@@ -277,7 +274,7 @@ machine_crash_setup(void)
if((ret = register_die_notifier(&kdump_init_notifier_nb)) != 0)
return ret;
#ifdef CONFIG_SYSCTL
- register_sysctl_table(sys_table);
+ register_sysctl_paths(kdump_path, kdump_ctl_table);
#endif
return 0;
}
--
1.7.5.134.g1c08b

2011-05-08 22:56:38

by Lucian Adrian Grijincu

[permalink] [raw]
Subject: [v2 045/115] sysctl: remove .child from kernel/powersave-nap (powerpc)

Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
arch/powerpc/kernel/idle.c | 13 +++++--------
1 files changed, 5 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/kernel/idle.c b/arch/powerpc/kernel/idle.c
index 39a2baa..88d03c5 100644
--- a/arch/powerpc/kernel/idle.c
+++ b/arch/powerpc/kernel/idle.c
@@ -118,19 +118,16 @@ static ctl_table powersave_nap_ctl_table[]={
},
{}
};
-static ctl_table powersave_nap_sysctl_root[] = {
- {
- .procname = "kernel",
- .mode = 0555,
- .child = powersave_nap_ctl_table,
- },
- {}
+
+static const __initdata struct ctl_path powersave_nap_path[] = {
+ { .procname = "kernel" },
+ { }
};

static int __init
register_powersave_nap_sysctl(void)
{
- register_sysctl_table(powersave_nap_sysctl_root);
+ register_sysctl_paths(powersave_nap_path, powersave_nap_ctl_table);

return 0;
}
--
1.7.5.134.g1c08b

2011-05-08 22:56:35

by Lucian Adrian Grijincu

[permalink] [raw]
Subject: [v2 046/115] sysctl: remove .child from pm/ (frv)

Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
arch/frv/kernel/pm.c | 10 +++-------
1 files changed, 3 insertions(+), 7 deletions(-)

diff --git a/arch/frv/kernel/pm.c b/arch/frv/kernel/pm.c
index 5fa3889..bcef945 100644
--- a/arch/frv/kernel/pm.c
+++ b/arch/frv/kernel/pm.c
@@ -329,13 +329,9 @@ static struct ctl_table pm_table[] =
{ }
};

-static struct ctl_table pm_dir_table[] =
+static const __initdata struct ctl_path pm_path[] =
{
- {
- .procname = "pm",
- .mode = 0555,
- .child = pm_table,
- },
+ { .procname = "pm" },
{ }
};

@@ -344,7 +340,7 @@ static struct ctl_table pm_dir_table[] =
*/
static int __init pm_init(void)
{
- register_sysctl_table(pm_dir_table);
+ register_sysctl_paths(pm_path, pm_table);
return 0;
}

--
1.7.5.134.g1c08b

2011-05-08 22:56:15

by Lucian Adrian Grijincu

[permalink] [raw]
Subject: [v2 047/115] sysctl: remove .child from frv/

Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
arch/frv/kernel/sysctl.c | 12 ++++--------
1 files changed, 4 insertions(+), 8 deletions(-)

diff --git a/arch/frv/kernel/sysctl.c b/arch/frv/kernel/sysctl.c
index 6c155d6..e5c20a2 100644
--- a/arch/frv/kernel/sysctl.c
+++ b/arch/frv/kernel/sysctl.c
@@ -199,14 +199,10 @@ static struct ctl_table frv_table[] =
* Use a temporary sysctl number. Horrid, but will be cleaned up in 2.6
* when all the PM interfaces exist nicely.
*/
-static struct ctl_table frv_dir_table[] =
+static const __initdata struct ctl_path frv_path[] =
{
- {
- .procname = "frv",
- .mode = 0555,
- .child = frv_table
- },
- {}
+ { .procname = "frv" },
+ { }
};

/*
@@ -214,7 +210,7 @@ static struct ctl_table frv_dir_table[] =
*/
static int __init frv_sysctl_init(void)
{
- register_sysctl_table(frv_dir_table);
+ register_sysctl_paths(frv_path, frv_table);
return 0;
}

--
1.7.5.134.g1c08b

2011-05-08 22:55:57

by Lucian Adrian Grijincu

[permalink] [raw]
Subject: [v2 048/115] sysctl: remove .child from sh64/unaligned_fixup/

Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
arch/sh/kernel/traps_64.c | 21 +++++----------------
1 files changed, 5 insertions(+), 16 deletions(-)

diff --git a/arch/sh/kernel/traps_64.c b/arch/sh/kernel/traps_64.c
index 6713ca9..8b355b4 100644
--- a/arch/sh/kernel/traps_64.c
+++ b/arch/sh/kernel/traps_64.c
@@ -908,27 +908,16 @@ static ctl_table unaligned_table[] = {
{}
};

-static ctl_table unaligned_root[] = {
- {
- .procname = "unaligned_fixup",
- .mode = 0555,
- .child = unaligned_table
- },
- {}
+static const __initdata struct ctl_table unaligned_path[] = {
+ { .procname = "sh64" },
+ { .procname = "unaligned_fixup" },
+ { }
};

-static ctl_table sh64_root[] = {
- {
- .procname = "sh64",
- .mode = 0555,
- .child = unaligned_root
- },
- {}
-};
static struct ctl_table_header *sysctl_header;
static int __init init_sysctl(void)
{
- sysctl_header = register_sysctl_table(sh64_root);
+ sysctl_header = register_sysctl_paths(unaligned_path, unaligned_table);
return 0;
}

--
1.7.5.134.g1c08b

2011-05-08 22:55:35

by Lucian Adrian Grijincu

[permalink] [raw]
Subject: [v2 049/115] sysctl: delete unused register_sysctl_table function

Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
include/linux/sysctl.h | 3 +--
kernel/sysctl.c | 26 ++------------------------
2 files changed, 3 insertions(+), 26 deletions(-)

diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h
index 11684d9..470e06a 100644
--- a/include/linux/sysctl.h
+++ b/include/linux/sysctl.h
@@ -985,7 +985,7 @@ extern int proc_do_large_bitmap(struct ctl_table *, int,
void __user *, size_t *, loff_t *);

/*
- * Register a set of sysctl names by calling register_sysctl_table
+ * Register a set of sysctl names by calling __register_sysctl_paths
* with an initialised array of struct ctl_table's. An entry with
* NULL procname terminates the table. table->de will be
* set up by the registration and need not be initialised in advance.
@@ -1065,7 +1065,6 @@ void register_sysctl_root(struct ctl_table_root *root);
struct ctl_table_header *__register_sysctl_paths(
struct ctl_table_root *root, struct nsproxy *namespaces,
const struct ctl_path *path, struct ctl_table *table);
-struct ctl_table_header *register_sysctl_table(struct ctl_table * table);
struct ctl_table_header *register_sysctl_paths(const struct ctl_path *path,
struct ctl_table *table);

diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index c0bb324..b813724 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -1905,7 +1905,7 @@ struct ctl_table_header *__register_sysctl_paths(
}

/**
- * register_sysctl_table_path - register a sysctl table hierarchy
+ * register_sysctl_paths - register a sysctl table hierarchy
* @path: The path to the directory the sysctl table is in.
* @table: the top-level table structure
*
@@ -1922,24 +1922,8 @@ struct ctl_table_header *register_sysctl_paths(const struct ctl_path *path,
}

/**
- * register_sysctl_table - register a sysctl table hierarchy
- * @table: the top-level table structure
- *
- * Register a sysctl table hierarchy. @table should be a filled in ctl_table
- * array. A completely 0 filled entry terminates the table.
- *
- * See register_sysctl_paths for more details.
- */
-struct ctl_table_header *register_sysctl_table(struct ctl_table *table)
-{
- static const struct ctl_path null_path[] = { {} };
-
- return register_sysctl_paths(null_path, table);
-}
-
-/**
* unregister_sysctl_table - unregister a sysctl table hierarchy
- * @header: the header returned from register_sysctl_table
+ * @header: the header returned from __register_sysctl_paths
*
* Unregisters the sysctl table and all children. proc entries may not
* actually be removed until they are no longer used by anyone.
@@ -1987,11 +1971,6 @@ void setup_sysctl_set(struct ctl_table_set *p,
}

#else /* !CONFIG_SYSCTL */
-struct ctl_table_header *register_sysctl_table(struct ctl_table * table)
-{
- return NULL;
-}
-
struct ctl_table_header *register_sysctl_paths(const struct ctl_path *path,
struct ctl_table *table)
{
@@ -2977,6 +2956,5 @@ EXPORT_SYMBOL(proc_dointvec_ms_jiffies);
EXPORT_SYMBOL(proc_dostring);
EXPORT_SYMBOL(proc_doulongvec_minmax);
EXPORT_SYMBOL(proc_doulongvec_ms_jiffies_minmax);
-EXPORT_SYMBOL(register_sysctl_table);
EXPORT_SYMBOL(register_sysctl_paths);
EXPORT_SYMBOL(unregister_sysctl_table);
--
1.7.5.134.g1c08b

2011-05-08 22:41:44

by Lucian Adrian Grijincu

[permalink] [raw]
Subject: [v2 050/115] sysctl: remove .child from ax25 table

Only compile tested!

I'm sorry but I could not manage to add a ax25 interface.

Some notable changes: before this patch, each time a device switched
to up/down we would unregister everything under /proc/sys/net/ax25/
and then reregister an updated table with all devices in it (BTW, the
table was GFP_ATOMIC!).

Now each state change (up/down) registers it's own table (e.g.
/proc/sys/net/ax25/ax0/). I'm assuming ax25 devices cannot be renamed,
but if that's possible, this can be fixed by making a private copy of
the device name for sysctl, and unregistering/reregistering the table
on device rename (see net/ipv4/devinet.c).

Also added an empty /proc/sys/net/ax25/ root directory. Without it,
the first device added would have been the first to create the
/proc/sys/net/ax25/ sysctl path and all other devices would have
attached to it. If the first device was to be removed before other
ones, we would have gotten a harmless warning form sysctl telling us
we're unregistering the parent before the children.

Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
include/net/ax25.h | 10 +++---
net/ax25/af_ax25.c | 23 ++++++++++++-
net/ax25/ax25_dev.c | 10 +-----
net/ax25/sysctl_net_ax25.c | 76 ++++++++++++++-----------------------------
4 files changed, 53 insertions(+), 66 deletions(-)

diff --git a/include/net/ax25.h b/include/net/ax25.h
index 206d222..79c2d2d 100644
--- a/include/net/ax25.h
+++ b/include/net/ax25.h
@@ -215,7 +215,7 @@ typedef struct ax25_dev {
struct ax25_dev *next;
struct net_device *dev;
struct net_device *forward;
- struct ctl_table *systable;
+ struct ctl_table_header *ax25_sysheader;
int values[AX25_MAX_VALUES];
#if defined(CONFIG_AX25_DAMA_SLAVE) || defined(CONFIG_AX25_DAMA_MASTER)
ax25_dama_info dama;
@@ -441,11 +441,11 @@ extern void ax25_uid_free(void);

/* sysctl_net_ax25.c */
#ifdef CONFIG_SYSCTL
-extern void ax25_register_sysctl(void);
-extern void ax25_unregister_sysctl(void);
+extern void ax25_register_sysctl(struct ax25_dev *dev);
+extern void ax25_unregister_sysctl(struct ax25_dev *dev);
#else
-static inline void ax25_register_sysctl(void) {};
-static inline void ax25_unregister_sysctl(void) {};
+static inline void ax25_register_sysctl(struct ax25_dev *dev) {};
+static inline void ax25_unregister_sysctl(struct ax25_dev *dev) {};
#endif /* CONFIG_SYSCTL */

#endif
diff --git a/net/ax25/af_ax25.c b/net/ax25/af_ax25.c
index 6da5dae..965662d 100644
--- a/net/ax25/af_ax25.c
+++ b/net/ax25/af_ax25.c
@@ -1989,6 +1989,18 @@ static struct notifier_block ax25_dev_notifier = {
.notifier_call =ax25_device_event,
};

+
+#ifdef CONFIG_SYSCTL
+static const struct __initdata ctl_path ax25_path[] = {
+ { .procname = "net" },
+ { .procname = "ax25" },
+ { }
+};
+static struct ctl_table empty;
+static struct ctl_table_header *ax25_root_header;
+#endif /* CONFIG_SYSCTL */
+
+
static int __init ax25_init(void)
{
int rc = proto_register(&ax25_proto, 0);
@@ -1999,7 +2011,11 @@ static int __init ax25_init(void)
sock_register(&ax25_family_ops);
dev_add_pack(&ax25_packet_type);
register_netdevice_notifier(&ax25_dev_notifier);
- ax25_register_sysctl();
+
+ /* XXX: no error checking done in initializer */
+ #ifdef CONFIG_SYSCTL
+ ax25_root_header = register_sysctl_paths(ax25_path, &empty);
+ #endif

proc_net_fops_create(&init_net, "ax25_route", S_IRUGO, &ax25_route_fops);
proc_net_fops_create(&init_net, "ax25", S_IRUGO, &ax25_info_fops);
@@ -2024,7 +2040,10 @@ static void __exit ax25_exit(void)
ax25_uid_free();
ax25_dev_free();

- ax25_unregister_sysctl();
+ #ifdef CONFIG_SYSCTL
+ unregister_sysctl_table(ax25_root_header);
+ #endif
+
unregister_netdevice_notifier(&ax25_dev_notifier);

dev_remove_pack(&ax25_packet_type);
diff --git a/net/ax25/ax25_dev.c b/net/ax25/ax25_dev.c
index c1cb982..6ff1853 100644
--- a/net/ax25/ax25_dev.c
+++ b/net/ax25/ax25_dev.c
@@ -60,8 +60,6 @@ void ax25_dev_device_up(struct net_device *dev)
return;
}

- ax25_unregister_sysctl();
-
dev->ax25_ptr = ax25_dev;
ax25_dev->dev = dev;
dev_hold(dev);
@@ -91,7 +89,7 @@ void ax25_dev_device_up(struct net_device *dev)
ax25_dev_list = ax25_dev;
spin_unlock_bh(&ax25_dev_lock);

- ax25_register_sysctl();
+ ax25_register_sysctl(ax25_dev);
}

void ax25_dev_device_down(struct net_device *dev)
@@ -101,7 +99,7 @@ void ax25_dev_device_down(struct net_device *dev)
if ((ax25_dev = ax25_dev_ax25dev(dev)) == NULL)
return;

- ax25_unregister_sysctl();
+ ax25_unregister_sysctl(ax25_dev);

spin_lock_bh(&ax25_dev_lock);

@@ -121,7 +119,6 @@ void ax25_dev_device_down(struct net_device *dev)
spin_unlock_bh(&ax25_dev_lock);
dev_put(dev);
kfree(ax25_dev);
- ax25_register_sysctl();
return;
}

@@ -131,7 +128,6 @@ void ax25_dev_device_down(struct net_device *dev)
spin_unlock_bh(&ax25_dev_lock);
dev_put(dev);
kfree(ax25_dev);
- ax25_register_sysctl();
return;
}

@@ -139,8 +135,6 @@ void ax25_dev_device_down(struct net_device *dev)
}
spin_unlock_bh(&ax25_dev_lock);
dev->ax25_ptr = NULL;
-
- ax25_register_sysctl();
}

int ax25_fwd_ioctl(unsigned int cmd, struct ax25_fwd_struct *fwd)
diff --git a/net/ax25/sysctl_net_ax25.c b/net/ax25/sysctl_net_ax25.c
index ebe0ef3..b1181bc 100644
--- a/net/ax25/sysctl_net_ax25.c
+++ b/net/ax25/sysctl_net_ax25.c
@@ -29,17 +29,6 @@ static int min_proto[1], max_proto[] = { AX25_PROTO_MAX };
static int min_ds_timeout[1], max_ds_timeout[] = {65535000};
#endif

-static struct ctl_table_header *ax25_table_header;
-
-static ctl_table *ax25_table;
-static int ax25_table_size;
-
-static struct ctl_path ax25_path[] = {
- { .procname = "net", },
- { .procname = "ax25", },
- { }
-};
-
static const ctl_table ax25_param_table[] = {
{
.procname = "ip_default_mode",
@@ -159,52 +148,37 @@ static const ctl_table ax25_param_table[] = {
{ } /* that's all, folks! */
};

-void ax25_register_sysctl(void)
+void ax25_register_sysctl(struct ax25_dev *ax25_dev)
{
- ax25_dev *ax25_dev;
- int n, k;
-
- spin_lock_bh(&ax25_dev_lock);
- for (ax25_table_size = sizeof(ctl_table), ax25_dev = ax25_dev_list; ax25_dev != NULL; ax25_dev = ax25_dev->next)
- ax25_table_size += sizeof(ctl_table);
-
- if ((ax25_table = kzalloc(ax25_table_size, GFP_ATOMIC)) == NULL) {
- spin_unlock_bh(&ax25_dev_lock);
+ struct ctl_table *ax25_table;
+ int i;
+
+ /* Assuming the name does not change while this sysctl
+ * is registered. If ax25 supports device renaming
+ * (SIOCSIFNAME), sysctl will need it's own copy of
+ * the name */
+ struct ctl_path ax25_path[] = {
+ { .procname = "net" },
+ { .procname = "ax25" },
+ { .procname = ax25_dev->dev->name },
+ { }
+ };
+
+
+ ax25_table = kmemdup(ax25_param_table, sizeof(ax25_param_table), GFP_KERNEL);
+ if (!ax25_table)
return;
- }
-
- for (n = 0, ax25_dev = ax25_dev_list; ax25_dev != NULL; ax25_dev = ax25_dev->next) {
- struct ctl_table *child = kmemdup(ax25_param_table,
- sizeof(ax25_param_table),
- GFP_ATOMIC);
- if (!child) {
- while (n--)
- kfree(ax25_table[n].child);
- kfree(ax25_table);
- spin_unlock_bh(&ax25_dev_lock);
- return;
- }
- ax25_table[n].child = ax25_dev->systable = child;
- ax25_table[n].procname = ax25_dev->dev->name;
- ax25_table[n].mode = 0555;
-

- for (k = 0; k < AX25_MAX_VALUES; k++)
- child[k].data = &ax25_dev->values[k];
+ for (i = 0; i < AX25_MAX_VALUES; i++)
+ ax25_table[i].data = &ax25_dev->values[i];

- n++;
- }
- spin_unlock_bh(&ax25_dev_lock);
-
- ax25_table_header = register_sysctl_paths(ax25_path, ax25_table);
+ ax25_dev->ax25_sysheader = register_sysctl_paths(ax25_path, ax25_table);
}

-void ax25_unregister_sysctl(void)
+void ax25_unregister_sysctl(struct ax25_dev *ax25_dev)
{
- ctl_table *p;
- unregister_sysctl_table(ax25_table_header);
-
- for (p = ax25_table; p->procname; p++)
- kfree(p->child);
+ struct ctl_table *ax25_table = ax25_dev->ax25_sysheader->ctl_table_arg;
+ unregister_sysctl_table(ax25_dev->ax25_sysheader);
+ ax25_dev->ax25_sysheader = NULL;
kfree(ax25_table);
}
--
1.7.5.134.g1c08b

2011-05-08 22:41:41

by Lucian Adrian Grijincu

[permalink] [raw]
Subject: [v2 051/115] sysctl: remove .child from net/ipv4/route and net/ipv4/neigh tables

Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
net/ipv4/route.c | 15 ++++-----------
1 files changed, 4 insertions(+), 11 deletions(-)

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 99e6e4b..46c7b3d 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -3153,18 +3153,10 @@ static ctl_table ipv4_route_table[] = {

static struct ctl_table empty[1];

-static struct ctl_table ipv4_skeleton[] =
-{
- { .procname = "route",
- .mode = 0555, .child = ipv4_route_table},
- { .procname = "neigh",
- .mode = 0555, .child = empty},
- { }
-};
-
-static __net_initdata struct ctl_path ipv4_path[] = {
+static __net_initdata struct ctl_path ipv4_neigh_path[] = {
{ .procname = "net", },
{ .procname = "ipv4", },
+ { .procname = "neigh", },
{ },
};

@@ -3317,6 +3309,7 @@ int __init ip_rt_init(void)
*/
void __init ip_static_sysctl_init(void)
{
- register_sysctl_paths(ipv4_path, ipv4_skeleton);
+ register_sysctl_paths(ipv4_route_path, ipv4_route_table);
+ register_sysctl_paths(ipv4_neigh_path, empty);
}
#endif
--
1.7.5.134.g1c08b

2011-05-08 22:55:11

by Lucian Adrian Grijincu

[permalink] [raw]
Subject: [v2 052/115] sysctl: remove .child from net/ipv4/neigh table

Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
net/ipv6/sysctl_net_ipv6.c | 18 +++++++-----------
1 files changed, 7 insertions(+), 11 deletions(-)

diff --git a/net/ipv6/sysctl_net_ipv6.c b/net/ipv6/sysctl_net_ipv6.c
index 6dcf5e7..a0d9916 100644
--- a/net/ipv6/sysctl_net_ipv6.c
+++ b/net/ipv6/sysctl_net_ipv6.c
@@ -17,16 +17,6 @@

static struct ctl_table empty[1];

-static ctl_table ipv6_static_skeleton[] = {
- {
- .procname = "neigh",
- .maxlen = 0,
- .mode = 0555,
- .child = empty,
- },
- { }
-};
-
static ctl_table ipv6_table_template[] = {
{
.procname = "route",
@@ -160,11 +150,17 @@ void ipv6_sysctl_unregister(void)
unregister_pernet_subsys(&ipv6_sysctl_net_ops);
}

+static const struct ctl_path net_ipv6_neigh_path[] = {
+ { .procname = "net", },
+ { .procname = "ipv6", },
+ { .procname = "neigh", },
+ { },
+};
static struct ctl_table_header *ip6_base;

int ipv6_static_sysctl_register(void)
{
- ip6_base = register_sysctl_paths(net_ipv6_ctl_path, ipv6_static_skeleton);
+ ip6_base = register_sysctl_paths(net_ipv6_neigh_path, empty);
if (ip6_base == NULL)
return -ENOMEM;
return 0;
--
1.7.5.134.g1c08b

2011-05-08 22:54:50

by Lucian Adrian Grijincu

[permalink] [raw]
Subject: [v2 053/115] sysctl: remove .child from net/ipv6/route, net/ipv6/icmp, net/ipv6 tables

Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
include/net/netns/ipv6.h | 4 +-
net/ipv6/sysctl_net_ipv6.c | 101 +++++++++++++++++++++++++-------------------
2 files changed, 60 insertions(+), 45 deletions(-)

diff --git a/include/net/netns/ipv6.h b/include/net/netns/ipv6.h
index 81abfcb..2d9c6f1 100644
--- a/include/net/netns/ipv6.h
+++ b/include/net/netns/ipv6.h
@@ -12,7 +12,9 @@ struct ctl_table_header;

struct netns_sysctl_ipv6 {
#ifdef CONFIG_SYSCTL
- struct ctl_table_header *table;
+ struct ctl_table_header *bindv6only_hdr;
+ struct ctl_table_header *route6_hdr;
+ struct ctl_table_header *icmp6_hdr;
struct ctl_table_header *frags_hdr;
#endif
int bindv6only;
diff --git a/net/ipv6/sysctl_net_ipv6.c b/net/ipv6/sysctl_net_ipv6.c
index a0d9916..1d2d8c7 100644
--- a/net/ipv6/sysctl_net_ipv6.c
+++ b/net/ipv6/sysctl_net_ipv6.c
@@ -17,19 +17,7 @@

static struct ctl_table empty[1];

-static ctl_table ipv6_table_template[] = {
- {
- .procname = "route",
- .maxlen = 0,
- .mode = 0555,
- .child = ipv6_route_table_template
- },
- {
- .procname = "icmp",
- .maxlen = 0,
- .mode = 0555,
- .child = ipv6_icmp_table_template
- },
+static ctl_table ipv6_bindv6only_template[] = {
{
.procname = "bindv6only",
.data = &init_net.ipv6.sysctl.bindv6only,
@@ -58,64 +46,89 @@ struct ctl_path net_ipv6_ctl_path[] = {
};
EXPORT_SYMBOL_GPL(net_ipv6_ctl_path);

+static const struct ctl_path net_ipv6_route_path[] = {
+ { .procname = "net", },
+ { .procname = "ipv6", },
+ { .procname = "route", },
+ { },
+};
+
+static const struct ctl_path net_ipv6_icmp_path[] = {
+ { .procname = "net", },
+ { .procname = "ipv6", },
+ { .procname = "icmp", },
+ { },
+};
+
static int __net_init ipv6_sysctl_net_init(struct net *net)
{
- struct ctl_table *ipv6_table;
+ struct ctl_table *ipv6_bindv6only_table;
struct ctl_table *ipv6_route_table;
struct ctl_table *ipv6_icmp_table;
- int err;

- err = -ENOMEM;
- ipv6_table = kmemdup(ipv6_table_template, sizeof(ipv6_table_template),
- GFP_KERNEL);
- if (!ipv6_table)
- goto out;
+ ipv6_bindv6only_table = kmemdup(ipv6_bindv6only_template,
+ sizeof(ipv6_bindv6only_template), GFP_KERNEL);
+ if (!ipv6_bindv6only_table)
+ goto fail_alloc_ipv6_bindv6only_table;
+ ipv6_bindv6only_table[0].data = &net->ipv6.sysctl.bindv6only;

ipv6_route_table = ipv6_route_sysctl_init(net);
if (!ipv6_route_table)
- goto out_ipv6_table;
- ipv6_table[0].child = ipv6_route_table;
+ goto fail_alloc_ipv6_route_table;

ipv6_icmp_table = ipv6_icmp_sysctl_init(net);
if (!ipv6_icmp_table)
- goto out_ipv6_route_table;
- ipv6_table[1].child = ipv6_icmp_table;
+ goto fail_alloc_ipv6_icmp_table;

- ipv6_table[2].data = &net->ipv6.sysctl.bindv6only;

- net->ipv6.sysctl.table = register_net_sysctl_table(net, net_ipv6_ctl_path,
- ipv6_table);
- if (!net->ipv6.sysctl.table)
- goto out_ipv6_icmp_table;
+ net->ipv6.sysctl.bindv6only_hdr = register_net_sysctl_table(
+ net, net_ipv6_ctl_path, ipv6_bindv6only_table);
+ if (!net->ipv6.sysctl.bindv6only_hdr)
+ goto fail_reg_bindv6only_hdr;

- err = 0;
-out:
- return err;
+ net->ipv6.sysctl.route6_hdr = register_net_sysctl_table(
+ net, net_ipv6_route_path, ipv6_route_table);
+ if (!net->ipv6.sysctl.route6_hdr)
+ goto fail_reg_route6_hdr;
+
+ net->ipv6.sysctl.icmp6_hdr = register_net_sysctl_table(
+ net, net_ipv6_icmp_path, ipv6_icmp_table);
+ if (!net->ipv6.sysctl.icmp6_hdr)
+ goto fail_reg_icmp6_hdr;

-out_ipv6_icmp_table:
+ return 0;
+
+fail_reg_icmp6_hdr:
+ unregister_net_sysctl_table(net->ipv6.sysctl.route6_hdr);
+fail_reg_route6_hdr:
+ unregister_net_sysctl_table(net->ipv6.sysctl.bindv6only_hdr);
+fail_reg_bindv6only_hdr:
kfree(ipv6_icmp_table);
-out_ipv6_route_table:
+fail_alloc_ipv6_icmp_table:
kfree(ipv6_route_table);
-out_ipv6_table:
- kfree(ipv6_table);
- goto out;
+fail_alloc_ipv6_route_table:
+ kfree(ipv6_bindv6only_table);
+fail_alloc_ipv6_bindv6only_table:
+ return -ENOMEM;
}

static void __net_exit ipv6_sysctl_net_exit(struct net *net)
{
- struct ctl_table *ipv6_table;
+ struct ctl_table *ipv6_bindv6only_table;
struct ctl_table *ipv6_route_table;
struct ctl_table *ipv6_icmp_table;

- ipv6_table = net->ipv6.sysctl.table->ctl_table_arg;
- ipv6_route_table = ipv6_table[0].child;
- ipv6_icmp_table = ipv6_table[1].child;
+ ipv6_bindv6only_table = net->ipv6.sysctl.bindv6only_hdr->ctl_table_arg;
+ ipv6_route_table = net->ipv6.sysctl.route6_hdr->ctl_table_arg;
+ ipv6_icmp_table = net->ipv6.sysctl.icmp6_hdr->ctl_table_arg;

- unregister_net_sysctl_table(net->ipv6.sysctl.table);
+ unregister_net_sysctl_table(net->ipv6.sysctl.icmp6_hdr);
+ unregister_net_sysctl_table(net->ipv6.sysctl.route6_hdr);
+ unregister_net_sysctl_table(net->ipv6.sysctl.bindv6only_hdr);

- kfree(ipv6_table);
- kfree(ipv6_route_table);
kfree(ipv6_icmp_table);
+ kfree(ipv6_route_table);
+ kfree(ipv6_bindv6only_table);
}

static struct pernet_operations ipv6_sysctl_net_ops = {
--
1.7.5.134.g1c08b

2011-05-08 22:41:48

by Lucian Adrian Grijincu

[permalink] [raw]
Subject: [v2 054/115] sysctl: remove .child from net/llc tables

Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
net/llc/sysctl_net_llc.c | 55 +++++++++++++++++++++++----------------------
1 files changed, 28 insertions(+), 27 deletions(-)

diff --git a/net/llc/sysctl_net_llc.c b/net/llc/sysctl_net_llc.c
index e2ebe35..8977307 100644
--- a/net/llc/sysctl_net_llc.c
+++ b/net/llc/sysctl_net_llc.c
@@ -56,48 +56,49 @@ static struct ctl_table llc_station_table[] = {
{ },
};

-static struct ctl_table llc2_dir_timeout_table[] = {
- {
- .procname = "timeout",
- .mode = 0555,
- .child = llc2_timeout_table,
- },
- { },
-};

-static struct ctl_table llc_table[] = {
- {
- .procname = "llc2",
- .mode = 0555,
- .child = llc2_dir_timeout_table,
- },
- {
- .procname = "station",
- .mode = 0555,
- .child = llc_station_table,
- },
- { },
+static const __initdata struct ctl_path llc2_timeout_path[] = {
+ { .procname = "net", },
+ { .procname = "llc", },
+ { .procname = "llc2", },
+ { .procname = "timeout", },
+ { }
};

-static struct ctl_path llc_path[] = {
+static const __initdata struct ctl_path llc_station_path[] = {
{ .procname = "net", },
{ .procname = "llc", },
+ { .procname = "station", },
{ }
};

-static struct ctl_table_header *llc_table_header;
+static struct ctl_table_header *llc_station_hdr;
+static struct ctl_table_header *llc2_timeout_hdr;

int __init llc_sysctl_init(void)
{
- llc_table_header = register_sysctl_paths(llc_path, llc_table);
+ llc_station_hdr = register_sysctl_paths(llc_station_path, llc_station_table);
+ if (!llc_station_hdr)
+ return -ENOMEM;

- return llc_table_header ? 0 : -ENOMEM;
+ llc2_timeout_hdr = register_sysctl_paths(llc2_timeout_path, llc2_timeout_table);
+ if (!llc2_timeout_hdr) {
+ unregister_sysctl_table(llc_station_hdr);
+ llc_station_hdr = NULL;
+ return -ENOMEM;
+ }
+
+ return 0;
}

void llc_sysctl_exit(void)
{
- if (llc_table_header) {
- unregister_sysctl_table(llc_table_header);
- llc_table_header = NULL;
+ if (llc2_timeout_hdr) {
+ unregister_sysctl_table(llc2_timeout_hdr);
+ llc2_timeout_hdr = NULL;
+ }
+ if (llc_station_hdr) {
+ unregister_sysctl_table(llc_station_hdr);
+ llc_station_hdr = NULL;
}
}
--
1.7.5.134.g1c08b

2011-05-08 22:41:51

by Lucian Adrian Grijincu

[permalink] [raw]
Subject: [v2 055/115] sysctl: call sysctl_init before the first sysctl registration

In the next patch key_init() will be changed to register a sysctl
table. In preparation, we call sysctl_init() before it.

Also, rename net/sysctl_net.c's sysctl_init so the two don't clash.

Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
include/linux/sysctl.h | 2 ++
init/main.c | 1 +
kernel/sysctl.c | 4 +---
net/sysctl_net.c | 4 ++--
4 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h
index 470e06a..095df3a 100644
--- a/include/linux/sysctl.h
+++ b/include/linux/sysctl.h
@@ -943,6 +943,8 @@ struct ctl_table_set {
int (*is_seen)(struct ctl_table_set *);
};

+extern __init int sysctl_init(void);
+
extern void setup_sysctl_set(struct ctl_table_set *p,
struct ctl_table_set *parent,
int (*is_seen)(struct ctl_table_set *));
diff --git a/init/main.c b/init/main.c
index 4a9479e..d52a89a 100644
--- a/init/main.c
+++ b/init/main.c
@@ -599,6 +599,7 @@ asmlinkage void __init start_kernel(void)
fork_init(totalram_pages);
proc_caches_init();
buffer_init();
+ sysctl_init();
key_init();
security_init();
dbg_late_init();
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index b813724..6167daa 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -1716,7 +1716,7 @@ static void sysctl_set_parent(struct ctl_table *parent, struct ctl_table *table)
}
}

-static __init int sysctl_init(void)
+__init int sysctl_init(void)
{
sysctl_set_parent(NULL, root_table);
#ifdef CONFIG_SYSCTL_SYSCALL_CHECK
@@ -1725,8 +1725,6 @@ static __init int sysctl_init(void)
return 0;
}

-core_initcall(sysctl_init);
-
static struct ctl_table *is_branch_in(struct ctl_table *branch,
struct ctl_table *table)
{
diff --git a/net/sysctl_net.c b/net/sysctl_net.c
index ca84212..1197d9c 100644
--- a/net/sysctl_net.c
+++ b/net/sysctl_net.c
@@ -89,7 +89,7 @@ static struct pernet_operations sysctl_pernet_ops = {
.exit = sysctl_net_exit,
};

-static __init int sysctl_init(void)
+static __init int net_sysctl_init(void)
{
int ret;
ret = register_pernet_subsys(&sysctl_pernet_ops);
@@ -101,7 +101,7 @@ static __init int sysctl_init(void)
out:
return ret;
}
-subsys_initcall(sysctl_init);
+subsys_initcall(net_sysctl_init);

struct ctl_table_header *register_net_sysctl_table(struct net *net,
const struct ctl_path *path, struct ctl_table *table)
--
1.7.5.134.g1c08b

2011-05-08 22:54:22

by Lucian Adrian Grijincu

[permalink] [raw]
Subject: [v2 056/115] sysctl: no-child: manually register kernel/random

Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
drivers/char/random.c | 27 ++++++++++++++++++++++++++-
kernel/sysctl.c | 6 ------
2 files changed, 26 insertions(+), 7 deletions(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index d4ddeba..8893c4b 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -959,8 +959,15 @@ static void init_std_data(struct entropy_store *r)
mix_pool_bytes(r, utsname(), sizeof(*(utsname())));
}

+static int __init register_random_sysctls(void);
+
static int rand_initialize(void)
{
+ int rc;
+ rc = register_random_sysctls();
+ if (!rc)
+ return rc;
+
init_std_data(&input_pool);
init_std_data(&blocking_pool);
init_std_data(&nonblocking_pool);
@@ -1250,7 +1257,7 @@ static int proc_do_uuid(ctl_table *table, int write,
}

static int sysctl_poolsize = INPUT_POOL_WORDS * 32;
-ctl_table random_table[] = {
+static struct ctl_table random_table[] = {
{
.procname = "poolsize",
.data = &sysctl_poolsize,
@@ -1298,6 +1305,24 @@ ctl_table random_table[] = {
},
{ }
};
+
+static const __initdata struct ctl_path random_path[] = {
+ { .procname = "kernel" },
+ { .procname = "random" },
+ { }
+};
+
+static struct ctl_table_header *random_header;
+
+static int __init register_random_sysctls(void)
+{
+ random_header = register_sysctl_paths(random_path, random_table);
+ if (!random_header)
+ return -ENOMEM;
+ return 0;
+}
+#else /* CONFIG_SYSCTL */
+static int __init register_random_sysctls(void) { return 0; }
#endif /* CONFIG_SYSCTL */

/********************************************************************
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 6167daa..b020156 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -216,7 +216,6 @@ static struct ctl_table vm_table[];
static struct ctl_table fs_table[];
static struct ctl_table debug_table[];
static struct ctl_table dev_table[];
-extern struct ctl_table random_table[];
#ifdef CONFIG_EPOLL
extern struct ctl_table epoll_table[];
#endif
@@ -611,11 +610,6 @@ static struct ctl_table kern_table[] = {
.proc_handler = proc_dointvec,
},
{
- .procname = "random",
- .mode = 0555,
- .child = random_table,
- },
- {
.procname = "overflowuid",
.data = &overflowuid,
.maxlen = sizeof(int),
--
1.7.5.134.g1c08b

2011-05-08 22:54:03

by Lucian Adrian Grijincu

[permalink] [raw]
Subject: [v2 057/115] sysctl: no-child: manually register kernel/keys

Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
include/linux/key.h | 4 +++-
kernel/sysctl.c | 7 -------
security/keys/key.c | 1 +
security/keys/sysctl.c | 18 +++++++++++++++++-
4 files changed, 21 insertions(+), 9 deletions(-)

diff --git a/include/linux/key.h b/include/linux/key.h
index b2bb017..9b3df18 100644
--- a/include/linux/key.h
+++ b/include/linux/key.h
@@ -281,7 +281,9 @@ static inline key_serial_t key_serial(struct key *key)
rwsem_is_locked(&((struct key *)(KEY))->sem)))

#ifdef CONFIG_SYSCTL
-extern ctl_table key_sysctls[];
+extern int __init key_register_sysctls(void);
+#else
+static int __init key_register_sysctls(void) { return 0; }
#endif

extern void key_replace_session_keyring(void);
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index b020156..33d5e2e 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -905,13 +905,6 @@ static struct ctl_table kern_table[] = {
.mode = 0644,
.proc_handler = proc_dostring,
},
-#ifdef CONFIG_KEYS
- {
- .procname = "keys",
- .mode = 0555,
- .child = key_sysctls,
- },
-#endif
#ifdef CONFIG_RCU_TORTURE_TEST
{
.procname = "rcutorture_runnable",
diff --git a/security/keys/key.c b/security/keys/key.c
index f7f9d93..33903c2 100644
--- a/security/keys/key.c
+++ b/security/keys/key.c
@@ -1099,6 +1099,7 @@ EXPORT_SYMBOL(unregister_key_type);
*/
void __init key_init(void)
{
+ key_register_sysctls();
/* allocate a slab in which we can store keys */
key_jar = kmem_cache_create("key_jar", sizeof(struct key),
0, SLAB_HWCACHE_ALIGN|SLAB_PANIC, NULL);
diff --git a/security/keys/sysctl.c b/security/keys/sysctl.c
index ee32d18..e079223 100644
--- a/security/keys/sysctl.c
+++ b/security/keys/sysctl.c
@@ -15,7 +15,7 @@

static const int zero, one = 1, max = INT_MAX;

-ctl_table key_sysctls[] = {
+static struct ctl_table key_table[] = {
{
.procname = "maxkeys",
.data = &key_quota_maxkeys,
@@ -63,3 +63,19 @@ ctl_table key_sysctls[] = {
},
{ }
};
+
+static const __initdata struct ctl_path key_path[] = {
+ { .procname = "kernel" },
+ { .procname = "keys" },
+ { }
+};
+
+static struct ctl_table_header *key_header;
+
+int __init key_register_sysctls(void)
+{
+ key_header = register_sysctl_paths(key_path, key_table);
+ if (key_header == NULL)
+ return -ENOMEM;
+ return 0;
+}
--
1.7.5.134.g1c08b

2011-05-08 22:53:47

by Lucian Adrian Grijincu

[permalink] [raw]
Subject: [v2 058/115] sysctl: no-child: manually register fs/inotify

Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
fs/notify/inotify/inotify_user.c | 22 +++++++++++++++++++---
include/linux/inotify.h | 2 --
kernel/sysctl.c | 7 -------
3 files changed, 19 insertions(+), 12 deletions(-)

diff --git a/fs/notify/inotify/inotify_user.c b/fs/notify/inotify/inotify_user.c
index 8445fbc..ba618c2 100644
--- a/fs/notify/inotify/inotify_user.c
+++ b/fs/notify/inotify/inotify_user.c
@@ -51,13 +51,13 @@ static int inotify_max_user_watches __read_mostly;
static struct kmem_cache *inotify_inode_mark_cachep __read_mostly;
struct kmem_cache *event_priv_cachep __read_mostly;

-#ifdef CONFIG_SYSCTL
+#if defined(CONFIG_SYSCTL) && defined(CONFIG_MMU)

#include <linux/sysctl.h>

static int zero;

-ctl_table inotify_table[] = {
+static struct ctl_table inotify_table[] = {
{
.procname = "max_user_instances",
.data = &inotify_max_user_instances,
@@ -84,7 +84,22 @@ ctl_table inotify_table[] = {
},
{ }
};
-#endif /* CONFIG_SYSCTL */
+static const __initdata struct ctl_path inotify_path[] = {
+ { .procname = "fs" },
+ { .procname = "inotify" },
+ { }
+};
+static struct ctl_table_header *inotify_header;
+static int __init register_inotify_sysctls(void)
+{
+ inotify_header = register_sysctl_paths(inotify_path, inotify_table);
+ if (inotify_header == NULL)
+ return -ENOMEM;
+ return 0;
+}
+#else /* CONFIG_SYSCTL && CONFIG_MMU */
+static int __init register_inotify_sysctls(void) { return 0; }
+#endif /* CONFIG_SYSCTL && CONFIG_MMU */

static inline __u32 inotify_arg_to_mask(u32 arg)
{
@@ -862,6 +877,7 @@ static int __init inotify_user_setup(void)
inotify_max_user_instances = 128;
inotify_max_user_watches = 8192;

+ register_inotify_sysctls();
return 0;
}
module_init(inotify_user_setup);
diff --git a/include/linux/inotify.h b/include/linux/inotify.h
index d33041e..89b3bfe 100644
--- a/include/linux/inotify.h
+++ b/include/linux/inotify.h
@@ -71,8 +71,6 @@ struct inotify_event {
#define IN_NONBLOCK O_NONBLOCK

#ifdef __KERNEL__
-#include <linux/sysctl.h>
-extern struct ctl_table inotify_table[]; /* for sysctl */

#define ALL_INOTIFY_BITS (IN_ACCESS | IN_MODIFY | IN_ATTRIB | IN_CLOSE_WRITE | \
IN_CLOSE_NOWRITE | IN_OPEN | IN_MOVED_FROM | \
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 33d5e2e..9520e2b 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -1439,13 +1439,6 @@ static struct ctl_table fs_table[] = {
.proc_handler = proc_doulongvec_minmax,
},
#endif /* CONFIG_AIO */
-#ifdef CONFIG_INOTIFY_USER
- {
- .procname = "inotify",
- .mode = 0555,
- .child = inotify_table,
- },
-#endif
#ifdef CONFIG_EPOLL
{
.procname = "epoll",
--
1.7.5.134.g1c08b

2011-05-08 22:53:24

by Lucian Adrian Grijincu

[permalink] [raw]
Subject: [v2 059/115] sysctl: no-child: manually register fs/epoll

Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
fs/eventpoll.c | 22 +++++++++++++++++++---
include/linux/poll.h | 2 --
kernel/sysctl.c | 10 ----------
3 files changed, 19 insertions(+), 15 deletions(-)

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index f9cfd16..2dbcd0c 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -246,14 +246,14 @@ static struct kmem_cache *epi_cache __read_mostly;
/* Slab cache used to allocate "struct eppoll_entry" */
static struct kmem_cache *pwq_cache __read_mostly;

-#ifdef CONFIG_SYSCTL
+#if defined(CONFIG_SYSCTL) && defined (CONFIG_MMU)

#include <linux/sysctl.h>

static long zero;
static long long_max = LONG_MAX;

-ctl_table epoll_table[] = {
+static struct ctl_table epoll_table[] = {
{
.procname = "max_user_watches",
.data = &max_user_watches,
@@ -265,7 +265,22 @@ ctl_table epoll_table[] = {
},
{ }
};
-#endif /* CONFIG_SYSCTL */
+static const __initdata struct ctl_path epoll_path[] = {
+ { .procname = "fs" },
+ { .procname = "epoll" },
+ { }
+};
+static struct ctl_table_header *epoll_header;
+static int __init register_epoll_sysctls(void)
+{
+ epoll_header = register_sysctl_paths(epoll_path, epoll_table);
+ if (epoll_header == NULL)
+ return -ENOMEM;
+ return 0;
+}
+#else /* CONFIG_SYSCTL && CONFIG_MMU */
+static int __init register_epoll_sysctls(void) { return 0; }
+#endif /* CONFIG_SYSCTL && CONFIG_MMU */


/* Setup the structure that is used as key for the RB tree */
@@ -1586,6 +1601,7 @@ static int __init eventpoll_init(void)
pwq_cache = kmem_cache_create("eventpoll_pwq",
sizeof(struct eppoll_entry), 0, SLAB_PANIC, NULL);

+ register_epoll_sysctls();
return 0;
}
fs_initcall(eventpoll_init);
diff --git a/include/linux/poll.h b/include/linux/poll.h
index cf40010..314331c 100644
--- a/include/linux/poll.h
+++ b/include/linux/poll.h
@@ -10,10 +10,8 @@
#include <linux/wait.h>
#include <linux/string.h>
#include <linux/fs.h>
-#include <linux/sysctl.h>
#include <asm/uaccess.h>

-extern struct ctl_table epoll_table[]; /* for sysctl */
/* ~832 bytes of stack space used max in sys_select/sys_poll before allocating
additional memory. */
#define MAX_STACK_ALLOC 832
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 9520e2b..1797c01 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -216,9 +216,6 @@ static struct ctl_table vm_table[];
static struct ctl_table fs_table[];
static struct ctl_table debug_table[];
static struct ctl_table dev_table[];
-#ifdef CONFIG_EPOLL
-extern struct ctl_table epoll_table[];
-#endif

#ifdef HAVE_ARCH_PICK_MMAP_LAYOUT
int sysctl_legacy_va_layout;
@@ -1439,13 +1436,6 @@ static struct ctl_table fs_table[] = {
.proc_handler = proc_doulongvec_minmax,
},
#endif /* CONFIG_AIO */
-#ifdef CONFIG_EPOLL
- {
- .procname = "epoll",
- .mode = 0555,
- .child = epoll_table,
- },
-#endif
#endif
{
.procname = "suid_dumpable",
--
1.7.5.134.g1c08b

2011-05-08 22:41:55

by Lucian Adrian Grijincu

[permalink] [raw]
Subject: [v2 060/115] sysctl: no-child: manually register root tables

Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
kernel/sysctl.c | 121 +++++++++++++++++++++++++++++++++++++-----------------
1 files changed, 83 insertions(+), 38 deletions(-)

diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 1797c01..edacbdc 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -211,12 +211,6 @@ static struct ctl_table_root sysctl_table_root = {
.default_set.list = LIST_HEAD_INIT(root_table_header.ctl_entry),
};

-static struct ctl_table kern_table[];
-static struct ctl_table vm_table[];
-static struct ctl_table fs_table[];
-static struct ctl_table debug_table[];
-static struct ctl_table dev_table[];
-
#ifdef HAVE_ARCH_PICK_MMAP_LAYOUT
int sysctl_legacy_va_layout;
#endif
@@ -224,31 +218,6 @@ int sysctl_legacy_va_layout;
/* The default sysctl tables: */

static struct ctl_table root_table[] = {
- {
- .procname = "kernel",
- .mode = 0555,
- .child = kern_table,
- },
- {
- .procname = "vm",
- .mode = 0555,
- .child = vm_table,
- },
- {
- .procname = "fs",
- .mode = 0555,
- .child = fs_table,
- },
- {
- .procname = "debug",
- .mode = 0555,
- .child = debug_table,
- },
- {
- .procname = "dev",
- .mode = 0555,
- .child = dev_table,
- },
{ }
};

@@ -266,6 +235,11 @@ static int min_extfrag_threshold;
static int max_extfrag_threshold = 1000;
#endif

+static const __initdata struct ctl_path kern_path [] = {
+ { .procname = "kernel" },
+ { },
+};
+
static struct ctl_table kern_table[] = {
{
.procname = "sched_child_runs_first",
@@ -955,6 +929,11 @@ static struct ctl_table kern_table[] = {
{ }
};

+static const __initdata struct ctl_path vm_path [] = {
+ { .procname = "vm" },
+ { },
+};
+
static struct ctl_table vm_table[] = {
{
.procname = "overcommit_memory",
@@ -1324,11 +1303,23 @@ static struct ctl_table vm_table[] = {
};

#if defined(CONFIG_BINFMT_MISC) || defined(CONFIG_BINFMT_MISC_MODULE)
+
+static const __initdata struct ctl_path binfmt_misc_path [] = {
+ { .procname = "fs" },
+ { .procname = "binfmt_misc" },
+ { },
+};
+
static struct ctl_table binfmt_misc_table[] = {
{ }
};
#endif

+static const __initdata struct ctl_path fs_path [] = {
+ { .procname = "fs" },
+ { },
+};
+
static struct ctl_table fs_table[] = {
{
.procname = "inode-nr",
@@ -1446,13 +1437,6 @@ static struct ctl_table fs_table[] = {
.extra1 = &zero,
.extra2 = &two,
},
-#if defined(CONFIG_BINFMT_MISC) || defined(CONFIG_BINFMT_MISC_MODULE)
- {
- .procname = "binfmt_misc",
- .mode = 0555,
- .child = binfmt_misc_table,
- },
-#endif
{
.procname = "pipe-max-size",
.data = &pipe_max_size,
@@ -1464,6 +1448,11 @@ static struct ctl_table fs_table[] = {
{ }
};

+static const __initdata struct ctl_path debug_path [] = {
+ { .procname = "debug" },
+ { },
+};
+
static struct ctl_table debug_table[] = {
#if defined(CONFIG_X86) || defined(CONFIG_PPC) || defined(CONFIG_SPARC) || \
defined(CONFIG_S390)
@@ -1489,6 +1478,11 @@ static struct ctl_table debug_table[] = {
{ }
};

+static const __initdata struct ctl_path dev_path [] = {
+ { .procname = "dev" },
+ { },
+};
+
static struct ctl_table dev_table[] = {
{ }
};
@@ -1688,11 +1682,62 @@ static void sysctl_set_parent(struct ctl_table *parent, struct ctl_table *table)

__init int sysctl_init(void)
{
+ struct ctl_table_header *kern_header, *vm_header, *fs_header,
+ *debug_header, *dev_header;
+#if defined(CONFIG_BINFMT_MISC) || defined(CONFIG_BINFMT_MISC_MODULE)
+ struct ctl_table_header *binfmt_misc_header;
+#endif
+
sysctl_set_parent(NULL, root_table);
+
+ kern_header = register_sysctl_paths(kern_path, kern_table);
+ if (kern_header == NULL)
+ goto fail_register_kern;
+
+ vm_header = register_sysctl_paths(vm_path, vm_table);
+ if (vm_header == NULL)
+ goto fail_register_vm;
+
+ fs_header = register_sysctl_paths(fs_path, fs_table);
+ if (fs_header == NULL)
+ goto fail_register_fs;
+
+ debug_header = register_sysctl_paths(debug_path, debug_table);
+ if (debug_header == NULL)
+ goto fail_register_debug;
+
+ dev_header = register_sysctl_paths(dev_path, dev_table);
+ if (dev_header == NULL)
+ goto fail_register_dev;
+
+#if defined(CONFIG_BINFMT_MISC) || defined(CONFIG_BINFMT_MISC_MODULE)
+ binfmt_misc_header = register_sysctl_paths(binfmt_misc_path, binfmt_misc_table);
+ if (binfmt_misc_header == NULL)
+ goto fail_register_binfmt_misc;
+#endif
+
+
#ifdef CONFIG_SYSCTL_SYSCALL_CHECK
sysctl_check_table(current->nsproxy, root_table);
#endif
return 0;
+
+
+#if defined(CONFIG_BINFMT_MISC) || defined(CONFIG_BINFMT_MISC_MODULE)
+fail_register_binfmt_misc:
+ unregister_sysctl_table(dev_header);
+#endif
+
+fail_register_dev:
+ unregister_sysctl_table(debug_header);
+fail_register_debug:
+ unregister_sysctl_table(fs_header);
+fail_register_fs:
+ unregister_sysctl_table(vm_header);
+fail_register_vm:
+ unregister_sysctl_table(kern_header);
+fail_register_kern:
+ return -ENOMEM;
}

static struct ctl_table *is_branch_in(struct ctl_table *branch,
--
1.7.5.134.g1c08b

2011-05-08 22:52:47

by Lucian Adrian Grijincu

[permalink] [raw]
Subject: [v2 061/115] sysctl: faster reimplementation of sysctl_check_table

Determining the parent of a node at depth d
- previous implementation: O(d)
- current implementation: O(1)

Printing the path to a node at depth d
- previous implementation: O(d^2)
- current implementation: O(d)

This comes with a small cost: we use an array ('parents') holding as many
pointers as there can be sysctl levels (currently CTL_MAXNAME=10).

The 'parents' array of pointers holds the same values as the
ctl_table->parents field because the function that updates ->parents
(sysctl_set_parent) is called with either NULL (for root nodes) or
with sysctl_set_parent(table, table->child).

Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
kernel/sysctl_check.c | 118 ++++++++++++++++++++++++++-----------------------
1 files changed, 62 insertions(+), 56 deletions(-)

diff --git a/kernel/sysctl_check.c b/kernel/sysctl_check.c
index 4e4932a..cc26490 100644
--- a/kernel/sysctl_check.c
+++ b/kernel/sysctl_check.c
@@ -6,58 +6,34 @@
#include <net/ip_vs.h>


-static int sysctl_depth(struct ctl_table *table)
-{
- struct ctl_table *tmp;
- int depth;
-
- depth = 0;
- for (tmp = table; tmp->parent; tmp = tmp->parent)
- depth++;
-
- return depth;
-}
-
-static struct ctl_table *sysctl_parent(struct ctl_table *table, int n)
+static void sysctl_print_path(struct ctl_table *table,
+ struct ctl_table **parents, int depth)
{
+ struct ctl_table *p;
int i;
-
- for (i = 0; table && i < n; i++)
- table = table->parent;
-
- return table;
-}
-
-
-static void sysctl_print_path(struct ctl_table *table)
-{
- struct ctl_table *tmp;
- int depth, i;
- depth = sysctl_depth(table);
if (table->procname) {
- for (i = depth; i >= 0; i--) {
- tmp = sysctl_parent(table, i);
- printk("/%s", tmp->procname?tmp->procname:"");
+ for (i = 0; i < depth; i++) {
+ p = parents[i];
+ printk("/%s", p->procname ? p->procname : "");
}
+ printk("/%s", table->procname);
}
printk(" ");
}

static struct ctl_table *sysctl_check_lookup(struct nsproxy *namespaces,
- struct ctl_table *table)
+ struct ctl_table *table, struct ctl_table **parents, int depth)
{
struct ctl_table_header *head;
struct ctl_table *ref, *test;
- int depth, cur_depth;
-
- depth = sysctl_depth(table);
+ int cur_depth;

for (head = __sysctl_head_next(namespaces, NULL); head;
head = __sysctl_head_next(namespaces, head)) {
cur_depth = depth;
ref = head->ctl_table;
repeat:
- test = sysctl_parent(table, cur_depth);
+ test = parents[depth - cur_depth];
for (; ref->procname; ref++) {
int match = 0;
if (cur_depth && !ref->child)
@@ -83,11 +59,12 @@ out:
return ref;
}

-static void set_fail(const char **fail, struct ctl_table *table, const char *str)
+static void set_fail(const char **fail, struct ctl_table *table,
+ const char *str, struct ctl_table **parents, int depth)
{
if (*fail) {
printk(KERN_ERR "sysctl table check failed: ");
- sysctl_print_path(table);
+ sysctl_print_path(table, parents, depth);
printk(" %s\n", *fail);
dump_stack();
}
@@ -95,38 +72,51 @@ static void set_fail(const char **fail, struct ctl_table *table, const char *str
}

static void sysctl_check_leaf(struct nsproxy *namespaces,
- struct ctl_table *table, const char **fail)
+ struct ctl_table *table, const char **fail,
+ struct ctl_table **parents, int depth)
{
struct ctl_table *ref;

- ref = sysctl_check_lookup(namespaces, table);
+ ref = sysctl_check_lookup(namespaces, table, parents, depth);
if (ref && (ref != table))
- set_fail(fail, table, "Sysctl already exists");
+ set_fail(fail, table, "Sysctl already exists", parents, depth);
}

-int sysctl_check_table(struct nsproxy *namespaces, struct ctl_table *table)
+
+
+#define SET_FAIL(str) set_fail(&fail, table, str, parents, depth)
+
+static int __sysctl_check_table(struct nsproxy *namespaces,
+ struct ctl_table *table, struct ctl_table **parents, int depth)
{
+ const char *fail = NULL;
int error = 0;
+
+ if (depth >= CTL_MAXNAME) {
+ SET_FAIL("Sysctl tree too deep");
+ return -EINVAL;
+ }
+
for (; table->procname; table++) {
- const char *fail = NULL;
+ fail = NULL;

if (table->parent) {
if (!table->parent->procname)
- set_fail(&fail, table, "Parent without procname");
+ SET_FAIL("Parent without procname");
}
if (table->child) {
if (table->data)
- set_fail(&fail, table, "Directory with data?");
+ SET_FAIL("Directory with data?");
if (table->maxlen)
- set_fail(&fail, table, "Directory with maxlen?");
+ SET_FAIL("Directory with maxlen?");
if ((table->mode & (S_IRUGO|S_IXUGO)) != table->mode)
- set_fail(&fail, table, "Writable sysctl directory");
+ SET_FAIL("Writable sysctl directory");
if (table->proc_handler)
- set_fail(&fail, table, "Directory with proc_handler");
+ SET_FAIL("Directory with proc_handler");
if (table->extra1)
- set_fail(&fail, table, "Directory with extra1");
+ SET_FAIL("Directory with extra1");
if (table->extra2)
- set_fail(&fail, table, "Directory with extra2");
+ SET_FAIL("Directory with extra2");
} else {
if ((table->proc_handler == proc_dostring) ||
(table->proc_handler == proc_dointvec) ||
@@ -137,24 +127,40 @@ int sysctl_check_table(struct nsproxy *namespaces, struct ctl_table *table)
(table->proc_handler == proc_doulongvec_minmax) ||
(table->proc_handler == proc_doulongvec_ms_jiffies_minmax)) {
if (!table->data)
- set_fail(&fail, table, "No data");
+ SET_FAIL("No data");
if (!table->maxlen)
- set_fail(&fail, table, "No maxlen");
+ SET_FAIL("No maxlen");
}
#ifdef CONFIG_PROC_SYSCTL
if (!table->proc_handler)
- set_fail(&fail, table, "No proc_handler");
+ SET_FAIL("No proc_handler");
#endif
- sysctl_check_leaf(namespaces, table, &fail);
+ parents[depth] = table;
+ sysctl_check_leaf(namespaces, table, &fail,
+ parents, depth);
}
if (table->mode > 0777)
- set_fail(&fail, table, "bogus .mode");
+ SET_FAIL("bogus .mode");
if (fail) {
- set_fail(&fail, table, NULL);
+ SET_FAIL(NULL);
error = -EINVAL;
}
- if (table->child)
- error |= sysctl_check_table(namespaces, table->child);
+ if (table->child) {
+ parents[depth] = table;
+ error |= __sysctl_check_table(namespaces, table->child,
+ parents, depth + 1);
+ }
}
return error;
}
+
+
+int sysctl_check_table(struct nsproxy *namespaces, struct ctl_table *table)
+{
+ struct ctl_table *parents[CTL_MAXNAME];
+ /* Keep track of parents as we go down into the tree:
+ * - the node at depth 'd' will have the parent at parents[d-1].
+ * - the root node (depth=0) has no parent in this array.
+ */
+ return __sysctl_check_table(namespaces, table, parents, 0);
+}
--
1.7.5.134.g1c08b

2011-05-08 22:52:13

by Lucian Adrian Grijincu

[permalink] [raw]
Subject: [v2 062/115] sysctl: remove useless ctl_table->parent field

The 'parent' field was added for selinux in:
commit d912b0cc1a617d7c590d57b7ea971d50c7f02503
[PATCH] sysctl: add a parent entry to ctl_table and set the parent entry

and then was used for sysctl_check_table.

Both of the users have found other implementations.

Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
include/linux/sysctl.h | 1 -
kernel/sysctl.c | 12 ------------
kernel/sysctl_check.c | 5 +++--
3 files changed, 3 insertions(+), 15 deletions(-)

diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h
index 095df3a..1c41dbd 100644
--- a/include/linux/sysctl.h
+++ b/include/linux/sysctl.h
@@ -1021,7 +1021,6 @@ struct ctl_table
int maxlen;
mode_t mode;
struct ctl_table *child;
- struct ctl_table *parent; /* Automatically set */
proc_handler *proc_handler; /* Callback for text formatting */
void *extra1;
void *extra2;
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index edacbdc..0450d3d 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -1671,15 +1671,6 @@ int sysctl_perm(struct ctl_table_root *root, struct ctl_table *table, int op)
return test_perm(mode, op);
}

-static void sysctl_set_parent(struct ctl_table *parent, struct ctl_table *table)
-{
- for (; table->procname; table++) {
- table->parent = parent;
- if (table->child)
- sysctl_set_parent(table, table->child);
- }
-}
-
__init int sysctl_init(void)
{
struct ctl_table_header *kern_header, *vm_header, *fs_header,
@@ -1688,8 +1679,6 @@ __init int sysctl_init(void)
struct ctl_table_header *binfmt_misc_header;
#endif

- sysctl_set_parent(NULL, root_table);
-
kern_header = register_sysctl_paths(kern_path, kern_table);
if (kern_header == NULL)
goto fail_register_kern;
@@ -1889,7 +1878,6 @@ struct ctl_table_header *__register_sysctl_paths(
header->used = 0;
header->unregistering = NULL;
header->root = root;
- sysctl_set_parent(NULL, header->ctl_table);
header->count = 1;
#ifdef CONFIG_SYSCTL_SYSCALL_CHECK
if (sysctl_check_table(namespaces, header->ctl_table)) {
diff --git a/kernel/sysctl_check.c b/kernel/sysctl_check.c
index cc26490..52f4810 100644
--- a/kernel/sysctl_check.c
+++ b/kernel/sysctl_check.c
@@ -100,8 +100,9 @@ static int __sysctl_check_table(struct nsproxy *namespaces,
for (; table->procname; table++) {
fail = NULL;

- if (table->parent) {
- if (!table->parent->procname)
+
+ if (depth != 0) { /* has parent */
+ if (!parents[depth - 1]->procname)
SET_FAIL("Parent without procname");
}
if (table->child) {
--
1.7.5.134.g1c08b

2011-05-08 22:52:10

by Lucian Adrian Grijincu

[permalink] [raw]
Subject: [v2 063/115] sysctl: simplify find_in_table

The if (!p->procname) check is useless because the loop condition
prevents it from happening.

Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
fs/proc/proc_sysctl.c | 10 ++--------
1 files changed, 2 insertions(+), 8 deletions(-)

diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
index f50133c..d1640bc 100644
--- a/fs/proc/proc_sysctl.c
+++ b/fs/proc/proc_sysctl.c
@@ -52,18 +52,12 @@ static struct ctl_table *find_in_table(struct ctl_table *p, struct qstr *name)
int len;
for ( ; p->procname; p++) {

- if (!p->procname)
- continue;
-
len = strlen(p->procname);
if (len != name->len)
continue;

- if (memcmp(p->procname, name->name, len) != 0)
- continue;
-
- /* I have a match */
- return p;
+ if (memcmp(p->procname, name->name, len) == 0)
+ return p;
}
return NULL;
}
--
1.7.5.134.g1c08b

2011-05-08 22:51:36

by Lucian Adrian Grijincu

[permalink] [raw]
Subject: [v2 064/115] sysctl: sysctl_head_grab defaults to root header on NULL

The code that could send NULL to sysctl_head_grab is grab_header
because for the root sysctl directory ('/proc/sys/')
PROC_I(inode)->sysctl is NULL.

For it we used to return root_table_header indirectly through a call
to sysctl_head_next(NULL). Now we default to the root header here.

The BUG() has not been triggered until now so we can assume no one
else is sending NULL here.

Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
fs/proc/proc_sysctl.c | 5 +----
kernel/sysctl.c | 2 +-
2 files changed, 2 insertions(+), 5 deletions(-)

diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
index d1640bc..93962b0 100644
--- a/fs/proc/proc_sysctl.c
+++ b/fs/proc/proc_sysctl.c
@@ -64,10 +64,7 @@ static struct ctl_table *find_in_table(struct ctl_table *p, struct qstr *name)

static struct ctl_table_header *grab_header(struct inode *inode)
{
- if (PROC_I(inode)->sysctl)
- return sysctl_head_grab(PROC_I(inode)->sysctl);
- else
- return sysctl_head_next(NULL);
+ return sysctl_head_grab(PROC_I(inode)->sysctl);
}

static struct dentry *proc_sys_lookup(struct inode *dir, struct dentry *dentry,
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 0450d3d..8b56695 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -1554,7 +1554,7 @@ void sysctl_head_put(struct ctl_table_header *head)
struct ctl_table_header *sysctl_head_grab(struct ctl_table_header *head)
{
if (!head)
- BUG();
+ head = &root_table_header;
spin_lock(&sysctl_lock);
if (!use_table(head))
head = ERR_PTR(-ENOENT);
--
1.7.5.134.g1c08b

2011-05-08 22:51:34

by Lucian Adrian Grijincu

[permalink] [raw]
Subject: [v2 065/115] sysctl: delete useless grab_header function

There are lots of header grabbing/getting functions around. We'll
start changing them later on and this one will just make conversions
harder. It doesn't help much, so kill it!

Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
fs/proc/proc_sysctl.c | 15 +++++----------
1 files changed, 5 insertions(+), 10 deletions(-)

diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
index 93962b0..64665e0 100644
--- a/fs/proc/proc_sysctl.c
+++ b/fs/proc/proc_sysctl.c
@@ -62,15 +62,10 @@ static struct ctl_table *find_in_table(struct ctl_table *p, struct qstr *name)
return NULL;
}

-static struct ctl_table_header *grab_header(struct inode *inode)
-{
- return sysctl_head_grab(PROC_I(inode)->sysctl);
-}
-
static struct dentry *proc_sys_lookup(struct inode *dir, struct dentry *dentry,
struct nameidata *nd)
{
- struct ctl_table_header *head = grab_header(dir);
+ struct ctl_table_header *head = sysctl_head_grab(PROC_I(dir)->sysctl);
struct ctl_table *table = PROC_I(dir)->sysctl_entry;
struct ctl_table_header *h = NULL;
struct qstr *name = &dentry->d_name;
@@ -123,7 +118,7 @@ static ssize_t proc_sys_call_handler(struct file *filp, void __user *buf,
size_t count, loff_t *ppos, int write)
{
struct inode *inode = filp->f_path.dentry->d_inode;
- struct ctl_table_header *head = grab_header(inode);
+ struct ctl_table_header *head = sysctl_head_grab(PROC_I(inode)->sysctl);
struct ctl_table *table = PROC_I(inode)->sysctl_entry;
ssize_t error;
size_t res;
@@ -234,7 +229,7 @@ static int proc_sys_readdir(struct file *filp, void *dirent, filldir_t filldir)
{
struct dentry *dentry = filp->f_path.dentry;
struct inode *inode = dentry->d_inode;
- struct ctl_table_header *head = grab_header(inode);
+ struct ctl_table_header *head = sysctl_head_grab(PROC_I(inode)->sysctl);
struct ctl_table *table = PROC_I(inode)->sysctl_entry;
struct ctl_table_header *h = NULL;
unsigned long pos;
@@ -302,7 +297,7 @@ static int proc_sys_permission(struct inode *inode, int mask,unsigned int flags)
if ((mask & MAY_EXEC) && S_ISREG(inode->i_mode))
return -EACCES;

- head = grab_header(inode);
+ head = sysctl_head_grab(PROC_I(inode)->sysctl);
if (IS_ERR(head))
return PTR_ERR(head);

@@ -343,7 +338,7 @@ static int proc_sys_setattr(struct dentry *dentry, struct iattr *attr)
static int proc_sys_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat *stat)
{
struct inode *inode = dentry->d_inode;
- struct ctl_table_header *head = grab_header(inode);
+ struct ctl_table_header *head = sysctl_head_grab(PROC_I(inode)->sysctl);
struct ctl_table *table = PROC_I(inode)->sysctl_entry;

if (IS_ERR(head))
--
1.7.5.134.g1c08b

2011-05-08 22:41:58

by Lucian Adrian Grijincu

[permalink] [raw]
Subject: [v2 066/115] sysctl: rename ->used to ->ctl_use_refs

In a later patch I will split the 'count' counter. We need to have a
clear distinction between the three counters to be able to understand
the code.

This counts the number of references to this object from places that
can tinker with it's internals (e.g. ctl_table, ctl_entry,
attached_to, attached_by, etc.).

Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
include/linux/sysctl.h | 4 +++-
kernel/sysctl.c | 9 ++++-----
2 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h
index 1c41dbd..fe13067 100644
--- a/include/linux/sysctl.h
+++ b/include/linux/sysctl.h
@@ -1043,7 +1043,9 @@ struct ctl_table_header
struct {
struct ctl_table *ctl_table;
struct list_head ctl_entry;
- int used;
+ /* references to this header from contexts that
+ * can access fields of this header */
+ int ctl_use_refs;
int count;
};
struct rcu_head rcu;
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 8b56695..ab242b4 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -1494,14 +1494,14 @@ static int use_table(struct ctl_table_header *p)
{
if (unlikely(p->unregistering))
return 0;
- p->used++;
+ p->ctl_use_refs++;
return 1;
}

/* called under sysctl_lock */
static void unuse_table(struct ctl_table_header *p)
{
- if (!--p->used)
+ if (!--p->ctl_use_refs)
if (unlikely(p->unregistering))
complete(p->unregistering);
}
@@ -1510,10 +1510,10 @@ static void unuse_table(struct ctl_table_header *p)
static void start_unregistering(struct ctl_table_header *p)
{
/*
- * if p->used is 0, nobody will ever touch that entry again;
+ * if p->ctl_use_refs is 0, nobody will ever touch that entry again;
* we'll eliminate all paths to it before dropping sysctl_lock
*/
- if (unlikely(p->used)) {
+ if (unlikely(p->ctl_use_refs)) {
struct completion wait;
init_completion(&wait);
p->unregistering = &wait;
@@ -1875,7 +1875,6 @@ struct ctl_table_header *__register_sysctl_paths(
header->ctl_table_arg = table;

INIT_LIST_HEAD(&header->ctl_entry);
- header->used = 0;
header->unregistering = NULL;
header->root = root;
header->count = 1;
--
1.7.5.134.g1c08b

2011-05-08 22:51:15

by Lucian Adrian Grijincu

[permalink] [raw]
Subject: [v2 067/115] sysctl: rename sysctl_head_grab/finish to sysctl_use_header/unuse

The function names are clearer and they reflect the reference counter
that is being inc/decremented. No functional change, just aesthetics.

Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
fs/proc/proc_sysctl.c | 24 ++++++++++++------------
include/linux/sysctl.h | 4 ++--
kernel/sysctl.c | 40 ++++++++++++++++++++--------------------
kernel/sysctl_check.c | 2 +-
4 files changed, 35 insertions(+), 35 deletions(-)

diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
index 64665e0..b4cde14 100644
--- a/fs/proc/proc_sysctl.c
+++ b/fs/proc/proc_sysctl.c
@@ -65,7 +65,7 @@ static struct ctl_table *find_in_table(struct ctl_table *p, struct qstr *name)
static struct dentry *proc_sys_lookup(struct inode *dir, struct dentry *dentry,
struct nameidata *nd)
{
- struct ctl_table_header *head = sysctl_head_grab(PROC_I(dir)->sysctl);
+ struct ctl_table_header *head = sysctl_use_header(PROC_I(dir)->sysctl);
struct ctl_table *table = PROC_I(dir)->sysctl_entry;
struct ctl_table_header *h = NULL;
struct qstr *name = &dentry->d_name;
@@ -100,7 +100,7 @@ static struct dentry *proc_sys_lookup(struct inode *dir, struct dentry *dentry,
err = ERR_PTR(-ENOMEM);
inode = proc_sys_make_inode(dir->i_sb, h ? h : head, p);
if (h)
- sysctl_head_finish(h);
+ sysctl_unuse_header(h);

if (!inode)
goto out;
@@ -110,7 +110,7 @@ static struct dentry *proc_sys_lookup(struct inode *dir, struct dentry *dentry,
d_add(dentry, inode);

out:
- sysctl_head_finish(head);
+ sysctl_unuse_header(head);
return err;
}

@@ -118,7 +118,7 @@ static ssize_t proc_sys_call_handler(struct file *filp, void __user *buf,
size_t count, loff_t *ppos, int write)
{
struct inode *inode = filp->f_path.dentry->d_inode;
- struct ctl_table_header *head = sysctl_head_grab(PROC_I(inode)->sysctl);
+ struct ctl_table_header *head = sysctl_use_header(PROC_I(inode)->sysctl);
struct ctl_table *table = PROC_I(inode)->sysctl_entry;
ssize_t error;
size_t res;
@@ -145,7 +145,7 @@ static ssize_t proc_sys_call_handler(struct file *filp, void __user *buf,
if (!error)
error = res;
out:
- sysctl_head_finish(head);
+ sysctl_unuse_header(head);

return error;
}
@@ -229,7 +229,7 @@ static int proc_sys_readdir(struct file *filp, void *dirent, filldir_t filldir)
{
struct dentry *dentry = filp->f_path.dentry;
struct inode *inode = dentry->d_inode;
- struct ctl_table_header *head = sysctl_head_grab(PROC_I(inode)->sysctl);
+ struct ctl_table_header *head = sysctl_use_header(PROC_I(inode)->sysctl);
struct ctl_table *table = PROC_I(inode)->sysctl_entry;
struct ctl_table_header *h = NULL;
unsigned long pos;
@@ -270,13 +270,13 @@ static int proc_sys_readdir(struct file *filp, void *dirent, filldir_t filldir)
continue;
ret = scan(h, h->attached_by, &pos, filp, dirent, filldir);
if (ret) {
- sysctl_head_finish(h);
+ sysctl_unuse_header(h);
break;
}
}
ret = 1;
out:
- sysctl_head_finish(head);
+ sysctl_unuse_header(head);
return ret;
}

@@ -297,7 +297,7 @@ static int proc_sys_permission(struct inode *inode, int mask,unsigned int flags)
if ((mask & MAY_EXEC) && S_ISREG(inode->i_mode))
return -EACCES;

- head = sysctl_head_grab(PROC_I(inode)->sysctl);
+ head = sysctl_use_header(PROC_I(inode)->sysctl);
if (IS_ERR(head))
return PTR_ERR(head);

@@ -307,7 +307,7 @@ static int proc_sys_permission(struct inode *inode, int mask,unsigned int flags)
else /* Use the permissions on the sysctl table entry */
error = sysctl_perm(head->root, table, mask);

- sysctl_head_finish(head);
+ sysctl_unuse_header(head);
return error;
}

@@ -338,7 +338,7 @@ static int proc_sys_setattr(struct dentry *dentry, struct iattr *attr)
static int proc_sys_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat *stat)
{
struct inode *inode = dentry->d_inode;
- struct ctl_table_header *head = sysctl_head_grab(PROC_I(inode)->sysctl);
+ struct ctl_table_header *head = sysctl_use_header(PROC_I(inode)->sysctl);
struct ctl_table *table = PROC_I(inode)->sysctl_entry;

if (IS_ERR(head))
@@ -348,7 +348,7 @@ static int proc_sys_getattr(struct vfsmount *mnt, struct dentry *dentry, struct
if (table)
stat->mode = (stat->mode & S_IFMT) | table->mode;

- sysctl_head_finish(head);
+ sysctl_unuse_header(head);
return 0;
}

diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h
index fe13067..3ff0a9e 100644
--- a/include/linux/sysctl.h
+++ b/include/linux/sysctl.h
@@ -954,11 +954,11 @@ struct ctl_table_header;
extern void sysctl_head_get(struct ctl_table_header *);
extern void sysctl_head_put(struct ctl_table_header *);
extern int sysctl_is_seen(struct ctl_table_header *);
-extern struct ctl_table_header *sysctl_head_grab(struct ctl_table_header *);
+extern struct ctl_table_header *sysctl_use_header(struct ctl_table_header *);
extern struct ctl_table_header *sysctl_head_next(struct ctl_table_header *prev);
extern struct ctl_table_header *__sysctl_head_next(struct nsproxy *namespaces,
struct ctl_table_header *prev);
-extern void sysctl_head_finish(struct ctl_table_header *prev);
+extern void sysctl_unuse_header(struct ctl_table_header *prev);
extern int sysctl_perm(struct ctl_table_root *root,
struct ctl_table *table, int op);

diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index ab242b4..5d52e7a 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -1506,6 +1506,26 @@ static void unuse_table(struct ctl_table_header *p)
complete(p->unregistering);
}

+struct ctl_table_header *sysctl_use_header(struct ctl_table_header *head)
+{
+ if (!head)
+ head = &root_table_header;
+ spin_lock(&sysctl_lock);
+ if (!use_table(head))
+ head = ERR_PTR(-ENOENT);
+ spin_unlock(&sysctl_lock);
+ return head;
+}
+
+void sysctl_unuse_header(struct ctl_table_header *head)
+{
+ if (!head)
+ return;
+ spin_lock(&sysctl_lock);
+ unuse_table(head);
+ spin_unlock(&sysctl_lock);
+}
+
/* called under sysctl_lock, will reacquire if has to wait */
static void start_unregistering(struct ctl_table_header *p)
{
@@ -1551,26 +1571,6 @@ void sysctl_head_put(struct ctl_table_header *head)
spin_unlock(&sysctl_lock);
}

-struct ctl_table_header *sysctl_head_grab(struct ctl_table_header *head)
-{
- if (!head)
- head = &root_table_header;
- spin_lock(&sysctl_lock);
- if (!use_table(head))
- head = ERR_PTR(-ENOENT);
- spin_unlock(&sysctl_lock);
- return head;
-}
-
-void sysctl_head_finish(struct ctl_table_header *head)
-{
- if (!head)
- return;
- spin_lock(&sysctl_lock);
- unuse_table(head);
- spin_unlock(&sysctl_lock);
-}
-
static struct ctl_table_set *
lookup_header_set(struct ctl_table_root *root, struct nsproxy *namespaces)
{
diff --git a/kernel/sysctl_check.c b/kernel/sysctl_check.c
index 52f4810..a3a58b8 100644
--- a/kernel/sysctl_check.c
+++ b/kernel/sysctl_check.c
@@ -55,7 +55,7 @@ repeat:
}
ref = NULL;
out:
- sysctl_head_finish(head);
+ sysctl_unuse_header(head);
return ref;
}

--
1.7.5.134.g1c08b

2011-05-08 22:50:56

by Lucian Adrian Grijincu

[permalink] [raw]
Subject: [v2 068/115] sysctl: rename sysctl_head_next to sysctl_use_next_header

The new names makes it clear that this increments ctl_use_refs and
that _unuse must be used on the header. No functional change.

Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
fs/proc/proc_sysctl.c | 4 ++--
include/linux/sysctl.h | 4 ++--
kernel/sysctl.c | 6 +++---
kernel/sysctl_check.c | 4 ++--
4 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
index b4cde14..068d39c 100644
--- a/fs/proc/proc_sysctl.c
+++ b/fs/proc/proc_sysctl.c
@@ -85,7 +85,7 @@ static struct dentry *proc_sys_lookup(struct inode *dir, struct dentry *dentry,

p = find_in_table(table, name);
if (!p) {
- for (h = sysctl_head_next(NULL); h; h = sysctl_head_next(h)) {
+ for (h = sysctl_use_next_header(NULL); h; h = sysctl_use_next_header(h)) {
if (h->attached_to != table)
continue;
p = find_in_table(h->attached_by, name);
@@ -265,7 +265,7 @@ static int proc_sys_readdir(struct file *filp, void *dirent, filldir_t filldir)
if (ret)
goto out;

- for (h = sysctl_head_next(NULL); h; h = sysctl_head_next(h)) {
+ for (h = sysctl_use_next_header(NULL); h; h = sysctl_use_next_header(h)) {
if (h->attached_to != table)
continue;
ret = scan(h, h->attached_by, &pos, filp, dirent, filldir);
diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h
index 3ff0a9e..4ed5235 100644
--- a/include/linux/sysctl.h
+++ b/include/linux/sysctl.h
@@ -955,8 +955,8 @@ extern void sysctl_head_get(struct ctl_table_header *);
extern void sysctl_head_put(struct ctl_table_header *);
extern int sysctl_is_seen(struct ctl_table_header *);
extern struct ctl_table_header *sysctl_use_header(struct ctl_table_header *);
-extern struct ctl_table_header *sysctl_head_next(struct ctl_table_header *prev);
-extern struct ctl_table_header *__sysctl_head_next(struct nsproxy *namespaces,
+extern struct ctl_table_header *sysctl_use_next_header(struct ctl_table_header *prev);
+extern struct ctl_table_header *__sysctl_use_next_header(struct nsproxy *namespaces,
struct ctl_table_header *prev);
extern void sysctl_unuse_header(struct ctl_table_header *prev);
extern int sysctl_perm(struct ctl_table_root *root,
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 5d52e7a..e4ec23e 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -1587,7 +1587,7 @@ lookup_header_list(struct ctl_table_root *root, struct nsproxy *namespaces)
return &set->list;
}

-struct ctl_table_header *__sysctl_head_next(struct nsproxy *namespaces,
+struct ctl_table_header *__sysctl_use_next_header(struct nsproxy *namespaces,
struct ctl_table_header *prev)
{
struct ctl_table_root *root;
@@ -1631,9 +1631,9 @@ out:
return NULL;
}

-struct ctl_table_header *sysctl_head_next(struct ctl_table_header *prev)
+struct ctl_table_header *sysctl_use_next_header(struct ctl_table_header *prev)
{
- return __sysctl_head_next(current->nsproxy, prev);
+ return __sysctl_use_next_header(current->nsproxy, prev);
}

void register_sysctl_root(struct ctl_table_root *root)
diff --git a/kernel/sysctl_check.c b/kernel/sysctl_check.c
index a3a58b8..44c31f0 100644
--- a/kernel/sysctl_check.c
+++ b/kernel/sysctl_check.c
@@ -28,8 +28,8 @@ static struct ctl_table *sysctl_check_lookup(struct nsproxy *namespaces,
struct ctl_table *ref, *test;
int cur_depth;

- for (head = __sysctl_head_next(namespaces, NULL); head;
- head = __sysctl_head_next(namespaces, head)) {
+ for (head = __sysctl_use_next_header(namespaces, NULL); head;
+ head = __sysctl_use_next_header(namespaces, head)) {
cur_depth = depth;
ref = head->ctl_table;
repeat:
--
1.7.5.134.g1c08b

2011-05-08 22:50:29

by Lucian Adrian Grijincu

[permalink] [raw]
Subject: [v2 069/115] sysctl: split ->count into ctl_procfs_refs and ctl_header_refs

This is not necessary at this point, but will be later when we replace
the sysctl implementation with one that uses ctl_table_header objects
as directories.

Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
include/linux/sysctl.h | 8 +++++++-
kernel/sysctl.c | 21 ++++++++++++---------
2 files changed, 19 insertions(+), 10 deletions(-)

diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h
index 4ed5235..0f41beb 100644
--- a/include/linux/sysctl.h
+++ b/include/linux/sysctl.h
@@ -1046,7 +1046,13 @@ struct ctl_table_header
/* references to this header from contexts that
* can access fields of this header */
int ctl_use_refs;
- int count;
+ /* references to this header from procfs inodes.
+ * procfs embeds a pointer to the header in proc_inode */
+ int ctl_procfs_refs;
+ /* counts references to this header from other
+ * headers (through ->parent) plus the reference
+ * returned by __register_sysctl_paths */
+ int ctl_header_refs;
};
struct rcu_head rcu;
};
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index e4ec23e..48a1ffd 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -200,7 +200,7 @@ static int sysrq_sysctl_handler(ctl_table *table, int write,
static struct ctl_table root_table[];
static struct ctl_table_root sysctl_table_root;
static struct ctl_table_header root_table_header = {
- {{.count = 1,
+ {{.ctl_header_refs = 1,
.ctl_table = root_table,
.ctl_entry = LIST_HEAD_INIT(sysctl_table_root.default_set.list),}},
.root = &sysctl_table_root,
@@ -1554,7 +1554,7 @@ static void start_unregistering(struct ctl_table_header *p)
void sysctl_head_get(struct ctl_table_header *head)
{
spin_lock(&sysctl_lock);
- head->count++;
+ head->ctl_procfs_refs++;
spin_unlock(&sysctl_lock);
}

@@ -1566,7 +1566,8 @@ static void free_head(struct rcu_head *rcu)
void sysctl_head_put(struct ctl_table_header *head)
{
spin_lock(&sysctl_lock);
- if (!--head->count)
+ head->ctl_procfs_refs--;
+ if ((head->ctl_procfs_refs == 0) && (head->ctl_header_refs == 0))
call_rcu(&head->rcu, free_head);
spin_unlock(&sysctl_lock);
}
@@ -1877,7 +1878,7 @@ struct ctl_table_header *__register_sysctl_paths(
INIT_LIST_HEAD(&header->ctl_entry);
header->unregistering = NULL;
header->root = root;
- header->count = 1;
+ header->ctl_header_refs = 1;
#ifdef CONFIG_SYSCTL_SYSCALL_CHECK
if (sysctl_check_table(namespaces, header->ctl_table)) {
kfree(header);
@@ -1897,7 +1898,7 @@ struct ctl_table_header *__register_sysctl_paths(
try_attach(p, header);
}
}
- header->parent->count++;
+ header->parent->ctl_header_refs++;
list_add_tail(&header->ctl_entry, &header->set->list);
spin_unlock(&sysctl_lock);

@@ -1937,12 +1938,14 @@ void unregister_sysctl_table(struct ctl_table_header * header)

spin_lock(&sysctl_lock);
start_unregistering(header);
- if (!--header->parent->count) {
+ if (!--header->parent->ctl_header_refs) {
WARN_ON(1);
- call_rcu(&header->parent->rcu, free_head);
+ if (!header->parent->ctl_procfs_refs)
+ call_rcu(&header->parent->rcu, free_head);
}
- if (!--header->count)
- call_rcu(&header->rcu, free_head);
+ if (!--header->ctl_header_refs)
+ if (!header->ctl_procfs_refs)
+ call_rcu(&header->rcu, free_head);
spin_unlock(&sysctl_lock);
}

--
1.7.5.134.g1c08b

2011-05-08 22:50:08

by Lucian Adrian Grijincu

[permalink] [raw]
Subject: [v2 070/115] sysctl: rename sysctl_head_get/put to sysctl_proc_inode_get/put

Clarify the purpose of those references. No functional changes.

Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
fs/proc/inode.c | 2 +-
fs/proc/proc_sysctl.c | 2 +-
include/linux/sysctl.h | 7 +++++--
kernel/sysctl.c | 6 +++---
4 files changed, 10 insertions(+), 7 deletions(-)

diff --git a/fs/proc/inode.c b/fs/proc/inode.c
index d15aa1b..08166df 100644
--- a/fs/proc/inode.c
+++ b/fs/proc/inode.c
@@ -42,7 +42,7 @@ static void proc_evict_inode(struct inode *inode)
head = PROC_I(inode)->sysctl;
if (head) {
rcu_assign_pointer(PROC_I(inode)->sysctl, NULL);
- sysctl_head_put(head);
+ sysctl_proc_inode_put(head);
}
}

diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
index 068d39c..125b679 100644
--- a/fs/proc/proc_sysctl.c
+++ b/fs/proc/proc_sysctl.c
@@ -26,7 +26,7 @@ static struct inode *proc_sys_make_inode(struct super_block *sb,

inode->i_ino = get_next_ino();

- sysctl_head_get(head);
+ sysctl_proc_inode_get(head);
ei = PROC_I(inode);
ei->sysctl = head;
ei->sysctl_entry = table;
diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h
index 0f41beb..e265880 100644
--- a/include/linux/sysctl.h
+++ b/include/linux/sysctl.h
@@ -951,8 +951,11 @@ extern void setup_sysctl_set(struct ctl_table_set *p,

struct ctl_table_header;

-extern void sysctl_head_get(struct ctl_table_header *);
-extern void sysctl_head_put(struct ctl_table_header *);
+/* get/put a reference to this header that
+ * will be/was embedded in a procfs proc_inode */
+extern void sysctl_proc_inode_get(struct ctl_table_header *);
+extern void sysctl_proc_inode_put(struct ctl_table_header *);
+
extern int sysctl_is_seen(struct ctl_table_header *);
extern struct ctl_table_header *sysctl_use_header(struct ctl_table_header *);
extern struct ctl_table_header *sysctl_use_next_header(struct ctl_table_header *prev);
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 48a1ffd..caafbb8 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -1551,7 +1551,7 @@ static void start_unregistering(struct ctl_table_header *p)
list_del_init(&p->ctl_entry);
}

-void sysctl_head_get(struct ctl_table_header *head)
+void sysctl_proc_inode_get(struct ctl_table_header *head)
{
spin_lock(&sysctl_lock);
head->ctl_procfs_refs++;
@@ -1563,7 +1563,7 @@ static void free_head(struct rcu_head *rcu)
kfree(container_of(rcu, struct ctl_table_header, rcu));
}

-void sysctl_head_put(struct ctl_table_header *head)
+void sysctl_proc_inode_put(struct ctl_table_header *head)
{
spin_lock(&sysctl_lock);
head->ctl_procfs_refs--;
@@ -1990,7 +1990,7 @@ void setup_sysctl_set(struct ctl_table_set *p,
{
}

-void sysctl_head_put(struct ctl_table_header *head)
+void sysctl_proc_inode_put(struct ctl_table_header *head)
{
}

--
1.7.5.134.g1c08b

2011-05-08 22:49:49

by Lucian Adrian Grijincu

[permalink] [raw]
Subject: [v2 071/115] sysctl: rename (un)use_table to __sysctl_(un)use_header

The former names were not semantically correct, as the use/unuse was
related to the header, not the table. Also this makes it clearer that
sysctl_use_header and __sysctl_use_header are related (one takes the
spin lock inside and the other doesn't).

Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
kernel/sysctl.c | 21 ++++++++++-----------
1 files changed, 10 insertions(+), 11 deletions(-)

diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index caafbb8..1281827 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -1490,16 +1490,16 @@ static struct ctl_table dev_table[] = {
static DEFINE_SPINLOCK(sysctl_lock);

/* called under sysctl_lock */
-static int use_table(struct ctl_table_header *p)
+static struct ctl_table_header *__sysctl_use_header(struct ctl_table_header *head)
{
- if (unlikely(p->unregistering))
- return 0;
- p->ctl_use_refs++;
- return 1;
+ if (unlikely(head->unregistering))
+ return ERR_PTR(-ENOENT);
+ head->ctl_use_refs++;
+ return head;
}

/* called under sysctl_lock */
-static void unuse_table(struct ctl_table_header *p)
+static void __sysctl_unuse_header(struct ctl_table_header *p)
{
if (!--p->ctl_use_refs)
if (unlikely(p->unregistering))
@@ -1511,8 +1511,7 @@ struct ctl_table_header *sysctl_use_header(struct ctl_table_header *head)
if (!head)
head = &root_table_header;
spin_lock(&sysctl_lock);
- if (!use_table(head))
- head = ERR_PTR(-ENOENT);
+ head = __sysctl_use_header(head);
spin_unlock(&sysctl_lock);
return head;
}
@@ -1522,7 +1521,7 @@ void sysctl_unuse_header(struct ctl_table_header *head)
if (!head)
return;
spin_lock(&sysctl_lock);
- unuse_table(head);
+ __sysctl_unuse_header(head);
spin_unlock(&sysctl_lock);
}

@@ -1600,14 +1599,14 @@ struct ctl_table_header *__sysctl_use_next_header(struct nsproxy *namespaces,
if (prev) {
head = prev;
tmp = &prev->ctl_entry;
- unuse_table(prev);
+ __sysctl_unuse_header(prev);
goto next;
}
tmp = &root_table_header.ctl_entry;
for (;;) {
head = list_entry(tmp, struct ctl_table_header, ctl_entry);

- if (!use_table(head))
+ if (IS_ERR(__sysctl_use_header(head)))
goto next;
spin_unlock(&sysctl_lock);
return head;
--
1.7.5.134.g1c08b

2011-05-08 22:42:04

by Lucian Adrian Grijincu

[permalink] [raw]
Subject: [v2 072/115] sysctl: simplify ->permissions hook

The @root parameter was not used at all.

The @namespaces parameter was used to transmit current->nsproxy. We
can access current->nsproxy directly in the ->permissions function, no
need to send it.

Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
include/linux/sysctl.h | 3 +--
kernel/sysctl.c | 2 +-
net/sysctl_net.c | 9 +++------
3 files changed, 5 insertions(+), 9 deletions(-)

diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h
index e265880..1af4ed5 100644
--- a/include/linux/sysctl.h
+++ b/include/linux/sysctl.h
@@ -1034,8 +1034,7 @@ struct ctl_table_root {
struct ctl_table_set default_set;
struct ctl_table_set *(*lookup)(struct ctl_table_root *root,
struct nsproxy *namespaces);
- int (*permissions)(struct ctl_table_root *root,
- struct nsproxy *namespaces, struct ctl_table *table);
+ int (*permissions)(struct ctl_table *table);
};

/* struct ctl_table_header is used to maintain dynamic lists of
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 1281827..6e4e32b 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -1664,7 +1664,7 @@ int sysctl_perm(struct ctl_table_root *root, struct ctl_table *table, int op)
int mode;

if (root->permissions)
- mode = root->permissions(root, current->nsproxy, table);
+ mode = root->permissions(table);
else
mode = table->mode;

diff --git a/net/sysctl_net.c b/net/sysctl_net.c
index 1197d9c..1c0cb57 100644
--- a/net/sysctl_net.c
+++ b/net/sysctl_net.c
@@ -41,9 +41,7 @@ static int is_seen(struct ctl_table_set *set)
}

/* Return standard mode bits for table entry. */
-static int net_ctl_permissions(struct ctl_table_root *root,
- struct nsproxy *nsproxy,
- struct ctl_table *table)
+static int net_ctl_permissions(struct ctl_table *table)
{
/* Allow network administrator to have same access as root. */
if (capable(CAP_NET_ADMIN)) {
@@ -58,10 +56,9 @@ static struct ctl_table_root net_sysctl_root = {
.permissions = net_ctl_permissions,
};

-static int net_ctl_ro_header_perms(struct ctl_table_root *root,
- struct nsproxy *namespaces, struct ctl_table *table)
+static int net_ctl_ro_header_perms(ctl_table *table)
{
- if (net_eq(namespaces->net_ns, &init_net))
+ if (net_eq(current->nsproxy->net_ns, &init_net))
return table->mode;
else
return table->mode & ~0222;
--
1.7.5.134.g1c08b

2011-05-08 22:49:17

by Lucian Adrian Grijincu

[permalink] [raw]
Subject: [v2 073/115] sysctl: group root-specific operations

No functional change, just moved stuff around.

->lookup was not moved to _ops because we'll get rid of it later.

This makes ctl_table_set occupy less space (the pointer to is_seen),
and that will means N*sizeof(void*) saved for N network
namespaces, but I don't that will impress anyone.

Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
fs/proc/proc_sysctl.c | 4 ++--
include/linux/sysctl.h | 26 +++++++++++++++++++-------
kernel/sysctl.c | 25 ++++++++++++++-----------
net/sysctl_net.c | 20 ++++++++++++++------
4 files changed, 49 insertions(+), 26 deletions(-)

diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
index 125b679..55c9bd1 100644
--- a/fs/proc/proc_sysctl.c
+++ b/fs/proc/proc_sysctl.c
@@ -131,7 +131,7 @@ static ssize_t proc_sys_call_handler(struct file *filp, void __user *buf,
* and won't be until we finish.
*/
error = -EPERM;
- if (sysctl_perm(head->root, table, write ? MAY_WRITE : MAY_READ))
+ if (sysctl_perm(head->root->ctl_ops, table, write ? MAY_WRITE : MAY_READ))
goto out;

/* if that can happen at all, it should be -EINVAL, not -EISDIR */
@@ -305,7 +305,7 @@ static int proc_sys_permission(struct inode *inode, int mask,unsigned int flags)
if (!table) /* global root - r-xr-xr-x */
error = mask & MAY_WRITE ? -EACCES : 0;
else /* Use the permissions on the sysctl table entry */
- error = sysctl_perm(head->root, table, mask);
+ error = sysctl_perm(head->root->ctl_ops, table, mask);

sysctl_unuse_header(head);
return error;
diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h
index 1af4ed5..8209d75 100644
--- a/include/linux/sysctl.h
+++ b/include/linux/sysctl.h
@@ -934,22 +934,21 @@ enum

/* For the /proc/sys support */
struct ctl_table;
+struct ctl_table_header;
+struct ctl_table_group_ops;
struct nsproxy;
struct ctl_table_root;

struct ctl_table_set {
struct list_head list;
struct ctl_table_set *parent;
- int (*is_seen)(struct ctl_table_set *);
};

extern __init int sysctl_init(void);

extern void setup_sysctl_set(struct ctl_table_set *p,
- struct ctl_table_set *parent,
- int (*is_seen)(struct ctl_table_set *));
+ struct ctl_table_set *parent);

-struct ctl_table_header;

/* get/put a reference to this header that
* will be/was embedded in a procfs proc_inode */
@@ -962,8 +961,8 @@ extern struct ctl_table_header *sysctl_use_next_header(struct ctl_table_header *
extern struct ctl_table_header *__sysctl_use_next_header(struct nsproxy *namespaces,
struct ctl_table_header *prev);
extern void sysctl_unuse_header(struct ctl_table_header *prev);
-extern int sysctl_perm(struct ctl_table_root *root,
- struct ctl_table *table, int op);
+extern int sysctl_perm(const struct ctl_table_group_ops *ops,
+ struct ctl_table *table, int op);

typedef struct ctl_table ctl_table;

@@ -1029,12 +1028,25 @@ struct ctl_table
void *extra2;
};

+struct ctl_table_group_ops {
+ /* some sysctl entries are visible only in some situations.
+ * E.g.: /proc/sys/net/ipv4/conf/eth0/ is only visible in the
+ * netns in which that eth0 interface lives.
+ *
+ * If this hook is not set, then all the sysctl entries in
+ * this set are always visible. */
+ int (*is_seen)(struct ctl_table_set *set);
+
+ /* hook to alter permissions for some sysctl nodes at runtime */
+ int (*permissions)(struct ctl_table *table);
+};
+
struct ctl_table_root {
struct list_head root_list;
struct ctl_table_set default_set;
struct ctl_table_set *(*lookup)(struct ctl_table_root *root,
struct nsproxy *namespaces);
- int (*permissions)(struct ctl_table *table);
+ const struct ctl_table_group_ops *ctl_ops;
};

/* struct ctl_table_header is used to maintain dynamic lists of
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 6e4e32b..0f00b87 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -197,6 +197,9 @@ static int sysrq_sysctl_handler(ctl_table *table, int write,

#endif

+/* uses default ops */
+static const struct ctl_table_group_ops root_table_group_ops = { };
+
static struct ctl_table root_table[];
static struct ctl_table_root sysctl_table_root;
static struct ctl_table_header root_table_header = {
@@ -206,7 +209,9 @@ static struct ctl_table_header root_table_header = {
.root = &sysctl_table_root,
.set = &sysctl_table_root.default_set,
};
+
static struct ctl_table_root sysctl_table_root = {
+ .ctl_ops = &root_table_group_ops,
.root_list = LIST_HEAD_INIT(sysctl_table_root.root_list),
.default_set.list = LIST_HEAD_INIT(root_table_header.ctl_entry),
};
@@ -1659,12 +1664,13 @@ static int test_perm(int mode, int op)
return -EACCES;
}

-int sysctl_perm(struct ctl_table_root *root, struct ctl_table *table, int op)
+int sysctl_perm(const struct ctl_table_group_ops *ops,
+ struct ctl_table *table, int op)
{
int mode;

- if (root->permissions)
- mode = root->permissions(table);
+ if (ops->permissions)
+ mode = ops->permissions(table);
else
mode = table->mode;

@@ -1950,26 +1956,24 @@ void unregister_sysctl_table(struct ctl_table_header * header)

int sysctl_is_seen(struct ctl_table_header *p)
{
- struct ctl_table_set *set = p->set;
+ const struct ctl_table_group_ops *ops = p->root->ctl_ops;
int res;
spin_lock(&sysctl_lock);
if (p->unregistering)
res = 0;
- else if (!set->is_seen)
+ else if (!ops->is_seen)
res = 1;
else
- res = set->is_seen(set);
+ res = ops->is_seen(p->set);
spin_unlock(&sysctl_lock);
return res;
}

void setup_sysctl_set(struct ctl_table_set *p,
- struct ctl_table_set *parent,
- int (*is_seen)(struct ctl_table_set *))
+ struct ctl_table_set *parent)
{
INIT_LIST_HEAD(&p->list);
p->parent = parent ? parent : &sysctl_table_root.default_set;
- p->is_seen = is_seen;
}

#else /* !CONFIG_SYSCTL */
@@ -1984,8 +1988,7 @@ void unregister_sysctl_table(struct ctl_table_header * table)
}

void setup_sysctl_set(struct ctl_table_set *p,
- struct ctl_table_set *parent,
- int (*is_seen)(struct ctl_table_set *))
+ struct ctl_table_set *parent)
{
}

diff --git a/net/sysctl_net.c b/net/sysctl_net.c
index 1c0cb57..c0d7140 100644
--- a/net/sysctl_net.c
+++ b/net/sysctl_net.c
@@ -51,12 +51,17 @@ static int net_ctl_permissions(struct ctl_table *table)
return table->mode;
}

+static const struct ctl_table_group_ops net_sysctl_group_ops = {
+ .is_seen = is_seen,
+ .permissions = net_ctl_permissions,
+};
+
static struct ctl_table_root net_sysctl_root = {
.lookup = net_ctl_header_lookup,
- .permissions = net_ctl_permissions,
+ .ctl_ops = &net_sysctl_group_ops,
};

-static int net_ctl_ro_header_perms(ctl_table *table)
+static int net_ctl_ro_header_permissions(ctl_table *table)
{
if (net_eq(current->nsproxy->net_ns, &init_net))
return table->mode;
@@ -64,15 +69,18 @@ static int net_ctl_ro_header_perms(ctl_table *table)
return table->mode & ~0222;
}

+static const struct ctl_table_group_ops net_sysctl_ro_group_ops = {
+ .permissions = net_ctl_ro_header_permissions,
+};
+
static struct ctl_table_root net_sysctl_ro_root = {
- .permissions = net_ctl_ro_header_perms,
+ .ctl_ops = &net_sysctl_ro_group_ops,
};

static int __net_init sysctl_net_init(struct net *net)
{
setup_sysctl_set(&net->sysctls,
- &net_sysctl_ro_root.default_set,
- is_seen);
+ &net_sysctl_ro_root.default_set);
return 0;
}

@@ -93,7 +101,7 @@ static __init int net_sysctl_init(void)
if (ret)
goto out;
register_sysctl_root(&net_sysctl_root);
- setup_sysctl_set(&net_sysctl_ro_root.default_set, NULL, NULL);
+ setup_sysctl_set(&net_sysctl_ro_root.default_set, NULL);
register_sysctl_root(&net_sysctl_ro_root);
out:
return ret;
--
1.7.5.134.g1c08b

2011-05-08 22:48:58

by Lucian Adrian Grijincu

[permalink] [raw]
Subject: [v2 074/115] sysctl: introduce ctl_table_group

ctl_table_group will replace in the future ctl_table_root and
ctl_table_set. For now it only takes from ctl_table_root the ctl_ops
field.

Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
fs/proc/proc_sysctl.c | 4 ++--
include/linux/sysctl.h | 16 ++++++++++++----
kernel/sysctl.c | 18 ++++++++++++------
net/sysctl_net.c | 13 +++++++++----
4 files changed, 35 insertions(+), 16 deletions(-)

diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
index 55c9bd1..375d145 100644
--- a/fs/proc/proc_sysctl.c
+++ b/fs/proc/proc_sysctl.c
@@ -131,7 +131,7 @@ static ssize_t proc_sys_call_handler(struct file *filp, void __user *buf,
* and won't be until we finish.
*/
error = -EPERM;
- if (sysctl_perm(head->root->ctl_ops, table, write ? MAY_WRITE : MAY_READ))
+ if (sysctl_perm(head->ctl_group, table, write ? MAY_WRITE : MAY_READ))
goto out;

/* if that can happen at all, it should be -EINVAL, not -EISDIR */
@@ -305,7 +305,7 @@ static int proc_sys_permission(struct inode *inode, int mask,unsigned int flags)
if (!table) /* global root - r-xr-xr-x */
error = mask & MAY_WRITE ? -EACCES : 0;
else /* Use the permissions on the sysctl table entry */
- error = sysctl_perm(head->root->ctl_ops, table, mask);
+ error = sysctl_perm(head->ctl_group, table, mask);

sysctl_unuse_header(head);
return error;
diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h
index 8209d75..a12ab12 100644
--- a/include/linux/sysctl.h
+++ b/include/linux/sysctl.h
@@ -935,6 +935,7 @@ enum
/* For the /proc/sys support */
struct ctl_table;
struct ctl_table_header;
+struct ctl_table_group;
struct ctl_table_group_ops;
struct nsproxy;
struct ctl_table_root;
@@ -961,7 +962,7 @@ extern struct ctl_table_header *sysctl_use_next_header(struct ctl_table_header *
extern struct ctl_table_header *__sysctl_use_next_header(struct nsproxy *namespaces,
struct ctl_table_header *prev);
extern void sysctl_unuse_header(struct ctl_table_header *prev);
-extern int sysctl_perm(const struct ctl_table_group_ops *ops,
+extern int sysctl_perm(struct ctl_table_group *group,
struct ctl_table *table, int op);

typedef struct ctl_table ctl_table;
@@ -1041,12 +1042,15 @@ struct ctl_table_group_ops {
int (*permissions)(struct ctl_table *table);
};

+struct ctl_table_group {
+ const struct ctl_table_group_ops *ctl_ops;
+};
+
struct ctl_table_root {
struct list_head root_list;
struct ctl_table_set default_set;
struct ctl_table_set *(*lookup)(struct ctl_table_root *root,
struct nsproxy *namespaces);
- const struct ctl_table_group_ops *ctl_ops;
};

/* struct ctl_table_header is used to maintain dynamic lists of
@@ -1073,6 +1077,7 @@ struct ctl_table_header
struct completion *unregistering;
struct ctl_table *ctl_table_arg;
struct ctl_table_root *root;
+ struct ctl_table_group *ctl_group;
struct ctl_table_set *set;
struct ctl_table *attached_by;
struct ctl_table *attached_to;
@@ -1086,8 +1091,11 @@ struct ctl_path {

void register_sysctl_root(struct ctl_table_root *root);
struct ctl_table_header *__register_sysctl_paths(
- struct ctl_table_root *root, struct nsproxy *namespaces,
- const struct ctl_path *path, struct ctl_table *table);
+ struct ctl_table_root *root,
+ struct ctl_table_group *group,
+ struct nsproxy *namespaces,
+ const struct ctl_path *path,
+ struct ctl_table *table);
struct ctl_table_header *register_sysctl_paths(const struct ctl_path *path,
struct ctl_table *table);

diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 0f00b87..8dde087 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -200,6 +200,10 @@ static int sysrq_sysctl_handler(ctl_table *table, int write,
/* uses default ops */
static const struct ctl_table_group_ops root_table_group_ops = { };

+static struct ctl_table_group root_table_group = {
+ .ctl_ops = &root_table_group_ops,
+};
+
static struct ctl_table root_table[];
static struct ctl_table_root sysctl_table_root;
static struct ctl_table_header root_table_header = {
@@ -207,11 +211,11 @@ static struct ctl_table_header root_table_header = {
.ctl_table = root_table,
.ctl_entry = LIST_HEAD_INIT(sysctl_table_root.default_set.list),}},
.root = &sysctl_table_root,
+ .ctl_group = &root_table_group,
.set = &sysctl_table_root.default_set,
};

static struct ctl_table_root sysctl_table_root = {
- .ctl_ops = &root_table_group_ops,
.root_list = LIST_HEAD_INIT(sysctl_table_root.root_list),
.default_set.list = LIST_HEAD_INIT(root_table_header.ctl_entry),
};
@@ -1664,10 +1668,10 @@ static int test_perm(int mode, int op)
return -EACCES;
}

-int sysctl_perm(const struct ctl_table_group_ops *ops,
- struct ctl_table *table, int op)
+int sysctl_perm(struct ctl_table_group *group, struct ctl_table *table, int op)
{
int mode;
+ const struct ctl_table_group_ops *ops = group->ctl_ops;

if (ops->permissions)
mode = ops->permissions(table);
@@ -1838,6 +1842,7 @@ static void try_attach(struct ctl_table_header *p, struct ctl_table_header *q)
*/
struct ctl_table_header *__register_sysctl_paths(
struct ctl_table_root *root,
+ struct ctl_table_group *group,
struct nsproxy *namespaces,
const struct ctl_path *path, struct ctl_table *table)
{
@@ -1883,6 +1888,7 @@ struct ctl_table_header *__register_sysctl_paths(
INIT_LIST_HEAD(&header->ctl_entry);
header->unregistering = NULL;
header->root = root;
+ header->ctl_group = group;
header->ctl_header_refs = 1;
#ifdef CONFIG_SYSCTL_SYSCALL_CHECK
if (sysctl_check_table(namespaces, header->ctl_table)) {
@@ -1923,8 +1929,8 @@ struct ctl_table_header *__register_sysctl_paths(
struct ctl_table_header *register_sysctl_paths(const struct ctl_path *path,
struct ctl_table *table)
{
- return __register_sysctl_paths(&sysctl_table_root, current->nsproxy,
- path, table);
+ return __register_sysctl_paths(&sysctl_table_root, &root_table_group,
+ current->nsproxy, path, table);
}

/**
@@ -1956,7 +1962,7 @@ void unregister_sysctl_table(struct ctl_table_header * header)

int sysctl_is_seen(struct ctl_table_header *p)
{
- const struct ctl_table_group_ops *ops = p->root->ctl_ops;
+ const struct ctl_table_group_ops *ops = p->ctl_group->ctl_ops;
int res;
spin_lock(&sysctl_lock);
if (p->unregistering)
diff --git a/net/sysctl_net.c b/net/sysctl_net.c
index c0d7140..5009d4e 100644
--- a/net/sysctl_net.c
+++ b/net/sysctl_net.c
@@ -56,9 +56,12 @@ static const struct ctl_table_group_ops net_sysctl_group_ops = {
.permissions = net_ctl_permissions,
};

+static struct ctl_table_group net_sysctl_group = {
+ .ctl_ops = &net_sysctl_group_ops,
+};
+
static struct ctl_table_root net_sysctl_root = {
.lookup = net_ctl_header_lookup,
- .ctl_ops = &net_sysctl_group_ops,
};

static int net_ctl_ro_header_permissions(ctl_table *table)
@@ -73,10 +76,12 @@ static const struct ctl_table_group_ops net_sysctl_ro_group_ops = {
.permissions = net_ctl_ro_header_permissions,
};

-static struct ctl_table_root net_sysctl_ro_root = {
+static struct ctl_table_group net_sysctl_ro_group = {
.ctl_ops = &net_sysctl_ro_group_ops,
};

+static struct ctl_table_root net_sysctl_ro_root = { };
+
static int __net_init sysctl_net_init(struct net *net)
{
setup_sysctl_set(&net->sysctls,
@@ -114,7 +119,7 @@ struct ctl_table_header *register_net_sysctl_table(struct net *net,
struct nsproxy namespaces;
namespaces = *current->nsproxy;
namespaces.net_ns = net;
- return __register_sysctl_paths(&net_sysctl_root,
+ return __register_sysctl_paths(&net_sysctl_root, &net_sysctl_group,
&namespaces, path, table);
}
EXPORT_SYMBOL_GPL(register_net_sysctl_table);
@@ -122,7 +127,7 @@ EXPORT_SYMBOL_GPL(register_net_sysctl_table);
struct ctl_table_header *register_net_sysctl_rotable(const
struct ctl_path *path, struct ctl_table *table)
{
- return __register_sysctl_paths(&net_sysctl_ro_root,
+ return __register_sysctl_paths(&net_sysctl_ro_root, &net_sysctl_ro_group,
&init_nsproxy, path, table);
}
EXPORT_SYMBOL_GPL(register_net_sysctl_rotable);
--
1.7.5.134.g1c08b

2011-05-08 22:48:41

by Lucian Adrian Grijincu

[permalink] [raw]
Subject: [v2 075/115] sysctl: move removal from list out of start_unregistering

Later on we'll switch form a global list protected by the sysctl_lock
spin lock to rwsem protected per-header lists.

At that point we'll need to hold the parent header's rwlock to remove
the header from the list, not the sysctl_lock spin lock.

As start_unregistering is called under the sysctl_lock, we move the
list removal out.

Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
kernel/sysctl.c | 12 +++++++-----
1 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 8dde087..a863b56 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -1552,11 +1552,6 @@ static void start_unregistering(struct ctl_table_header *p)
/* anything non-NULL; we'll never dereference it */
p->unregistering = ERR_PTR(-EINVAL);
}
- /*
- * do not remove from the list until nobody holds it; walking the
- * list in do_sysctl() relies on that.
- */
- list_del_init(&p->ctl_entry);
}

void sysctl_proc_inode_get(struct ctl_table_header *head)
@@ -1949,6 +1944,13 @@ void unregister_sysctl_table(struct ctl_table_header * header)

spin_lock(&sysctl_lock);
start_unregistering(header);
+
+ /* after start_unregistering has finished no one holds a
+ * ctl_use_refs or is able to acquire one => no one is going
+ * to access internal fields of this object, so we can remove
+ * it from the list and schedule it for deletion. */
+ list_del_init(&p->ctl_entry);
+
if (!--header->parent->ctl_header_refs) {
WARN_ON(1);
if (!header->parent->ctl_procfs_refs)
--
1.7.5.134.g1c08b

2011-05-08 22:48:09

by Lucian Adrian Grijincu

[permalink] [raw]
Subject: [v2 076/115] sysctl: faster tree-based sysctl implementation

The old implementation used inefficient algorithms both at
lookup/readdir times and at registration. This patch introduces an
improved algorithm: lower memory consumption, better time complexity
for lookup/readdir/registration. Locking is a bit heavier in this
algorithm (in this patch: reader locks for lookup/readdir, writer
locks for register/unregister; in a later patch in this series: RCU +
spin-lock). I'll address this locking issue later in this commit.

I will shortly describe the previous algorithm, the new one and brag
at the end with an endless list of improvements and new limitations.

= Old algorithm =

== Description ==
We created a ctl_table_header for each registered sysctl table. The
header's role is to maintain sysctl internal data, reference counting
and as a token to unregister the table.

All headers were put in a list in the order of registration without
regard to the position of the tables in the sysctl tree. Headers were
also 'attached' one to another to (somewhat) speed up lookup/readdir.

Attachment meant looking at each other already registered header and
comparing the paths to the tables. A newly registered header would be
attached to the first header with which it would share most of it's
path.

e.g. paths registered: /, /a/b/c, /a/b/c/d, /a/x, /a/x/y, /a/z
tree:
/
+ /a/b/c
| + /a/b/c/d
+ /a/x
| /a/x/y
+ /a/z

== Time complexity ==

- register N tables would take O(N^2) steps (see above)

- lookup: if the item searched for is not found in the current header,
iterate the list of headers until you find another header that's
attached to the current position in the header's table. Lookups for
elements that are in a header registered under the current position
or inexistent elements would take O(N) steps each.

- readdir: after searching the current headers table in the current
position, always do an O(N) search for a header attached to the
current table position.

== Memory ==

Each header was allocated some data and a variable-length path.
O(1) with kzalloc/kfree.

= New algorithm =

== Description ==

Reuses the 'ctl_table_header' concept but with two distinct meanings:
- as a wrapper of a table registered by the user
- as a directory entry.

Registering the paths from the above example gives this tree:
paths: /, /a/b/c, /a/b/c/d, /a/x, /a/x/y, /a/z
tree:
/: .subdirs = a
a: .subdirs = b x z
b: subdirs = c
c: subdirs = d
d:
x: subdirs = y
y:
z:

Each directory gets a header. Each header has a parent (except root)
and two lists:
- ctl_subdirs: list of sub-directories - other headers
- ctl_tables: list of headers that wrap a ctl_table array

Because the directory structure is now maintained as ctl_table_header
objects, we needed to remove the .child from ctl_tables (this explains
the previous patches). A ctl_table array represents a list of files.

== Time complexity ==

- registration of N headers. Registration means adding new directories
at each level or incrementing an existing directory's refcount.

- O(N * lnN) - if the paths to the headers are evenly distributed

- O(N^2) - if most of the headers registered are children of the
same parent directory (searching the list of subdirs takes O(N)).
There are cases where this happens (e.g. registering sysctl
entries for net devices under /proc/sys/net/ipv4|6/conf/device).

A few later patches will add an optimisation, to fix locations
that might trigger the O(N^2) issue.

- lookup: O(len(subdirs) + sum(len(tarr) for each tarr in ctl_tables)
- could be made better:
- sort ctl_subdirs (for binary search)
- replace ctl_subdirs with a hash-table (increase memory footprint)
- sort ctl_table entries at registration time (for binary search).
Could be done, but I'm too lazy to do it now.

- readdir: O(len(subdirs) + sum(len(tarr) for each tarr in ctl_tables)
- can't get any better than this :)

== Memory complexity ==

Although we create more ctl_table_header (one for each directory, one
for each table, and because we deleted the .child from ctl_table there
are more tables registered than before this patch) we remove the need
to store a full path (from too to the table) as was done in the old
solution => a O(N) small memory gain with report to the old algo.

= Limitations =

== ctl_table does not has .child => some code uglyfication ==

Registering tables with multiple directories and files cannot be done
in a single operation: there must be at least a table registered for
each directory. This make code that registers sysctls uglier (see the
earlier patches that remove .child form sched_domain and the root
table). Other places e.g. the parport systls look much better now
without .child: I can now read and understand that code.

== Handling of netns specific paths is weirder ==

The algorithm descriptions from above are simplifications. In reality
the code needs to handle directories and files that must be visible in
some netns' only. E.g. the /proc/sys/net/ipv4/conf/DEVICENAME/
directory and it's files must be visible only in the netns of that
device.

The old algorithm used a secondary list that indexed all netns
specific headers. All algorithms remain the same, with the mention
that besides searching the global list, the algorithm would also look
into the current netns' list of headers. This scales perfectly in
rapport to the number of network namespaces.

The new algorithm does something similar, but a bit more complicated.
We also use netns specific lists of directories/tables and store them
in a special directory ctl_table_header (which I dubbed the
"netns-correspondent" of another directory - I'm not very pleased with
the name either).

When registering a net-ns specific table, we will create a
"netns-correspondent" to the last directory that is not net-ns
specific in that path.

E.g.: we're registering a netns specific table for 'lo':
common path: /proc/sys/net/ipv4/
netns path: /proc/sys/net/ipv4/conf/lo/

We'll create an (unnamed) netns correspondent for 'ipv4' which will
have 'conf' as it's subdir.

E.g.: We're registering a netns specific file in /proc/sys/net/core/somaxconn
common path: /proc/sys/net/core/
netns path: /proc/sys/net/core/

We'll create an (unnamed) netns correspondent for 'core' with the
table containing 'somaxconn' in ctl_tables.

All net-ns correspondents of one netns are held in a single list, and
each netns gets it own list. This keeps the algorithm complexity
indifferent of the number of network namespaces (as was the old one).

However, now only a smaller part of directories are members of this
list, improving register/lookup/readdir time complexity.

There is one ugly limitation that stems from this approach.
E.g.: register these files in this order:
- register common /dir1/file-common1
- register netns specific /dir1/dir2/file-netns
- register common /dir1/dir2/file-common2

We'll have this tree:
'dir1' { .subdirs = ['dir2'], .tables = ['file-common1'] }
^ |
| -> { .subdirs = [], .tables = ['file-common2'] }
|
| (unnamed netns-corresp for dir1)
-> { .subdir = ['dir2'] }
|
-> { .subdirs = [], .tables = ['file-netns'] }

readdir: when we list the contents of 'dir1' we'll see it has two
sub-directories named 'dir2' each with a file in it.

lookup: lookup of /dir1/dir2/file-netns will not work because we find
'dir2' as a subdir of 'dir1' and stick with it and never look
into the netns correspondent of 'dir1'.

This can be fixed in two ways:

- A) by making sure to never register a netns specific directory and
after that register that directory as a common one. From what I can
tell there isn't such a problem in the kernel at the moment, but I
did not study the source in detail.

- B) by increasing the complexity of the code:

- readdir: looking at both lists and comparing if we have already
listed a directory as common, so we don't list twice.
-> For imbalanced trees this can make readdir O(N^2) :(

- register: the netns 'dir2' from the example above needs to be
connected to the common 'dir2' when 'dir2' is
registered. I'm not even going to thing of how time
complexity/ugliness is going to explode here.

= Change summary =

* include/linux/sysctl.h
- removed _set and _root, replaced with _group

- netns correspondent directories are held in each netns's
group->corresp_list

- reused the header structure to represent directories which don't
use ctl_table_arg, but store the directory name directly.

- each directory header also gets two lists: subdirs and tables

* fs/proc/proc_sysctl.c
- a proc inode has ->sysctl_entry set only for files, not
directories as these store the dirname directly

- lookup:
- take the dirs read-lock and iterate through subdirs and tables
- if nothing is found, try the dir's netns-correspondent

- scan: list every subdir and file that was not listed before

- readdir: scan the current dir and it's netns correspondent

* kernel/sysctl.c
- inlines the code of use_table/unuse_table as it is not used
elsewhere (used to be called from __register, but aren't any more)

- adds routines to get/set the netns-correspondent

- adds routines to protect the subdirs/tables lists (rwsem)

- __register_sysctl_paths:
- preallocate ctl_table_header for every dir in 'path'
- increase the ctl_header_refs of every existing directory
- if the group needs a netns-correspondent it is created for the
last existing directory that is part of the non-netns specific
path.
- all the non-existing directories are added as children of their
parent's subdir lists.

- unregister:
- wait until no one uses the header
- for normal directories and table-wrapper headers take the
parent's write lock to be able to delete something from one of
it's lists (ctl_subdir or ctl_tables).
- netns-correspondent headers must take the netns group list lock
before deleting.

Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
fs/proc/proc_sysctl.c | 159 ++++++++-----
include/linux/sysctl.h | 120 +++++------
include/net/net_namespace.h | 2 +-
kernel/sysctl.c | 533 ++++++++++++++++++++++++++----------------
kernel/sysctl_check.c | 168 +--------------
net/sysctl_net.c | 41 +---
6 files changed, 499 insertions(+), 524 deletions(-)

diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
index 375d145..9337149 100644
--- a/fs/proc/proc_sysctl.c
+++ b/fs/proc/proc_sysctl.c
@@ -32,13 +32,14 @@ static struct inode *proc_sys_make_inode(struct super_block *sb,
ei->sysctl_entry = table;

inode->i_mtime = inode->i_atime = inode->i_ctime = CURRENT_TIME;
- inode->i_mode = table->mode;
- if (!table->child) {
- inode->i_mode |= S_IFREG;
+
+ /* directories have table==NULL (thus ei->sysctl_entry is NULL too) */
+ if (table) {
+ inode->i_mode = S_IFREG | table->mode;
inode->i_op = &proc_sys_inode_operations;
inode->i_fop = &proc_sys_file_operations;
} else {
- inode->i_mode |= S_IFDIR;
+ inode->i_mode = S_IFDIR | S_IRUGO | S_IWUSR;
inode->i_nlink = 0;
inode->i_op = &proc_sys_dir_operations;
inode->i_fop = &proc_sys_dir_file_operations;
@@ -66,42 +67,65 @@ static struct dentry *proc_sys_lookup(struct inode *dir, struct dentry *dentry,
struct nameidata *nd)
{
struct ctl_table_header *head = sysctl_use_header(PROC_I(dir)->sysctl);
- struct ctl_table *table = PROC_I(dir)->sysctl_entry;
- struct ctl_table_header *h = NULL;
struct qstr *name = &dentry->d_name;
- struct ctl_table *p;
+ struct ctl_table_header *h = NULL, *found_head = NULL;
+ struct ctl_table *table = NULL;
struct inode *inode;
struct dentry *err = ERR_PTR(-ENOENT);

+
if (IS_ERR(head))
return ERR_CAST(head);

- if (table && !table->child) {
- WARN_ON(1);
- goto out;
+retry:
+ sysctl_read_lock_head(head);
+
+ /* first check whether a subdirectory has the searched-for name */
+ list_for_each_entry(h, &head->ctl_subdirs, ctl_entry) {
+ if (IS_ERR(sysctl_use_header(h)))
+ continue;
+
+ if (strcmp(name->name, h->ctl_dirname) == 0) {
+ found_head = h;
+ goto search_finished;
+ }
+ sysctl_unuse_header(h);
}

- table = table ? table->child : head->ctl_table;
+ /* no subdir with that name, look for the file in the ctl_tables */
+ list_for_each_entry(h, &head->ctl_tables, ctl_entry) {
+ if (IS_ERR(sysctl_use_header(h)))
+ continue;

- p = find_in_table(table, name);
- if (!p) {
- for (h = sysctl_use_next_header(NULL); h; h = sysctl_use_next_header(h)) {
- if (h->attached_to != table)
- continue;
- p = find_in_table(h->attached_by, name);
- if (p)
- break;
+ table = find_in_table(h->ctl_table_arg, name);
+ if (table) {
+ found_head = h;
+ goto search_finished;
}
+ sysctl_unuse_header(h);
}

- if (!p)
+search_finished:
+ sysctl_read_unlock_head(head);
+
+ if (!found_head) {
+ struct ctl_table_header *netns_corresp;
+ /* the item was not found in the dir's sub-directories
+ * or tables. See if this dir has a netns
+ * correspondent and restart the lookup in there. */
+ netns_corresp = sysctl_use_netns_corresp(head);
+ if (netns_corresp) {
+ sysctl_unuse_header(head);
+ head = netns_corresp;
+ goto retry;
+ }
+ }
+ if (!found_head)
goto out;

err = ERR_PTR(-ENOMEM);
- inode = proc_sys_make_inode(dir->i_sb, h ? h : head, p);
- if (h)
- sysctl_unuse_header(h);
-
+ inode = proc_sys_make_inode(dir->i_sb, found_head, table);
+ sysctl_unuse_header(found_head);
if (!inode)
goto out;

@@ -174,8 +198,8 @@ static int proc_sys_fill_cache(struct file *filp, void *dirent,
ino_t ino = 0;
unsigned type = DT_UNKNOWN;

- qname.name = table->procname;
- qname.len = strlen(table->procname);
+ qname.name = table ? table->procname : head->ctl_dirname;
+ qname.len = strlen(qname.name);
qname.hash = full_name_hash(qname.name, qname.len);

child = d_lookup(dir, &qname);
@@ -201,28 +225,56 @@ static int proc_sys_fill_cache(struct file *filp, void *dirent,
return !!filldir(dirent, qname.name, qname.len, filp->f_pos, ino, type);
}

-static int scan(struct ctl_table_header *head, ctl_table *table,
+static int scan(struct ctl_table_header *head,
unsigned long *pos, struct file *file,
void *dirent, filldir_t filldir)
{
+ struct ctl_table_header *h;
+ int res = 0;

- for (; table->procname; table++, (*pos)++) {
- int res;
+ sysctl_read_lock_head(head);

- /* Can't do anything without a proc name */
- if (!table->procname)
+ list_for_each_entry(h, &head->ctl_subdirs, ctl_entry) {
+ if (*pos < file->f_pos) {
+ (*pos)++;
continue;
+ }

- if (*pos < file->f_pos)
+ if (IS_ERR(sysctl_use_header(h)))
continue;

- res = proc_sys_fill_cache(file, dirent, filldir, head, table);
+ res = proc_sys_fill_cache(file, dirent, filldir, h, NULL);
+ sysctl_unuse_header(h);
if (res)
- return res;
+ goto out;

file->f_pos = *pos + 1;
+ (*pos)++;
}
- return 0;
+
+ list_for_each_entry(h, &head->ctl_tables, ctl_entry) {
+ ctl_table *t;
+
+ if (IS_ERR(sysctl_use_header(h)))
+ continue;
+
+ for (t = h->ctl_table_arg; t->procname; t++, (*pos)++) {
+ if (*pos < file->f_pos)
+ continue;
+
+ res = proc_sys_fill_cache(file, dirent, filldir, h, t);
+ if (res) {
+ sysctl_unuse_header(h);
+ goto out;
+ }
+ file->f_pos = *pos + 1;
+ }
+ sysctl_unuse_header(h);
+ }
+
+out:
+ sysctl_read_unlock_head(head);
+ return res;
}

static int proc_sys_readdir(struct file *filp, void *dirent, filldir_t filldir)
@@ -230,21 +282,12 @@ static int proc_sys_readdir(struct file *filp, void *dirent, filldir_t filldir)
struct dentry *dentry = filp->f_path.dentry;
struct inode *inode = dentry->d_inode;
struct ctl_table_header *head = sysctl_use_header(PROC_I(inode)->sysctl);
- struct ctl_table *table = PROC_I(inode)->sysctl_entry;
- struct ctl_table_header *h = NULL;
unsigned long pos;
int ret = -EINVAL;

if (IS_ERR(head))
return PTR_ERR(head);

- if (table && !table->child) {
- WARN_ON(1);
- goto out;
- }
-
- table = table ? table->child : head->ctl_table;
-
ret = 0;
/* Avoid a switch here: arm builds fail with missing __cmpdi2 */
if (filp->f_pos == 0) {
@@ -260,18 +303,20 @@ static int proc_sys_readdir(struct file *filp, void *dirent, filldir_t filldir)
filp->f_pos++;
}
pos = 2;
-
- ret = scan(head, table, &pos, filp, dirent, filldir);
- if (ret)
- goto out;
-
- for (h = sysctl_use_next_header(NULL); h; h = sysctl_use_next_header(h)) {
- if (h->attached_to != table)
- continue;
- ret = scan(h, h->attached_by, &pos, filp, dirent, filldir);
- if (ret) {
- sysctl_unuse_header(h);
- break;
+ ret = scan(head, &pos, filp, dirent, filldir);
+ if (!ret) {
+ /* the netns-correspondent contains only those
+ * subdirectories that are netns-specific, and not
+ * shared with the @head directory: there is no
+ * possibility to list the same directory twice (once
+ * for @head and once for @netns_corresp). Sibling
+ * tables cannot contain the entries with the same
+ * name, no need to worry about them either. */
+ struct ctl_table_header *netns_corresp;
+ netns_corresp = sysctl_use_netns_corresp(head);
+ if (netns_corresp) {
+ ret = scan(netns_corresp, &pos, filp, dirent, filldir);
+ sysctl_unuse_header(netns_corresp);
}
}
ret = 1;
@@ -302,7 +347,7 @@ static int proc_sys_permission(struct inode *inode, int mask,unsigned int flags)
return PTR_ERR(head);

table = PROC_I(inode)->sysctl_entry;
- if (!table) /* global root - r-xr-xr-x */
+ if (!table) /* directory - r-xr-xr-x */
error = mask & MAY_WRITE ? -EACCES : 0;
else /* Use the permissions on the sysctl table entry */
error = sysctl_perm(head->ctl_group, table, mask);
diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h
index a12ab12..b626271 100644
--- a/include/linux/sysctl.h
+++ b/include/linux/sysctl.h
@@ -937,18 +937,12 @@ struct ctl_table;
struct ctl_table_header;
struct ctl_table_group;
struct ctl_table_group_ops;
-struct nsproxy;
-struct ctl_table_root;
-
-struct ctl_table_set {
- struct list_head list;
- struct ctl_table_set *parent;
-};

extern __init int sysctl_init(void);

-extern void setup_sysctl_set(struct ctl_table_set *p,
- struct ctl_table_set *parent);
+extern void sysctl_init_group(struct ctl_table_group *group,
+ const struct ctl_table_group_ops *ops,
+ int has_netns_corresp);


/* get/put a reference to this header that
@@ -957,14 +951,23 @@ extern void sysctl_proc_inode_get(struct ctl_table_header *);
extern void sysctl_proc_inode_put(struct ctl_table_header *);

extern int sysctl_is_seen(struct ctl_table_header *);
-extern struct ctl_table_header *sysctl_use_header(struct ctl_table_header *);
-extern struct ctl_table_header *sysctl_use_next_header(struct ctl_table_header *prev);
-extern struct ctl_table_header *__sysctl_use_next_header(struct nsproxy *namespaces,
- struct ctl_table_header *prev);
-extern void sysctl_unuse_header(struct ctl_table_header *prev);
extern int sysctl_perm(struct ctl_table_group *group,
struct ctl_table *table, int op);

+/* proctect the ctl_subdirs/ctl_tables lists */
+extern void sysctl_write_lock_head(struct ctl_table_header *head);
+extern void sysctl_write_unlock_head(struct ctl_table_header *head);
+extern void sysctl_read_lock_head(struct ctl_table_header *head);
+extern void sysctl_read_unlock_head(struct ctl_table_header *head);
+
+/* get/put references to this header with the pourpose of using it's internals.
+ * As long as the use count is not zero, there may be items accessing it,
+ * so we can't even remove it from the lists (ctl_entry). */
+extern struct ctl_table_header *sysctl_use_header(struct ctl_table_header *);
+extern struct ctl_table_header *sysctl_use_netns_corresp(struct ctl_table_header *);
+extern void sysctl_unuse_header(struct ctl_table_header *prev);
+
+
typedef struct ctl_table ctl_table;

typedef int proc_handler (struct ctl_table *ctl, int write,
@@ -991,39 +994,29 @@ extern int proc_do_large_bitmap(struct ctl_table *, int,

/*
* Register a set of sysctl names by calling __register_sysctl_paths
- * with an initialised array of struct ctl_table's. An entry with
- * NULL procname terminates the table. table->de will be
- * set up by the registration and need not be initialised in advance.
- *
- * sysctl names can be mirrored automatically under /proc/sys. The
- * procname supplied controls /proc naming.
+ * with an initialised array of struct ctl_table's. An entry with a
+ * NULL procname terminates the table.
*
* The table's mode will be honoured both for sys_sysctl(2) and
- * proc-fs access.
+ * proc-fs access (sys_sysctl(2) uses procfs internally).
*
- * Leaf nodes in the sysctl tree will be represented by a single file
- * under /proc; non-leaf nodes will be represented by directories. A
- * null procname disables /proc mirroring at this node.
+ * Only files can be represented by ctl_table elements. Directories
+ * are implemented with ctl_table_header objects.
*
- * sysctl(2) can automatically manage read and write requests through
- * the sysctl table. The data and maxlen fields of the ctl_table
- * struct enable minimal validation of the values being written to be
- * performed, and the mode field allows minimal authentication.
- *
- * There must be a proc_handler routine for any terminal nodes
- * mirrored under /proc/sys (non-terminals are handled by a built-in
- * directory handler). Several default handlers are available to
- * cover common cases.
+ * The data and maxlen fields of the ctl_table struct enable minimal
+ * validation of the values being written to be performed, and the
+ * mode field allows minimal authentication.
+ *
+ * There must be a proc_handler routine for each ctl_table node.
+ * Several default handlers are available to cover common cases.
*/

/* A sysctl table is an array of struct ctl_table: */
-struct ctl_table
-{
+struct ctl_table {
const char *procname; /* Text ID for /proc/sys, or zero */
void *data;
int maxlen;
mode_t mode;
- struct ctl_table *child;
proc_handler *proc_handler; /* Callback for text formatting */
void *extra1;
void *extra2;
@@ -1035,8 +1028,8 @@ struct ctl_table_group_ops {
* netns in which that eth0 interface lives.
*
* If this hook is not set, then all the sysctl entries in
- * this set are always visible. */
- int (*is_seen)(struct ctl_table_set *set);
+ * this group are always visible. */
+ int (*is_seen)(struct ctl_table_group *group);

/* hook to alter permissions for some sysctl nodes at runtime */
int (*permissions)(struct ctl_table *table);
@@ -1044,22 +1037,24 @@ struct ctl_table_group_ops {

struct ctl_table_group {
const struct ctl_table_group_ops *ctl_ops;
-};
-
-struct ctl_table_root {
- struct list_head root_list;
- struct ctl_table_set default_set;
- struct ctl_table_set *(*lookup)(struct ctl_table_root *root,
- struct nsproxy *namespaces);
+ /* A list of ctl_table_header elements that represent the
+ * netns-specific correspondents of some sysctl directories */
+ struct list_head corresp_list;
+ /* binary: whether this group uses the @corresp_list */
+ char has_netns_corresp;
};

/* struct ctl_table_header is used to maintain dynamic lists of
struct ctl_table trees. */
-struct ctl_table_header
-{
+struct ctl_table_header {
union {
struct {
- struct ctl_table *ctl_table;
+ /* a header is used either as a wraper for a
+ * ctl_table array or as directory entry. */
+ union {
+ struct ctl_table *ctl_table_arg;
+ const char *ctl_dirname;
+ };
struct list_head ctl_entry;
/* references to this header from contexts that
* can access fields of this header */
@@ -1075,12 +1070,13 @@ struct ctl_table_header
struct rcu_head rcu;
};
struct completion *unregistering;
- struct ctl_table *ctl_table_arg;
- struct ctl_table_root *root;
struct ctl_table_group *ctl_group;
- struct ctl_table_set *set;
- struct ctl_table *attached_by;
- struct ctl_table *attached_to;
+
+ /* Lists of other ctl_table_headers that represent either
+ * subdirectories or ctl_tables of files. Add/remove and walk
+ * this list holding the header's read/write lock. */
+ struct list_head ctl_tables;
+ struct list_head ctl_subdirs;
struct ctl_table_header *parent;
};

@@ -1089,18 +1085,12 @@ struct ctl_path {
const char *procname;
};

-void register_sysctl_root(struct ctl_table_root *root);
-struct ctl_table_header *__register_sysctl_paths(
- struct ctl_table_root *root,
- struct ctl_table_group *group,
- struct nsproxy *namespaces,
- const struct ctl_path *path,
- struct ctl_table *table);
-struct ctl_table_header *register_sysctl_paths(const struct ctl_path *path,
- struct ctl_table *table);
-
-void unregister_sysctl_table(struct ctl_table_header * table);
-int sysctl_check_table(struct nsproxy *namespaces, struct ctl_table *table);
+extern struct ctl_table_header *__register_sysctl_paths(struct ctl_table_group *g,
+ const struct ctl_path *p,
+ struct ctl_table *table);
+extern struct ctl_table_header *register_sysctl_paths(const struct ctl_path *path,
+ struct ctl_table *table);
+extern void unregister_sysctl_table(struct ctl_table_header *table);

#endif /* __KERNEL__ */

diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
index 3ae4919..871dd2b 100644
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -52,7 +52,7 @@ struct net {
struct proc_dir_entry *proc_net_stat;

#ifdef CONFIG_SYSCTL
- struct ctl_table_set sysctls;
+ struct ctl_table_group netns_ctl_group;
#endif

struct sock *rtnl; /* rtnetlink socket */
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index a863b56..cbf33b1 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -56,6 +56,7 @@
#include <linux/kprobes.h>
#include <linux/pipe_fs_i.h>
#include <linux/oom.h>
+#include <linux/rwsem.h>

#include <asm/uaccess.h>
#include <asm/processor.h>
@@ -201,23 +202,16 @@ static int sysrq_sysctl_handler(ctl_table *table, int write,
static const struct ctl_table_group_ops root_table_group_ops = { };

static struct ctl_table_group root_table_group = {
+ .has_netns_corresp = 0,
.ctl_ops = &root_table_group_ops,
};

-static struct ctl_table root_table[];
-static struct ctl_table_root sysctl_table_root;
static struct ctl_table_header root_table_header = {
{{.ctl_header_refs = 1,
- .ctl_table = root_table,
- .ctl_entry = LIST_HEAD_INIT(sysctl_table_root.default_set.list),}},
- .root = &sysctl_table_root,
- .ctl_group = &root_table_group,
- .set = &sysctl_table_root.default_set,
-};
-
-static struct ctl_table_root sysctl_table_root = {
- .root_list = LIST_HEAD_INIT(sysctl_table_root.root_list),
- .default_set.list = LIST_HEAD_INIT(root_table_header.ctl_entry),
+ .ctl_entry = LIST_HEAD_INIT(root_table_header.ctl_entry),}},
+ .ctl_tables = LIST_HEAD_INIT(root_table_header.ctl_tables),
+ .ctl_subdirs = LIST_HEAD_INIT(root_table_header.ctl_subdirs),
+ .ctl_group = &root_table_group,
};

#ifdef HAVE_ARCH_PICK_MMAP_LAYOUT
@@ -226,10 +220,6 @@ int sysctl_legacy_va_layout;

/* The default sysctl tables: */

-static struct ctl_table root_table[] = {
- { }
-};
-
#ifdef CONFIG_SCHED_DEBUG
static int min_sched_granularity_ns = 100000; /* 100 usecs */
static int max_sched_granularity_ns = NSEC_PER_SEC; /* 1 second */
@@ -1575,78 +1565,76 @@ void sysctl_proc_inode_put(struct ctl_table_header *head)
spin_unlock(&sysctl_lock);
}

-static struct ctl_table_set *
-lookup_header_set(struct ctl_table_root *root, struct nsproxy *namespaces)
-{
- struct ctl_table_set *set = &root->default_set;
- if (root->lookup)
- set = root->lookup(root, namespaces);
- return set;
-}
-
-static struct list_head *
-lookup_header_list(struct ctl_table_root *root, struct nsproxy *namespaces)
-{
- struct ctl_table_set *set = lookup_header_set(root, namespaces);
- return &set->list;
-}
-
-struct ctl_table_header *__sysctl_use_next_header(struct nsproxy *namespaces,
- struct ctl_table_header *prev)
+/*
+ * Find the netns correspondent of @head. If it is not found and @dflt
+ * is != NULL, set dflt to be the netns correspondent of @head.
+ */
+static struct ctl_table_header *sysctl_use_netns_corresp_dflt(
+ struct ctl_table_group *group,
+ struct ctl_table_header *head,
+ struct ctl_table_header *dflt)
{
- struct ctl_table_root *root;
- struct list_head *header_list;
- struct ctl_table_header *head;
- struct list_head *tmp;
+ struct ctl_table_header *h, *ret = NULL;

spin_lock(&sysctl_lock);
- if (prev) {
- head = prev;
- tmp = &prev->ctl_entry;
- __sysctl_unuse_header(prev);
- goto next;
+ list_for_each_entry(h, &group->corresp_list, ctl_entry) {
+ if (h->parent != head)
+ continue;
+ if (IS_ERR(__sysctl_use_header(h)))
+ continue;
+ ret = h;
+ goto out;
}
- tmp = &root_table_header.ctl_entry;
- for (;;) {
- head = list_entry(tmp, struct ctl_table_header, ctl_entry);

- if (IS_ERR(__sysctl_use_header(head)))
- goto next;
- spin_unlock(&sysctl_lock);
- return head;
- next:
- root = head->root;
- tmp = tmp->next;
- header_list = lookup_header_list(root, namespaces);
- if (tmp != header_list)
- continue;
+ if (!dflt)
+ goto out;
+
+ /* will not fail because dflt is a brand-new header that no
+ * one has seen yet, so no one has started to unregister it */
+ dflt = __sysctl_use_header(dflt);
+ dflt->ctl_dirname = NULL; /* this marks the header as a netns-corresp */
+ dflt->parent = head;
+ list_add_tail(&dflt->ctl_entry, &group->corresp_list);
+ ret = dflt;

- do {
- root = list_entry(root->root_list.next,
- struct ctl_table_root, root_list);
- if (root == &sysctl_table_root)
- goto out;
- header_list = lookup_header_list(root, namespaces);
- } while (list_empty(header_list));
- tmp = header_list->next;
- }
out:
spin_unlock(&sysctl_lock);
- return NULL;
+ return ret;
}

-struct ctl_table_header *sysctl_use_next_header(struct ctl_table_header *prev)
+struct ctl_table_header *sysctl_use_netns_corresp(struct ctl_table_header *h)
{
- return __sysctl_use_next_header(current->nsproxy, prev);
+ struct ctl_table_group *g = &current->nsproxy->net_ns->netns_ctl_group;
+ /* dflt == NULL means: if there's a netns corresp return it,
+ * if there isn't, just return NULL */
+ return sysctl_use_netns_corresp_dflt(g, h, NULL);
}

-void register_sysctl_root(struct ctl_table_root *root)
+
+/* This semaphore protects the ctl_subdirs and ctl_tables lists. You
+ * must also have incremented the _use_refs of the header before
+ * accessing any field of the header including these lists. If it's
+ * deemed necessary, we can create a per-header rwsem. For now a
+ * global one will do. */
+static DECLARE_RWSEM(sysctl_rwsem);
+void sysctl_write_lock_head(struct ctl_table_header *head)
{
- spin_lock(&sysctl_lock);
- list_add_tail(&root->root_list, &sysctl_table_root.root_list);
- spin_unlock(&sysctl_lock);
+ down_write(&sysctl_rwsem);
+}
+void sysctl_write_unlock_head(struct ctl_table_header *head)
+{
+ up_write(&sysctl_rwsem);
+}
+void sysctl_read_lock_head(struct ctl_table_header *head)
+{
+ down_read(&sysctl_rwsem);
+}
+void sysctl_read_unlock_head(struct ctl_table_header *head)
+{
+ up_read(&sysctl_rwsem);
}

+
/*
* sysctl_perm does NOT grant the superuser all rights automatically, because
* some sysctl variables are readonly even to root.
@@ -1710,10 +1698,6 @@ __init int sysctl_init(void)
goto fail_register_binfmt_misc;
#endif

-
-#ifdef CONFIG_SYSCTL_SYSCALL_CHECK
- sysctl_check_table(current->nsproxy, root_table);
-#endif
return 0;


@@ -1734,57 +1718,214 @@ fail_register_kern:
return -ENOMEM;
}

-static struct ctl_table *is_branch_in(struct ctl_table *branch,
- struct ctl_table *table)
+static void header_refs_inc(struct ctl_table_header*head)
{
- struct ctl_table *p;
- const char *s = branch->procname;
+ spin_lock(&sysctl_lock);
+ head->ctl_header_refs ++;
+ spin_unlock(&sysctl_lock);
+}

- /* branch should have named subdirectory as its first element */
- if (!s || !branch->child)
- return NULL;
+static int ctl_path_items(const struct ctl_path *path)
+{
+ int n = 0;
+ while (path->procname) {
+ path++;
+ n++;
+ }
+ return n;
+}

- /* ... and nothing else */
- if (branch[1].procname)
+
+static struct ctl_table_header *alloc_sysctl_header(struct ctl_table_group *group)
+{
+ struct ctl_table_header *h;
+
+ h = kzalloc(sizeof(*h), GFP_KERNEL);
+ if (!h)
return NULL;

- /* table should contain subdirectory with the same name */
- for (p = table; p->procname; p++) {
- if (!p->child)
- continue;
- if (p->procname && strcmp(p->procname, s) == 0)
- return p;
+ h->ctl_group = group;
+ INIT_LIST_HEAD(&h->ctl_entry);
+ INIT_LIST_HEAD(&h->ctl_subdirs);
+ INIT_LIST_HEAD(&h->ctl_tables);
+ return h;
+}
+
+/* Increment the references to an existing subdir of @parent with the name
+ * @name and return that subdir. If no such subdir exists, return NULL.
+ * Called under the write lock protecting parent's ctl_subdirs. */
+static struct ctl_table_header *mkdir_existing_dir(struct ctl_table_header *parent,
+ const char *name)
+{
+ struct ctl_table_header *h;
+ list_for_each_entry(h, &parent->ctl_subdirs, ctl_entry) {
+ spin_lock(&sysctl_lock);
+ if (likely(!h->unregistering)) {
+ if (strcmp(name, h->ctl_dirname) == 0) {
+ h->ctl_header_refs ++;
+ spin_unlock(&sysctl_lock);
+ return h;
+ }
+ }
+ spin_unlock(&sysctl_lock);
}
return NULL;
}

-/* see if attaching q to p would be an improvement */
-static void try_attach(struct ctl_table_header *p, struct ctl_table_header *q)
+/* Some sysctl paths are netns-specific. The last directory that in
+ * not net-ns specific will have a corespondent dir in the netns
+ * specific ctl_table_group. That corespondent will hold the lists of
+ * netns specific tables and subdirectories.
+ *
+ * E.g.: registering netns/interface specific directories:
+ * common path: /proc/sys/net/ipv4/
+ * netns path: /proc/sys/net/ipv4/conf/lo/
+ * We'll create an (unnamed) netns correspondent for 'ipv4' which will
+ * have 'conf' as it's subdir.
+ *
+ * E.g.: We're registering a netns specific file in /proc/sys/net/core/somaxconn
+ * common path: /proc/sys/net/core/
+ * netns path: /proc/sys/net/core/
+ * We'll create an (unnamed) netns correspondent for 'core'.
+ */
+static struct ctl_table_header *mkdir_netns_corresp(
+ struct ctl_table_header *parent,
+ struct ctl_table_group *group,
+ struct ctl_table_header **__netns_corresp)
+{
+ struct ctl_table_header *ret;
+
+ ret = sysctl_use_netns_corresp_dflt(group, parent, *__netns_corresp);
+
+ /* *__netns_corresp is a pre-allocated header. If we used it
+ here, we have to tell the caller so it won't free it. */
+ if (*__netns_corresp == ret)
+ *__netns_corresp = NULL;
+
+ header_refs_inc(ret);
+ sysctl_unuse_header(ret);
+ return ret;
+}
+
+/* Add @dir as a subdir of @parent.
+ * Called under the write lock protecting parent's ctl_subdirs. */
+static struct ctl_table_header *mkdir_new_dir(struct ctl_table_header *parent,
+ struct ctl_table_header *dir)
+{
+ dir->parent = parent;
+ header_refs_inc(dir);
+ list_add_tail(&dir->ctl_entry, &parent->ctl_subdirs);
+ return dir;
+}
+
+/*
+ * Attach the branch denoted by @dirs (a series of directories that
+ * are children of their predecessor in the array) to @parent.
+ *
+ * If at a level there exist in the parent tree a node with the same
+ * name as the one we're trying to add, increment that nodes'
+ * @count. If not, add that dir as a subdir of it's parent.
+ *
+ * Nodes that remain non-NULL in @dirs must be freed by the caller as
+ * they were not added to the tree.
+ *
+ * Return the corresponding ctl_table_header for dirs[nr_dirs-1] from
+ * the tree (either one added by this function, or one already in the
+ * tree).
+ */
+static struct ctl_table_header *sysctl_mkdirs(struct ctl_table_header *parent,
+ struct ctl_table_group *group,
+ const struct ctl_path *path,
+ int nr_dirs)
{
- struct ctl_table *to = p->ctl_table, *by = q->ctl_table;
- struct ctl_table *next;
- int is_better = 0;
- int not_in_parent = !p->attached_by;
+ struct ctl_table_header *dirs[CTL_MAXNAME];
+ struct ctl_table_header *__netns_corresp = NULL;
+ int create_first_netns_corresp = group->has_netns_corresp;
+ int i;
+
+ /* We create excess ctl_table_header for directory entries.
+ * We do so because we may need new headers while under a lock
+ * where we will not be able to allocate entries (sleeping).
+ * Also, this simplifies handling of ENOMEM: no need to remove
+ * already allocated/added directories and unlink them from
+ * their parent directories. Stuff that is not used will be
+ * freed at the end. */
+ for (i = 0; i < nr_dirs; i++) {
+ dirs[i] = alloc_sysctl_header(group);
+ if (!dirs[i])
+ goto err_alloc_dir;
+ dirs[i]->ctl_dirname = path[i].procname;
+ }

- while ((next = is_branch_in(by, to)) != NULL) {
- if (by == q->attached_by)
- is_better = 1;
- if (to == p->attached_by)
- not_in_parent = 1;
- by = by->child;
- to = next->child;
+ if (create_first_netns_corresp) {
+ /* The netns correspondent for the last common path
+ * component migh exist. However we will only know
+ * this later while being under a lock. We
+ * pre-allocate it just in case it might be needed and
+ * free it at the end only if it wasn't used. */
+ __netns_corresp = alloc_sysctl_header(group);
+ if (!__netns_corresp)
+ goto err_alloc_coresp;
}

- if (is_better && not_in_parent) {
- q->attached_by = by;
- q->attached_to = to;
- q->parent = p;
+ header_refs_inc(parent);
+
+ for (i = 0; i < nr_dirs; i++) {
+ struct ctl_table_header *h;
+
+ retry:
+ sysctl_write_lock_head(parent);
+
+ h = mkdir_existing_dir(parent, dirs[i]->ctl_dirname);
+ if (h != NULL) {
+ sysctl_write_unlock_head(parent);
+ parent = h;
+ continue;
+ }
+
+ if (likely(!create_first_netns_corresp)) {
+ h = mkdir_new_dir(parent, dirs[i]);
+ sysctl_write_unlock_head(parent);
+ parent = h;
+ dirs[i] = NULL; /* I'm used, don't free me */
+ continue;
+ }
+
+ sysctl_write_unlock_head(parent);
+
+ create_first_netns_corresp = 0;
+ parent = mkdir_netns_corresp(parent, group, &__netns_corresp);
+ /* We still have to add the new subdirectory, but
+ * instead of adding it into the common parent, add it
+ * to it's netns correspondent. */
+ goto retry;
}
+
+ if (create_first_netns_corresp)
+ parent = mkdir_netns_corresp(parent, group, &__netns_corresp);
+
+ if (__netns_corresp)
+ kfree(__netns_corresp);
+
+ /* free unused pre-allocated entries */
+ for (i = 0; i < nr_dirs; i++)
+ if (dirs[i])
+ kfree(dirs[i]);
+
+ return parent;
+
+err_alloc_coresp:
+ i = nr_dirs;
+err_alloc_dir:
+ for (i--; i >= 0; i--)
+ kfree(dirs[i]);
+ return NULL;
+
}

/**
* __register_sysctl_paths - register a sysctl hierarchy
- * @root: List of sysctl headers to register on
+ * @group: Group of sysctl headers to register on
* @namespaces: Data to compute which lists of sysctl entries are visible
* @path: The path to the directory the sysctl table is in.
* @table: the top-level table structure
@@ -1803,9 +1944,6 @@ static void try_attach(struct ctl_table_header *p, struct ctl_table_header *q)
*
* mode - the file permissions for the /proc/sys file, and for sysctl(2)
*
- * child - a pointer to the child sysctl table if this entry is a directory, or
- * %NULL.
- *
* proc_handler - the text handler routine (described below)
*
* de - for internal use by the sysctl routines
@@ -1835,78 +1973,28 @@ static void try_attach(struct ctl_table_header *p, struct ctl_table_header *q)
* This routine returns %NULL on a failure to register, and a pointer
* to the table header on success.
*/
-struct ctl_table_header *__register_sysctl_paths(
- struct ctl_table_root *root,
- struct ctl_table_group *group,
- struct nsproxy *namespaces,
+struct ctl_table_header *__register_sysctl_paths(struct ctl_table_group *group,
const struct ctl_path *path, struct ctl_table *table)
{
struct ctl_table_header *header;
- struct ctl_table *new, **prevp;
- unsigned int n, npath;
- struct ctl_table_set *set;
-
- /* Count the path components */
- for (npath = 0; path[npath].procname; ++npath)
- ;
+ int nr_dirs = ctl_path_items(path);

- /*
- * For each path component, allocate a 2-element ctl_table array.
- * The first array element will be filled with the sysctl entry
- * for this, the second will be the sentinel (procname == 0).
- *
- * We allocate everything in one go so that we don't have to
- * worry about freeing additional memory in unregister_sysctl_table.
- */
- header = kzalloc(sizeof(struct ctl_table_header) +
- (2 * npath * sizeof(struct ctl_table)), GFP_KERNEL);
+ header = alloc_sysctl_header(group);
if (!header)
return NULL;

- new = (struct ctl_table *) (header + 1);
-
- /* Now connect the dots */
- prevp = &header->ctl_table;
- for (n = 0; n < npath; ++n, ++path) {
- /* Copy the procname */
- new->procname = path->procname;
- new->mode = 0555;
-
- *prevp = new;
- prevp = &new->child;
-
- new += 2;
- }
- *prevp = table;
- header->ctl_table_arg = table;
-
- INIT_LIST_HEAD(&header->ctl_entry);
- header->unregistering = NULL;
- header->root = root;
- header->ctl_group = group;
- header->ctl_header_refs = 1;
-#ifdef CONFIG_SYSCTL_SYSCALL_CHECK
- if (sysctl_check_table(namespaces, header->ctl_table)) {
+ header->parent = sysctl_mkdirs(&root_table_header, group, path, nr_dirs);
+ if (!header->parent) {
kfree(header);
return NULL;
}
-#endif
- spin_lock(&sysctl_lock);
- header->set = lookup_header_set(root, namespaces);
- header->attached_by = header->ctl_table;
- header->attached_to = root_table;
- header->parent = &root_table_header;
- for (set = header->set; set; set = set->parent) {
- struct ctl_table_header *p;
- list_for_each_entry(p, &set->list, ctl_entry) {
- if (p->unregistering)
- continue;
- try_attach(p, header);
- }
- }
- header->parent->ctl_header_refs++;
- list_add_tail(&header->ctl_entry, &header->set->list);
- spin_unlock(&sysctl_lock);
+
+ header->ctl_table_arg = table;
+ header->ctl_header_refs = 1;
+
+ sysctl_write_lock_head(header->parent);
+ list_add_tail(&header->ctl_entry, &header->parent->ctl_tables);
+ sysctl_write_unlock_head(header->parent);

return header;
}
@@ -1924,8 +2012,7 @@ struct ctl_table_header *__register_sysctl_paths(
struct ctl_table_header *register_sysctl_paths(const struct ctl_path *path,
struct ctl_table *table)
{
- return __register_sysctl_paths(&sysctl_table_root, &root_table_group,
- current->nsproxy, path, table);
+ return __register_sysctl_paths(&root_table_group, path, table);
}

/**
@@ -1935,31 +2022,67 @@ struct ctl_table_header *register_sysctl_paths(const struct ctl_path *path,
* Unregisters the sysctl table and all children. proc entries may not
* actually be removed until they are no longer used by anyone.
*/
-void unregister_sysctl_table(struct ctl_table_header * header)
+void unregister_sysctl_table(struct ctl_table_header *header)
{
might_sleep();

- if (header == NULL)
- return;
+ while(header->parent) {
+ struct ctl_table_header *parent = header->parent;

- spin_lock(&sysctl_lock);
- start_unregistering(header);
-
- /* after start_unregistering has finished no one holds a
- * ctl_use_refs or is able to acquire one => no one is going
- * to access internal fields of this object, so we can remove
- * it from the list and schedule it for deletion. */
- list_del_init(&p->ctl_entry);
-
- if (!--header->parent->ctl_header_refs) {
- WARN_ON(1);
- if (!header->parent->ctl_procfs_refs)
- call_rcu(&header->parent->rcu, free_head);
- }
- if (!--header->ctl_header_refs)
+ /* the three counters (ctl_header_refs, ctl_procfs_refs
+ * and ctl_use_refs) are protected by the spin lock. */
+ spin_lock(&sysctl_lock);
+ if (header->ctl_header_refs > 1) {
+ /* other headers need a reference to this one. Just
+ * mark that we don't need it and leave it as it is. */
+ header->ctl_header_refs --;
+ spin_unlock(&sysctl_lock);
+
+ goto unregister_parent;
+ }
+
+ /* header->ctl_header_refs is 1. We hold the only
+ * ctl_header_refs reference, but others may still
+ * hold _use_refs and _procfs_refs. We first need to
+ * wait until no one is actively using this object
+ * (that means until ctl_use_refs==0). While waiting
+ * no one will increase this header's refs because we
+ * set ->unregistering. */
+ start_unregistering(header);
+ spin_unlock(&sysctl_lock);
+
+ if (!header->ctl_dirname) {
+ /* the header is a netns correspondent of it's
+ * parent. It is a member of it's netns
+ * specific ctl_table_group list. For not that
+ * list is protected by sysctl_lock. */
+ spin_lock(&sysctl_lock);
+ list_del_init(&header->ctl_entry);
+ spin_unlock(&sysctl_lock);
+ } else {
+ /* ctl_entry is a member of the parent's
+ * ctl_tables/subdirs lists which are
+ * protected by the parent's write lock. */
+ sysctl_write_lock_head(parent);
+ list_del_init(&header->ctl_entry);
+ sysctl_write_unlock_head(parent);
+ }
+
+ spin_lock(&sysctl_lock);
+ /* something is wrong in the register/unregister code
+ * if this BUG triggers. No one should have changed the
+ * _header_refs of this header after start_unregistering */
+ BUG_ON(header->ctl_header_refs != 1);
+
+ header->ctl_header_refs --;
if (!header->ctl_procfs_refs)
call_rcu(&header->rcu, free_head);
- spin_unlock(&sysctl_lock);
+
+ spin_unlock(&sysctl_lock);
+
+unregister_parent:
+ header = parent;
+ }
}

int sysctl_is_seen(struct ctl_table_header *p)
@@ -1972,16 +2095,19 @@ int sysctl_is_seen(struct ctl_table_header *p)
else if (!ops->is_seen)
res = 1;
else
- res = ops->is_seen(p->set);
+ res = ops->is_seen(p->ctl_group);
spin_unlock(&sysctl_lock);
return res;
}

-void setup_sysctl_set(struct ctl_table_set *p,
- struct ctl_table_set *parent)
+void sysctl_init_group(struct ctl_table_group *group,
+ const struct ctl_table_group_ops *ops,
+ int has_netns_corresp)
{
- INIT_LIST_HEAD(&p->list);
- p->parent = parent ? parent : &sysctl_table_root.default_set;
+ group->ctl_ops = ops;
+ group->has_netns_corresp = has_netns_corresp;
+ if (has_netns_corresp)
+ INIT_LIST_HEAD(&group->corresp_list);
}

#else /* !CONFIG_SYSCTL */
@@ -1995,8 +2121,9 @@ void unregister_sysctl_table(struct ctl_table_header * table)
{
}

-void setup_sysctl_set(struct ctl_table_set *p,
- struct ctl_table_set *parent)
+void sysctl_init_group(struct ctl_table_group *group,
+ const struct ctl_table_group_ops *ops,
+ int has_netns_corresp)
{
}

diff --git a/kernel/sysctl_check.c b/kernel/sysctl_check.c
index 44c31f0..e9a7a58 100644
--- a/kernel/sysctl_check.c
+++ b/kernel/sysctl_check.c
@@ -1,167 +1 @@
-#include <linux/stat.h>
-#include <linux/sysctl.h>
-#include "../fs/xfs/linux-2.6/xfs_sysctl.h"
-#include <linux/sunrpc/debug.h>
-#include <linux/string.h>
-#include <net/ip_vs.h>
-
-
-static void sysctl_print_path(struct ctl_table *table,
- struct ctl_table **parents, int depth)
-{
- struct ctl_table *p;
- int i;
- if (table->procname) {
- for (i = 0; i < depth; i++) {
- p = parents[i];
- printk("/%s", p->procname ? p->procname : "");
- }
- printk("/%s", table->procname);
- }
- printk(" ");
-}
-
-static struct ctl_table *sysctl_check_lookup(struct nsproxy *namespaces,
- struct ctl_table *table, struct ctl_table **parents, int depth)
-{
- struct ctl_table_header *head;
- struct ctl_table *ref, *test;
- int cur_depth;
-
- for (head = __sysctl_use_next_header(namespaces, NULL); head;
- head = __sysctl_use_next_header(namespaces, head)) {
- cur_depth = depth;
- ref = head->ctl_table;
-repeat:
- test = parents[depth - cur_depth];
- for (; ref->procname; ref++) {
- int match = 0;
- if (cur_depth && !ref->child)
- continue;
-
- if (test->procname && ref->procname &&
- (strcmp(test->procname, ref->procname) == 0))
- match++;
-
- if (match) {
- if (cur_depth != 0) {
- cur_depth--;
- ref = ref->child;
- goto repeat;
- }
- goto out;
- }
- }
- }
- ref = NULL;
-out:
- sysctl_unuse_header(head);
- return ref;
-}
-
-static void set_fail(const char **fail, struct ctl_table *table,
- const char *str, struct ctl_table **parents, int depth)
-{
- if (*fail) {
- printk(KERN_ERR "sysctl table check failed: ");
- sysctl_print_path(table, parents, depth);
- printk(" %s\n", *fail);
- dump_stack();
- }
- *fail = str;
-}
-
-static void sysctl_check_leaf(struct nsproxy *namespaces,
- struct ctl_table *table, const char **fail,
- struct ctl_table **parents, int depth)
-{
- struct ctl_table *ref;
-
- ref = sysctl_check_lookup(namespaces, table, parents, depth);
- if (ref && (ref != table))
- set_fail(fail, table, "Sysctl already exists", parents, depth);
-}
-
-
-
-#define SET_FAIL(str) set_fail(&fail, table, str, parents, depth)
-
-static int __sysctl_check_table(struct nsproxy *namespaces,
- struct ctl_table *table, struct ctl_table **parents, int depth)
-{
- const char *fail = NULL;
- int error = 0;
-
- if (depth >= CTL_MAXNAME) {
- SET_FAIL("Sysctl tree too deep");
- return -EINVAL;
- }
-
- for (; table->procname; table++) {
- fail = NULL;
-
-
- if (depth != 0) { /* has parent */
- if (!parents[depth - 1]->procname)
- SET_FAIL("Parent without procname");
- }
- if (table->child) {
- if (table->data)
- SET_FAIL("Directory with data?");
- if (table->maxlen)
- SET_FAIL("Directory with maxlen?");
- if ((table->mode & (S_IRUGO|S_IXUGO)) != table->mode)
- SET_FAIL("Writable sysctl directory");
- if (table->proc_handler)
- SET_FAIL("Directory with proc_handler");
- if (table->extra1)
- SET_FAIL("Directory with extra1");
- if (table->extra2)
- SET_FAIL("Directory with extra2");
- } else {
- if ((table->proc_handler == proc_dostring) ||
- (table->proc_handler == proc_dointvec) ||
- (table->proc_handler == proc_dointvec_minmax) ||
- (table->proc_handler == proc_dointvec_jiffies) ||
- (table->proc_handler == proc_dointvec_userhz_jiffies) ||
- (table->proc_handler == proc_dointvec_ms_jiffies) ||
- (table->proc_handler == proc_doulongvec_minmax) ||
- (table->proc_handler == proc_doulongvec_ms_jiffies_minmax)) {
- if (!table->data)
- SET_FAIL("No data");
- if (!table->maxlen)
- SET_FAIL("No maxlen");
- }
-#ifdef CONFIG_PROC_SYSCTL
- if (!table->proc_handler)
- SET_FAIL("No proc_handler");
-#endif
- parents[depth] = table;
- sysctl_check_leaf(namespaces, table, &fail,
- parents, depth);
- }
- if (table->mode > 0777)
- SET_FAIL("bogus .mode");
- if (fail) {
- SET_FAIL(NULL);
- error = -EINVAL;
- }
- if (table->child) {
- parents[depth] = table;
- error |= __sysctl_check_table(namespaces, table->child,
- parents, depth + 1);
- }
- }
- return error;
-}
-
-
-int sysctl_check_table(struct nsproxy *namespaces, struct ctl_table *table)
-{
- struct ctl_table *parents[CTL_MAXNAME];
- /* Keep track of parents as we go down into the tree:
- * - the node at depth 'd' will have the parent at parents[d-1].
- * - the root node (depth=0) has no parent in this array.
- */
- return __sysctl_check_table(namespaces, table, parents, 0);
-}
+/* will be rewritten */
diff --git a/net/sysctl_net.c b/net/sysctl_net.c
index 5009d4e..f610879 100644
--- a/net/sysctl_net.c
+++ b/net/sysctl_net.c
@@ -29,15 +29,9 @@
#include <linux/if_tr.h>
#endif

-static struct ctl_table_set *
-net_ctl_header_lookup(struct ctl_table_root *root, struct nsproxy *namespaces)
+static int is_seen(struct ctl_table_group *group)
{
- return &namespaces->net_ns->sysctls;
-}
-
-static int is_seen(struct ctl_table_set *set)
-{
- return &current->nsproxy->net_ns->sysctls == set;
+ return &current->nsproxy->net_ns->netns_ctl_group == group;
}

/* Return standard mode bits for table entry. */
@@ -56,14 +50,6 @@ static const struct ctl_table_group_ops net_sysctl_group_ops = {
.permissions = net_ctl_permissions,
};

-static struct ctl_table_group net_sysctl_group = {
- .ctl_ops = &net_sysctl_group_ops,
-};
-
-static struct ctl_table_root net_sysctl_root = {
- .lookup = net_ctl_header_lookup,
-};
-
static int net_ctl_ro_header_permissions(ctl_table *table)
{
if (net_eq(current->nsproxy->net_ns, &init_net))
@@ -77,21 +63,22 @@ static const struct ctl_table_group_ops net_sysctl_ro_group_ops = {
};

static struct ctl_table_group net_sysctl_ro_group = {
+ .has_netns_corresp = 0,
.ctl_ops = &net_sysctl_ro_group_ops,
};

-static struct ctl_table_root net_sysctl_ro_root = { };
-
static int __net_init sysctl_net_init(struct net *net)
{
- setup_sysctl_set(&net->sysctls,
- &net_sysctl_ro_root.default_set);
+ int has_netns_corresp = 1;
+
+ sysctl_init_group(&net->netns_ctl_group, &net_sysctl_group_ops,
+ has_netns_corresp);
return 0;
}

static void __net_exit sysctl_net_exit(struct net *net)
{
- WARN_ON(!list_empty(&net->sysctls.list));
+ WARN_ON(!list_empty(&net->netns_ctl_group.corresp_list));
}

static struct pernet_operations sysctl_pernet_ops = {
@@ -105,9 +92,6 @@ static __init int net_sysctl_init(void)
ret = register_pernet_subsys(&sysctl_pernet_ops);
if (ret)
goto out;
- register_sysctl_root(&net_sysctl_root);
- setup_sysctl_set(&net_sysctl_ro_root.default_set, NULL);
- register_sysctl_root(&net_sysctl_ro_root);
out:
return ret;
}
@@ -116,19 +100,14 @@ subsys_initcall(net_sysctl_init);
struct ctl_table_header *register_net_sysctl_table(struct net *net,
const struct ctl_path *path, struct ctl_table *table)
{
- struct nsproxy namespaces;
- namespaces = *current->nsproxy;
- namespaces.net_ns = net;
- return __register_sysctl_paths(&net_sysctl_root, &net_sysctl_group,
- &namespaces, path, table);
+ return __register_sysctl_paths(&net->netns_ctl_group, path, table);
}
EXPORT_SYMBOL_GPL(register_net_sysctl_table);

struct ctl_table_header *register_net_sysctl_rotable(const
struct ctl_path *path, struct ctl_table *table)
{
- return __register_sysctl_paths(&net_sysctl_ro_root, &net_sysctl_ro_group,
- &init_nsproxy, path, table);
+ return __register_sysctl_paths(&net_sysctl_ro_group, path, table);
}
EXPORT_SYMBOL_GPL(register_net_sysctl_rotable);

--
1.7.5.134.g1c08b

2011-05-08 22:48:05

by Lucian Adrian Grijincu

[permalink] [raw]
Subject: [v2 077/115] sysctl: add duplicate entry and sanity ctl_table checks

Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
include/linux/sysctl.h | 7 ++
kernel/sysctl.c | 19 ++++++-
kernel/sysctl_check.c | 153 +++++++++++++++++++++++++++++++++++++++++++++++-
3 files changed, 177 insertions(+), 2 deletions(-)

diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h
index b626271..22b6eb8 100644
--- a/include/linux/sysctl.h
+++ b/include/linux/sysctl.h
@@ -1092,6 +1092,13 @@ extern struct ctl_table_header *register_sysctl_paths(const struct ctl_path *pat
struct ctl_table *table);
extern void unregister_sysctl_table(struct ctl_table_header *table);

+#ifdef CONFIG_SYSCTL_SYSCALL_CHECK
+extern int sysctl_check_table(const struct ctl_path *path,
+ int nr_dirs,
+ struct ctl_table *table);
+extern int sysctl_check_duplicates(struct ctl_table_header *header);
+#endif /* CONFIG_SYSCTL_SYSCALL_CHECK */
+
#endif /* __KERNEL__ */

#endif /* _LINUX_SYSCTL_H */
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index cbf33b1..d777e89 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -1977,8 +1977,14 @@ struct ctl_table_header *__register_sysctl_paths(struct ctl_table_group *group,
const struct ctl_path *path, struct ctl_table *table)
{
struct ctl_table_header *header;
+ int failed_duplicate_check = 0;
int nr_dirs = ctl_path_items(path);

+#ifdef CONFIG_SYSCTL_SYSCALL_CHECK
+ if (sysctl_check_table(path, nr_dirs, table))
+ return NULL;
+#endif
+
header = alloc_sysctl_header(group);
if (!header)
return NULL;
@@ -1993,9 +1999,20 @@ struct ctl_table_header *__register_sysctl_paths(struct ctl_table_group *group,
header->ctl_header_refs = 1;

sysctl_write_lock_head(header->parent);
- list_add_tail(&header->ctl_entry, &header->parent->ctl_tables);
+
+#ifdef CONFIG_SYSCTL_SYSCALL_CHECK
+ failed_duplicate_check = sysctl_check_duplicates(header);
+#endif
+ if (!failed_duplicate_check)
+ list_add_tail(&header->ctl_entry, &header->parent->ctl_tables);
+
sysctl_write_unlock_head(header->parent);

+ if (failed_duplicate_check) {
+ unregister_sysctl_table(header);
+ return NULL;
+ }
+
return header;
}

diff --git a/kernel/sysctl_check.c b/kernel/sysctl_check.c
index e9a7a58..4e0bce5 100644
--- a/kernel/sysctl_check.c
+++ b/kernel/sysctl_check.c
@@ -1 +1,152 @@
-/* will be rewritten */
+#include <linux/sysctl.h>
+#include <linux/string.h>
+
+/*
+ * @path: the path to the offender
+ * @offender is the name of a file or directory that violated some sysctl rules.
+ * @str: a message accompanying the error
+ */
+static void fail(const struct ctl_path *path,
+ const char *offender,
+ const char *str)
+{
+ printk(KERN_ERR "sysctl sanity check failed: ");
+
+ for (; path->procname; path++)
+ printk("/%s", path->procname);
+
+ if (offender)
+ printk("/%s", offender);
+
+ printk(": %s\n", str);
+}
+
+#define FAIL(str) do { fail(path, t->procname, str); error = -EINVAL;} while (0)
+
+int sysctl_check_table(const struct ctl_path *path,
+ int nr_dirs,
+ struct ctl_table *table)
+{
+ struct ctl_table *t;
+ int error = 0;
+
+ if (nr_dirs > CTL_MAXNAME - 1) {
+ fail(path, NULL, "tree too deep");
+ error = -EINVAL;
+ }
+
+ for(t = table; t->procname; t++) {
+ if ((t->proc_handler == proc_dostring) ||
+ (t->proc_handler == proc_dointvec) ||
+ (t->proc_handler == proc_dointvec_minmax) ||
+ (t->proc_handler == proc_dointvec_jiffies) ||
+ (t->proc_handler == proc_dointvec_userhz_jiffies) ||
+ (t->proc_handler == proc_dointvec_ms_jiffies) ||
+ (t->proc_handler == proc_doulongvec_minmax) ||
+ (t->proc_handler == proc_doulongvec_ms_jiffies_minmax)) {
+ if (!t->data)
+ FAIL("No data");
+ if (!t->maxlen)
+ FAIL("No maxlen");
+ }
+#ifdef CONFIG_PROC_SYSCTL
+ if (!t->proc_handler)
+ FAIL("No proc_handler");
+#endif
+ if (t->mode > 0777)
+ FAIL("bogus .mode");
+ }
+
+ if (error)
+ dump_stack();
+
+ return error;
+}
+
+
+/*
+ * @dir: the directory imediately above the offender
+ * @offender is the name of a file or directory that violated some sysctl rules.
+ */
+static void duplicate_error(struct ctl_table_header *dir,
+ const char *offender)
+{
+ const char *names[CTL_MAXNAME];
+ int i = 0;
+
+ printk(KERN_ERR "sysctl duplicate check failed: ");
+
+ for (; dir->parent; dir = dir->parent)
+ /* ctl_dirname can be NULL: netns-correspondent
+ * directories do not have a ctl_dirname. Their only
+ * pourpose is to hold the list of
+ * subdirs/subtables. They hold netns-specific
+ * information for the parent directory. */
+ if (dir->ctl_dirname) {
+ names[i] = dir->ctl_dirname;
+ i++;
+ }
+
+ /* Print the names in the normal path order, not reversed */
+ for(i--; i >= 0; i--)
+ printk("/%s", names[i]);
+
+ printk("/%s \n", offender);
+}
+
+/* is there an entry in the table with the same procname? */
+static int match(struct ctl_table *table, const char *name)
+{
+ for ( ; table->procname; table++) {
+
+ if (strcmp(table->procname, name) == 0)
+ return 1;
+ }
+ return 0;
+}
+
+
+/* Called under header->parent write lock.
+ *
+ * checks whether this header's table introduces items that have the
+ * same names as other items at the same level (other files or
+ * subdirectories of the current dir). */
+int sysctl_check_duplicates(struct ctl_table_header *header)
+{
+ int has_duplicates = 0;
+ struct ctl_table *table = header->ctl_table_arg;
+ struct ctl_table_header *dir = header->parent;
+ struct ctl_table_header *h;
+
+ list_for_each_entry(h, &dir->ctl_subdirs, ctl_entry) {
+ if (IS_ERR(sysctl_use_header(h)))
+ continue;
+
+ if (match(table, h->ctl_dirname)) {
+ has_duplicates = 1;
+ duplicate_error(dir, h->ctl_dirname);
+ }
+
+ sysctl_unuse_header(h);
+ }
+
+ list_for_each_entry(h, &dir->ctl_tables, ctl_entry) {
+ ctl_table *t;
+
+ if (IS_ERR(sysctl_use_header(h)))
+ continue;
+
+ for (t = h->ctl_table_arg; t->procname; t++) {
+ if (match(table, t->procname)) {
+ has_duplicates = 1;
+ duplicate_error(dir, t->procname);
+ }
+ }
+ sysctl_unuse_header(h);
+ }
+
+ if (has_duplicates)
+ dump_stack();
+
+ return has_duplicates;
+}
--
1.7.5.134.g1c08b

2011-05-08 22:47:48

by Lucian Adrian Grijincu

[permalink] [raw]
Subject: [v2 078/115] sysctl: alloc ctl_table_header with kmem_cache

Because now ctl_table_header objects are allocated with a fixed size
buffer (sizeof(struct ctl_table_header)) we can do the allocations
with kmem_cache.

Also, by making sure that the objects that are returned to the cache
are in a sane state we don't waste time reinitializing every field
after kmem_cache_alloc. We only initialize fields that were not left
with a sane value before returning an object to the cache.

Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
kernel/sysctl.c | 48 +++++++++++++++++++++++++++++++++++++++---------
1 files changed, 39 insertions(+), 9 deletions(-)

diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index d777e89..c207c19 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -198,6 +198,9 @@ static int sysrq_sysctl_handler(ctl_table *table, int write,

#endif

+/* cache for ctl_table_header objects */
+static struct kmem_cache *sysctl_header_cachep;
+
/* uses default ops */
static const struct ctl_table_group_ops root_table_group_ops = { };

@@ -1553,7 +1556,9 @@ void sysctl_proc_inode_get(struct ctl_table_header *head)

static void free_head(struct rcu_head *rcu)
{
- kfree(container_of(rcu, struct ctl_table_header, rcu));
+ struct ctl_table_header *header;
+ header = container_of(rcu, struct ctl_table_header, rcu);
+ kmem_cache_free(sysctl_header_cachep, header);
}

void sysctl_proc_inode_put(struct ctl_table_header *head)
@@ -1664,6 +1669,8 @@ int sysctl_perm(struct ctl_table_group *group, struct ctl_table *table, int op)
return test_perm(mode, op);
}

+static void sysctl_header_ctor(void *data);
+
__init int sysctl_init(void)
{
struct ctl_table_header *kern_header, *vm_header, *fs_header,
@@ -1672,6 +1679,12 @@ __init int sysctl_init(void)
struct ctl_table_header *binfmt_misc_header;
#endif

+ sysctl_header_cachep = kmem_cache_create("sysctl_header_cachep",
+ sizeof(struct ctl_table_header),
+ 0, 0, &sysctl_header_ctor);
+ if (!sysctl_header_cachep)
+ goto fail_alloc_cachep;
+
kern_header = register_sysctl_paths(kern_path, kern_table);
if (kern_header == NULL)
goto fail_register_kern;
@@ -1715,6 +1728,8 @@ fail_register_fs:
fail_register_vm:
unregister_sysctl_table(kern_header);
fail_register_kern:
+ kmem_cache_destroy(sysctl_header_cachep);
+fail_alloc_cachep:
return -ENOMEM;
}

@@ -1735,19 +1750,34 @@ static int ctl_path_items(const struct ctl_path *path)
return n;
}

+static void sysctl_header_ctor(void *data)
+{
+ struct ctl_table_header *h = data;
+
+ h->ctl_use_refs = 0;
+ h->ctl_procfs_refs = 0;
+ h->ctl_header_refs = 0;
+
+ INIT_LIST_HEAD(&h->ctl_entry);
+ INIT_LIST_HEAD(&h->ctl_subdirs);
+ INIT_LIST_HEAD(&h->ctl_tables);
+}

static struct ctl_table_header *alloc_sysctl_header(struct ctl_table_group *group)
{
struct ctl_table_header *h;

- h = kzalloc(sizeof(*h), GFP_KERNEL);
+ h = kmem_cache_alloc(sysctl_header_cachep, GFP_KERNEL);
if (!h)
return NULL;

+ /* - all _refs members are zero before freeing
+ * - all list_head members point to themselves (empty lists) */
+
+ h->ctl_table_arg = NULL;
+ h->unregistering = NULL;
h->ctl_group = group;
- INIT_LIST_HEAD(&h->ctl_entry);
- INIT_LIST_HEAD(&h->ctl_subdirs);
- INIT_LIST_HEAD(&h->ctl_tables);
+
return h;
}

@@ -1905,12 +1935,12 @@ static struct ctl_table_header *sysctl_mkdirs(struct ctl_table_header *parent,
parent = mkdir_netns_corresp(parent, group, &__netns_corresp);

if (__netns_corresp)
- kfree(__netns_corresp);
+ kmem_cache_free(sysctl_header_cachep, __netns_corresp);

/* free unused pre-allocated entries */
for (i = 0; i < nr_dirs; i++)
if (dirs[i])
- kfree(dirs[i]);
+ kmem_cache_free(sysctl_header_cachep, dirs[i]);

return parent;

@@ -1918,7 +1948,7 @@ err_alloc_coresp:
i = nr_dirs;
err_alloc_dir:
for (i--; i >= 0; i--)
- kfree(dirs[i]);
+ kmem_cache_free(sysctl_header_cachep, dirs[i]);
return NULL;

}
@@ -1991,7 +2021,7 @@ struct ctl_table_header *__register_sysctl_paths(struct ctl_table_group *group,

header->parent = sysctl_mkdirs(&root_table_header, group, path, nr_dirs);
if (!header->parent) {
- kfree(header);
+ kmem_cache_free(sysctl_header_cachep, header);
return NULL;
}

--
1.7.5.134.g1c08b

2011-05-08 22:47:01

by Lucian Adrian Grijincu

[permalink] [raw]
Subject: [v2 079/115] sysctl: single subheader path: optimisation for paths used only once

This is an optimisation for registering paths that you know will be
used to register a single table. Because such directories will be used
only once, sysctl will always create an entry for it when it sees it.

When sysctl registers a table, for each directory that may be used
while registering other tables we do a linear search to see if it's
already added, and, if not, add it ourselves.

For example: each netdevice will register a single table under
/proc/sys/net/ipv4/conf/DEVNAME/.

The 'DEVNAME' component of the path is not used to register other
headers, and we can optimise adding that directory: we don't have to
check if it's already registered.

This will have a positive performance impact when registering many
such directories because we're doing a O(nr of sibling directories)
search. With @has_just_one_subheader=1 set we skip that search and add
the directory directly because we know no other sibling directory with
the same name was registered.

NOTE: in this example setting @has_just_one_subheader=1 for the 'conf'
ctl_path would be wrong because it's used when registering other
subheaders too (e.g. subheaders for other netdevices).

Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
include/linux/sysctl.h | 31 +++++++++++++++++++++++++++++++
kernel/sysctl.c | 12 +++++++-----
2 files changed, 38 insertions(+), 5 deletions(-)

diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h
index 22b6eb8..bdc8c97 100644
--- a/include/linux/sysctl.h
+++ b/include/linux/sysctl.h
@@ -1083,6 +1083,37 @@ struct ctl_table_header {
/* struct ctl_path describes where in the hierarchy a table is added */
struct ctl_path {
const char *procname;
+
+
+ /* This is an optimisation for registering paths that you know
+ * will be used to register a single table. Because such
+ * directories will be used only once, sysctl will always
+ * create an entry for it when it sees it.
+ *
+ * When sysctl registers a table, for each directory that may
+ * be used while registering other tables we do a linear
+ * search to see if it's already added, and, if not, add it
+ * ourselves.
+ *
+ * For example: each netdevice will register a single table
+ * under /proc/sys/net/ipv4/conf/DEVNAME/.
+ *
+ * The 'DEVNAME' component of the path is not used to register
+ * other headers, and we can optimise adding that directory:
+ * we don't have to check if it's already registered.
+ *
+ * This will have a positive performance impact when
+ * registering many such directories because we're doing a
+ * O(nr of sibling directories) search. With
+ * @has_just_one_subheader=1 set we skip that search and add
+ * the directory directly because we know no other sibling
+ * directory with the same name was registered.
+ *
+ * NOTE: in this example setting @has_just_one_subheader=1 for
+ * the 'conf' ctl_path would be wrong because it's used when
+ * registering other subheaders too (e.g. subheaders for other
+ * netdevices). */
+ int has_just_one_subheader;
};

extern struct ctl_table_header *__register_sysctl_paths(struct ctl_table_group *g,
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index c207c19..9b2c05a 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -1906,11 +1906,13 @@ static struct ctl_table_header *sysctl_mkdirs(struct ctl_table_header *parent,
retry:
sysctl_write_lock_head(parent);

- h = mkdir_existing_dir(parent, dirs[i]->ctl_dirname);
- if (h != NULL) {
- sysctl_write_unlock_head(parent);
- parent = h;
- continue;
+ if (!path[i].has_just_one_subheader) {
+ h = mkdir_existing_dir(parent, dirs[i]->ctl_dirname);
+ if (h != NULL) {
+ sysctl_write_unlock_head(parent);
+ parent = h;
+ continue;
+ }
}

if (likely(!create_first_netns_corresp)) {
--
1.7.5.134.g1c08b

2011-05-08 22:42:12

by Lucian Adrian Grijincu

[permalink] [raw]
Subject: [v2 080/115] sysctl: single subheader path: net/ipv4/conf/DEVICE-NAME/

Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
net/ipv4/devinet.c | 8 +++++++-
1 files changed, 7 insertions(+), 1 deletions(-)

diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
index cd9ca08..e672107 100644
--- a/net/ipv4/devinet.c
+++ b/net/ipv4/devinet.c
@@ -1631,7 +1631,13 @@ static int __devinet_sysctl_register(struct net *net, char *dev_name,
{ .procname = "net", },
{ .procname = "ipv4", },
{ .procname = "conf", },
- { /* to be set */ },
+ {
+ /* to be set bellow (DEVINET_CTL_PATH_DEV) */
+ .procname = NULL,
+ /* skip duplicate name check; we're registering
+ * just one subheader for this directory */
+ .has_just_one_subheader = 1,
+ },
{ },
};

--
1.7.5.134.g1c08b

2011-05-08 22:42:08

by Lucian Adrian Grijincu

[permalink] [raw]
Subject: [v2 081/115] sysctl: single subheader path: net/{ipv4|ipv6}/neigh/DEV/

Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
net/core/neighbour.c | 8 +++++++-
1 files changed, 7 insertions(+), 1 deletions(-)

diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index 799f06e..63677be 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -2818,7 +2818,13 @@ int neigh_sysctl_register(struct net_device *dev, struct neigh_parms *p,
{ .procname = "net", },
{ .procname = "proto", },
{ .procname = "neigh", },
- { .procname = "default", },
+ {
+ /* will be set to device name (NEIGH_CTL_PATH_DEV) */
+ .procname = "default",
+ /* skip duplicate name check; we're registering
+ * just one subheader for this directory */
+ .has_just_one_subheader = 1,
+ },
{ },
};

--
1.7.5.134.g1c08b

2011-05-08 22:46:57

by Lucian Adrian Grijincu

[permalink] [raw]
Subject: [v2 082/115] sysctl: single subheader path: net/ipv6/conf/DEVICE-NAME/

Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
net/ipv6/addrconf.c | 8 +++++++-
1 files changed, 7 insertions(+), 1 deletions(-)

diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index a7bda07..3a9f958 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -4486,7 +4486,13 @@ static int __addrconf_sysctl_register(struct net *net, char *dev_name,
{ .procname = "net", },
{ .procname = "ipv6", },
{ .procname = "conf", },
- { /* to be set */ },
+ {
+ /* to be set bellow (ADDRCONF_CTL_PATH_DEV) */
+ .procname = NULL,
+ /* skip duplicate name check; we're registering
+ * just one subheader for this directory */
+ .has_just_one_subheader = 1,
+ },
{ },
};

--
1.7.5.134.g1c08b

2011-05-08 22:46:38

by Lucian Adrian Grijincu

[permalink] [raw]
Subject: [v2 083/115] sysctl: single subheader path: dev/parport/PORT/devices/DEVICE/

This patch was not tested!

Parport registers tables under these paths:
dev/parport/default/
dev/parport/PORT/
dev/parport/PORT/devices/
dev/parport/PORT/devices/DEVICE/

Nothing else is registered below dev/parport/PORT/devices/DEVICE/ and
I assume device names are unique (if they are not this patch is
invalid), so we can skip name checks for the 'DEVICE' directory.

This will have a positive performance impact when there are many
devices registered on the same port.

Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
drivers/parport/procfs.c | 7 ++++++-
1 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/drivers/parport/procfs.c b/drivers/parport/procfs.c
index 3bb5bed..9c48946 100644
--- a/drivers/parport/procfs.c
+++ b/drivers/parport/procfs.c
@@ -442,7 +442,12 @@ int parport_device_proc_register(struct pardevice *device)
{ .procname = "parport" },
{ .procname = port->name },
{ .procname = "devices" },
- { .procname = device->name },
+ {
+ .procname = device->name,
+ /* skip duplicate name check; we're registering
+ * just one subheader for this directory */
+ .has_just_one_subheader = 1,
+ },
{ },
};

--
1.7.5.134.g1c08b

2011-05-08 22:45:46

by Lucian Adrian Grijincu

[permalink] [raw]
Subject: [v2 084/115] sysctl: single subheader path: net/ax25/DEVICE

Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
net/ax25/sysctl_net_ax25.c | 8 +++++++-
1 files changed, 7 insertions(+), 1 deletions(-)

diff --git a/net/ax25/sysctl_net_ax25.c b/net/ax25/sysctl_net_ax25.c
index b1181bc..9bd49c0 100644
--- a/net/ax25/sysctl_net_ax25.c
+++ b/net/ax25/sysctl_net_ax25.c
@@ -160,7 +160,13 @@ void ax25_register_sysctl(struct ax25_dev *ax25_dev)
struct ctl_path ax25_path[] = {
{ .procname = "net" },
{ .procname = "ax25" },
- { .procname = ax25_dev->dev->name },
+ {
+ .procname = ax25_dev->dev->name,
+ /* skip duplicate name check; we're registering
+ * just one subheader for this directory */
+ .has_just_one_subheader = 1,
+
+ },
{ }
};

--
1.7.5.134.g1c08b

2011-05-08 22:45:43

by Lucian Adrian Grijincu

[permalink] [raw]
Subject: [v2 085/115] sysctl: single subheader path: kernel/sched_domain/CPU/DOMAIN/

Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
kernel/sched.c | 8 +++++++-
1 files changed, 7 insertions(+), 1 deletions(-)

diff --git a/kernel/sched.c b/kernel/sched.c
index 6e39b7c..8320365 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -6250,7 +6250,13 @@ static void register_sched_domain_sysctl(void)
{ .procname = "kernel" },
{ .procname = "sched_domain" },
{ /* 'cpu0' */ },
- { /* 'domain0' */ },
+ {
+ /* 'domain0' */
+ .procname = NULL,
+ /* skip duplicate name check; we're registering
+ * just one subheader for this directory */
+ .has_just_one_subheader = 1,
+ },
{ },
};

--
1.7.5.134.g1c08b

2011-05-08 22:45:22

by Lucian Adrian Grijincu

[permalink] [raw]
Subject: [v2 086/115] sysctl: single subheader path: net/decnet/conf/DEVNAME

This patch was not tested!

I assume the DN_CTL_PATH_DEV .procname names are unique. If they are
not this patch is invalid.

Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
net/decnet/dn_dev.c | 8 +++++++-
1 files changed, 7 insertions(+), 1 deletions(-)

diff --git a/net/decnet/dn_dev.c b/net/decnet/dn_dev.c
index 0dcaa90..d83d561 100644
--- a/net/decnet/dn_dev.c
+++ b/net/decnet/dn_dev.c
@@ -216,7 +216,13 @@ static void dn_dev_sysctl_register(struct net_device *dev, struct dn_dev_parms *
{ .procname = "net", },
{ .procname = "decnet", },
{ .procname = "conf", },
- { /* to be set */ },
+ {
+ /* to be set bellow (DN_CTL_PATH_DEV) */
+ .procname = NULL,
+ /* skip duplicate name check; we're registering
+ * just one subheader for this directory */
+ .has_just_one_subheader = 1,
+ },
{ },
};

--
1.7.5.134.g1c08b

2011-05-08 22:42:21

by Lucian Adrian Grijincu

[permalink] [raw]
Subject: [v2 087/115] sysctl: check netns-specific registration order respected

Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
include/linux/sysctl.h | 2 +
kernel/sysctl.c | 15 ++++++++-
kernel/sysctl_check.c | 78 ++++++++++++++++++++++++++++++++++++++++++-----
3 files changed, 85 insertions(+), 10 deletions(-)

diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h
index bdc8c97..036d1aa 100644
--- a/include/linux/sysctl.h
+++ b/include/linux/sysctl.h
@@ -1128,6 +1128,8 @@ extern int sysctl_check_table(const struct ctl_path *path,
int nr_dirs,
struct ctl_table *table);
extern int sysctl_check_duplicates(struct ctl_table_header *header);
+extern int sysctl_check_netns_correspondents(struct ctl_table_header *header,
+ struct ctl_table_group *group);
#endif /* CONFIG_SYSCTL_SYSCALL_CHECK */

#endif /* __KERNEL__ */
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 9b2c05a..9e50334 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -1920,6 +1920,12 @@ static struct ctl_table_header *sysctl_mkdirs(struct ctl_table_header *parent,
sysctl_write_unlock_head(parent);
parent = h;
dirs[i] = NULL; /* I'm used, don't free me */
+#ifdef CONFIG_SYSCTL_SYSCALL_CHECK
+ if (sysctl_check_netns_correspondents(parent, group)) {
+ unregister_sysctl_table(h);
+ goto err_check_netns_correspondents;
+ }
+#endif
continue;
}

@@ -1946,11 +1952,18 @@ static struct ctl_table_header *sysctl_mkdirs(struct ctl_table_header *parent,

return parent;

+#ifdef CONFIG_SYSCTL_SYSCALL_CHECK
+err_check_netns_correspondents:
+ if (__netns_corresp)
+ kmem_cache_free(sysctl_header_cachep, __netns_corresp);
+#endif
+
err_alloc_coresp:
i = nr_dirs;
err_alloc_dir:
for (i--; i >= 0; i--)
- kmem_cache_free(sysctl_header_cachep, dirs[i]);
+ if (dirs[i])
+ kmem_cache_free(sysctl_header_cachep, dirs[i]);
return NULL;

}
diff --git a/kernel/sysctl_check.c b/kernel/sysctl_check.c
index 4e0bce5..55e797a 100644
--- a/kernel/sysctl_check.c
+++ b/kernel/sysctl_check.c
@@ -63,19 +63,14 @@ int sysctl_check_table(const struct ctl_path *path,
return error;
}

-
-/*
- * @dir: the directory imediately above the offender
- * @offender is the name of a file or directory that violated some sysctl rules.
- */
-static void duplicate_error(struct ctl_table_header *dir,
- const char *offender)
+/* Print the path from to a sysctl directory. The header must *not*
+ * point to a ctl_table_header that wraps a ctl_table array, it must
+ * be a directory. */
+static void printk_sysctl_dir(struct ctl_table_header *dir)
{
const char *names[CTL_MAXNAME];
int i = 0;

- printk(KERN_ERR "sysctl duplicate check failed: ");
-
for (; dir->parent; dir = dir->parent)
/* ctl_dirname can be NULL: netns-correspondent
* directories do not have a ctl_dirname. Their only
@@ -90,7 +85,18 @@ static void duplicate_error(struct ctl_table_header *dir,
/* Print the names in the normal path order, not reversed */
for(i--; i >= 0; i--)
printk("/%s", names[i]);
+}
+
+/*
+ * @dir: the directory imediately above the offender
+ * @offender is the name of a file or directory that violated some sysctl rules.
+ */
+static void duplicate_error(struct ctl_table_header *dir,
+ const char *offender)
+{

+ printk(KERN_ERR "sysctl duplicate check failed: ");
+ printk_sysctl_dir(dir);
printk("/%s \n", offender);
}

@@ -150,3 +156,57 @@ int sysctl_check_duplicates(struct ctl_table_header *header)

return has_duplicates;
}
+
+/* Check whether adding this header respects the rule that no
+ * non-netns-specific directory will be registered after one with the
+ * same name, but netns-specific was registered before (and still is registered)
+ *
+ * E.g. This sequence of registrations is not valid:
+ * - non-netns-specific: /net/ipv4/
+ * - netns-specific: /net/ipv4/conf/lo
+ * - non-netns-specific: /net/ipv4/conf/
+
+ * because after first adding 'conf' as a netns specific directory,
+ * we're adding one non-netns specific.
+ *
+ * NOTE: in this example, the directory that has a netns-correspondent is 'ipv4'
+ */
+int sysctl_check_netns_correspondents(struct ctl_table_header *header,
+ struct ctl_table_group *group)
+{
+ struct ctl_table_header *netns_corresp, *h;
+ int found = 0;
+ /* we're only checking registration of non-netns paths added,
+ * because only those paths can violate the above rule. */
+ if (group->has_netns_corresp)
+ return 0;
+
+ netns_corresp = sysctl_use_netns_corresp(header->parent);
+ if (!netns_corresp)
+ return 0;
+
+ /* see if the netns_correspondent has a subdir
+ * with the same as this non-netns specific header */
+ sysctl_read_lock_head(netns_corresp);
+ list_for_each_entry(h, &netns_corresp->ctl_subdirs, ctl_entry) {
+ if (IS_ERR(sysctl_use_header(h)))
+ continue;
+ if (strcmp(header->ctl_dirname, h->ctl_dirname) == 0) {
+ sysctl_unuse_header(h);
+ found = 1;
+ break;
+ }
+ sysctl_unuse_header(h);
+ }
+ sysctl_read_unlock_head(netns_corresp);
+
+ if (!found)
+ return 0;
+
+ printk(KERN_ERR "illegal sysctl registration of non-netns-specific "
+ "directory after a netns-specific with the same name\n");
+ printk_sysctl_dir(header);
+ dump_stack();
+
+ return 1;
+}
--
1.7.5.134.g1c08b

2011-05-08 22:42:19

by Lucian Adrian Grijincu

[permalink] [raw]
Subject: [v2 088/115] RFC: sysctl: convert read-write lock to RCU

Apologies to reviewers who will feel insulted reading this. This patch
is just for kicks - and by kicks I mean ass-kicks for such an awful
misuse of the RCU API. I haven't done anything with RCUs until now and
I'm very unsure about the sanity of this patch.

This patch replaces the reader-writer lock protected lists ctl_subdirs
and ctl_tables with RCU protected lists.

Unlike in the RCU sniplets I found, where the Reader part only read
data from the object - Updates were done on a separate Copy (RCU ...),
here readers do change some data in the list elements (data access
protected by a separate spin lock), but does not touch the list_head.

read-side:
- uses the for...rcu list traversal for DEC Alpha memory whatever
- rcu_read_(un)lock make sure the grace period is as long as needed

write-site:
- writers are synchronized with a spin-lock
- list adding/removing is done with list_add_tail_rcu/list_del_rcu
- freeing of elements is done after the grace period has ended (call_rcu)

Also note that there may be unwanted interactions with the RCU
protected VFS routines: ctl_table_header elements are scheduled to be
freed when all references to them have disappeared. This means after
removing the element from the list of at a later time (also with
call_rcu). I don't think that delaying free-ing some more would be a
problem, but I may be very wrong.

Free-ing of ctl_table_header is done with free_head. This is
scheduled to be called with call_rcu in two places:

- sysctl_proc_inode_put() called from the VFS by proc_evict_inode which uses
rcu_assign_pointer(PROC_I(inode)->sysctl, NULL)
to delete the VFS's last reference to the object

- unregister_sysctl_table (no connection to the VFS).

Each of them determines if all references to that object have
disappeared, and if so, schedule the object to be freed with call_rcu.

Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
fs/proc/proc_sysctl.c | 8 ++++----
kernel/sysctl.c | 29 ++++++++++++-----------------
kernel/sysctl_check.c | 7 ++++---
3 files changed, 20 insertions(+), 24 deletions(-)

diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
index 9337149..b3e2453 100644
--- a/fs/proc/proc_sysctl.c
+++ b/fs/proc/proc_sysctl.c
@@ -81,7 +81,7 @@ retry:
sysctl_read_lock_head(head);

/* first check whether a subdirectory has the searched-for name */
- list_for_each_entry(h, &head->ctl_subdirs, ctl_entry) {
+ list_for_each_entry_rcu(h, &head->ctl_subdirs, ctl_entry) {
if (IS_ERR(sysctl_use_header(h)))
continue;

@@ -93,7 +93,7 @@ retry:
}

/* no subdir with that name, look for the file in the ctl_tables */
- list_for_each_entry(h, &head->ctl_tables, ctl_entry) {
+ list_for_each_entry_rcu(h, &head->ctl_tables, ctl_entry) {
if (IS_ERR(sysctl_use_header(h)))
continue;

@@ -234,7 +234,7 @@ static int scan(struct ctl_table_header *head,

sysctl_read_lock_head(head);

- list_for_each_entry(h, &head->ctl_subdirs, ctl_entry) {
+ list_for_each_entry_rcu(h, &head->ctl_subdirs, ctl_entry) {
if (*pos < file->f_pos) {
(*pos)++;
continue;
@@ -252,7 +252,7 @@ static int scan(struct ctl_table_header *head,
(*pos)++;
}

- list_for_each_entry(h, &head->ctl_tables, ctl_entry) {
+ list_for_each_entry_rcu(h, &head->ctl_tables, ctl_entry) {
ctl_table *t;

if (IS_ERR(sysctl_use_header(h)))
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 9e50334..26c2bc6 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -56,7 +56,6 @@
#include <linux/kprobes.h>
#include <linux/pipe_fs_i.h>
#include <linux/oom.h>
-#include <linux/rwsem.h>

#include <asm/uaccess.h>
#include <asm/processor.h>
@@ -1616,30 +1615,25 @@ struct ctl_table_header *sysctl_use_netns_corresp(struct ctl_table_header *h)
}


-/* This semaphore protects the ctl_subdirs and ctl_tables lists. You
- * must also have incremented the _use_refs of the header before
- * accessing any field of the header including these lists. If it's
- * deemed necessary, we can create a per-header rwsem. For now a
- * global one will do. */
-static DECLARE_RWSEM(sysctl_rwsem);
+/* protection for the headers' ctl_subdirs/ctl_tables lists */
+static DEFINE_SPINLOCK(sysctl_list_lock);
void sysctl_write_lock_head(struct ctl_table_header *head)
{
- down_write(&sysctl_rwsem);
+ spin_lock(&sysctl_list_lock);
}
void sysctl_write_unlock_head(struct ctl_table_header *head)
{
- up_write(&sysctl_rwsem);
+ spin_unlock(&sysctl_list_lock);
}
void sysctl_read_lock_head(struct ctl_table_header *head)
{
- down_read(&sysctl_rwsem);
+ rcu_read_lock();
}
void sysctl_read_unlock_head(struct ctl_table_header *head)
{
- up_read(&sysctl_rwsem);
+ rcu_read_unlock();
}

-
/*
* sysctl_perm does NOT grant the superuser all rights automatically, because
* some sysctl variables are readonly even to root.
@@ -1777,6 +1771,7 @@ static struct ctl_table_header *alloc_sysctl_header(struct ctl_table_group *grou
h->ctl_table_arg = NULL;
h->unregistering = NULL;
h->ctl_group = group;
+ INIT_LIST_HEAD(&h->ctl_entry);

return h;
}
@@ -1788,7 +1783,7 @@ static struct ctl_table_header *mkdir_existing_dir(struct ctl_table_header *pare
const char *name)
{
struct ctl_table_header *h;
- list_for_each_entry(h, &parent->ctl_subdirs, ctl_entry) {
+ list_for_each_entry_rcu(h, &parent->ctl_subdirs, ctl_entry) {
spin_lock(&sysctl_lock);
if (likely(!h->unregistering)) {
if (strcmp(name, h->ctl_dirname) == 0) {
@@ -1844,7 +1839,7 @@ static struct ctl_table_header *mkdir_new_dir(struct ctl_table_header *parent,
{
dir->parent = parent;
header_refs_inc(dir);
- list_add_tail(&dir->ctl_entry, &parent->ctl_subdirs);
+ list_add_tail_rcu(&dir->ctl_entry, &parent->ctl_subdirs);
return dir;
}

@@ -2049,7 +2044,7 @@ struct ctl_table_header *__register_sysctl_paths(struct ctl_table_group *group,
failed_duplicate_check = sysctl_check_duplicates(header);
#endif
if (!failed_duplicate_check)
- list_add_tail(&header->ctl_entry, &header->parent->ctl_tables);
+ list_add_tail_rcu(&header->ctl_entry, &header->parent->ctl_tables);

sysctl_write_unlock_head(header->parent);

@@ -2119,14 +2114,14 @@ void unregister_sysctl_table(struct ctl_table_header *header)
* specific ctl_table_group list. For not that
* list is protected by sysctl_lock. */
spin_lock(&sysctl_lock);
- list_del_init(&header->ctl_entry);
+ list_del_rcu(&header->ctl_entry);
spin_unlock(&sysctl_lock);
} else {
/* ctl_entry is a member of the parent's
* ctl_tables/subdirs lists which are
* protected by the parent's write lock. */
sysctl_write_lock_head(parent);
- list_del_init(&header->ctl_entry);
+ list_del_rcu(&header->ctl_entry);
sysctl_write_unlock_head(parent);
}

diff --git a/kernel/sysctl_check.c b/kernel/sysctl_check.c
index 55e797a..b9573e0 100644
--- a/kernel/sysctl_check.c
+++ b/kernel/sysctl_check.c
@@ -1,5 +1,6 @@
#include <linux/sysctl.h>
#include <linux/string.h>
+#include <linux/rculist.h>

/*
* @path: the path to the offender
@@ -124,7 +125,7 @@ int sysctl_check_duplicates(struct ctl_table_header *header)
struct ctl_table_header *dir = header->parent;
struct ctl_table_header *h;

- list_for_each_entry(h, &dir->ctl_subdirs, ctl_entry) {
+ list_for_each_entry_rcu(h, &dir->ctl_subdirs, ctl_entry) {
if (IS_ERR(sysctl_use_header(h)))
continue;

@@ -136,7 +137,7 @@ int sysctl_check_duplicates(struct ctl_table_header *header)
sysctl_unuse_header(h);
}

- list_for_each_entry(h, &dir->ctl_tables, ctl_entry) {
+ list_for_each_entry_rcu(h, &dir->ctl_tables, ctl_entry) {
ctl_table *t;

if (IS_ERR(sysctl_use_header(h)))
@@ -188,7 +189,7 @@ int sysctl_check_netns_correspondents(struct ctl_table_header *header,
/* see if the netns_correspondent has a subdir
* with the same as this non-netns specific header */
sysctl_read_lock_head(netns_corresp);
- list_for_each_entry(h, &netns_corresp->ctl_subdirs, ctl_entry) {
+ list_for_each_entry_rcu(h, &netns_corresp->ctl_subdirs, ctl_entry) {
if (IS_ERR(sysctl_use_header(h)))
continue;
if (strcmp(header->ctl_dirname, h->ctl_dirname) == 0) {
--
1.7.5.134.g1c08b

2011-05-08 22:42:26

by Lucian Adrian Grijincu

[permalink] [raw]
Subject: [v2 089/115] RFC: sysctl: change type of ctl_procfs_refs to u8

255 files per registered header should be enough for everyone.
If not, either:
- register another header (and another, and another) each with max 255 files
- change the type of ctl_procfs_refs to something bigger (e.g. u16)

This patch makes two assumptions:

- there will be at max a single inode created for each sysctl
file. That means that the ctl_table_header will be (at max)
incremented once for each of it's files. For directories the counter
will be incremented only once (when creating an inode for the
directory itself).

- there are no sysctl tables in the kernel with more than 255 entries.

Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
fs/proc/proc_sysctl.c | 12 +++++++++---
include/linux/sysctl.h | 11 +++++++----
kernel/sysctl.c | 10 +++++++++-
kernel/sysctl_check.c | 12 ++++++++++++
4 files changed, 37 insertions(+), 8 deletions(-)

diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
index b3e2453..9580794 100644
--- a/fs/proc/proc_sysctl.c
+++ b/fs/proc/proc_sysctl.c
@@ -20,13 +20,15 @@ static struct inode *proc_sys_make_inode(struct super_block *sb,
struct inode *inode;
struct proc_inode *ei;

+ if (sysctl_proc_inode_get(head))
+ goto err_get;
+
inode = new_inode(sb);
if (!inode)
- goto out;
+ goto err_new_inode;

inode->i_ino = get_next_ino();

- sysctl_proc_inode_get(head);
ei = PROC_I(inode);
ei->sysctl = head;
ei->sysctl_entry = table;
@@ -44,8 +46,12 @@ static struct inode *proc_sys_make_inode(struct super_block *sb,
inode->i_op = &proc_sys_dir_operations;
inode->i_fop = &proc_sys_dir_file_operations;
}
-out:
return inode;
+
+err_new_inode:
+ sysctl_proc_inode_put(head);
+err_get:
+ return NULL;
}

static struct ctl_table *find_in_table(struct ctl_table *p, struct qstr *name)
diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h
index 036d1aa..d5d9b66f 100644
--- a/include/linux/sysctl.h
+++ b/include/linux/sysctl.h
@@ -947,7 +947,7 @@ extern void sysctl_init_group(struct ctl_table_group *group,

/* get/put a reference to this header that
* will be/was embedded in a procfs proc_inode */
-extern void sysctl_proc_inode_get(struct ctl_table_header *);
+extern int sysctl_proc_inode_get(struct ctl_table_header *);
extern void sysctl_proc_inode_put(struct ctl_table_header *);

extern int sysctl_is_seen(struct ctl_table_header *);
@@ -1059,13 +1059,16 @@ struct ctl_table_header {
/* references to this header from contexts that
* can access fields of this header */
int ctl_use_refs;
- /* references to this header from procfs inodes.
- * procfs embeds a pointer to the header in proc_inode */
- int ctl_procfs_refs;
/* counts references to this header from other
* headers (through ->parent) plus the reference
* returned by __register_sysctl_paths */
int ctl_header_refs;
+ /* references to this header from procfs inodes.
+ * procfs embeds a pointer to the header in proc_inode.
+ * If there's at max one inode created per file then
+ * the max value of this is the number of files in the
+ * ctl_table array, or 1 for directories. */
+ u8 ctl_procfs_refs;
};
struct rcu_head rcu;
};
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 26c2bc6..3e30e78 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -1546,11 +1546,19 @@ static void start_unregistering(struct ctl_table_header *p)
}
}

-void sysctl_proc_inode_get(struct ctl_table_header *head)
+int sysctl_proc_inode_get(struct ctl_table_header *head)
{
+ int err = 0;
spin_lock(&sysctl_lock);
head->ctl_procfs_refs++;
+ if (unlikely(head->ctl_procfs_refs == 0)) {
+ /* restore old value */
+ head->ctl_procfs_refs--;
+ err = 1;
+ WARN(head->ctl_procfs_refs == 0, "sysctl: ctl_procfs_refs overflow");
+ }
spin_unlock(&sysctl_lock);
+ return err;
}

static void free_head(struct rcu_head *rcu)
diff --git a/kernel/sysctl_check.c b/kernel/sysctl_check.c
index b9573e0..205f721 100644
--- a/kernel/sysctl_check.c
+++ b/kernel/sysctl_check.c
@@ -29,6 +29,8 @@ int sysctl_check_table(const struct ctl_path *path,
struct ctl_table *table)
{
struct ctl_table *t;
+ unsigned int max_bits, max_files;
+ unsigned int nr_files = 0;
int error = 0;

if (nr_dirs > CTL_MAXNAME - 1) {
@@ -37,6 +39,7 @@ int sysctl_check_table(const struct ctl_path *path,
}

for(t = table; t->procname; t++) {
+ nr_files ++;
if ((t->proc_handler == proc_dostring) ||
(t->proc_handler == proc_dointvec) ||
(t->proc_handler == proc_dointvec_minmax) ||
@@ -58,6 +61,15 @@ int sysctl_check_table(const struct ctl_path *path,
FAIL("bogus .mode");
}

+ /* make sure we can increment the header's ctl_procfs_refs
+ * counter for each file in the table. If this fails we either
+ * need to change the type of the ctl_procfs_refs variable, or
+ * register more tables in the same directory. */
+ max_bits = 8 * sizeof(((struct ctl_table_header *) 0)->ctl_procfs_refs);
+ max_files = 1 << max_bits;
+ if (nr_files >= max_files)
+ FAIL("too many files in registered table");
+
if (error)
dump_stack();

--
1.7.5.134.g1c08b

2011-05-08 22:44:59

by Lucian Adrian Grijincu

[permalink] [raw]
Subject: [v2 090/115] sysctl: warn if registration/unregistration order is not respected

This patch sends a warning for each sysctl unregistration that cannot
delete all the directories that it created.

For example:
- register /existingdir/newdir/file-a
- register /existingdir/newdir/dir3/file-b
- unregister /existingdir/newdir/file-a
- unregister /existingdir/newdir/dir3/file-b

Here the order is violated because the first unregister operation
cannot delete all the directories it has created (namely 'newdir')
because they are used by another registered path.

This rule violation can be fixed in (at least) two ways:

- enforce order of unregistration:
- register /existingdir/newdir/file-a
- register /existingdir/newdir/dir3/file-b
- unregister /existingdir/newdir/dir3/file-b
- unregister /existingdir/newdir/file-a

- have a third party register the common part:
- register /existingdir/newdir/
- register /existingdir/newdir/file-a
- register /existingdir/newdir/dir3/file-b
- unregister /existingdir/newdir/file-a
- unregister /existingdir/newdir/dir3/file-b
- unregister /existingdir/newdir/

The current implementation works well regardless of this order being
respected. In the future, other sysctl implementations may only work
if this rule is respected.

Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
include/linux/sysctl.h | 15 ++++++++++++---
kernel/sysctl.c | 39 +++++++++++++++++++++++++++++++++++----
2 files changed, 47 insertions(+), 7 deletions(-)

diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h
index d5d9b66f..322246d 100644
--- a/include/linux/sysctl.h
+++ b/include/linux/sysctl.h
@@ -1036,12 +1036,14 @@ struct ctl_table_group_ops {
};

struct ctl_table_group {
+ /* has initialization for this group finished? */
+ int is_initialized:1;
+ /* does this group use the @corresp_list? */
+ int has_netns_corresp:1;
+ struct list_head corresp_list;
const struct ctl_table_group_ops *ctl_ops;
/* A list of ctl_table_header elements that represent the
* netns-specific correspondents of some sysctl directories */
- struct list_head corresp_list;
- /* binary: whether this group uses the @corresp_list */
- char has_netns_corresp;
};

/* struct ctl_table_header is used to maintain dynamic lists of
@@ -1069,6 +1071,13 @@ struct ctl_table_header {
* the max value of this is the number of files in the
* ctl_table array, or 1 for directories. */
u8 ctl_procfs_refs;
+ /* how many dirs were created when this header was
+ * registered. Rule: the header which created a directory
+ * should be the one that deletes it. This counter is
+ * used to signal violations of this rule. The counter's
+ * max value is CTL_MAXNAME (currently=10) so we use
+ * only 4 bits of the 8 available. */
+ u8 ctl_owned_dirs_refs;
};
struct rcu_head rcu;
};
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 3e30e78..94fff4e 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -204,8 +204,9 @@ static struct kmem_cache *sysctl_header_cachep;
static const struct ctl_table_group_ops root_table_group_ops = { };

static struct ctl_table_group root_table_group = {
- .has_netns_corresp = 0,
- .ctl_ops = &root_table_group_ops,
+ .is_initialized = 1,
+ .has_netns_corresp = 0,
+ .ctl_ops = &root_table_group_ops,
};

static struct ctl_table_header root_table_header = {
@@ -1617,6 +1618,14 @@ out:
struct ctl_table_header *sysctl_use_netns_corresp(struct ctl_table_header *h)
{
struct ctl_table_group *g = &current->nsproxy->net_ns->netns_ctl_group;
+
+ /* this function may be called to check whether the
+ * netns-specific vs. non-netns-specific registration order is
+ * respected. Those checks may be done early during init when
+ * nor init_net is not initialized, nor it's netns-specific group. */
+ if (!g->is_initialized)
+ return NULL;
+
/* dflt == NULL means: if there's a netns corresp return it,
* if there isn't, just return NULL */
return sysctl_use_netns_corresp_dflt(g, h, NULL);
@@ -1869,13 +1878,14 @@ static struct ctl_table_header *mkdir_new_dir(struct ctl_table_header *parent,
static struct ctl_table_header *sysctl_mkdirs(struct ctl_table_header *parent,
struct ctl_table_group *group,
const struct ctl_path *path,
- int nr_dirs)
+ int nr_dirs, int *p_dirs_created)
{
struct ctl_table_header *dirs[CTL_MAXNAME];
struct ctl_table_header *__netns_corresp = NULL;
int create_first_netns_corresp = group->has_netns_corresp;
int i;

+ *p_dirs_created = 0;
/* We create excess ctl_table_header for directory entries.
* We do so because we may need new headers while under a lock
* where we will not be able to allocate entries (sleeping).
@@ -1929,6 +1939,7 @@ static struct ctl_table_header *sysctl_mkdirs(struct ctl_table_header *parent,
goto err_check_netns_correspondents;
}
#endif
+ (*p_dirs_created)++;
continue;
}

@@ -1945,8 +1956,12 @@ static struct ctl_table_header *sysctl_mkdirs(struct ctl_table_header *parent,
if (create_first_netns_corresp)
parent = mkdir_netns_corresp(parent, group, &__netns_corresp);

+ /* if mkdir_netns_corresp used it, it's NULL */
if (__netns_corresp)
kmem_cache_free(sysctl_header_cachep, __netns_corresp);
+ else
+ (*p_dirs_created)++;
+

/* free unused pre-allocated entries */
for (i = 0; i < nr_dirs; i++)
@@ -2027,6 +2042,7 @@ struct ctl_table_header *__register_sysctl_paths(struct ctl_table_group *group,
struct ctl_table_header *header;
int failed_duplicate_check = 0;
int nr_dirs = ctl_path_items(path);
+ int dirs_created = 0;

#ifdef CONFIG_SYSCTL_SYSCALL_CHECK
if (sysctl_check_table(path, nr_dirs, table))
@@ -2037,7 +2053,8 @@ struct ctl_table_header *__register_sysctl_paths(struct ctl_table_group *group,
if (!header)
return NULL;

- header->parent = sysctl_mkdirs(&root_table_header, group, path, nr_dirs);
+ header->parent = sysctl_mkdirs(&root_table_header, group, path,
+ nr_dirs, &dirs_created);
if (!header->parent) {
kmem_cache_free(sysctl_header_cachep, header);
return NULL;
@@ -2045,6 +2062,7 @@ struct ctl_table_header *__register_sysctl_paths(struct ctl_table_group *group,

header->ctl_table_arg = table;
header->ctl_header_refs = 1;
+ header->ctl_owned_dirs_refs = dirs_created;

sysctl_write_lock_head(header->parent);

@@ -2089,6 +2107,7 @@ struct ctl_table_header *register_sysctl_paths(const struct ctl_path *path,
*/
void unregister_sysctl_table(struct ctl_table_header *header)
{
+ int dirs_to_delete = header->ctl_owned_dirs_refs;
might_sleep();

while(header->parent) {
@@ -2098,6 +2117,13 @@ void unregister_sysctl_table(struct ctl_table_header *header)
* and ctl_use_refs) are protected by the spin lock. */
spin_lock(&sysctl_lock);
if (header->ctl_header_refs > 1) {
+ if (WARN(dirs_to_delete != 0, "directory that we "
+ "created is still used by another header.")) {
+ /* if one element of the path is still used it's
+ * parents will be too. Stop sending warnings */
+ dirs_to_delete = 0;
+ }
+
/* other headers need a reference to this one. Just
* mark that we don't need it and leave it as it is. */
header->ctl_header_refs --;
@@ -2116,6 +2142,10 @@ void unregister_sysctl_table(struct ctl_table_header *header)
start_unregistering(header);
spin_unlock(&sysctl_lock);

+ /* don't go negative */
+ if (dirs_to_delete)
+ dirs_to_delete --;
+
if (!header->ctl_dirname) {
/* the header is a netns correspondent of it's
* parent. It is a member of it's netns
@@ -2173,6 +2203,7 @@ void sysctl_init_group(struct ctl_table_group *group,
group->has_netns_corresp = has_netns_corresp;
if (has_netns_corresp)
INIT_LIST_HEAD(&group->corresp_list);
+ group->is_initialized = 1;
}

#else /* !CONFIG_SYSCTL */
--
1.7.5.134.g1c08b

2011-05-08 22:42:29

by Lucian Adrian Grijincu

[permalink] [raw]
Subject: [v2 091/115] sysctl: add register_sysctl_dir: register an empty sysctl directory

Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
include/linux/sysctl.h | 5 +++--
kernel/sysctl.c | 37 +++++++++++++++++++++++++++++++++++++
kernel/sysctl_check.c | 15 +++++++++++----
3 files changed, 51 insertions(+), 6 deletions(-)

diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h
index 322246d..03842cc 100644
--- a/include/linux/sysctl.h
+++ b/include/linux/sysctl.h
@@ -1133,11 +1133,12 @@ extern struct ctl_table_header *__register_sysctl_paths(struct ctl_table_group *
struct ctl_table *table);
extern struct ctl_table_header *register_sysctl_paths(const struct ctl_path *path,
struct ctl_table *table);
+struct ctl_table_header *register_sysctl_dir(const struct ctl_path *path);
extern void unregister_sysctl_table(struct ctl_table_header *table);

#ifdef CONFIG_SYSCTL_SYSCALL_CHECK
-extern int sysctl_check_table(const struct ctl_path *path,
- int nr_dirs,
+extern int sysctl_check_path(const struct ctl_path *path, int nr_dirs);
+extern int sysctl_check_table(const struct ctl_path *path, int nr_dirs,
struct ctl_table *table);
extern int sysctl_check_duplicates(struct ctl_table_header *header);
extern int sysctl_check_netns_correspondents(struct ctl_table_header *header,
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 94fff4e..7cf0242 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -2045,6 +2045,9 @@ struct ctl_table_header *__register_sysctl_paths(struct ctl_table_group *group,
int dirs_created = 0;

#ifdef CONFIG_SYSCTL_SYSCALL_CHECK
+ if (sysctl_check_path(path, nr_dirs))
+ return NULL;
+
if (sysctl_check_table(path, nr_dirs, table))
return NULL;
#endif
@@ -2098,6 +2101,39 @@ struct ctl_table_header *register_sysctl_paths(const struct ctl_path *path,
return __register_sysctl_paths(&root_table_group, path, table);
}

+/* Register an empty sysctl directory. */
+static struct ctl_table_header *__register_sysctl_dir(
+ struct ctl_table_group *group, const struct ctl_path *path)
+{
+ struct ctl_table_header *dir;
+ int nr_dirs = ctl_path_items(path);
+ int dirs_created = 0;
+
+#ifdef CONFIG_SYSCTL_SYSCALL_CHECK
+ if (sysctl_check_path(path, nr_dirs))
+ return NULL;
+#endif
+
+ dir = sysctl_mkdirs(&root_table_header, group, path,
+ nr_dirs, &dirs_created);
+ if (!dir)
+ return NULL;
+
+ /* -1 because we don't want to count ourselves in the list of
+ * directory headers owned by @dir. NOTE: if all of the dirs
+ * in the path are already registered dirs_created will be 0. */
+ if (dirs_created > 0)
+ dir->ctl_owned_dirs_refs = dirs_created - 1;
+ else
+ dir->ctl_owned_dirs_refs = 0;
+ return dir;
+}
+
+struct ctl_table_header *register_sysctl_dir(const struct ctl_path *path)
+{
+ return __register_sysctl_dir(&root_table_group, path);
+}
+
/**
* unregister_sysctl_table - unregister a sysctl table hierarchy
* @header: the header returned from __register_sysctl_paths
@@ -3193,4 +3229,5 @@ EXPORT_SYMBOL(proc_dostring);
EXPORT_SYMBOL(proc_doulongvec_minmax);
EXPORT_SYMBOL(proc_doulongvec_ms_jiffies_minmax);
EXPORT_SYMBOL(register_sysctl_paths);
+EXPORT_SYMBOL(register_sysctl_dir);
EXPORT_SYMBOL(unregister_sysctl_table);
diff --git a/kernel/sysctl_check.c b/kernel/sysctl_check.c
index 205f721..20c1948 100644
--- a/kernel/sysctl_check.c
+++ b/kernel/sysctl_check.c
@@ -24,6 +24,17 @@ static void fail(const struct ctl_path *path,

#define FAIL(str) do { fail(path, t->procname, str); error = -EINVAL;} while (0)

+
+int sysctl_check_path(const struct ctl_path *path,
+ int nr_dirs)
+{
+ if (nr_dirs <= CTL_MAXNAME - 1)
+ return 0;
+ fail(path, NULL, "tree too deep");
+ return -EINVAL;
+}
+
+
int sysctl_check_table(const struct ctl_path *path,
int nr_dirs,
struct ctl_table *table)
@@ -33,10 +44,6 @@ int sysctl_check_table(const struct ctl_path *path,
unsigned int nr_files = 0;
int error = 0;

- if (nr_dirs > CTL_MAXNAME - 1) {
- fail(path, NULL, "tree too deep");
- error = -EINVAL;
- }

for(t = table; t->procname; t++) {
nr_files ++;
--
1.7.5.134.g1c08b

2011-05-08 22:44:41

by Lucian Adrian Grijincu

[permalink] [raw]
Subject: [v2 092/115] sysctl: sched: create empty dir with register_sysctl_dir

Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
kernel/sched.c | 19 ++++---------------
1 files changed, 4 insertions(+), 15 deletions(-)

diff --git a/kernel/sched.c b/kernel/sched.c
index 8320365..5cda526 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -6302,16 +6302,11 @@ static void register_sched_domain_sysctl(void)

i = 0;
for_each_possible_cpu(cpu) {
- struct ctl_table *empty = kzalloc(sizeof(*empty), GFP_KERNEL);
- if (empty == NULL)
- goto unregister_sd_cpudir_headers;
sd_path[SD_PATH_CPU].procname = sd_cpu_names[cpu];
sd_path[SD_PATH_DOM].procname = NULL; /* end of array sentinel */
- sd_cpudir_headers[i] = register_sysctl_paths(sd_path, empty);
- if (sd_cpudir_headers[i] == NULL) {
- kfree(empty);
+ sd_cpudir_headers[i] = register_sysctl_dir(sd_path);
+ if (sd_cpudir_headers[i] == NULL)
goto unregister_sd_cpudir_headers;
- }
i++;
}

@@ -6347,11 +6342,8 @@ unregister_sd_domain_headers:
i = sd_cpudir_headers_num;
unregister_sd_cpudir_headers:
i--;
- for(; i >= 0; i--) {
- struct ctl_table *table = sd_cpudir_headers[i]->ctl_table_arg;
+ for(; i >= 0; i--)
unregister_sysctl_table(sd_cpudir_headers[i]);
- kfree(table);
- }

kfree(sd_domain_headers);
fail_alloc_sd_domain_headers:
@@ -6391,11 +6383,8 @@ static void unregister_sched_domain_sysctl(void)
kfree(table);
}

- for(i = sd_cpudir_headers_num - 1; i >= 0; i--) {
- struct ctl_table *table = sd_cpudir_headers[i]->ctl_table_arg;
+ for(i = sd_cpudir_headers_num - 1; i >= 0; i--)
unregister_sysctl_table(sd_cpudir_headers[i]);
- kfree(table);
- }

kfree(sd_domain_headers);
kfree(sd_cpudir_headers);
--
1.7.5.134.g1c08b

2011-05-08 22:44:22

by Lucian Adrian Grijincu

[permalink] [raw]
Subject: [v2 093/115] sysctl: ax25: create empty dir with register_sysctl_dir

Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
net/ax25/af_ax25.c | 3 +--
1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/net/ax25/af_ax25.c b/net/ax25/af_ax25.c
index 965662d..d8a4ea4 100644
--- a/net/ax25/af_ax25.c
+++ b/net/ax25/af_ax25.c
@@ -1996,7 +1996,6 @@ static const struct __initdata ctl_path ax25_path[] = {
{ .procname = "ax25" },
{ }
};
-static struct ctl_table empty;
static struct ctl_table_header *ax25_root_header;
#endif /* CONFIG_SYSCTL */

@@ -2014,7 +2013,7 @@ static int __init ax25_init(void)

/* XXX: no error checking done in initializer */
#ifdef CONFIG_SYSCTL
- ax25_root_header = register_sysctl_paths(ax25_path, &empty);
+ ax25_root_header = register_sysctl_dir(ax25_path);
#endif

proc_net_fops_create(&init_net, "ax25_route", S_IRUGO, &ax25_route_fops);
--
1.7.5.134.g1c08b

2011-05-08 22:42:33

by Lucian Adrian Grijincu

[permalink] [raw]
Subject: [v2 094/115] sysctl: net/core: create empty dir with register_sysctl_dir

Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
net/core/sysctl_net_core.c | 4 +---
1 files changed, 1 insertions(+), 3 deletions(-)

diff --git a/net/core/sysctl_net_core.c b/net/core/sysctl_net_core.c
index 385b609..6d2fe6e 100644
--- a/net/core/sysctl_net_core.c
+++ b/net/core/sysctl_net_core.c
@@ -239,9 +239,7 @@ static __net_initdata struct pernet_operations sysctl_core_ops = {

static __init int sysctl_core_init(void)
{
- static struct ctl_table empty[1];
-
- register_sysctl_paths(net_core_path, empty);
+ register_sysctl_dir(net_core_path);
register_net_sysctl_rotable(net_core_path, net_core_table);
return register_pernet_subsys(&sysctl_core_ops);
}
--
1.7.5.134.g1c08b

2011-05-08 22:43:56

by Lucian Adrian Grijincu

[permalink] [raw]
Subject: [v2 095/115] sysctl: net/ipv4/neigh: create empty dir with register_sysctl_dir

Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
net/ipv4/route.c | 4 +---
1 files changed, 1 insertions(+), 3 deletions(-)

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 46c7b3d..092f3d1 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -3151,8 +3151,6 @@ static ctl_table ipv4_route_table[] = {
{ }
};

-static struct ctl_table empty[1];
-
static __net_initdata struct ctl_path ipv4_neigh_path[] = {
{ .procname = "net", },
{ .procname = "ipv4", },
@@ -3310,6 +3308,6 @@ int __init ip_rt_init(void)
void __init ip_static_sysctl_init(void)
{
register_sysctl_paths(ipv4_route_path, ipv4_route_table);
- register_sysctl_paths(ipv4_neigh_path, empty);
+ register_sysctl_dir(ipv4_neigh_path);
}
#endif
--
1.7.5.134.g1c08b

2011-05-08 22:43:28

by Lucian Adrian Grijincu

[permalink] [raw]
Subject: [v2 096/115] sysctl: net/ipv6/neigh: create empty dir with register_sysctl_dir

Signed-off-by: Lucian Adrian Grijincu <[email protected]>
---
net/ipv6/sysctl_net_ipv6.c | 4 +---
1 files changed, 1 insertions(+), 3 deletions(-)

diff --git a/net/ipv6/sysctl_net_ipv6.c b/net/ipv6/sysctl_net_ipv6.c
index 1d2d8c7..bb57ab4 100644
--- a/net/ipv6/sysctl_net_ipv6.c
+++ b/net/ipv6/sysctl_net_ipv6.c
@@ -15,8 +15,6 @@
#include <net/addrconf.h>
#include <net/inet_frag.h>

-static struct ctl_table empty[1];
-
static ctl_table ipv6_bindv6only_template[] = {
{
.procname = "bindv6only",
@@ -173,7 +171,7 @@ static struct ctl_table_header *ip6_base;

int ipv6_static_sysctl_register(void)
{
- ip6_base = register_sysctl_paths(net_ipv6_neigh_path, empty);
+ ip6_base = register_sysctl_dir(net_ipv6_neigh_path);
if (ip6_base == NULL)
return -ENOMEM;
return 0;
--
1.7.5.134.g1c08b

2011-05-08 22:41:37

by David Miller

[permalink] [raw]
Subject: Re: [v2 000/115] faster tree-based sysctl implementation

From: Lucian Adrian Grijincu <[email protected]>
Date: Mon, 9 May 2011 00:38:12 +0200

> This patch series introduces a faster/leaner sysctl internal implementation:

NO WAY.

Do not send so many patches all at once, never do anything like
that, it's too much.

2011-05-09 03:11:10

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [v2 000/115] faster tree-based sysctl implementation

Lucian Adrian Grijincu <[email protected]> writes:

> Eric also asked for:
> - replacing the per-header list of subdirs with a rbtree.
> -- Again, lack of time, and this can always be added at a later time
> to optimize lookup and duplicate checks. At the moment this patch
> series does not add a complexity regression over the previous
> implementation, au contraire.

Instead you add a dubious mechanisms, to avoid duplicates checks
because the duplicate checks takes time.

A good internal data structure should have logN duplicate checks
and that should not be cost prohibitive. For maintainability of
the users a good internal data structure removes the need for
special cases, and it actually makes use the sysctl fast.
Your implementation continues to have a O(N^2) cost to use the
sysctl files.

Getting a good internal data structure is the most important part
of a making sysctls fast, and keeping them fast. Your patchset
fails in that.

So at that level, a great big NAK on the patchset as it currently
exists.


You have made a key observation that we only need better data
structures for the directories and can leave off that work
for the callers.

Thank you for that.

Additionally thank you for splitting up your changes to the
core of sysctl so there is a chance of reviewing those.

With respect to review and merging. The priorities should be:
1) Figure out what the users really need to do, before we change
everything so we only need one sweeping pass through the users
that hopefully simplifies them.

2) Ensure you have a version of the new interfaces ready at the
start of the patchset, so the sweeping changes can be made
incrementally.

3) Clean up the users.

4) Make simplifications because you don't have any old users left.

I very much agree that the data structures that sysctl uses today
are far from ideal, and that we can make a lot of headway.

That said I think there is some good work in your patchset and I may
try and cherry pick out the good bits that we can merge now.

Eric

2011-05-10 01:07:10

by Lucian Adrian Grijincu

[permalink] [raw]
Subject: Re: [v2 000/115] faster tree-based sysctl implementation

On Mon, May 9, 2011 at 5:11 AM, Eric W. Biederman <[email protected]> wrote:
> With respect to review and merging.  The priorities should be:
> 1) Figure out what the users really need to do, before we change
>   everything so we only need one sweeping pass through the users
>   that hopefully simplifies them.


The users of the API should not rely on the .child field of 'struct
ctl_table' because we don't use that to encode the tree structure any
more.
That's absolutely the only thing that's changed in the API.

Most of the sysctl users in the tree don't need any change and will
work with the new sysctl just fine.

For the few that do, the first path of the series (the patches with
"no-child" in the name) get's rid of the .child field by:
a) replacing register_sysctl_table with register_sysctl_paths (very
straight forward change and the new code is cleaner)
b) registering files in multiple directories separately



That part of the patch series does not disturb the current
functionality and does not depend of the later patches; it uses the
current sysctl API to achieve the same goal.



> 2) Ensure you have a version of the new interfaces ready at the
>   start of the patchset, so the sweeping changes can be made
>   incrementally.
>
> 3) Clean up the users.
>
> 4) Make simplifications because you don't have any old users left.


Hmm. I'm not sure if I understand you correctly.

You want this so that the cleanup patches get in through the affected
subsystems, and not in a big series that would possibly create merge
problems with those trees?

This wouldn't be too hard to do and wouldn't uglify the new interface
too much, but I have limited time available to do this in the near
future (exams and school projects). I'll see how I can squeeze this
in.

--
 .
..: Lucian