2006-08-01 14:45:06

by Amit Gud

[permalink] [raw]
Subject: [RFC] [PATCH] sysctl for the latecomers

* Warning: This patch is a prototype and is NOT tested. *

Signed-off-by: Amit Gud <[email protected]>
---

--- kernel/sysctl.c.orig 2006-07-20 16:56:09.000000000 -0400
+++ kernel/sysctl.c 2006-07-20 19:33:18.000000000 -0400
@@ -133,6 +133,9 @@ extern int no_unaligned_warning;

static int parse_table(int __user *, int, void __user *, size_t __user *, void __user *, size_t,
ctl_table *, void **);
+static void check_pending_values(ctl_table *);
+static void store_pending_val(int, void *, size_t);
+static void init_pending_value_table();
static int proc_doutsstring(ctl_table *table, int write, struct file *filp,
void __user *buffer, size_t *lenp, loff_t *ppos);

@@ -158,6 +161,23 @@ extern ctl_table inotify_table[];
int sysctl_legacy_va_layout;
#endif

+/* struct ctl_pending_values is used for temporarily storing data values
+ for the entries which are not found at the _sysctl(2) time. This is
+ useful for the modules which are required to use the values stored
+ in /etc/sysctl.conf. During regiter_sysctl_table() these ctl_names
+ are checked against the ctl_names that are being registered and
+ 'strategy' and 'proc_handler' is called if previous entry is found. */
+#define NR_TEMP_VALUES 32 /* temporary storage for the data for unmatched ctl_names */
+
+struct ctl_pending_val_s {
+ int name;
+ void *data;
+ size_t len;
+} ctl_pending_values[NR_TEMP_VALUES];
+
+static int ctl_index = 0;
+spinlock_t ctl_pending_lock = SPIN_LOCK_UNLOCKED;
+
/* /proc declarations: */

#ifdef CONFIG_PROC_FS
@@ -1099,12 +1119,20 @@ static void start_unregistering(struct c
list_del_init(&p->ctl_entry);
}

+static void init_pending_value_table()
+{
+ int i;
+ for(i = 0; i < NR_TEMP_VALUES; i++)
+ ctl_pending_values[i].name = 0;
+}
+
void __init sysctl_init(void)
{
#ifdef CONFIG_PROC_FS
register_proc_table(root_table, proc_sys_root, &root_table_header);
init_irq_proc();
#endif
+ init_pending_value_table();
}

int do_sysctl(int __user *name, int nlen, void __user *oldval, size_t __user *oldlenp,
@@ -1186,6 +1214,20 @@ static inline int ctl_perm(ctl_table *ta
return test_perm(table->mode, op);
}

+static void store_pending_val(int n, void *newval, size_t newlen)
+{
+ spin_lock(&ctl_pending_lock);
+ kfree(ctl_pending_values[ctl_index].data);
+
+ ctl_pending_values[ctl_index].name = n;
+ ctl_pending_values[ctl_index].data = kmalloc(newlen, GFP_ATOMIC);
+ memcpy(ctl_pending_values[ctl_index].data, newval, newlen);
+ ctl_pending_values[ctl_index].len = newlen;
+
+ ctl_index = (++ctl_index % NR_TEMP_VALUES);
+ spin_unlock(&ctl_pending_lock);
+}
+
static int parse_table(int __user *name, int nlen,
void __user *oldval, size_t __user *oldlenp,
void __user *newval, size_t newlen,
@@ -1222,9 +1264,26 @@ repeat:
return error;
}
}
+ store_pending_val(n, newval, newlen);
return -ENOTDIR;
}

+static void check_pending_values(ctl_table *table)
+{
+ int i;
+ spin_lock(&ctl_pending_lock);
+ for(i = 0; i < NR_TEMP_VALUES; i++) {
+ int error;
+ void *context = NULL;
+ error = parse_table(&ctl_pending_values[i].name, 1, 0, 0, ctl_pending_values[i].data,
+ ctl_pending_values[i].len, table, &context);
+ kfree(context);
+ if(!error)
+ ctl_pending_values[i].name = 0;
+ }
+ spin_unlock(&ctl_pending_lock);
+}
+
/* Perform the actual read/write of a sysctl table entry. */
int do_sysctl_strategy (ctl_table *table,
int __user *name, int nlen,
@@ -1365,6 +1424,7 @@ struct ctl_table_header *register_sysctl
#ifdef CONFIG_PROC_FS
register_proc_table(table, proc_sys_root, tmp);
#endif
+ check_pending_values(table);
return tmp;
}


Attachments:
sysctl-pending-values-v0.1.patch (3.57 kB)

2006-08-01 15:25:10

by Frederik Deweerdt

[permalink] [raw]
Subject: Re: [RFC] [PATCH] sysctl for the latecomers

On Tue, Aug 01, 2006 at 10:49:20AM -0400, Amit Gud wrote:
> /etc/sysctl.conf values are of no use to kernel modules that are inserted
> after init scripts call sysctl for the values in /etc/sysctl.conf
>
> For modules to use the values stored in the file /etc/sysctl.conf, sysctl
> kernel code can keep record of 'limited' values, for sysctl entries which
> haven't been registered yet. During registration, sysctl code can check
> against the stored values and call the appropriate strategy and proc_handler
> routines if a match is found.
>
> Attached patch does just that. This patch is NOT tested and is just to get
> opinions, if something like this is a right way of addressing this problem.
>
>
Hi,

One strange behaviour that comes to mind is the following:
1. I boot my machine so that it doesn't load module X
2. I modify /etc/sysctl.conf and I remove a line affecting module X
3. I modprobe X

Wouldn't the fact that the sysctl directive is applied anyway be a bit
misleading?

Regards,
Frederik

2006-08-01 16:56:20

by Chase Venters

[permalink] [raw]
Subject: Re: [RFC] [PATCH] sysctl for the latecomers

On Tue, 1 Aug 2006, Amit Gud wrote:

> /etc/sysctl.conf values are of no use to kernel modules that are inserted
> after init scripts call sysctl for the values in /etc/sysctl.conf
>
> For modules to use the values stored in the file /etc/sysctl.conf, sysctl
> kernel code can keep record of 'limited' values, for sysctl entries which
> haven't been registered yet. During registration, sysctl code can check
> against the stored values and call the appropriate strategy and proc_handler
> routines if a match is found.
>
> Attached patch does just that. This patch is NOT tested and is just to get
> opinions, if something like this is a right way of addressing this problem.

Do you anticipate any users that you could list? It seems like a more
appropriate approach would be to allow some kind of user-space hook or
event notification to run upon module insertion, which could then apply
the appropriate sysctl.

>
> Thanks,
> AG
>

Thanks,
Chase

2006-08-01 17:22:26

by Chase Venters

[permalink] [raw]
Subject: Re: [RFC] [PATCH] sysctl for the latecomers

On Tue, 1 Aug 2006, Chase Venters wrote:

> On Tue, 1 Aug 2006, Amit Gud wrote:
>
>> /etc/sysctl.conf values are of no use to kernel modules that are inserted
>> after init scripts call sysctl for the values in /etc/sysctl.conf
>>
>> For modules to use the values stored in the file /etc/sysctl.conf, sysctl
>> kernel code can keep record of 'limited' values, for sysctl entries which
>> haven't been registered yet. During registration, sysctl code can check
>> against the stored values and call the appropriate strategy and
>> proc_handler routines if a match is found.
>>
>> Attached patch does just that. This patch is NOT tested and is just to get
>> opinions, if something like this is a right way of addressing this
>> problem.
>
> Do you anticipate any users that you could list? It seems like a more
> appropriate approach would be to allow some kind of user-space hook or event
> notification to run upon module insertion, which could then apply the
> appropriate sysctl.

Btw, wanted to add some comments on the specific approach:

1. A ring hard-coded to 32 elements is IMO unuseable. While it may not be
a real limit for what use case you have in mind, if it's in the kernel
sooner or later someone else is going to use it and get bitten. Imagine if
they wrote in 33 entries, and the first one was some critical security
setting that ended up getting silently ignored...

2. On the other hand, allowing it to grow unbounded is equally
unacceptable without a mechanism to list and clear the current "pending"
sysctl values. Unfortunately, at this point, you're starting to violate
"KISS".

Are the modules you refer to inserted during init at all? Because it seems
like it would be a lot more appropriate to just move sysctl until after
loading the modules, or perhaps running it again once they are loaded.

Thanks,
Chase

2006-08-01 17:41:55

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [RFC] [PATCH] sysctl for the latecomers

Amit Gud wrote:
> /etc/sysctl.conf values are of no use to kernel modules that are
> inserted after init scripts call sysctl for the values in /etc/sysctl.conf
>
> For modules to use the values stored in the file /etc/sysctl.conf,
> sysctl kernel code can keep record of 'limited' values, for sysctl
> entries which haven't been registered yet. During registration, sysctl
> code can check against the stored values and call the appropriate
> strategy and proc_handler routines if a match is found.
>
> Attached patch does just that. This patch is NOT tested and is just to
> get opinions, if something like this is a right way of addressing this
> problem.
>

Sounds like it would make more sense to add this kind of functionality
to modprobe.

-hpa

2006-08-01 18:54:39

by Andi Kleen

[permalink] [raw]
Subject: Re: [RFC] [PATCH] sysctl for the latecomers

Amit Gud <[email protected]> writes:

> /etc/sysctl.conf values are of no use to kernel modules that are
> inserted after init scripts call sysctl for the values in
> /etc/sysctl.conf

_sysctl(2) is obsolete and on its way out. Doesn't make sense
to add new feature to it.

BTW I doubt the sysctl user program actually uses it - most likely
it uses /proc/sys

I think I agree with hpa that this feature belongs into modprobe
if anywhere.

-Andi

2006-08-01 18:59:53

by Amit Gud

[permalink] [raw]
Subject: Re: [RFC] [PATCH] sysctl for the latecomers

Chase Venters wrote:
> On Tue, 1 Aug 2006, Chase Venters wrote:
> Btw, wanted to add some comments on the specific approach:
>
> 1. A ring hard-coded to 32 elements is IMO unuseable. While it may not
> be a real limit for what use case you have in mind, if it's in the
> kernel sooner or later someone else is going to use it and get bitten.
> Imagine if they wrote in 33 entries, and the first one was some critical
> security setting that ended up getting silently ignored...
>
> 2. On the other hand, allowing it to grow unbounded is equally
> unacceptable without a mechanism to list and clear the current "pending"
> sysctl values. Unfortunately, at this point, you're starting to violate
> "KISS".
>

You figured it right, theres no "correct" number of elements that I
could adhere to.

> Are the modules you refer to inserted during init at all? Because it
> seems like it would be a lot more appropriate to just move sysctl until
> after loading the modules, or perhaps running it again once they are
> loaded.
>

I have a case where sunrpc module gets inserted and
sunrpc.tcp_slot_table_entries parameter is to be set _before_ nfs module
is inserted. I agree that for this particular case user-space works
(either udev rule, or modprobe.conf or sysctl after modprobe in
initscripts), but am on a lookout for a more generic way for handling
such cases - be it user-space or otherwise.


AG
--
May the source be with you.
http://www.cis.ksu.edu/~gud

2006-08-01 20:36:11

by Chase Venters

[permalink] [raw]
Subject: Re: [RFC] [PATCH] sysctl for the latecomers

On Tue, 1 Aug 2006, Amit Gud wrote:

> Chase Venters wrote:
>> On Tue, 1 Aug 2006, Chase Venters wrote:
>> Btw, wanted to add some comments on the specific approach:
>>
>> 1. A ring hard-coded to 32 elements is IMO unuseable. While it may not be
>> a real limit for what use case you have in mind, if it's in the kernel
>> sooner or later someone else is going to use it and get bitten. Imagine if
>> they wrote in 33 entries, and the first one was some critical security
>> setting that ended up getting silently ignored...
>>
>> 2. On the other hand, allowing it to grow unbounded is equally
>> unacceptable without a mechanism to list and clear the current "pending"
>> sysctl values. Unfortunately, at this point, you're starting to violate
>> "KISS".
>>
>
> You figured it right, theres no "correct" number of elements that I could
> adhere to.
>
>> Are the modules you refer to inserted during init at all? Because it seems
>> like it would be a lot more appropriate to just move sysctl until after
>> loading the modules, or perhaps running it again once they are loaded.
>>
>
> I have a case where sunrpc module gets inserted and
> sunrpc.tcp_slot_table_entries parameter is to be set _before_ nfs module is
> inserted. I agree that for this particular case user-space works (either udev
> rule, or modprobe.conf or sysctl after modprobe in initscripts), but am on a
> lookout for a more generic way for handling such cases - be it user-space or
> otherwise.

It just seems like something that belongs in user-space -- whether that be
better init scripts, changes to modprobe, both, or something else
entirely.

To make a kernel implementation of this concept that isn't
fundamentally flawed from a usability and "least-surprise"
perspective would mean enough code and behavior that the resulting
implementation wouldn't be considered correct for the kernel anyway.

The kernel has, fundamentally, three kinds of configuration - CONFIG_*,
the bootloader command-line, and 'live' configuration that gets set by
user-space whenever appropriate.

>
> AG
>

Thanks,
Chase

2006-08-02 14:57:05

by Horst H. von Brand

[permalink] [raw]
Subject: Re: [RFC] [PATCH] sysctl for the latecomers

Chase Venters <[email protected]> wrote:
> On Tue, 1 Aug 2006, Chase Venters wrote:
> > On Tue, 1 Aug 2006, Amit Gud wrote:
> >> /etc/sysctl.conf values are of no use to kernel modules that are inserted
> >> after init scripts call sysctl for the values in /etc/sysctl.conf
> >>
> >> For modules to use the values stored in the file /etc/sysctl.conf, sysctl
> >> kernel code can keep record of 'limited' values, for sysctl entries which
> >> haven't been registered yet. During registration, sysctl code can check
> >> against the stored values and call the appropriate strategy and
> >> proc_handler routines if a match is found.
> >>
> >> Attached patch does just that. This patch is NOT tested and is just to
> >> get opinions, if something like this is a right way of addressing this
> >> problem.

> > Do you anticipate any users that you could list? It seems like a
> > more appropriate approach would be to allow some kind of user-space
> > hook or event notification to run upon module insertion, which could
> > then apply the appropriate sysctl.

> Btw, wanted to add some comments on the specific approach:
>
> 1. A ring hard-coded to 32 elements is IMO unuseable. While it may not
> be a real limit for what use case you have in mind, if it's in the
> kernel sooner or later someone else is going to use it and get
> bitten. Imagine if they wrote in 33 entries, and the first one was
> some critical security setting that ended up getting silently
> ignored...

Right.

> 2. On the other hand, allowing it to grow unbounded is equally
> unacceptable without a mechanism to list and clear the current
> "pending" sysctl values. Unfortunately, at this point, you're starting
> to violate "KISS".

Yep.

> Are the modules you refer to inserted during init at all? Because it
> seems like it would be a lot more appropriate to just move sysctl
> until after loading the modules, or perhaps running it again once they
> are loaded.

OK, lets step back a bit... sysctl(8) can be used to load other values, or
to change them, etc. Whatever is in sysctl.conf is just /default/ values,
/typically/ set on boot. You could let the initscripts set those, then
fiddle them by hand. If a module is now inserted, you can't just reset the
values to the ones in the file.

Perhaps module-specific values should be set in the configuration for the
module itself (i.e., arguments) or add to modprobe.conf something along
the lines:

sysctl mymod kernel.mystuff=42

and have modprobe(8) run sysctl(8) as needed after loading the module?
Sounds more KISSy to me...
--
Dr. Horst H. von Brand User #22616 counter.li.org
Departamento de Informatica Fono: +56 32 654431
Universidad Tecnica Federico Santa Maria +56 32 654239
Casilla 110-V, Valparaiso, Chile Fax: +56 32 797513