2007-08-28 22:40:35

by Eric W. Biederman

[permalink] [raw]
Subject: [PATCH] sysctl: Deprecate sys_sysctl in a user space visible fashion.


After adding checking to register_sysctl_table and finding a whole new
set of bugs. Missed by countless code reviews and testers I have
finally lost patience with the binary sysctl interface.

The binary sysctl interface has been sort of deprecated for years and
finding a user space program that uses the syscall is more difficult
then finding a needle in a haystack. Problems continue to crop up,
with the in kernel implementation. So since supporting something that
no one uses is silly, deprecate sys_sysctl with a sufficient grace
period and notice that the handful of user space applications that
care can be fixed or replaced.

The /proc/sys sysctl interface that people use will continue to be
supported indefinitely.

This patch moves the tested warning about sysctls from the path where
sys_sysctl to a separate path called from both implementations of
sys_sysctl, and it adds a proper entry into
Documentation/feature-removal-schedule.

Allowing us to revisit this in a couple years time and actually kill
sys_sysctl.

Signed-off-by: Eric W. Biederman <[email protected]>
---
Documentation/feature-removal-schedule.txt | 35 ++++++++++++++++
kernel/sysctl.c | 62 +++++++++++++++++----------
2 files changed, 74 insertions(+), 23 deletions(-)

diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt
index a43d287..4d3097e 100644
--- a/Documentation/feature-removal-schedule.txt
+++ b/Documentation/feature-removal-schedule.txt
@@ -290,3 +290,38 @@ Why: All mthca hardware also supports MSI-X, which provides
Who: Roland Dreier <[email protected]>

---------------------------
+
+What: sys_sysctl
+When: September 2010
+Option: CONFIG_SYSCTL_SYSCALL
+Why: The same information is available in a more convenient from
+ /proc/sys, and none of the sysctl variables appear to be
+ important performance wise.
+
+ Binary sysctls are a long standing source of subtle kernel
+ bugs and security issues.
+
+ When I looked several months ago all I could find after
+ searching several distributions were 5 user space programs and
+ glibc (which falls back to /proc/sys) using this syscall.
+
+ The man page for sysctl(2) documents it as unusable for user
+ space programs.
+
+ sysctl(2) is not generally ABI compatible to a 32bit user
+ space application on a 64bit and a 32bit kernel.
+
+ For the last several months the policy has been no new binary
+ sysctls and no one has put forward an argument to use them.
+
+ Binary sysctls issues seem to keep happening appearing so
+ properly deprecating them (with a warning to user space) and a
+ 2 year grace warning period will mean eventually we can kill
+ them and end the pain.
+
+ In the mean time individual binary sysctls can be dealt with
+ in a piecewise fashion.
+
+Who: Eric Biederman <[email protected]>
+
+---------------------------
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 6d01497..792e6fe 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -1275,6 +1275,33 @@ struct ctl_table_header *sysctl_head_next(struct ctl_table_header *prev)
return NULL;
}

+static int deprecated_sysctl_warning(struct __sysctl_args *args)
+{
+ static int msg_count;
+ int name[CTL_MAXNAME];
+ int i;
+
+ /* Read in the sysctl name for better debug message logging */
+ for (i = 0; i < args->nlen; i++)
+ if (get_user(name[i], args->name + i))
+ return -EFAULT;
+
+ /* Ignore accesses to kernel.version */
+ if ((args->nlen == 2) && (name[0] == CTL_KERN) && (name[1] == KERN_VERSION))
+ return 0;
+
+ if (msg_count < 5) {
+ msg_count++;
+ printk(KERN_INFO
+ "warning: process `%s' used the deprecated sysctl "
+ "system call with ", current->comm);
+ for (i = 0; i < args->nlen; i++)
+ printk("%d.", name[i]);
+ printk("\n");
+ }
+ return 0;
+}
+
#ifdef CONFIG_SYSCTL_SYSCALL
int do_sysctl(int __user *name, int nlen, void __user *oldval, size_t __user *oldlenp,
void __user *newval, size_t newlen)
@@ -1310,10 +1337,15 @@ asmlinkage long sys_sysctl(struct __sysctl_args __user *args)
if (copy_from_user(&tmp, args, sizeof(tmp)))
return -EFAULT;

+ error = deprecated_sysctl_warning(&tmp);
+ if (error)
+ goto out;
+
lock_kernel();
error = do_sysctl(tmp.name, tmp.nlen, tmp.oldval, tmp.oldlenp,
tmp.newval, tmp.newlen);
unlock_kernel();
+out:
return error;
}
#endif /* CONFIG_SYSCTL_SYSCALL */
@@ -2503,35 +2535,19 @@ int sysctl_ms_jiffies(struct ctl_table *table, int __user *name, int nlen,

asmlinkage long sys_sysctl(struct __sysctl_args __user *args)
{
- static int msg_count;
struct __sysctl_args tmp;
- int name[CTL_MAXNAME];
- int i;
+ int error;

- /* Read in the sysctl name for better debug message logging */
if (copy_from_user(&tmp, args, sizeof(tmp)))
return -EFAULT;
- if (tmp.nlen <= 0 || tmp.nlen >= CTL_MAXNAME)
- return -ENOTDIR;
- for (i = 0; i < tmp.nlen; i++)
- if (get_user(name[i], tmp.name + i))
- return -EFAULT;

- /* Ignore accesses to kernel.version */
- if ((tmp.nlen == 2) && (name[0] == CTL_KERN) && (name[1] == KERN_VERSION))
- goto out;
+ error = deprecated_sysctl_warning(&tmp);

- if (msg_count < 5) {
- msg_count++;
- printk(KERN_INFO
- "warning: process `%s' used the removed sysctl "
- "system call with ", current->comm);
- for (i = 0; i < tmp.nlen; i++)
- printk("%d.", name[i]);
- printk("\n");
- }
-out:
- return -ENOSYS;
+ /* If no error reading the parameters then just -ENOSYS ... */
+ if (!error)
+ error = -ENOSYS;
+
+ return error;
}

int sysctl_data(struct ctl_table *table, int __user *name, int nlen,
--
1.5.3.rc6.17.g1911


2007-08-28 23:05:20

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH] sysctl: Deprecate sys_sysctl in a user space visible fashion.

On Tue, Aug 28, 2007 at 04:40:15PM -0600, Eric W. Biederman wrote:
> +When: September 2010
> +Option: CONFIG_SYSCTL_SYSCALL
> +Why: The same information is available in a more convenient from
> + /proc/sys, and none of the sysctl variables appear to be
> + important performance wise.
> +
> + Binary sysctls are a long standing source of subtle kernel
> + bugs and security issues.
> +
> + When I looked several months ago all I could find after
> + searching several distributions were 5 user space programs and
> + glibc (which falls back to /proc/sys) using this syscall.

Umm, no way we're ever going to remove a syscall like this. Please
stop this deprecration crap. Just make sure no ones adds more binary
sysctls.

2007-08-28 23:54:26

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH] sysctl: Deprecate sys_sysctl in a user space visible fashion.

Christoph Hellwig <[email protected]> writes:

> Umm, no way we're ever going to remove a syscall like this.

If someone besides me cares about more then rhetoric I will be happy
to reconsider and several years is plenty of time to find that out.

I aborted the removal last time precisely because we had not done an
adequate job of warning our users. A printk when we run a program
that uses the binary interface and an long enough interval the warning
makes it to the Enterprise kernels before we remove the interface
should be sufficient.

> stop this deprecration crap. Just make sure no ones adds more binary
> sysctls.

The sysctl_check_table function should keep out most of the problem
cases and especially it should ensure we don't add any new binary
sysctls by accident.

However given our atrocious record at catching these kinds of
problems via code review and testing and the fact that no one
uses these things anyway, I don't see an argument for keeping
dead code in the kernel.

Over the long term the goal is to not break user space binaries.

I see a better chance of achieving the goal of not breaking user space
binaries if we remove interfaces that no known user space applications
use, in a way a well written application can handle, then to let
the user space interface code succumb to bit rot, and start returning
the wrong values to user space.

That is where we are at with sys_sysctl.
Almost all of the binary paths have no known users and the
implementations are succumbing to bit rot. The binary interface and
the proc interface go through two completely separate paths so there
is little to ensure those paths don't diverge over time.

It is also true that the non-generic helper functions are diverging
over time. Currently these things are not an issue because no one
actually uses the binary interfaces. The empirical evidence seems
overwhelming on this point.

So just freezing us at our current set of non-broken binary sysctls
does not seem sufficient to ensure we don't break user space binaries.
Although it does seem to be a good start.

Eric

2007-08-29 01:36:08

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH] sysctl: Deprecate sys_sysctl in a user space visible fashion.

Eric W. Biederman wrote:
> Christoph Hellwig <[email protected]> writes:
>
>> Umm, no way we're ever going to remove a syscall like this.
>
> If someone besides me cares about more then rhetoric I will be happy
> to reconsider and several years is plenty of time to find that out.
>
> I aborted the removal last time precisely because we had not done an
> adequate job of warning our users. A printk when we run a program
> that uses the binary interface and an long enough interval the warning
> makes it to the Enterprise kernels before we remove the interface
> should be sufficient.
>

glibc uses it, and it uses it in contexts where access to the filesystem
isn't functional (e.g. in chroot.)

-hpa

2007-08-29 01:56:55

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH] sysctl: Deprecate sys_sysctl in a user space visible fashion.

"H. Peter Anvin" <[email protected]> writes:

> Eric W. Biederman wrote:
>> Christoph Hellwig <[email protected]> writes:
>>
>>> Umm, no way we're ever going to remove a syscall like this.
>>
>> If someone besides me cares about more then rhetoric I will be happy
>> to reconsider and several years is plenty of time to find that out.
>>
>> I aborted the removal last time precisely because we had not done an
>> adequate job of warning our users. A printk when we run a program
>> that uses the binary interface and an long enough interval the warning
>> makes it to the Enterprise kernels before we remove the interface
>> should be sufficient.
>>
>
> glibc uses it, and it uses it in contexts where access to the filesystem isn't
> functional (e.g. in chroot.)

Yes. But (a) It doesn't affect correctness what answer it gets back.
(b) It should be using uname.

Or are you thinking about something besides the pthreads usage?

Eric

2007-08-29 04:49:19

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] sysctl: Deprecate sys_sysctl in a user space visible fashion.

On Wed, 29 Aug 2007 00:04:59 +0100 Christoph Hellwig <[email protected]> wrote:

> On Tue, Aug 28, 2007 at 04:40:15PM -0600, Eric W. Biederman wrote:
> > +When: September 2010
> > +Option: CONFIG_SYSCTL_SYSCALL
> > +Why: The same information is available in a more convenient from
> > + /proc/sys, and none of the sysctl variables appear to be
> > + important performance wise.
> > +
> > + Binary sysctls are a long standing source of subtle kernel
> > + bugs and security issues.
> > +
> > + When I looked several months ago all I could find after
> > + searching several distributions were 5 user space programs and
> > + glibc (which falls back to /proc/sys) using this syscall.
>
> Umm, no way we're ever going to remove a syscall like this. Please
> stop this deprecration crap. Just make sure no ones adds more binary
> sysctls.

I think it's worth a try. It might take two, three or five years, who
knows? If it turns out to be impractical then we we can just change our
minds later, no big loss. It's just too early to say right now.


2007-08-29 04:50:11

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] sysctl: Deprecate sys_sysctl in a user space visible fashion.

On Tue, 28 Aug 2007 16:40:15 -0600 [email protected] (Eric W. Biederman) wrote:

> +static int deprecated_sysctl_warning(struct __sysctl_args *args)
> +{
> + static int msg_count;
> + int name[CTL_MAXNAME];
> + int i;
> +
> + /* Read in the sysctl name for better debug message logging */
> + for (i = 0; i < args->nlen; i++)
> + if (get_user(name[i], args->name + i))
> + return -EFAULT;
> +
> + /* Ignore accesses to kernel.version */
> + if ((args->nlen == 2) && (name[0] == CTL_KERN) && (name[1] == KERN_VERSION))
> + return 0;

Do we want to do all the above if msg_count>=5?

> + if (msg_count < 5) {
> + msg_count++;
> + printk(KERN_INFO
> + "warning: process `%s' used the deprecated sysctl "
> + "system call with ", current->comm);
> + for (i = 0; i < args->nlen; i++)
> + printk("%d.", name[i]);
> + printk("\n");
> + }
> + return 0;
> +}

2007-08-29 05:25:44

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH] sysctl: Deprecate sys_sysctl in a user space visible fashion.

Andrew Morton <[email protected]> writes:

> On Tue, 28 Aug 2007 16:40:15 -0600 [email protected] (Eric W. Biederman)
> wrote:
>
>> +static int deprecated_sysctl_warning(struct __sysctl_args *args)
>> +{
>> + static int msg_count;
>> + int name[CTL_MAXNAME];
>> + int i;
>> +
>> + /* Read in the sysctl name for better debug message logging */
>> + for (i = 0; i < args->nlen; i++)
>> + if (get_user(name[i], args->name + i))
>> + return -EFAULT;
>> +
>> + /* Ignore accesses to kernel.version */
>> + if ((args->nlen == 2) && (name[0] == CTL_KERN) && (name[1] == KERN_VERSION))
>> + return 0;
>
> Do we want to do all the above if msg_count>=5?

Well. It won't really change order of the algorithm because we have
to read the data in any way. So an earlier short circuit exit
would speed things up by a little bit, but it really shouldn't
matter either way.

Eric

2007-08-29 10:38:01

by Alan

[permalink] [raw]
Subject: Re: [PATCH] sysctl: Deprecate sys_sysctl in a user space visible fashion.

> >> adequate job of warning our users. A printk when we run a program
> >> that uses the binary interface and an long enough interval the warning
> >> makes it to the Enterprise kernels before we remove the interface
> >> should be sufficient.

The enterprise products will probably just remove the printk. Even if
they didn't you are looking at ten years before things finish changing
based on current experiences, probably longer as things stabilize.

The whole "whine a bit" process simply doesn't work when you are trying
to persuade people to move in a non-hobbyist context. They don't want to
move, the message is simply an annoyance, their upstream huge package
vendor won't change just to deal with it and they'll class it as a
regression from previous releases, an incompatibility and file bugs until
it goes away.

Its user ABI and as Linus said - we don't break it. Trimming down all the
crap that never worked via sysctl is one thing, not putting sysctl in new
platforms likewise. Trying to undo it isn't going to work

Alan

2007-08-29 17:17:16

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH] sysctl: Deprecate sys_sysctl in a user space visible fashion.

Alan Cox <[email protected]> writes:

>> >> adequate job of warning our users. A printk when we run a program
>> >> that uses the binary interface and an long enough interval the warning
>> >> makes it to the Enterprise kernels before we remove the interface
>> >> should be sufficient.
>
> The enterprise products will probably just remove the printk. Even if
> they didn't you are looking at ten years before things finish changing
> based on current experiences, probably longer as things stabilize.
>
> The whole "whine a bit" process simply doesn't work when you are trying
> to persuade people to move in a non-hobbyist context. They don't want to
> move, the message is simply an annoyance, their upstream huge package
> vendor won't change just to deal with it and they'll class it as a
> regression from previous releases, an incompatibility and file bugs until
> it goes away.

My hypothesis. No one cares now.

My observation. The way we have been maintaining the binary sysctl
side of things using it is asking for your application to be broken in
subtle and nasty ways.

If that is true none of your enterprise concerns apply because it isn't
used, or we will be breaking the enterprise application by accident
anyway in a much more difficult way to diagnose.

Right now there is only one way to find out. Plan on removing it
and see what the fallout really is.

If the grace period needs to be longer then 2 years I have no problem
upping it. If my hypothesis is wrong and there are real users who
care that aren't easy to change I don't mind keeping it.

> Its user ABI and as Linus said - we don't break it. Trimming down all the
> crap that never worked via sysctl is one thing, not putting sysctl in new
> platforms likewise. Trying to undo it isn't going to work

Then this will fail and we will have a good case for maintaining
sys_sysctl. No breaking user space is the point of the grace period.

At the rate things are going now I suspect that we will wind up
removing sys_sysctl one entry at a time as we discover new and
interesting ways that the code has been broken through bad
maintenance.

I am very much in favor of not breaking user space, and preserving
a binary ABI. I think we can best avoiding breaking user space
applications by getting them to avoid sys_sysctl, and getting
them to stop if they already are.

This isn't "Oh some apps are using it let's get them to stop, because
it is inconvenient". This is "No apps seems to be using this, we
keep goofing up the maintenance and no one notices, and so it is
likely a source of security problems and other nasties"

I have evidence that there are 1 or 2 open source applications using
this (not counting the glibc weirdness). Although the ones I have
found were either old installers or were already patched to not
do this in some of the distros, or in the case of glibc really don't
care. So currently I believe that after we spread the word far and
wide in a way that people will listen to no one will have any real
complaints and the proof that it is safe to remove will be complete.

Eric

2007-08-29 17:31:45

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH] sysctl: Deprecate sys_sysctl in a user space visible fashion.

Eric W. Biederman wrote:
>
> My hypothesis. No one cares now.
>
> My observation. The way we have been maintaining the binary sysctl
> side of things using it is asking for your application to be broken in
> subtle and nasty ways.
>

I suspect the right thing to do is simply to make a list of the
supported binary sysctls, and automatically verify those numbers. Doing
that would alleviate these concerns, wouldn't break anything, and isn't
really that hard to do.

-hpa

2007-08-29 19:00:43

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH] sysctl: Deprecate sys_sysctl in a user space visible fashion.

"H. Peter Anvin" <[email protected]> writes:

> Eric W. Biederman wrote:
>>
>> My hypothesis. No one cares now.
>>
>> My observation. The way we have been maintaining the binary sysctl
>> side of things using it is asking for your application to be broken in
>> subtle and nasty ways.
>>
>
> I suspect the right thing to do is simply to make a list of the supported binary
> sysctls, and automatically verify those numbers. Doing that would alleviate
> these concerns, wouldn't break anything, and isn't really that hard to do.

Well the list is currently 1200 lines long, with wild cards in it.
See sysctl_check.c in the -mm tree. I think I have finally found
all of the binary sysctl numbers that are currently in use but I may
have missed something. Although that can probably be trimmed a bit
now that a number of those sysctls have been identified as impossibly
and always broken

The real problem is that sysctl uses different functions for the
binary path and the proc path. Those functions return the same
data in two different forms. When those functions diverge we
have problems. As I recently found with about 42 of the netfilter
sysctls.

The concern is that no one uses these things so no one tests these
things, and no one complains about these things so the code bit rots.
When the code bit rots we can return the wrong value or set the
wrong value in the kernel or skip locking or skip permission checks
or various other nasty things.

Hmm. Thinking about it I guess so far I have found about 10% of
the binary sysctls to have actual implementation problems. Not my
idea of well maintained code.

Eric

2007-08-29 22:51:57

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] sysctl: Deprecate sys_sysctl in a user space visible fashion.

On Wed, 29 Aug 2007 11:46:04 +0100 Alan Cox <[email protected]> wrote:

> > >> adequate job of warning our users. A printk when we run a program
> > >> that uses the binary interface and an long enough interval the warning
> > >> makes it to the Enterprise kernels before we remove the interface
> > >> should be sufficient.
>
> The enterprise products will probably just remove the printk. Even if
> they didn't you are looking at ten years before things finish changing
> based on current experiences, probably longer as things stabilize.

If that happens then the enterprise vendors will tell us (won't they). We
can then discuss it and we may well elect to do something different.

But Eric is predicting that this probably _won't_ happen. There's only one
way to find out.

> The whole "whine a bit" process simply doesn't work when you are trying
> to persuade people to move in a non-hobbyist context.

Eric thinks there will be little if any moving to be done.

> They don't want to
> move, the message is simply an annoyance, their upstream huge package
> vendor won't change just to deal with it and they'll class it as a
> regression from previous releases, an incompatibility and file bugs until
> it goes away.
>
> Its user ABI and as Linus said - we don't break it. Trimming down all the
> crap that never worked via sysctl is one thing, not putting sysctl in new
> platforms likewise. Trying to undo it isn't going to work
>

We don't know that yet. You may be right, or maybe Eric is right. Let's
find out.

2007-08-30 12:13:39

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH] sysctl: Deprecate sys_sysctl in a user space visible fashion.

On Wed, Aug 29, 2007 at 01:00:07PM -0600, Eric W. Biederman wrote:
> "H. Peter Anvin" <[email protected]> writes:
>
> > Eric W. Biederman wrote:
> >>
> >> My hypothesis. No one cares now.
> >>
> >> My observation. The way we have been maintaining the binary sysctl
> >> side of things using it is asking for your application to be broken in
> >> subtle and nasty ways.
> >>
> >
> > I suspect the right thing to do is simply to make a list of the supported binary
> > sysctls, and automatically verify those numbers. Doing that would alleviate
> > these concerns, wouldn't break anything, and isn't really that hard to do.
>
> Well the list is currently 1200 lines long, with wild cards in it.
> See sysctl_check.c in the -mm tree. I think I have finally found
> all of the binary sysctl numbers that are currently in use but I may
> have missed something. Although that can probably be trimmed a bit
> now that a number of those sysctls have been identified as impossibly
> and always broken

It's not hard to do read-side, right? Take the list of sysctl's, and
create a program which reads it via the binary interface and the /proc
interface, and verify they are the same.

Testing write-side, where we have to worry about permission tests,
making sure the correctr value is set, locking issues, etc., is
admittedly more difficult. My guess though many programs/libraries
are reading from the sysctl interface than writing to it.

- Ted

2007-08-30 13:20:26

by David Newall

[permalink] [raw]
Subject: Re: [PATCH] sysctl: Deprecate sys_sysctl in a user space visible fashion.

Eric W. Biederman wrote:
> This isn't "Oh some apps are using it let's get them to stop, because
> it is inconvenient". This is "No apps seems to be using this, we
> keep goofing up the maintenance and no one notices, and so it is
> likely a source of security problems and other nasties"
>

This first claim is not consistent with the next claim:

> I have evidence that there are 1 or 2 open source applications using
> this (not counting the glibc weirdness).

If may be unmaintainable but, apparently, it's not really unused.

2007-08-30 17:41:18

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH] sysctl: Deprecate sys_sysctl in a user space visible fashion.

David Newall <[email protected]> writes:

> Eric W. Biederman wrote:
>> This isn't "Oh some apps are using it let's get them to stop, because
>> it is inconvenient". This is "No apps seems to be using this, we
>> keep goofing up the maintenance and no one notices, and so it is
>> likely a source of security problems and other nasties"
>>
>
> This first claim is not consistent with the next claim:
>
>> I have evidence that there are 1 or 2 open source applications using
>> this (not counting the glibc weirdness).
>
> If may be unmaintainable but, apparently, it's not really unused.

Only in a strict mathematical sense. I seem to recall patches to
those programs or seeing that it doesn't matter. Things are sufficient
that I expect that in 2 years time when we revisit this. We won't
be able to find anything that cares.

Eric

2007-08-30 18:32:22

by Rob Landley

[permalink] [raw]
Subject: Re: [PATCH] sysctl: Deprecate sys_sysctl in a user space visible fashion.

On Tuesday 28 August 2007 8:31:45 pm H. Peter Anvin wrote:
> Eric W. Biederman wrote:
> > Christoph Hellwig <[email protected]> writes:
> >> Umm, no way we're ever going to remove a syscall like this.
> >
> > If someone besides me cares about more then rhetoric I will be happy
> > to reconsider and several years is plenty of time to find that out.
> >
> > I aborted the removal last time precisely because we had not done an
> > adequate job of warning our users. A printk when we run a program
> > that uses the binary interface and an long enough interval the warning
> > makes it to the Enterprise kernels before we remove the interface
> > should be sufficient.
>
> glibc uses it, and it uses it in contexts where access to the filesystem
> isn't functional (e.g. in chroot.)

A lot of embedded people like to configure /proc out of the kernel for space
reasons. This would make that noticeably more painful.

(If sysctlfs wasn't part of proc, that would be less of an issue, but we need
union mounts for that...)

Rob
--
"One of my most productive days was throwing away 1000 lines of code."
- Ken Thompson.

2007-08-30 18:34:24

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH] sysctl: Deprecate sys_sysctl in a user space visible fashion.

On Thu, Aug 30, 2007 at 02:32:11PM -0500, Rob Landley wrote:
> (If sysctlfs wasn't part of proc, that would be less of an issue, but we need
> union mounts for that...)

Not at all. all sysctls are under /proc/sys/

2007-08-30 18:56:40

by Jan Engelhardt

[permalink] [raw]
Subject: Re: [PATCH] sysctl: Deprecate sys_sysctl in a user space visible fashion.


On Aug 28 2007 21:49, Andrew Morton wrote:
>>
>> Umm, no way we're ever going to remove a syscall like this. Please
>> stop this deprecration crap. Just make sure no ones adds more binary
>> sysctls.
>
>I think it's worth a try. It might take two, three or five years, who
>knows? If it turns out to be impractical then we we can just change our
>minds later, no big loss. It's just too early to say right now.

Great, why do not we remove all the things marked obsolete and
start Linux 2.*newversionbeep*. Just rm, no add, and then tag.

If we never start moving to get rid of the old stuff, we will be
ending up where Microsoft is; though we won't get there as fast since
Linux was designed carefully in the first place and did/does not
collect as much compat dust.


Jan
--

2007-08-30 18:58:22

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH] sysctl: Deprecate sys_sysctl in a user space visible fashion.

Christoph Hellwig <[email protected]> writes:

> On Thu, Aug 30, 2007 at 02:32:11PM -0500, Rob Landley wrote:
>> (If sysctlfs wasn't part of proc, that would be less of an issue, but we need
>> union mounts for that...)
>
> Not at all. all sysctls are under /proc/sys/

Yes. So all we really need to do is split them apart and do the nfs multiple
mount thing to maintain backwards compatibility when mounting proc.

The code is close enough to split apart now that it probably would not be
to difficult to finish the job.

Eric

2007-08-30 22:23:24

by Rob Landley

[permalink] [raw]
Subject: Re: [PATCH] sysctl: Deprecate sys_sysctl in a user space visible fashion.

On Thursday 30 August 2007 1:34:02 pm Christoph Hellwig wrote:
> On Thu, Aug 30, 2007 at 02:32:11PM -0500, Rob Landley wrote:
> > (If sysctlfs wasn't part of proc, that would be less of an issue, but we
> > need union mounts for that...)
>
> Not at all. all sysctls are under /proc/sys/

Ah, right. Good point. (I was thinking of the problem of
splitting /proc/$PID directories from the rest of the stuff in /proc.)

Rob
--
"One of my most productive days was throwing away 1000 lines of code."
- Ken Thompson.

2007-09-01 22:16:18

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH] sysctl: Deprecate sys_sysctl in a user space visible fashion.

Rob Landley <[email protected]> writes:
>
> A lot of embedded people like to configure /proc out of the kernel for space
> reasons. This would make that noticeably more painful.

I had a patch for a sysctl_name(2) for this a long time ago.
If it was a serious issue that could be reintroduced.

BTW sysctl(2) only needs to be quiet for a single sysctl used
by glibc.

-Andi

2007-09-02 08:45:07

by Rob Landley

[permalink] [raw]
Subject: Re: [PATCH] sysctl: Deprecate sys_sysctl in a user space visible fashion.

On Saturday 01 September 2007 5:16:03 pm Andi Kleen wrote:
> Rob Landley <[email protected]> writes:
> > A lot of embedded people like to configure /proc out of the kernel for
> > space reasons. This would make that noticeably more painful.
>
> I had a patch for a sysctl_name(2) for this a long time ago.
> If it was a serious issue that could be reintroduced.
>
> BTW sysctl(2) only needs to be quiet for a single sysctl used
> by glibc.
>
> -Andi

Yeah, I found it:
http://lkml.org/lkml/2003/7/10/345

I think that if /proc/sys could be broken out as a separate filesystem, and it
was small and simple, the embedded people would probably be happy. Is your
patch significantly smaller than such a filesystem would be? (Keeping in
mind that the smallest thing you can do is run from initramfs, and I think
that's pulling in libfs already...)

Rob
--
"One of my most productive days was throwing away 1000 lines of code."
- Ken Thompson.

2007-09-02 08:59:43

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH] sysctl: Deprecate sys_sysctl in a user space visible fashion.

Rob Landley wrote:
> On Saturday 01 September 2007 5:16:03 pm Andi Kleen wrote:
>> Rob Landley <[email protected]> writes:
>>> A lot of embedded people like to configure /proc out of the kernel for
>>> space reasons. This would make that noticeably more painful.
>> I had a patch for a sysctl_name(2) for this a long time ago.
>> If it was a serious issue that could be reintroduced.
>>
>> BTW sysctl(2) only needs to be quiet for a single sysctl used
>> by glibc.
>>
>> -Andi
>
> Yeah, I found it:
> http://lkml.org/lkml/2003/7/10/345
>
> I think that if /proc/sys could be broken out as a separate filesystem, and it
> was small and simple, the embedded people would probably be happy. Is your
> patch significantly smaller than such a filesystem would be? (Keeping in
> mind that the smallest thing you can do is run from initramfs, and I think
> that's pulling in libfs already...)
>

IMO, the big problem with /proc/sys (and, for that matter, /sys) is
mainly that they have to live in the process namespace, which is highly
awkward when one uses chroot().

One way to solve *that* might be a system call to get a file descriptor
to the root of sysfs or procsysfs which can be used with openat(). That
has its own perils, of course...

-hpa

2007-09-02 11:05:21

by Rob Landley

[permalink] [raw]
Subject: Re: [PATCH] sysctl: Deprecate sys_sysctl in a user space visible fashion.

On Sunday 02 September 2007 3:54:36 am H. Peter Anvin wrote:
> Rob Landley wrote:
> > On Saturday 01 September 2007 5:16:03 pm Andi Kleen wrote:
> >> Rob Landley <[email protected]> writes:
> >>> A lot of embedded people like to configure /proc out of the kernel for
> >>> space reasons. This would make that noticeably more painful.
> >>
> >> I had a patch for a sysctl_name(2) for this a long time ago.
> >> If it was a serious issue that could be reintroduced.
> >>
> >> BTW sysctl(2) only needs to be quiet for a single sysctl used
> >> by glibc.
> >>
> >> -Andi
> >
> > Yeah, I found it:
> > http://lkml.org/lkml/2003/7/10/345
> >
> > I think that if /proc/sys could be broken out as a separate filesystem,
> > and it was small and simple, the embedded people would probably be happy.
> > Is your patch significantly smaller than such a filesystem would be?
> > (Keeping in mind that the smallest thing you can do is run from
> > initramfs, and I think that's pulling in libfs already...)
>
> IMO, the big problem with /proc/sys (and, for that matter, /sys) is
> mainly that they have to live in the process namespace, which is highly
> awkward when one uses chroot().
>
> One way to solve *that* might be a system call to get a file descriptor
> to the root of sysfs or procsysfs which can be used with openat(). That
> has its own perils, of course...

If you're going to add a new api, you might as well go with the sysctl-by-name
patch above, which looks reasonably small and simple to me from a very quick
glance at a 2.6.0-era patch.

The advantage of breaking /proc/sys into a separate filesystem doesn't
introduce a new API (although possibly a new line in the init scripts), so
existing software doesn't have to change to use it, which is good. It
increases orthogonality and granularity, which embedded guys like me are
generally in favor of. :)

On the other hand, if you're adding a system call to get a file descriptor to
an arbitrary superblock you can then openat... How do you refer to said
superblock? (Perhaps invent a "volume" syntax for all the superblocks, ala
the amiga? Do the /proc and /sys superblocks exist if nobody's mounted them
yet? Yes the open could instantiate them, but I'm wondering about the "list
available filesystems that aren't in your namespace" and the security fun
from that. Presumably this is doable as non-root, because if you're root you
can just mount /proc and /sys and go from there...)

You could also special case "mount" so that if you try to mount sysfs on /sys
or proc on /proc (and they're not already mount points) you don't need to be
root. Seems a bit evil, though...

> -hpa

Rob
--
"One of my most productive days was throwing away 1000 lines of code."
- Ken Thompson.

2007-09-02 19:57:30

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH] sysctl: Deprecate sys_sysctl in a user space visible fashion.

Rob Landley <[email protected]> writes:
>
> If you're going to add a new api, you might as well go with the sysctl-by-name
> patch above, which looks reasonably small and simple to me from a very quick
> glance at a 2.6.0-era patch.
>
> The advantage of breaking /proc/sys into a separate filesystem doesn't
> introduce a new API (although possibly a new line in the init scripts), so
> existing software doesn't have to change to use it, which is good. It
> increases orthogonality and granularity, which embedded guys like me are
> generally in favor of. :)

- I think sysctlfs makes sense.
- I think all that is left is superblock handling and some backward
compatibility magic. (Using the follow_link trick to automatically
mount /proc/sys)

All of the rest of the code pretty much lives in fs/proc/proc_sysctl.c
already.

I have some other priorities to deal with first but if no one does the
work before I get there I will probably implement that eventually.

Eric

2007-09-02 20:00:45

by Al Viro

[permalink] [raw]
Subject: Re: [PATCH] sysctl: Deprecate sys_sysctl in a user space visible fashion.

On Sun, Sep 02, 2007 at 01:56:33PM -0600, Eric W. Biederman wrote:
> - I think all that is left is superblock handling and some backward
> compatibility magic. (Using the follow_link trick to automatically
> mount /proc/sys)

NAK. Let's explicitly mount this stuff in init scripts; it won't break
on older kernels and there's no excuse for that kind of kludges in the
kernel.

2007-09-02 21:52:36

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH] sysctl: Deprecate sys_sysctl in a user space visible fashion.

Al Viro <[email protected]> writes:

> On Sun, Sep 02, 2007 at 01:56:33PM -0600, Eric W. Biederman wrote:
>> - I think all that is left is superblock handling and some backward
>> compatibility magic. (Using the follow_link trick to automatically
>> mount /proc/sys)
>
> NAK. Let's explicitly mount this stuff in init scripts; it won't break
> on older kernels and there's no excuse for that kind of kludges in the
> kernel.

I don't much care. But we do have the infrastructure for it in the
kernel and NFS uses it. And it seems like a nice way to preserve user
space backwards compatibility, without making the code to nasty.

If we don't mind that bit of change that would make it harder to
upgrade a kernel I don't mind not doing it. It just looks like a
elegant way to handle that implementation change.

Eric

2007-09-03 08:37:47

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH] sysctl: Deprecate sys_sysctl in a user space visible fashion.

On Sun, Sep 02, 2007 at 09:00:10PM +0100, Al Viro wrote:
> On Sun, Sep 02, 2007 at 01:56:33PM -0600, Eric W. Biederman wrote:
> > - I think all that is left is superblock handling and some backward
> > compatibility magic. (Using the follow_link trick to automatically
> > mount /proc/sys)
>
> NAK. Let's explicitly mount this stuff in init scripts; it won't break
> on older kernels and there's no excuse for that kind of kludges in the
> kernel.

That would probably break near all init scripts out there.

Can't the file system not just be mounted with /proc together?

-Andi

2007-09-03 09:17:39

by Al Viro

[permalink] [raw]
Subject: Re: [PATCH] sysctl: Deprecate sys_sysctl in a user space visible fashion.

On Mon, Sep 03, 2007 at 10:37:33AM +0200, Andi Kleen wrote:

> That would probably break near all init scripts out there.
>
> Can't the file system not just be mounted with /proc together?

Won't be fun to implement. Really. BTW, I really wonder what will
happen if two processes step on a magic symlink at the same time -
ought to check if NFS use is broken.