2006-10-20 12:43:46

by Michael Tokarev

[permalink] [raw]
Subject: bug reading /proc/sys/kernel/*: only first byte read.

I were debugging a weird problem with busybox, and come across
this chunk of strace output:

open("/proc/sys/kernel/osrelease", O_RDONLY) = 3
read(3, "2", 1) = 1
read(3, "", 1) = 0
close(3) = 0

As you can see, after reading one byte from /proc/sys/kernel/osrelease,
next read() returns 0, which is treated as end-of-file by an application.

Why busybox does this single-byte reads is another question (many
shells does that, in order to be able to stop reading at newline).

But this is definitely a bug in kernel, and should be fixed....

It exists in 2.6.17 and 2.6.18

Thanks.

/mjt


2007-01-30 10:25:23

by Andrew Morton

[permalink] [raw]
Subject: Re: bug reading /proc/sys/kernel/*: only first byte read.

On Fri, 20 Oct 2006 16:43:39 +0400
Michael Tokarev <[email protected]> wrote:

> I were debugging a weird problem with busybox, and come across
> this chunk of strace output:
>
> open("/proc/sys/kernel/osrelease", O_RDONLY) = 3
> read(3, "2", 1) = 1
> read(3, "", 1) = 0
> close(3) = 0
>
> As you can see, after reading one byte from /proc/sys/kernel/osrelease,
> next read() returns 0, which is treated as end-of-file by an application.
>
> Why busybox does this single-byte reads is another question (many
> shells does that, in order to be able to stop reading at newline).
>
> But this is definitely a bug in kernel, and should be fixed....
>
> It exists in 2.6.17 and 2.6.18
>

Well this nearly killed me. kernel-side proc handlers are ghastly things.

Could I have this reviewed please? It surely has a hole in it somewhere.


From: Andrew Morton <[email protected]>

If you try to read things like /proc/sys/kernel/osrelease with single-byte
reads, you get just one byte and then EOF. This is because _proc_do_string()
assumes that the caller is read()ing into a buffer which is large enough to
fit the whole string in a single hit.

Fix.

Cc: "Eric W. Biederman" <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
---

kernel/sysctl.c | 37 ++++++++++++++++++++++++++++---------
1 file changed, 28 insertions(+), 9 deletions(-)

diff -puN kernel/sysctl.c~_proc_do_string-fix-short-reads kernel/sysctl.c
--- a/kernel/sysctl.c~_proc_do_string-fix-short-reads
+++ a/kernel/sysctl.c
@@ -1682,8 +1682,7 @@ static int _proc_do_string(void* data, i
char __user *p;
char c;

- if (!data || !maxlen || !*lenp ||
- (*ppos && !write)) {
+ if (!data || !maxlen || !*lenp) {
*lenp = 0;
return 0;
}
@@ -1705,18 +1704,38 @@ static int _proc_do_string(void* data, i
((char *) data)[len] = 0;
*ppos += *lenp;
} else {
- len = strlen(data);
+ loff_t pos = *ppos;
+ const size_t slen = strlen(data);
+
+ /*
+ * len is the amount of data to copy, and becomes the amount of
+ * data which was copied
+ */
+ len = slen;
+ if (pos > len) {
+ *lenp = 0;
+ return 0;
+ }
if (len > maxlen)
len = maxlen;
if (len > *lenp)
len = *lenp;
- if (len)
- if(copy_to_user(buffer, data, len))
- return -EFAULT;
- if (len < *lenp) {
- if(put_user('\n', ((char __user *) buffer) + len))
+ /* Don't copy past the end of the string */
+ if (len > slen - pos)
+ len = slen - pos;
+ data += pos;
+ /* Copy as much of the string as we can */
+ if (len) {
+ if (copy_to_user(buffer, data, len))
return -EFAULT;
- len++;
+ }
+ /* If we copied the whole string, now write a \n */
+ if (len + pos == slen) {
+ if (len + pos < maxlen) {
+ if (put_user('\n', (char __user *)buffer + len))
+ return -EFAULT;
+ len++;
+ }
}
*lenp = len;
*ppos += len;
_

2007-01-30 13:24:39

by Oleg Nesterov

[permalink] [raw]
Subject: Re: bug reading /proc/sys/kernel/*: only first byte read.

On 01/30, Andrew Morton wrote:
>
> @@ -1705,18 +1704,38 @@ static int _proc_do_string(void* data, i
> ((char *) data)[len] = 0;
> *ppos += *lenp;
> } else {
> - len = strlen(data);
> + loff_t pos = *ppos;
> + const size_t slen = strlen(data);
> +
> + /*
> + * len is the amount of data to copy, and becomes the amount of
> + * data which was copied
> + */
> + len = slen;
> + if (pos > len) {
> + *lenp = 0;
> + return 0;
> + }
> if (len > maxlen)
> len = maxlen;
> if (len > *lenp)
> len = *lenp;
> - if (len)
> - if(copy_to_user(buffer, data, len))
> - return -EFAULT;
> - if (len < *lenp) {
> - if(put_user('\n', ((char __user *) buffer) + len))
> + /* Don't copy past the end of the string */
> + if (len > slen - pos)
> + len = slen - pos;
> + data += pos;
> + /* Copy as much of the string as we can */
> + if (len) {
> + if (copy_to_user(buffer, data, len))
> return -EFAULT;
> - len++;
> + }
> + /* If we copied the whole string, now write a \n */
> + if (len + pos == slen) {
> + if (len + pos < maxlen) {
^^^^^^^^^^^^^^^^^^^^^^^
Shouldn't this be
if (len < *lenp)

?

> + if (put_user('\n', (char __user *)buffer + len))
> + return -EFAULT;
> + len++;
> + }
> }
> *lenp = len;
> *ppos += len;

2007-01-30 13:58:14

by Oleg Nesterov

[permalink] [raw]
Subject: Re: bug reading /proc/sys/kernel/*: only first byte read.

On 01/30, Oleg Nesterov wrote:
>
> On 01/30, Andrew Morton wrote:
> >
> > + if (len + pos < maxlen) {
> ^^^^^^^^^^^^^^^^^^^^^^^
> Shouldn't this be
> if (len < *lenp)
>
> ?

On the other hand. If we may assume that original code was correct,
we can make a simpler patch?

Signed-off-by: Oleg Nesterov <[email protected]>

--- 6.19/kernel/sysctl.c~ 2006-11-17 19:42:31.000000000 +0300
+++ 6.19/kernel/sysctl.c 2007-01-30 16:46:00.000000000 +0300
@@ -1683,13 +1683,12 @@ static int _proc_do_string(void* data, i
size_t len;
char __user *p;
char c;
-
- if (!data || !maxlen || !*lenp ||
- (*ppos && !write)) {
+
+ if (!data || !maxlen || !*lenp) {
*lenp = 0;
return 0;
}
-
+
if (write) {
len = 0;
p = buffer;
@@ -1710,6 +1709,15 @@ static int _proc_do_string(void* data, i
len = strlen(data);
if (len > maxlen)
len = maxlen;
+
+ if (*ppos > len) {
+ *lenp = 0;
+ return 0;
+ }
+
+ data += *ppos;
+ len -= *ppos;
+
if (len > *lenp)
len = *lenp;
if (len)

2007-01-30 15:09:07

by Eric W. Biederman

[permalink] [raw]
Subject: Re: bug reading /proc/sys/kernel/*: only first byte read.

Oleg Nesterov <[email protected]> writes:

> On 01/30, Oleg Nesterov wrote:
>>
>> On 01/30, Andrew Morton wrote:
>> >
>> > + if (len + pos < maxlen) {
>> ^^^^^^^^^^^^^^^^^^^^^^^
>> Shouldn't this be
>> if (len < *lenp)
>>
>> ?
>
> On the other hand. If we may assume that original code was correct,
> we can make a simpler patch?

I think there is an issue worth fixing her.

However as far as I can tell the code has this limitation
deliberately for simplicity.

Getting the string side of this fixed even by itself is
worthwhile, although it might be worth teach people
about sys_uname and /bin/uname. It seems is the biggest thing
people look at /proc/sys/ for...

For the non-string data the only way we can do this is to
generate it into a temporary buffer and the read out the
part of the buffer the user is requesting. Isn't there
something in seq_file that will do this?

On the other side I'm fairly certain we can't support
short writes to magic sysctl files on the write side of things.

I do have to say I like Oleg short version of this patch.

Eric

2007-01-30 15:57:36

by Michael Tokarev

[permalink] [raw]
Subject: Re: bug reading /proc/sys/kernel/*: only first byte read.

Eric W. Biederman wrote:
[]
> However as far as I can tell the code has this limitation
> deliberately for simplicity.
>
> Getting the string side of this fixed even by itself is
> worthwhile, although it might be worth teach people
> about sys_uname and /bin/uname. It seems is the biggest thing
> people look at /proc/sys/ for...

I come across this very issue when I discovered another bug,
using un-initialized pipefs structures during early hotplug
calls. I know about /bin/uname, but it requires a pipe, so
wont work. Obvious alternative is /proc/sys/version, which
is alot faster too (no fork+exec overhead).

/mjt