2006-08-20 02:08:55

by Solar Designer

[permalink] [raw]
Subject: [PATCH x7] misc fixes from 2.4.33-ow1

Willy and all,

Attached are 7 small changes from 2.4.33-ow1, as separate patches. I do
not feel that these warrant separate messages.

linux-2.4.33-ow1-BAD_ADDR.diff

This one is a one-liner, in binfmt_elf.c:

-#define BAD_ADDR(x) ((unsigned long)(x) > TASK_SIZE)
+#define BAD_ADDR(x) ((unsigned long)(x) >= TASK_SIZE)

I feel that it is more logical to have BAD_ADDR() defined in this way:
indeed, the first kernel-space address is unusable for userspace. I
don't think this change affects anything, and we had it in -ow for a
couple of years.

linux-2.4.33-ow1-create_elf_tables.diff
linux-2.4.33-ow1-elf_core_dump.diff

These two are pieces from my more elaborate version of the coredump
vulnerability fix:

http://www.isec.pl/vulnerabilities/isec-0023-coredump.txt

The first one does for env_end the same thing that the minimal fix did
for arg_end, and the second one makes things more correct for 64-bit
archs. Neither fixes any known security problem, although I would not
be surprised if there's a yet unexplored attack vector that uses env_end.
This has been in -ow for over a year.

linux-2.4.33-ow1-load_elf_library.diff

This rejects ELF libraries with elf_ex.e_phnum == 0 earlier in the code.
Without this change, such libraries would be rejected anyway - after a
kmalloc() and a kernel_read(), both of zero bytes. This has been in -ow
for almost two years.

linux-2.4.33-ow1-mremap-arch-sanity.diff

This makes 32-bit emulation mmap/mremap calls on 64-bit archs refuse to
accept/return addresses beyond the 32-bit user address space. This is
about correctness, not security. Andrea Arcangeli did not like these,
commenting on them like: "supeflous, reject, must BUG if something", to
which I replied:

I don't see how these architecture-specific checks are superfluous.
The non-arch-specific code has only checked against TASK_SIZE, not
against the lower limit present with 32-bit syscall emulation on
64-bit archs. Yes, with other checks in the arch-specific code, my
added checks are pretty much limited to just the special case of
addr == end, but such addr's are invalid under 32-bit syscall
emulation and should not be accepted/returned by syscalls.

BUG() would be wrong, it would introduce a DoS.

(That discussion occurred in December, 2003; I kept the changes in -ow
since then.)

linux-2.4.33-ow1-proc_file_read.diff

This is hardening of proc_file_read() similar to what was recently done
in 2.6 in response to the vulnerability discovery. While 2.4 kernels
were not vulnerable due to differences in the lseek implementation, I
prefer to harden the reads as well. This change is new with 2.4.33-ow1.

linux-2.4.33-ow1-proc-nosuid-noexec-nodev.diff

This attempts to make procfs MS_NOSUID | MS_NOEXEC | MS_NODEV by
default, in response to another recently discovered 2.6-specific
vulnerability (so for 2.4 this is just proactive hardening). Most of
the changes in this patch were in -ow patches for years (in order to
enable the uid= and gid= mount options to work). Only the specific
"s->s_flags |= ..." line is a recent addition, and this one has not yet
been tested to actually make a difference.

Thanks,

Alexander


Attachments:
(No filename) (3.08 kB)
linux-2.4.33-ow1-BAD_ADDR.diff (375.00 B)
linux-2.4.33-ow1-create_elf_tables.diff (509.00 B)
linux-2.4.33-ow1-elf_core_dump.diff (691.00 B)
linux-2.4.33-ow1-load_elf_library.diff (649.00 B)
linux-2.4.33-ow1-mremap-arch-sanity.diff (1.90 kB)
linux-2.4.33-ow1-proc_file_read.diff (739.00 B)
linux-2.4.33-ow1-proc-nosuid-noexec-nodev.diff (2.64 kB)
Download all attachments

2006-08-20 09:16:17

by Willy Tarreau

[permalink] [raw]
Subject: [PATCH] binfmt_elf.c : the BAD_ADDR macro again

On Sun, Aug 20, 2006 at 06:04:17AM +0400, Solar Designer wrote:
> Willy and all,
>
> Attached are 7 small changes from 2.4.33-ow1, as separate patches. I do
> not feel that these warrant separate messages.

OK, I will reply to them one at a time, though, otherwise it will take me
hours to write one single mail !

> linux-2.4.33-ow1-BAD_ADDR.diff
>
> This one is a one-liner, in binfmt_elf.c:
>
> -#define BAD_ADDR(x) ((unsigned long)(x) > TASK_SIZE)
> +#define BAD_ADDR(x) ((unsigned long)(x) >= TASK_SIZE)
>
> I feel that it is more logical to have BAD_ADDR() defined in this way:
> indeed, the first kernel-space address is unusable for userspace. I
> don't think this change affects anything, and we had it in -ow for a
> couple of years.

I remember being surprized by this macro when I discovered it about one month
ago. But I was working on something else and let it go. I should have checked
more carefully, because binfmt_aout already defines it as >=, and 2.6 has it
fixed too (recenty though). However, it's not enough to fix the macro, there
are two tests which need to be fixed too, as explained there :

http://lkml.org/lkml/2006/6/21/507

The proper fix would then be :

diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index b0ad905..249b710 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -77,7 +77,7 @@ static struct linux_binfmt elf_format =
NULL, THIS_MODULE, load_elf_binary, load_elf_library, elf_core_dump, ELF_EXEC_PAGESIZE
};

-#define BAD_ADDR(x) ((unsigned long)(x) > TASK_SIZE)
+#define BAD_ADDR(x) ((unsigned long)(x) >= TASK_SIZE)

static int set_brk(unsigned long start, unsigned long end)
{
@@ -345,7 +345,7 @@ static unsigned long load_elf_interp(str
* <= p_memsize so it is only necessary to check p_memsz.
*/
k = load_addr + eppnt->p_vaddr;
- if (k > TASK_SIZE || eppnt->p_filesz > eppnt->p_memsz ||
+ if (BAD_ADDR(k) || eppnt->p_filesz > eppnt->p_memsz ||
eppnt->p_memsz > TASK_SIZE || TASK_SIZE - eppnt->p_memsz < k) {
error = -ENOMEM;
goto out_close;
@@ -772,7 +772,7 @@ #endif
* allowed task size. Note that p_filesz must always be
* <= p_memsz so it is only necessary to check p_memsz.
*/
- if (k > TASK_SIZE || elf_ppnt->p_filesz > elf_ppnt->p_memsz ||
+ if (BAD_ADDR(k) || elf_ppnt->p_filesz > elf_ppnt->p_memsz ||
elf_ppnt->p_memsz > TASK_SIZE ||
TASK_SIZE - elf_ppnt->p_memsz < k) {
/* set_brk can never work. Avoid overflows. */


And even then, I'm not happy with this test :

TASK_SIZE - elf_ppnt->p_memsz < k

because it means that we only raise the error when

k + elf_ppnt->p_memsz > TASK_SIZE

I really think that we want to check this instead :

k + elf_ppnt->p_memsz >= TASK_SIZE

Otherwise we leave a window where this is undetected :

load_addr + eppnt->p_vaddr == TASK_SIZE - eppnt->p_memsz

This will later lead to last_bss getting readjusted to TASK_SIZE, which I
don't think is expected at all :

k = load_addr + eppnt->p_memsz + eppnt->p_vaddr;
if (k > last_bss)
last_bss = k;

Then I think we should change this at both places :

- TASK_SIZE - elf_ppnt->p_memsz < k) {
+ BAD_ADDR(k + elf_ppnt->p_memsz)) {

The same change would also be needed in 2.6 then. I can provide a patch
for both 2.4 and 2.6 if everyone agree.

I cc Chuck Ebbert, Ernie Petrides and Andrew who discussed the subject
in June.

Regards,
Willy

2006-08-20 15:55:26

by Solar Designer

[permalink] [raw]
Subject: Re: [PATCH] binfmt_elf.c : the BAD_ADDR macro again

On Sun, Aug 20, 2006 at 11:15:15AM +0200, Willy Tarreau wrote:
> The proper fix would then be :
[...]
> -#define BAD_ADDR(x) ((unsigned long)(x) > TASK_SIZE)
> +#define BAD_ADDR(x) ((unsigned long)(x) >= TASK_SIZE)
[...]
> - if (k > TASK_SIZE || eppnt->p_filesz > eppnt->p_memsz ||
> + if (BAD_ADDR(k) || eppnt->p_filesz > eppnt->p_memsz ||
[...]
> - if (k > TASK_SIZE || elf_ppnt->p_filesz > elf_ppnt->p_memsz ||
> + if (BAD_ADDR(k) || elf_ppnt->p_filesz > elf_ppnt->p_memsz ||

Looks OK to me.

> And even then, I'm not happy with this test :
>
> TASK_SIZE - elf_ppnt->p_memsz < k
>
> because it means that we only raise the error when
>
> k + elf_ppnt->p_memsz > TASK_SIZE
>
> I really think that we want to check this instead :
>
> k + elf_ppnt->p_memsz >= TASK_SIZE
>
> Otherwise we leave a window where this is undetected :
>
> load_addr + eppnt->p_vaddr == TASK_SIZE - eppnt->p_memsz
>
> This will later lead to last_bss getting readjusted to TASK_SIZE, which I
> don't think is expected at all :
>
> k = load_addr + eppnt->p_memsz + eppnt->p_vaddr;
> if (k > last_bss)
> last_bss = k;
>
> Then I think we should change this at both places :
>
> - TASK_SIZE - elf_ppnt->p_memsz < k) {
> + BAD_ADDR(k + elf_ppnt->p_memsz)) {

I am not sure about these re-arrangements - I'd need to review them in
full context to make sure that we don't inadvertently change things as
it relates to behavior on integer overflows, which I presently do not
have the time for. I'll trust that you do such a review with integer
overflows and variable type differences (size, signedness) in mind now
that I've mentioned this potential danger. ;-) Alternatively, you can
simply change the comparisons from < to <= and from > to >= rather than
use the BAD_ADDR() macro.

Thanks,

Alexander

2006-08-20 16:24:46

by Willy Tarreau

[permalink] [raw]
Subject: Re: [PATCH] binfmt_elf.c : the BAD_ADDR macro again

On Sun, Aug 20, 2006 at 07:51:22PM +0400, Solar Designer wrote:
> On Sun, Aug 20, 2006 at 11:15:15AM +0200, Willy Tarreau wrote:
> > The proper fix would then be :
> [...]
> > -#define BAD_ADDR(x) ((unsigned long)(x) > TASK_SIZE)
> > +#define BAD_ADDR(x) ((unsigned long)(x) >= TASK_SIZE)
> [...]
> > - if (k > TASK_SIZE || eppnt->p_filesz > eppnt->p_memsz ||
> > + if (BAD_ADDR(k) || eppnt->p_filesz > eppnt->p_memsz ||
> [...]
> > - if (k > TASK_SIZE || elf_ppnt->p_filesz > elf_ppnt->p_memsz ||
> > + if (BAD_ADDR(k) || elf_ppnt->p_filesz > elf_ppnt->p_memsz ||
>
> Looks OK to me.
>
> > And even then, I'm not happy with this test :
> >
> > TASK_SIZE - elf_ppnt->p_memsz < k
> >
> > because it means that we only raise the error when
> >
> > k + elf_ppnt->p_memsz > TASK_SIZE
> >
> > I really think that we want to check this instead :
> >
> > k + elf_ppnt->p_memsz >= TASK_SIZE
> >
> > Otherwise we leave a window where this is undetected :
> >
> > load_addr + eppnt->p_vaddr == TASK_SIZE - eppnt->p_memsz
> >
> > This will later lead to last_bss getting readjusted to TASK_SIZE, which I
> > don't think is expected at all :
> >
> > k = load_addr + eppnt->p_memsz + eppnt->p_vaddr;
> > if (k > last_bss)
> > last_bss = k;
> >
> > Then I think we should change this at both places :
> >
> > - TASK_SIZE - elf_ppnt->p_memsz < k) {
> > + BAD_ADDR(k + elf_ppnt->p_memsz)) {
>
> I am not sure about these re-arrangements - I'd need to review them in
> full context to make sure that we don't inadvertently change things as
> it relates to behavior on integer overflows, which I presently do not
> have the time for. I'll trust that you do such a review with integer
> overflows and variable type differences (size, signedness) in mind now
> that I've mentioned this potential danger. ;-) Alternatively, you can
> simply change the comparisons from < to <= and from > to >= rather than
> use the BAD_ADDR() macro.

Well, I have for a principle that if a change requires too many brain usage
to check for validity when we can avoid it, let's avoid it. I'm fine with
just changing the operator. But before this, I'd like to get comments from
the people who discussed the subject recently.

> Thanks,
>
> Alexander

Regards,
Willy

2006-08-21 20:36:30

by Ernie Petrides

[permalink] [raw]
Subject: Re: [PATCH] binfmt_elf.c : the BAD_ADDR macro again

On Sunday, 20-Aug-2006 at 18:23 +0200, Willy Tarreau wrote:

> On Sun, Aug 20, 2006 at 07:51:22PM +0400, Solar Designer wrote:
> > On Sun, Aug 20, 2006 at 11:15:15AM +0200, Willy Tarreau wrote:
> > > The proper fix would then be :
> > [...]
> > > -#define BAD_ADDR(x) ((unsigned long)(x) > TASK_SIZE)
> > > +#define BAD_ADDR(x) ((unsigned long)(x) >= TASK_SIZE)
> > [...]
> > > - if (k > TASK_SIZE || eppnt->p_filesz > eppnt->p_memsz ||
> > > + if (BAD_ADDR(k) || eppnt->p_filesz > eppnt->p_memsz ||
> > [...]
> > > - if (k > TASK_SIZE || elf_ppnt->p_filesz > elf_ppnt->p_memsz ||
> > > + if (BAD_ADDR(k) || elf_ppnt->p_filesz > elf_ppnt->p_memsz ||
> >
> > Looks OK to me.

These are all correct.



> > > And even then, I'm not happy with this test :
> > >
> > > TASK_SIZE - elf_ppnt->p_memsz < k
> > >
> > > because it means that we only raise the error when
> > >
> > > k + elf_ppnt->p_memsz > TASK_SIZE
> > >
> > > I really think that we want to check this instead :
> > >
> > > k + elf_ppnt->p_memsz >= TASK_SIZE
> > >
> > > Otherwise we leave a window where this is undetected :
> > >
> > > load_addr + eppnt->p_vaddr == TASK_SIZE - eppnt->p_memsz

The reason I did not propose changing these is because these are
end-point checks (as opposed to starting address checks). I think
that the following "equals" condition is conceptually valid:

(starting-address + region-size == TASK_SIZE)



> > > This will later lead to last_bss getting readjusted to TASK_SIZE, which I
> > > don't think is expected at all :
> > >
> > > k = load_addr + eppnt->p_memsz + eppnt->p_vaddr;
> > > if (k > last_bss)
> > > last_bss = k;

This is an interesting case, but I think the error checking works okay.

After the ELF phdr loop, the resulting "last_bss" is used as follows:

/* Map the last of the bss segment */
if (last_bss > elf_bss) {
down_write(&current->mm->mmap_sem);
error = do_brk(elf_bss, last_bss - elf_bss);
up_write(&current->mm->mmap_sem);
if (BAD_ADDR(error))
goto out_close;
}

The variable "last_bss" is used to compute the size argument in the
call to do_brk(). If the section extends beyond TASK_SIZE, then do_brk()
will return -EINVAL. If the do_brk() call succeeds but "elf_bss" is itself
exactly at TASK_SIZE, then the BAD_ADDR() call above will catch it.



> [...] But before this, I'd like to get comments from
> the people who discussed the subject recently.

Thus, I think that both 2.4.33 and 2.6.<latest> are okay without any
further changes.



Cheers. -ernie

2006-08-21 21:23:55

by Willy Tarreau

[permalink] [raw]
Subject: Re: [PATCH] binfmt_elf.c : the BAD_ADDR macro again

On Mon, Aug 21, 2006 at 04:35:47PM -0400, Ernie Petrides wrote:
> On Sunday, 20-Aug-2006 at 18:23 +0200, Willy Tarreau wrote:
>
> > On Sun, Aug 20, 2006 at 07:51:22PM +0400, Solar Designer wrote:
> > > On Sun, Aug 20, 2006 at 11:15:15AM +0200, Willy Tarreau wrote:
> > > > The proper fix would then be :
> > > [...]
> > > > -#define BAD_ADDR(x) ((unsigned long)(x) > TASK_SIZE)
> > > > +#define BAD_ADDR(x) ((unsigned long)(x) >= TASK_SIZE)
> > > [...]
> > > > - if (k > TASK_SIZE || eppnt->p_filesz > eppnt->p_memsz ||
> > > > + if (BAD_ADDR(k) || eppnt->p_filesz > eppnt->p_memsz ||
> > > [...]
> > > > - if (k > TASK_SIZE || elf_ppnt->p_filesz > elf_ppnt->p_memsz ||
> > > > + if (BAD_ADDR(k) || elf_ppnt->p_filesz > elf_ppnt->p_memsz ||
> > >
> > > Looks OK to me.
>
> These are all correct.

OK.

> > > > And even then, I'm not happy with this test :
> > > >
> > > > TASK_SIZE - elf_ppnt->p_memsz < k
> > > >
> > > > because it means that we only raise the error when
> > > >
> > > > k + elf_ppnt->p_memsz > TASK_SIZE
> > > >
> > > > I really think that we want to check this instead :
> > > >
> > > > k + elf_ppnt->p_memsz >= TASK_SIZE
> > > >
> > > > Otherwise we leave a window where this is undetected :
> > > >
> > > > load_addr + eppnt->p_vaddr == TASK_SIZE - eppnt->p_memsz
>
> The reason I did not propose changing these is because these are
> end-point checks (as opposed to starting address checks). I think
> that the following "equals" condition is conceptually valid:
>
> (starting-address + region-size == TASK_SIZE)

Agreed.

> > > > This will later lead to last_bss getting readjusted to TASK_SIZE, which I
> > > > don't think is expected at all :
> > > >
> > > > k = load_addr + eppnt->p_memsz + eppnt->p_vaddr;
> > > > if (k > last_bss)
> > > > last_bss = k;
>
> This is an interesting case, but I think the error checking works okay.
>
> After the ELF phdr loop, the resulting "last_bss" is used as follows:
>
> /* Map the last of the bss segment */
> if (last_bss > elf_bss) {
> down_write(&current->mm->mmap_sem);
> error = do_brk(elf_bss, last_bss - elf_bss);
> up_write(&current->mm->mmap_sem);
> if (BAD_ADDR(error))
> goto out_close;
> }
>
> The variable "last_bss" is used to compute the size argument in the
> call to do_brk(). If the section extends beyond TASK_SIZE, then do_brk()
> will return -EINVAL. If the do_brk() call succeeds but "elf_bss" is itself
> exactly at TASK_SIZE, then the BAD_ADDR() call above will catch it.

OK. Thanks for the explanation.

> > [...] But before this, I'd like to get comments from
> > the people who discussed the subject recently.
>
> Thus, I think that both 2.4.33 and 2.6.<latest> are okay without any
> further changes.

At least 2.4 needs the fix to use the correct BAD_ADDR (which is not
OK in 2.4.33 yet).

> Cheers. -ernie

Thanks Ernie,
Willy

2006-08-21 23:36:53

by Ernie Petrides

[permalink] [raw]
Subject: Re: [PATCH] binfmt_elf.c : the BAD_ADDR macro again

On Monday, 21-Aug-2006 at 23:11 +0200, Willy Tarreau wrote:

> > > [...] But before this, I'd like to get comments from
> > > the people who discussed the subject recently.
> >
> > Thus, I think that both 2.4.33 and 2.6.<latest> are okay without any
> > further changes.
>
> At least 2.4 needs the fix to use the correct BAD_ADDR (which is not
> OK in 2.4.33 yet).

Ah, right. (Sorry, I was verifying the change in a RHEL3 tree.)

In that case, I support your patch as posted. But the whole point of
that investigation was to fix an exec() vulnerability with a bad ELF
entry address. This is addressed in the final hunk of the 2.4.33-based
patch below, which includes the changes that you previously posted.

Cheers. -ernie



Signed-off-by: Ernie Petrides <[email protected]>

--- linux-2.4.33/fs/binfmt_elf.c.orig
+++ linux-2.4.33/fs/binfmt_elf.c
@@ -77,7 +77,7 @@ static struct linux_binfmt elf_format =
NULL, THIS_MODULE, load_elf_binary, load_elf_library, elf_core_dump, ELF_EXEC_PAGESIZE
};

-#define BAD_ADDR(x) ((unsigned long)(x) > TASK_SIZE)
+#define BAD_ADDR(x) ((unsigned long)(x) >= TASK_SIZE)

static int set_brk(unsigned long start, unsigned long end)
{
@@ -345,7 +345,7 @@ elf_type, total_size);
* <= p_memsize so it is only necessary to check p_memsz.
*/
k = load_addr + eppnt->p_vaddr;
- if (k > TASK_SIZE || eppnt->p_filesz > eppnt->p_memsz ||
+ if (BAD_ADDR(k) || eppnt->p_filesz > eppnt->p_memsz ||
eppnt->p_memsz > TASK_SIZE || TASK_SIZE - eppnt->p_memsz < k) {
error = -ENOMEM;
goto out_close;
@@ -772,7 +772,7 @@ static int load_elf_binary(struct linux_
* allowed task size. Note that p_filesz must always be
* <= p_memsz so it is only necessary to check p_memsz.
*/
- if (k > TASK_SIZE || elf_ppnt->p_filesz > elf_ppnt->p_memsz ||
+ if (BAD_ADDR(k) || elf_ppnt->p_filesz > elf_ppnt->p_memsz ||
elf_ppnt->p_memsz > TASK_SIZE ||
TASK_SIZE - elf_ppnt->p_memsz < k) {
/* set_brk can never work. Avoid overflows. */
@@ -822,10 +822,9 @@ static int load_elf_binary(struct linux_
interpreter,
&interp_load_addr);
if (BAD_ADDR(elf_entry)) {
- printk(KERN_ERR "Unable to load interpreter %.128s\n",
- elf_interpreter);
force_sig(SIGSEGV, current);
- retval = IS_ERR((void *)elf_entry) ? PTR_ERR((void *)elf_entry) : -ENOEXEC;
+ retval = IS_ERR((void *)elf_entry) ?
+ (int)elf_entry : -EINVAL;
goto out_free_dentry;
}
reloc_func_desc = interp_load_addr;
@@ -833,6 +832,12 @@ static int load_elf_binary(struct linux_
allow_write_access(interpreter);
fput(interpreter);
kfree(elf_interpreter);
+ } else {
+ if (BAD_ADDR(elf_entry)) {
+ force_sig(SIGSEGV, current);
+ retval = -EINVAL;
+ goto out_free_dentry;
+ }
}

kfree(elf_phdata);

2006-08-22 03:11:58

by Solar Designer

[permalink] [raw]
Subject: printk()s of user-supplied strings (Re: [PATCH] binfmt_elf.c : the BAD_ADDR macro again)

On Mon, Aug 21, 2006 at 07:36:01PM -0400, Ernie Petrides wrote:
> - printk(KERN_ERR "Unable to load interpreter %.128s\n",
> - elf_interpreter);

I'd rather have this message rate-limited, not dropped completely.

Another long-time concern that I had is that we've got some printk()s
of user-supplied string data. What about embedded linefeeds - can this
be used to produce fake kernel messages with arbitrary log level (syslog
priority)? It certainly seems so.

Also, there are terminal controls...

Alexander

2006-08-22 04:49:44

by Willy Tarreau

[permalink] [raw]
Subject: Re: [PATCH] binfmt_elf.c : the BAD_ADDR macro again

On Mon, Aug 21, 2006 at 07:36:01PM -0400, Ernie Petrides wrote:
> On Monday, 21-Aug-2006 at 23:11 +0200, Willy Tarreau wrote:
>
> > > > [...] But before this, I'd like to get comments from
> > > > the people who discussed the subject recently.
> > >
> > > Thus, I think that both 2.4.33 and 2.6.<latest> are okay without any
> > > further changes.
> >
> > At least 2.4 needs the fix to use the correct BAD_ADDR (which is not
> > OK in 2.4.33 yet).
>
> Ah, right. (Sorry, I was verifying the change in a RHEL3 tree.)
>
> In that case, I support your patch as posted. But the whole point of
> that investigation was to fix an exec() vulnerability with a bad ELF
> entry address. This is addressed in the final hunk of the 2.4.33-based
> patch below, which includes the changes that you previously posted.
>
> Cheers. -ernie

Thanks Ernie.
However, I will just comment the printk out instead of removing it
for now. I've backported the prink_ratelimit() function from 2.6,
but I've not tested it yet. My goal is to use it there (and at other
places where messages can be triggered at will by a user).

About Solar Designer's concern about the string passed to printk, I'm
wondering if it would not be printk() itself which should strip
non printable chars. It would seem very strange to me that any part
of the kernel needs to send \n or \e via strings sent to printk. Well,
after having seen ugly code in some drivers, anything can be expected,
but we could try anyway.

It would be another patch anyway

Cheers,
Willy

> Signed-off-by: Ernie Petrides <[email protected]>
>
> --- linux-2.4.33/fs/binfmt_elf.c.orig
> +++ linux-2.4.33/fs/binfmt_elf.c
> @@ -77,7 +77,7 @@ static struct linux_binfmt elf_format =
> NULL, THIS_MODULE, load_elf_binary, load_elf_library, elf_core_dump, ELF_EXEC_PAGESIZE
> };
>
> -#define BAD_ADDR(x) ((unsigned long)(x) > TASK_SIZE)
> +#define BAD_ADDR(x) ((unsigned long)(x) >= TASK_SIZE)
>
> static int set_brk(unsigned long start, unsigned long end)
> {
> @@ -345,7 +345,7 @@ elf_type, total_size);
> * <= p_memsize so it is only necessary to check p_memsz.
> */
> k = load_addr + eppnt->p_vaddr;
> - if (k > TASK_SIZE || eppnt->p_filesz > eppnt->p_memsz ||
> + if (BAD_ADDR(k) || eppnt->p_filesz > eppnt->p_memsz ||
> eppnt->p_memsz > TASK_SIZE || TASK_SIZE - eppnt->p_memsz < k) {
> error = -ENOMEM;
> goto out_close;
> @@ -772,7 +772,7 @@ static int load_elf_binary(struct linux_
> * allowed task size. Note that p_filesz must always be
> * <= p_memsz so it is only necessary to check p_memsz.
> */
> - if (k > TASK_SIZE || elf_ppnt->p_filesz > elf_ppnt->p_memsz ||
> + if (BAD_ADDR(k) || elf_ppnt->p_filesz > elf_ppnt->p_memsz ||
> elf_ppnt->p_memsz > TASK_SIZE ||
> TASK_SIZE - elf_ppnt->p_memsz < k) {
> /* set_brk can never work. Avoid overflows. */
> @@ -822,10 +822,9 @@ static int load_elf_binary(struct linux_
> interpreter,
> &interp_load_addr);
> if (BAD_ADDR(elf_entry)) {
> - printk(KERN_ERR "Unable to load interpreter %.128s\n",
> - elf_interpreter);
> force_sig(SIGSEGV, current);
> - retval = IS_ERR((void *)elf_entry) ? PTR_ERR((void *)elf_entry) : -ENOEXEC;
> + retval = IS_ERR((void *)elf_entry) ?
> + (int)elf_entry : -EINVAL;
> goto out_free_dentry;
> }
> reloc_func_desc = interp_load_addr;
> @@ -833,6 +832,12 @@ static int load_elf_binary(struct linux_
> allow_write_access(interpreter);
> fput(interpreter);
> kfree(elf_interpreter);
> + } else {
> + if (BAD_ADDR(elf_entry)) {
> + force_sig(SIGSEGV, current);
> + retval = -EINVAL;
> + goto out_free_dentry;
> + }
> }
>
> kfree(elf_phdata);

2006-08-22 20:23:43

by Ernie Petrides

[permalink] [raw]
Subject: Re: printk()s of user-supplied strings (Re: [PATCH] binfmt_elf.c : the BAD_ADDR macro again)

On Tuesday, 22-Aug-2006 at 7:7 +0400, Solar Designer wrote:

> On Mon, Aug 21, 2006 at 07:36:01PM -0400, Ernie Petrides wrote:
> > - printk(KERN_ERR "Unable to load interpreter %.128s\n",
> > - elf_interpreter);
>
> I'd rather have this message rate-limited, not dropped completely.

I consider any printk() that can be arbitrarily triggered by an
unprivileged user to be inappropriate, rate-limited or not. I
recommend that it be removed entirely.


> Another long-time concern that I had is that we've got some printk()s
> of user-supplied string data. What about embedded linefeeds - can this
> be used to produce fake kernel messages with arbitrary log level (syslog
> priority)? It certainly seems so.
>
> Also, there are terminal controls...

These are valid concerns. Allowing the kernel to print user-fabricated
strings is a terrible idea.


Cheers. -ernie

2006-08-22 20:48:19

by Willy Tarreau

[permalink] [raw]
Subject: Re: printk()s of user-supplied strings

Hi Ernie,

On Tue, Aug 22, 2006 at 04:23:17PM -0400, Ernie Petrides wrote:
> On Tuesday, 22-Aug-2006 at 7:7 +0400, Solar Designer wrote:
>
> > On Mon, Aug 21, 2006 at 07:36:01PM -0400, Ernie Petrides wrote:
> > > - printk(KERN_ERR "Unable to load interpreter %.128s\n",
> > > - elf_interpreter);
> >
> > I'd rather have this message rate-limited, not dropped completely.
>
> I consider any printk() that can be arbitrarily triggered by an
> unprivileged user to be inappropriate, rate-limited or not. I
> recommend that it be removed entirely.

Well, we had the same problem with the setuid() call where I proposed a
printk(), and Alan proposed to ratelimit it to prevent local user from
using it to flush the logs for instance, which I found clearly appropriate,
reason why I've backported 2.6's printk_ratelimit() function, citing the
two other printk() in binfmt_elf as good candidates.

> > Another long-time concern that I had is that we've got some printk()s
> > of user-supplied string data. What about embedded linefeeds - can this
> > be used to produce fake kernel messages with arbitrary log level (syslog
> > priority)? It certainly seems so.
> >
> > Also, there are terminal controls...
>
> These are valid concerns. Allowing the kernel to print user-fabricated
> strings is a terrible idea.

I agree. While this printk might have been there for years now, I really
think that it should be fixed for sensible chars, but then restored and
ratelimited to inform the admin that something abnormal is going on.

2.4.33.2 is out with the SCTP fix and this patch now, but with the printk
commented out.

Cheers,
Willy

2006-08-24 16:47:58

by Solar Designer

[permalink] [raw]
Subject: Re: printk()s of user-supplied strings (Re: [PATCH] binfmt_elf.c : the BAD_ADDR macro again)

On Tue, Aug 22, 2006 at 04:23:17PM -0400, Ernie Petrides wrote:
> On Tuesday, 22-Aug-2006 at 7:7 +0400, Solar Designer wrote:
> > > - printk(KERN_ERR "Unable to load interpreter %.128s\n",
> > > - elf_interpreter);
> >
> > I'd rather have this message rate-limited, not dropped completely.
>
> I consider any printk() that can be arbitrarily triggered by an
> unprivileged user to be inappropriate, rate-limited or not. I
> recommend that it be removed entirely.

Some of them are quite useful. For example, the presence of this one in
dmesg and the logs typically (although not necessarily) indicates that
there was an OOM condition - and the interpreter name is useful to know
that the message indeed applied to /lib/ld-linux.so.2 rather than to
some user-supplied ELF interpreter (that could simply be non-existent).
Alan has recently suggested that another rate-limited user triggerable
message be introduced for set*uid() failing on transient errors. Then,
what about warnings of emulated unaligned accesses on Alpha? Those were
useful to me on many occasions. There can be many other examples.

Also, for some printk()s it is difficult to say whether they're user
triggerable. A printk() indicating a "machine check" or an ECC error
may well be user triggerable on a particular system.

> > Another long-time concern that I had is that we've got some printk()s
> > of user-supplied string data. What about embedded linefeeds - can this
> > be used to produce fake kernel messages with arbitrary log level (syslog
> > priority)? It certainly seems so.
> >
> > Also, there are terminal controls...
>
> These are valid concerns. Allowing the kernel to print user-fabricated
> strings is a terrible idea.

Yet there are lots of such printk()s - and I suggest that we make a
determination on all of them before we possibly start fixing individual
ones. In fact, perhaps there are too many of them to be fixing any in
2.4, unless we determine to somehow harden printk() itself.

Even current->comm is untrusted user input, but there are at least 58
printk()s of it in 2.4.33. (58 is the number spotted by a grep that
would only match those that have current->comm on the same line with
printk itself.)

Alexander

2006-08-24 17:02:42

by Willy Tarreau

[permalink] [raw]
Subject: Re: printk()s of user-supplied strings (Re: [PATCH] binfmt_elf.c : the BAD_ADDR macro again)

On Thu, Aug 24, 2006 at 08:44:25PM +0400, Solar Designer wrote:
> On Tue, Aug 22, 2006 at 04:23:17PM -0400, Ernie Petrides wrote:
> > On Tuesday, 22-Aug-2006 at 7:7 +0400, Solar Designer wrote:
> > > > - printk(KERN_ERR "Unable to load interpreter %.128s\n",
> > > > - elf_interpreter);
> > >
> > > I'd rather have this message rate-limited, not dropped completely.
> >
> > I consider any printk() that can be arbitrarily triggered by an
> > unprivileged user to be inappropriate, rate-limited or not. I
> > recommend that it be removed entirely.
>
> Some of them are quite useful. For example, the presence of this one in
> dmesg and the logs typically (although not necessarily) indicates that
> there was an OOM condition - and the interpreter name is useful to know
> that the message indeed applied to /lib/ld-linux.so.2 rather than to
> some user-supplied ELF interpreter (that could simply be non-existent).
> Alan has recently suggested that another rate-limited user triggerable
> message be introduced for set*uid() failing on transient errors. Then,
> what about warnings of emulated unaligned accesses on Alpha? Those were
> useful to me on many occasions. There can be many other examples.
>
> Also, for some printk()s it is difficult to say whether they're user
> triggerable. A printk() indicating a "machine check" or an ECC error
> may well be user triggerable on a particular system.
>
> > > Another long-time concern that I had is that we've got some printk()s
> > > of user-supplied string data. What about embedded linefeeds - can this
> > > be used to produce fake kernel messages with arbitrary log level (syslog
> > > priority)? It certainly seems so.
> > >
> > > Also, there are terminal controls...
> >
> > These are valid concerns. Allowing the kernel to print user-fabricated
> > strings is a terrible idea.
>
> Yet there are lots of such printk()s - and I suggest that we make a
> determination on all of them before we possibly start fixing individual
> ones. In fact, perhaps there are too many of them to be fixing any in
> 2.4, unless we determine to somehow harden printk() itself.
>
> Even current->comm is untrusted user input, but there are at least 58
> printk()s of it in 2.4.33. (58 is the number spotted by a grep that
> would only match those that have current->comm on the same line with
> printk itself.)

I still fail to imagine a useful case for printk("%s") to output non-printable
chars verbatim. I really think that the right solution would be for printk()
to output all non-printable chars escaped (eg: \n, \r, \xXX). It would
definitely fix the problem without removing useful information. I remember
having had the problem a long time ago with a name extracted from a string
supplied by the BIOS which mangled the dmesg at early boot.

It would be too much work (and too risky) to expect from all drivers to check
their strings for correctness, so the hardening way would be the best one IMHO.

Cheers,
Willy

2006-08-26 02:33:33

by Solar Designer

[permalink] [raw]
Subject: Re: printk()s of user-supplied strings (Re: [PATCH] binfmt_elf.c : the BAD_ADDR macro again)

On Thu, Aug 24, 2006 at 06:46:33PM +0200, Willy Tarreau wrote:
> On Thu, Aug 24, 2006 at 08:44:25PM +0400, Solar Designer wrote:
> > Yet there are lots of such printk()s - and I suggest that we make a
> > determination on all of them before we possibly start fixing individual
> > ones. In fact, perhaps there are too many of them to be fixing any in
> > 2.4, unless we determine to somehow harden printk() itself.
> >
> > Even current->comm is untrusted user input, but there are at least 58
> > printk()s of it in 2.4.33. (58 is the number spotted by a grep that
> > would only match those that have current->comm on the same line with
> > printk itself.)
>
> I still fail to imagine a useful case for printk("%s") to output non-printable
> chars verbatim. I really think that the right solution would be for printk()
> to output all non-printable chars escaped (eg: \n, \r, \xXX).

Well, this would be ASCII codes 0 through 0x1f and 0x7f through 0x9f.
Unfortunately, some of the latter correspond to national letters and
other visible characters with weird charsets:

http://en.wikipedia.org/wiki/Windows_code_page

but perhaps it is OK to not support them in kernel logs - syslogd's
printline() would escape them anyway, so we can do it earlier to also
protect the console.

Would you escape backslashes themselves, though? Perhaps not. syslogd
(as maintained in Linux sysklogd) doesn't. Yes, this escaping is not
reliably reversible in that case.

> It would
> definitely fix the problem without removing useful information. I remember
> having had the problem a long time ago with a name extracted from a string
> supplied by the BIOS which mangled the dmesg at early boot.
>
> It would be too much work (and too risky) to expect from all drivers to check
> their strings for correctness, so the hardening way would be the best one IMHO.

I agree.

Alexander

2006-08-26 08:40:38

by Willy Tarreau

[permalink] [raw]
Subject: Re: printk()s of user-supplied strings (Re: [PATCH] binfmt_elf.c : the BAD_ADDR macro again)

On Sat, Aug 26, 2006 at 06:29:55AM +0400, Solar Designer wrote:
> On Thu, Aug 24, 2006 at 06:46:33PM +0200, Willy Tarreau wrote:
> > On Thu, Aug 24, 2006 at 08:44:25PM +0400, Solar Designer wrote:
> > > Yet there are lots of such printk()s - and I suggest that we make a
> > > determination on all of them before we possibly start fixing individual
> > > ones. In fact, perhaps there are too many of them to be fixing any in
> > > 2.4, unless we determine to somehow harden printk() itself.
> > >
> > > Even current->comm is untrusted user input, but there are at least 58
> > > printk()s of it in 2.4.33. (58 is the number spotted by a grep that
> > > would only match those that have current->comm on the same line with
> > > printk itself.)
> >
> > I still fail to imagine a useful case for printk("%s") to output non-printable
> > chars verbatim. I really think that the right solution would be for printk()
> > to output all non-printable chars escaped (eg: \n, \r, \xXX).
>
> Well, this would be ASCII codes 0 through 0x1f and 0x7f through 0x9f.
> Unfortunately, some of the latter correspond to national letters and
> other visible characters with weird charsets:
>
> http://en.wikipedia.org/wiki/Windows_code_page
>
> but perhaps it is OK to not support them in kernel logs - syslogd's
> printline() would escape them anyway, so we can do it earlier to also
> protect the console.

I think that we are trying to protect against :
1) local users faking kernel messages. (eg: disk errors, oopses, ...)
2) local users changing console settings (eg: black on black)
3) local users corrupting the log reader's terminal

1) is relatively easy to do, basically if we escape \b, \r and \n, it will
become hard to produce fake logs.

2) should be as easy. I'm not aware of any way to change a local console
setting with non-control chars (>= 0x20)

3) is a problem of interpretation between the program storing the logs on
disk and the one retrieving them and showing them to the user. It's by no
way a kernel problem.

Thus, 1 and 2 could be merged by escaping chars \x00 to \x1F. Sysklogd
prints them as control chars prefixed with the '^' sign, so a linefeed
would appear as '^J'.

> Would you escape backslashes themselves, though? Perhaps not. syslogd
> (as maintained in Linux sysklogd) doesn't. Yes, this escaping is not
> reliably reversible in that case.

I don't think we will have to escape the escape char itself. I know that
this is dirty and will not make it reliable to reverse the string, but
we need to keep in mind that what we are trying to do is not preventing
users from hiding their programs' exact names, but preventing them from
faking logs. Moreover, sysklogd does not escape the escape char either,
so we are not introducing a new weakness by doing this.

Also, we should not prevent the kernel from outputting such chars. For
instance, RAMDISK decompression displays a rotating bar followed by
backspaces (doing dmesg on a serial console on this is somewhat funny).
Such strings are hard-coded, so as long as we only protect the %s interpreter,
we will not break those legitimate uses.

> > It would
> > definitely fix the problem without removing useful information. I remember
> > having had the problem a long time ago with a name extracted from a string
> > supplied by the BIOS which mangled the dmesg at early boot.
> >
> > It would be too much work (and too risky) to expect from all drivers to check
> > their strings for correctness, so the hardening way would be the best one IMHO.
>
> I agree.

I will try to propose something this week-end.

> Alexander

Cheers,
Willy

2006-08-26 23:17:05

by Solar Designer

[permalink] [raw]
Subject: Re: printk()s of user-supplied strings (Re: [PATCH] binfmt_elf.c : the BAD_ADDR macro again)

On Sat, Aug 26, 2006 at 10:22:36AM +0200, Willy Tarreau wrote:
> I think that we are trying to protect against :
> 1) local users faking kernel messages. (eg: disk errors, oopses, ...)
> 2) local users changing console settings (eg: black on black)
> 3) local users corrupting the log reader's terminal
>
> 1) is relatively easy to do, basically if we escape \b, \r and \n, it will
> become hard to produce fake logs.

I think it's just '\n'; other characters that you've mentioned achieve
(1) by exploiting (2).

> 2) should be as easy. I'm not aware of any way to change a local console
> setting with non-control chars (>= 0x20)

...except that there are control chars above 0x20:

http://archives.neohapsis.com/archives/linux/lsap/2000-q2/0151.html

(scroll down to the end). One thing that I did not mention in that older
posting is that, IIRC, according to my vt420 manual, macro recording and
playback may be invoked via the 8-bit controls (in some terminal modes).

This is not limited to real (hardware) terminals; of the 8-bit controls,
at least CSI (0x9b) is typically supported by terminal emulators. Even
our linux/drivers/char/console.c supports it.

Although most terminal emulators don't appear to allow for macro
recording/playback to be requested with control chars, they do support
status requests - to which they respond with certain easy-to-guess
strings (an attack would be to create a script in PATH under that name -
yes, this has multiple prerequisites).

So I think that we should escape chars in the 0x7f to 0x9f range as well.

> 3) is a problem of interpretation between the program storing the logs on
> disk and the one retrieving them and showing them to the user. It's by no
> way a kernel problem.

I'd agree, but what about dmesg? It reads directly from the kernel and
it is commonly run with output to a tty. OK, maybe dmesg (the userspace
program) should be fixed.

> Thus, 1 and 2 could be merged by escaping ...

Yes.

In fact, if you do the escaping right when substituting values for "%s",
then you'll cover (3) as well.

You probably don't want to modify the behavior of vsnprintf() for all of
the kernel; you only want to affect its behavior when called from
printk(). So you might need to be calling a private function instead,
which would accept an additional argument, and make vsnprintf() a
wrapper around that function. Yes, that would be a hack.

Alternatively, you can do the escaping right after the vsnprintf() call,
but then it will affect more than just "%s". I am not sure whether this
is acceptable.

> I don't think we will have to escape the escape char itself. I know that
> this is dirty and will not make it reliable to reverse the string, but
> we need to keep in mind that what we are trying to do is not preventing
> users from hiding their programs' exact names, but preventing them from
> faking logs. Moreover, sysklogd does not escape the escape char either,
> so we are not introducing a new weakness by doing this.

I agree.

Thanks,

Alexander

2006-08-27 20:05:54

by Willy Tarreau

[permalink] [raw]
Subject: Re: printk()s of user-supplied strings

Hi,

On Sun, Aug 27, 2006 at 03:13:14AM +0400, Solar Designer wrote:
> On Sat, Aug 26, 2006 at 10:22:36AM +0200, Willy Tarreau wrote:
> > I think that we are trying to protect against :
> > 1) local users faking kernel messages. (eg: disk errors, oopses, ...)
> > 2) local users changing console settings (eg: black on black)
> > 3) local users corrupting the log reader's terminal
> >
> > 1) is relatively easy to do, basically if we escape \b, \r and \n, it will
> > become hard to produce fake logs.
>
> I think it's just '\n'; other characters that you've mentioned achieve
> (1) by exploiting (2).
>
> > 2) should be as easy. I'm not aware of any way to change a local console
> > setting with non-control chars (>= 0x20)
>
> ...except that there are control chars above 0x20:
>
> http://archives.neohapsis.com/archives/linux/lsap/2000-q2/0151.html
>
> (scroll down to the end). One thing that I did not mention in that older
> posting is that, IIRC, according to my vt420 manual, macro recording and
> playback may be invoked via the 8-bit controls (in some terminal modes).
>
> This is not limited to real (hardware) terminals; of the 8-bit controls,
> at least CSI (0x9b) is typically supported by terminal emulators. Even
> our linux/drivers/char/console.c supports it.

Thanks for the pointers, I didn't even know about this one. I also found
DCS which looks nasty too.

> Although most terminal emulators don't appear to allow for macro
> recording/playback to be requested with control chars, they do support
> status requests - to which they respond with certain easy-to-guess
> strings (an attack would be to create a script in PATH under that name -
> yes, this has multiple prerequisites).
>
> So I think that we should escape chars in the 0x7f to 0x9f range as well.

OK that's what I adopted in my first attempt.

> > 3) is a problem of interpretation between the program storing the logs on
> > disk and the one retrieving them and showing them to the user. It's by no
> > way a kernel problem.
>
> I'd agree, but what about dmesg? It reads directly from the kernel and
> it is commonly run with output to a tty. OK, maybe dmesg (the userspace
> program) should be fixed.

Exactly.

> > Thus, 1 and 2 could be merged by escaping ...
>
> Yes.
>
> In fact, if you do the escaping right when substituting values for "%s",
> then you'll cover (3) as well.
>
> You probably don't want to modify the behavior of vsnprintf() for all of
> the kernel; you only want to affect its behavior when called from
> printk(). So you might need to be calling a private function instead,
> which would accept an additional argument, and make vsnprintf() a
> wrapper around that function. Yes, that would be a hack.

No, I have forked it into vsnprintf_escaped() which is used by printk()
only.

> Alternatively, you can do the escaping right after the vsnprintf() call,
> but then it will affect more than just "%s". I am not sure whether this
> is acceptable.

I didn't want to affect anything but "%s". Unfortunately, it seems that it
was already too much because there are several users in the kernel which
first construct their multi-line strings then call printk("%s") with it :-(
See example below with "\n" escaped with "^J" :

root@pcw:~# dmesg|fgrep '^J'
ACPI-0165: *** Error: No object was returned from [\_SB_.PCI0.UAR2._STA] (Node 79b1aec0), AE_NOT_EXIST^J<6>ACPI: Interpreter enabled
Serial driver version 5.05c (2001-07-08) with MANY_PORTS SHARE_IRQ SERIAL_PCI ISAPNP enabled^J<6>ttyS00 at 0x03f8 (irq = 4) is a 16550A
scsi0 : Adaptec AIC7XXX EISA/VLB/PCI SCSI HBA DRIVER, Rev 6.3.9^J <Adaptec 29160 Ultra160 SCSI adapter>^J aic7892: Ultra160 Wide Channel A, SCSI Id=7, 32/253 SCBs^J
(scsi0:A:0): 160.000MB/s transfers (80.000MHz DT, offset 63, 16bit)^J(scsi0:A:2): 20.000MB/s transfers (20.000MHz, offset 16)^J(scsi0:A:2): 19.230MB/s transfers (19.230MHz, offset 16)^J(scsi0:A:3): 20.000MB/s transfers (20.000MHz, offset 7)^J(scsi0:A:4): 10.000MB/s transfers (10.000MHz, offset 32)^J Vendor: COMPAQ
Model: BD01874554 Rev: 3B08
IP Protocols: ICMP, UDP, TCP, IGMP^J<6>IP: routing cache hash table of 8192 buckets, 64Kbytes
sd(8,6):Using r5 hash to sort names^JVFS: Mounted root (reiserfs filesystem) readonly.
root@pcw:~#

ACPI, Serial and Net are among the "clean" users, they correctly start
their next line with the log level prefix. Others such as aic7xxx and
reiserfs are not as clean as you can see above.

So I'm a bit lost now. The only alternative that I can imagine would
become a little bit complicated ; for "%s", do the following :

- replace "\n" by "\\\n" to show that the first line is not complete
- force the log level prefix immediately before first char of next
line, to the same level as for the former message. This is to
prevent user-space programs from hiding their actions by injecting
level prefixes such as <7> which sends everything till EOL to the
debug log, often equivalent to /dev/null on most systems.
- ignore the log level prefix provided after a potential linefeed.
- escape other non-printable characters

One of the problems is that we don't want to print "\" alone at the
end of lines before the final "\n", so there are lots of non-trivial
tests to perform. Also, the vsnprintf_escaped() function has to know
about the current log level (only need to add one argument).

Another idea I had was to add a new format specifier to vsnprintf()
to explicitly escape the string (eg: "%S"). But there are so many
users of printk() to fix then that I'm not sure we would find them
all. However, it would be the real fix and not a hack because what
we're trying to do is to enforce controls on some data type, which
is exactly the point of this solution.

Ideas are very welcome. I Cc Alan since he often has good ideas or
simple solutions to non-trivial problems such as this one.

> Thanks,
>
> Alexander

Regards,
Willy

2006-08-28 01:56:21

by Solar Designer

[permalink] [raw]
Subject: Re: printk()s of user-supplied strings

On Sun, Aug 27, 2006 at 10:04:40PM +0200, Willy Tarreau wrote:
> I didn't want to affect anything but "%s". Unfortunately, it seems that it
> was already too much because there are several users in the kernel which
> first construct their multi-line strings then call printk("%s") with it :-(

I've tried looking up the printk() calls corresponding to the example
output that you provided. Those that I've checked do not use "%s" on
multi-line strings; rather, they merely embed the final '\n' that is meant
to end the line within the last "%s" string. Perhaps we could allow for
this special case (that is, not escape the '\n' if it is the last
character to be output by the printk() call). Yes, there will be a few
printk("%s", ...) calls that we won't protect - namely, those that are
meant to output user-supplied strings in the middle of a line, yet are
invoked separately from the printk() that is meant to complete the line.
Those would need to be spotted and fixed individually.

> Serial driver version 5.05c (2001-07-08) with MANY_PORTS SHARE_IRQ SERIAL_PCI ISAPNP enabled^J<6>ttyS00 at 0x03f8 (irq = 4) is a 16550A

The first line (before the ^J) is output with:

printk(KERN_INFO "%s version %s%s (%s) with%s", serial_name,
serial_version, LOCAL_VERSTRING, serial_revdate,
serial_options);

The linefeed is embedded in serial_options.

> IP Protocols: ICMP, UDP, TCP, IGMP^J<6>IP: routing cache hash table of 8192 buckets, 64Kbytes

printk(KERN_INFO "IP Protocols: ");
for (p = inet_protocol_base; p != NULL;) {
struct inet_protocol *tmp = (struct inet_protocol *) p->next;
inet_add_protocol(p);
printk("%s%s",p->name,tmp?", ":"\n");
p = tmp;
}

As you can see, this one is similar - a single line is formed with
multiple printk()s and the linefeed is embedded in the last "%s" string.

> Another idea I had was to add a new format specifier to vsnprintf()
> to explicitly escape the string (eg: "%S"). But there are so many
> users of printk() to fix then that I'm not sure we would find them
> all. However, it would be the real fix and not a hack because what
> we're trying to do is to enforce controls on some data type, which
> is exactly the point of this solution.

Yes, I had this thought, too. This would be the cleanest solution, but
I'm afraid that it will fail in practice. People will continue
introducing printk() calls with plain "%s" where the new "%S" would be
needed. Another problem is that there will be cases where it is
difficult to make a determination whether a given string is untrusted
user input. A solution would be to normally use "%S" and only use
"%s" where "%S" wouldn't work. In that case, we could as well swap "%s"
and "%S", though - hardening the existing "%s" and introducing "%S" for
those callers that depend on the old behavior.

Alexander

2006-08-28 08:23:20

by Willy Tarreau

[permalink] [raw]
Subject: Re: printk()s of user-supplied strings

Hi Alexander,

On Mon, Aug 28, 2006 at 05:52:24AM +0400, Solar Designer wrote:
> On Sun, Aug 27, 2006 at 10:04:40PM +0200, Willy Tarreau wrote:
> > I didn't want to affect anything but "%s". Unfortunately, it seems that it
> > was already too much because there are several users in the kernel which
> > first construct their multi-line strings then call printk("%s") with it :-(
>
> I've tried looking up the printk() calls corresponding to the example
> output that you provided. Those that I've checked do not use "%s" on
> multi-line strings; rather, they merely embed the final '\n' that is meant
> to end the line within the last "%s" string. Perhaps we could allow for
> this special case (that is, not escape the '\n' if it is the last
> character to be output by the printk() call).

That's a good idea. I can say I haven't thought about that.

> Yes, there will be a few
> printk("%s", ...) calls that we won't protect - namely, those that are
> meant to output user-supplied strings in the middle of a line, yet are
> invoked separately from the printk() that is meant to complete the line.
> Those would need to be spotted and fixed individually.
>
> > Serial driver version 5.05c (2001-07-08) with MANY_PORTS SHARE_IRQ SERIAL_PCI ISAPNP enabled^J<6>ttyS00 at 0x03f8 (irq = 4) is a 16550A
>
> The first line (before the ^J) is output with:
>
> printk(KERN_INFO "%s version %s%s (%s) with%s", serial_name,
> serial_version, LOCAL_VERSTRING, serial_revdate,
> serial_options);
>
> The linefeed is embedded in serial_options.
>
> > IP Protocols: ICMP, UDP, TCP, IGMP^J<6>IP: routing cache hash table of 8192 buckets, 64Kbytes
>
> printk(KERN_INFO "IP Protocols: ");
> for (p = inet_protocol_base; p != NULL;) {
> struct inet_protocol *tmp = (struct inet_protocol *) p->next;
> inet_add_protocol(p);
> printk("%s%s",p->name,tmp?", ":"\n");
> p = tmp;
> }
>
> As you can see, this one is similar - a single line is formed with
> multiple printk()s and the linefeed is embedded in the last "%s" string.

Thanks for your analysis.

> > Another idea I had was to add a new format specifier to vsnprintf()
> > to explicitly escape the string (eg: "%S"). But there are so many
> > users of printk() to fix then that I'm not sure we would find them
> > all. However, it would be the real fix and not a hack because what
> > we're trying to do is to enforce controls on some data type, which
> > is exactly the point of this solution.
>
> Yes, I had this thought, too. This would be the cleanest solution, but
> I'm afraid that it will fail in practice. People will continue
> introducing printk() calls with plain "%s" where the new "%S" would be
> needed.

Well, I'm not sure about this. Nearly all patches which get merged pass
through a public review first, and when you see how many replies you get
for and 'else' and and 'if' on two different lines, I expect lots of
spontaneous replies such as "use %S for user-supplied strings".

> Another problem is that there will be cases where it is
> difficult to make a determination whether a given string is untrusted
> user input.

If it is difficult to tell this, then the code will be difficult to audit
and will be waiting for an exploit. Providing the tools to the users may
help them write cleaner code too.

> A solution would be to normally use "%S" and only use
> "%s" where "%S" wouldn't work. In that case, we could as well swap "%s"
> and "%S", though - hardening the existing "%s" and introducing "%S" for
> those callers that depend on the old behavior.

I'd rather not change "%s" semantics if we introduce another specifier
which does exactly what we would expect "%s" to do.

Also, if we can get all users to switch to %S for dangerous strings in
the long term, we might be able to remove the special cases on %s.

I will try your proposal to retain the trailing '\n' unescaped.

> Alexander

Cheers,
Willy

2006-08-28 11:17:46

by Krzysztof Halasa

[permalink] [raw]
Subject: Re: printk()s of user-supplied strings

Willy Tarreau <[email protected]> writes:

> Well, I'm not sure about this. Nearly all patches which get merged pass
> through a public review first, and when you see how many replies you get
> for and 'else' and and 'if' on two different lines, I expect lots of
> spontaneous replies such as "use %S for user-supplied strings".

I wouldn't rely on that.

>> A solution would be to normally use "%S" and only use
>> "%s" where "%S" wouldn't work. In that case, we could as well swap "%s"
>> and "%S", though - hardening the existing "%s" and introducing "%S" for
>> those callers that depend on the old behavior.

I think it's the way to go.

> I'd rather not change "%s" semantics if we introduce another specifier
> which does exactly what we would expect "%s" to do.

Both would be equivalent in most cases. It's better to use "%s" for
most cases (either secured or not) and leave "%S" for the bunch of
special cases whose authors better know what are they doing.

> I will try your proposal to retain the trailing '\n' unescaped.

I think with "%s" and "%S" this is no longer needed.
--
Krzysztof Halasa

2006-08-28 17:36:48

by Valdis Klētnieks

[permalink] [raw]
Subject: Re: printk()s of user-supplied strings

On Mon, 28 Aug 2006 05:52:24 +0400, Solar Designer said:

> > Serial driver version 5.05c (2001-07-08) with MANY_PORTS SHARE_IRQ SERIAL_P
CI ISAPNP enabled^J<6>ttyS00 at 0x03f8 (irq = 4) is a 16550A
>
> The first line (before the ^J) is output with:
>
> printk(KERN_INFO "%s version %s%s (%s) with%s", serial_name,
> serial_version, LOCAL_VERSTRING, serial_revdate,
> serial_options);
>
> The linefeed is embedded in serial_options.

Gaak. And I suppose that in *addition* to having an embedded trailing \n,
it *also* has a leading blank to make 'with%s' work correctly?

Fortunately, find . | xargs grep 'serial_revdate' comes up empty on
a 2.6.18-rc4-mm3 tree, so somebody's swatted this blecherousness already.


Attachments:
(No filename) (226.00 B)

2006-08-30 06:16:11

by Willy Tarreau

[permalink] [raw]
Subject: Re: printk()s of user-supplied strings

On Mon, Aug 28, 2006 at 01:17:43PM +0200, Krzysztof Halasa wrote:
> Willy Tarreau <[email protected]> writes:
>
> > Well, I'm not sure about this. Nearly all patches which get merged pass
> > through a public review first, and when you see how many replies you get
> > for and 'else' and and 'if' on two different lines, I expect lots of
> > spontaneous replies such as "use %S for user-supplied strings".
>
> I wouldn't rely on that.
>
> >> A solution would be to normally use "%S" and only use
> >> "%s" where "%S" wouldn't work. In that case, we could as well swap "%s"
> >> and "%S", though - hardening the existing "%s" and introducing "%S" for
> >> those callers that depend on the old behavior.
>
> I think it's the way to go.
>
> > I'd rather not change "%s" semantics if we introduce another specifier
> > which does exactly what we would expect "%s" to do.
>
> Both would be equivalent in most cases. It's better to use "%s" for
> most cases (either secured or not) and leave "%S" for the bunch of
> special cases whose authors better know what are they doing.
>
> > I will try your proposal to retain the trailing '\n' unescaped.
>
> I think with "%s" and "%S" this is no longer needed.

Yes it will be for compatibility reasons : we for sure will not fix all
users of "%s" quickly, so we will have to do our best not to break them.
If it was easy to find them all, we could replace "%s" with "%S" everywhere
and make "%S" the escaped one.

But well, I believe that you convinced me that escaping the "%s" and providing
a new "%S" for secure or special usages might be the way to go.

I will propose a patch soon.

> Krzysztof Halasa

thanks,
willy

2006-09-01 14:23:33

by Pavel Machek

[permalink] [raw]
Subject: Re: printk()s of user-supplied strings


> > Another idea I had was to add a new format specifier to vsnprintf()
> > to explicitly escape the string (eg: "%S"). But there are so many
> > users of printk() to fix then that I'm not sure we would find them
> > all. However, it would be the real fix and not a hack because what
> > we're trying to do is to enforce controls on some data type, which
> > is exactly the point of this solution.
>
> Yes, I had this thought, too. This would be the cleanest solution, but
> I'm afraid that it will fail in practice. People will continue

Please go for the cleanest solution. Anything else is not mergeable.

--
Thanks for all the (sleeping) penguins.