2006-10-15 18:34:18

by Ulrich Drepper

[permalink] [raw]
Subject: [PATCH] close mprotect noexec hole

The following patch closes the hole in mprotect discovered during
the noexec mount discussions. Without this the protection is
incomplete and pretty much useless. With it and additional techniques
like SELinux all holes can be plugged in a fine-grained way.

I think it should be in .19 since it's a security problem not to have
it. Tested on x86-64.


Signed-off-by: Ulrich Drepper <[email protected]>

--- mm/mprotect.c 2006-10-01 09:35:14.000000000 -0700
+++ mm/mprotect.c-new 2006-10-11 14:54:55.000000000 -0700
@@ -251,6 +251,10 @@
error = -ENOMEM;
if (!vma)
goto out;
+ error = -EACCES;
+ if ((reqprot & PROT_EXEC) && vma->vm_file &&
+ (vma->vm_file->f_vfsmnt->mnt_flags & MNT_NOEXEC))
+ goto out;
if (unlikely(grows & PROT_GROWSDOWN)) {
if (vma->vm_start >= end)
goto out;


2006-10-15 18:47:48

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] close mprotect noexec hole



On Sun, 15 Oct 2006, Ulrich Drepper wrote:
>
> The following patch closes the hole in mprotect discovered during
> the noexec mount discussions. Without this the protection is
> incomplete and pretty much useless. With it and additional techniques
> like SELinux all holes can be plugged in a fine-grained way.

This patch seems totally buggy.

mprotect() can cover _multiple_ mappings, and this one only checks the
very first one, as far as I can tell.

The place to do this is where we do the "security_file_mprotect()", not
where you did it.

Ie something like this instead. Totally untested, but at least it compiles
with current -git (unlike Uli's version - needs <linux/mount.h>)

Linus
---
diff --git a/mm/mprotect.c b/mm/mprotect.c
index 3b8f3c0..09ed8de 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -21,6 +21,7 @@ #include <linux/personality.h>
#include <linux/syscalls.h>
#include <linux/swap.h>
#include <linux/swapops.h>
+#include <linux/mount.h>
#include <asm/uaccess.h>
#include <asm/pgtable.h>
#include <asm/cacheflush.h>
@@ -280,9 +281,14 @@ sys_mprotect(unsigned long start, size_t
newflags = vm_flags | (vma->vm_flags & ~(VM_READ | VM_WRITE | VM_EXEC));

/* newflags >> 4 shift VM_MAY% in place of VM_% */
- if ((newflags & ~(newflags >> 4)) & (VM_READ | VM_WRITE | VM_EXEC)) {
- error = -EACCES;
+ error = -EACCES;
+ if ((newflags & ~(newflags >> 4)) & (VM_READ | VM_WRITE | VM_EXEC))
goto out;
+
+ if (newflags & VM_EXEC) {
+ struct file *file = vma->vm_file;
+ if (file && (file->f_vfsmnt->mnt_flags & MNT_NOEXEC))
+ goto out;
}

error = security_file_mprotect(vma, reqprot, prot);

2006-10-15 19:15:04

by Ulrich Drepper

[permalink] [raw]
Subject: Re: [PATCH] close mprotect noexec hole

Linus Torvalds wrote:
> Ie something like this instead. Totally untested, but at least it compiles
> with current -git (unlike Uli's version - needs <linux/mount.h>)

This works fine with my test case and is of course more correct.

--
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖

2006-10-15 19:17:18

by ranjith kumar

[permalink] [raw]
Subject: privilege levels and kernel mode

Hi,
I am using pentium-4 processor. My operating
system is Linux version 2.4.22-1.2199.nptl

I want to measure performance events like number of
memory reads, number of cache misses occured while
running a "C" program. For that I have to wright some
values into "Model specific registers of pentium-4
processor". But those registers can be written ONLY at
privilege level of zero of pentium4 processor.

We know that application programs we write (for
example any C program)are run at privilege
level-3(user mode).

I know how to include assembly instructions in a C
program to wtrite into "Model specific registers". But
that program has to be run at privilege level zero.

How to run a C program at privilege level zero??
Can any one help me?

Thanks in advance.



Send instant messages to your online friends http://uk.messenger.yahoo.com

2006-10-15 19:24:25

by bert hubert

[permalink] [raw]
Subject: Re: privilege levels and kernel mode

On Sun, Oct 15, 2006 at 08:17:16PM +0100, ranjith kumar wrote:

> memory reads, number of cache misses occured while
> running a "C" program. For that I have to wright some
> values into "Model specific registers of pentium-4
> processor". But those registers can be written ONLY at
> privilege level of zero of pentium4 processor.

Look at 'perfmon' and 'perfctr', they give you what you want.


--
http://www.PowerDNS.com Open source, database driven DNS Software
http://netherlabs.nl Open and Closed source services

2006-10-15 19:41:06

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] close mprotect noexec hole



On Sun, 15 Oct 2006, Ulrich Drepper wrote:

> Linus Torvalds wrote:
> > Ie something like this instead. Totally untested, but at least it compiles
> > with current -git (unlike Uli's version - needs <linux/mount.h>)
>
> This works fine with my test case and is of course more correct.

The thing is, I think even my version is wrong.

Why? Because this whole case _should_ have been handled by mprotect
already.

The way we handle VM_WRITE getting set in mprotect() etc is not by
checking that the file is writable, but checking that VM_MAYWRITE is set.

And that's what we did with VM_EXEC too.

So I think that the _real_ bug is that VM_MAYEXEC is set, even though it
clearly should not be.

In other words, I think the _real_ fix is actually to do this at mmap()
time, something like the following..

This is equally untested as the previous version, but I think this is
really conceptually the Right Thing(tm).

Linus
---
diff --git a/mm/mmap.c b/mm/mmap.c
index eea8eef..497e502 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -900,17 +900,6 @@ unsigned long do_mmap_pgoff(struct file
int accountable = 1;
unsigned long charged = 0, reqprot = prot;

- if (file) {
- if (is_file_hugepages(file))
- accountable = 0;
-
- if (!file->f_op || !file->f_op->mmap)
- return -ENODEV;
-
- if ((prot & PROT_EXEC) &&
- (file->f_vfsmnt->mnt_flags & MNT_NOEXEC))
- return -EPERM;
- }
/*
* Does the application expect PROT_READ to imply PROT_EXEC?
*
@@ -1000,6 +989,16 @@ unsigned long do_mmap_pgoff(struct file
case MAP_PRIVATE:
if (!(file->f_mode & FMODE_READ))
return -EACCES;
+ if (file->f_vfsmnt->mnt_flags & MNT_NOEXEC) {
+ if (vm_flags & VM_EXEC)
+ return -EPERM;
+ vm_flags &= ~VM_MAYEXEC;
+ }
+ if (is_file_hugepages(file))
+ accountable = 0;
+
+ if (!file->f_op || !file->f_op->mmap)
+ return -ENODEV;
break;

default:

2006-10-15 20:02:43

by Jan Engelhardt

[permalink] [raw]
Subject: Re: privilege levels and kernel mode

>Hi,
> I am using pentium-4 processor. My operating
>system is Linux version 2.4.22-1.2199.nptl
>
> I want to measure performance events like number of
>memory reads, number of cache misses occured while
>running a "C" program. For that I have to wright some
>values into "Model specific registers of pentium-4
>processor". But those registers can be written ONLY at
>privilege level of zero of pentium4 processor.
>
> We know that application programs we write (for
>example any C program)are run at privilege
>level-3(user mode).
>
>I know how to include assembly instructions in a C
>program to wtrite into "Model specific registers". But
>that program has to be run at privilege level zero.
>
>How to run a C program at privilege level zero??

The short answer: You can't.

The long answer: You have to write a kernel module to do what you need.

The best answer: Use /dev/cpu/0/msr.

>Can any one help me?
>
>Thanks in advance.
>
>
>
>Send instant messages to your online friends http://uk.messenger.yahoo.com
>-
>To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>the body of a message to [email protected]
>More majordomo info at http://vger.kernel.org/majordomo-info.html
>Please read the FAQ at http://www.tux.org/lkml/
>

-`J'
--

2006-10-15 21:37:50

by Ulrich Drepper

[permalink] [raw]
Subject: Re: [PATCH] close mprotect noexec hole

Linus Torvalds wrote:
> This is equally untested as the previous version, but I think this is
> really conceptually the Right Thing(tm).

Works for me here.

--
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖

2006-10-15 22:35:22

by Alan

[permalink] [raw]
Subject: Re: privilege levels and kernel mode

Ar Sul, 2006-10-15 am 20:17 +0100, ysgrifennodd ranjith kumar:
> I know how to include assembly instructions in a C
> program to wtrite into "Model specific registers". But
> that program has to be run at privilege level zero.

Its more hairy than that because of SMP and handling overflows and
according to whether you need to profile all use or your own task and so
on. Fortunately someone has done all the hard work already - take a look
at the oprofile support in the kernel and see if it will do what you
need.

Alan