2012-06-22 19:24:59

by Kees Cook

[permalink] [raw]
Subject: [PATCH v3] fs: introduce pipe-only dump mode suid_dumpable=3

When the suid_dumpable sysctl is set to "2", and there is no core
dump pipe defined in the core_pattern sysctl, a local user can cause
core files to be written to root-writable directories, potentially with
user-controlled content. This means an admin can unknowningly reintroduce
a variation of CVE-2006-2451.

$ cat /proc/sys/fs/suid_dumpable
2
$ cat /proc/sys/kernel/core_pattern
core
$ ulimit -c unlimited
$ cd /
$ ls -l core
ls: cannot access core: No such file or directory
$ touch core
touch: cannot touch `core': Permission denied
$ OHAI="evil-string-here" ping localhost >/dev/null 2>&1 &
$ pid=$!
$ sleep 1
$ kill -SEGV $pid
$ ls -l core
-rw------- 1 root kees 458752 Jun 21 11:35 core
$ sudo strings core | grep evil
OHAI=evil-string-here

While cron has been fixed to abort reading a file when there is any
parse error, there are still other sensitive directories that will read
any file present and skip unparsable lines.

This patch introduces suid_dumpable=3 to allow privilege-changed processes
to be dumped only to a pipe handler (and not directly to disk). The value
of suid_dumpable=2 is now historic, and attempting to set this sysctl
value returns -EINVAL.

Signed-off-by: Kees Cook <[email protected]>
Acked-by: Alan Cox <[email protected]>
---
v3:
- use proper sysctl _conv function, fix commit description, suggested by
Eric W. Biederman.
v2:
- switch to mode 3, remove mode 2, suggested by Alan Cox.
---
Documentation/sysctl/fs.txt | 12 +++++-----
fs/exec.c | 49 ++++++++++++++++--------------------------
include/linux/sched.h | 7 +++++-
kernel/sysctl.c | 38 ++++++++++++++++++++++++++++++--
4 files changed, 66 insertions(+), 40 deletions(-)

diff --git a/Documentation/sysctl/fs.txt b/Documentation/sysctl/fs.txt
index 13d6166..4302839 100644
--- a/Documentation/sysctl/fs.txt
+++ b/Documentation/sysctl/fs.txt
@@ -167,12 +167,12 @@ or otherwise protected/tainted binaries. The modes are
1 - (debug) - all processes dump core when possible. The core dump is
owned by the current user and no security is applied. This is
intended for system debugging situations only. Ptrace is unchecked.
-2 - (suidsafe) - any binary which normally would not be dumped is dumped
- readable by root only. This allows the end user to remove
- such a dump but not access it directly. For security reasons
- core dumps in this mode will not overwrite one another or
- other files. This mode is appropriate when administrators are
- attempting to debug problems in a normal environment.
+2 - (suidsafe) - no longer allowed (returns -EINVAL).
+3 - (pipeforced) - any binary which normally would not be dumped is dumped
+ anyway, but only if a core dump pipe handler is defined (see the
+ "core_pattern" kernel sysctl). This mode is appropriate when
+ administrators are attempting to debug problems in a normal
+ environment.

==============================================================

diff --git a/fs/exec.c b/fs/exec.c
index da27b91..8e0abf2 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1136,7 +1136,7 @@ void setup_new_exec(struct linux_binprm * bprm)
current->sas_ss_sp = current->sas_ss_size = 0;

if (uid_eq(current_euid(), current_uid()) && gid_eq(current_egid(), current_gid()))
- set_dumpable(current->mm, 1);
+ set_dumpable(current->mm, SUID_DUMPABLE_ENABLED);
else
set_dumpable(current->mm, suid_dumpable);

@@ -1991,28 +1991,28 @@ static void coredump_finish(struct mm_struct *mm)
* old new | initial interim final
* ---------+-----------------------
* 0 1 | 00 01 01
- * 0 2 | 00 10(*) 11
+ * 0 3 | 00 10(*) 11
* 1 0 | 01 00 00
- * 1 2 | 01 11 11
- * 2 0 | 11 10(*) 00
- * 2 1 | 11 11 01
+ * 1 3 | 01 11 11
+ * 3 0 | 11 10(*) 00
+ * 3 1 | 11 11 01
*
* (*) get_dumpable regards interim value of 10 as 11.
*/
void set_dumpable(struct mm_struct *mm, int value)
{
switch (value) {
- case 0:
+ case SUID_DUMPABLE_DISABLED:
clear_bit(MMF_DUMPABLE, &mm->flags);
smp_wmb();
clear_bit(MMF_DUMP_SECURELY, &mm->flags);
break;
- case 1:
+ case SUID_DUMPABLE_ENABLED:
set_bit(MMF_DUMPABLE, &mm->flags);
smp_wmb();
clear_bit(MMF_DUMP_SECURELY, &mm->flags);
break;
- case 2:
+ case SUID_DUMPABLE_PIPE_ONLY:
set_bit(MMF_DUMP_SECURELY, &mm->flags);
smp_wmb();
set_bit(MMF_DUMPABLE, &mm->flags);
@@ -2025,7 +2025,7 @@ static int __get_dumpable(unsigned long mm_flags)
int ret;

ret = mm_flags & MMF_DUMPABLE_MASK;
- return (ret >= 2) ? 2 : ret;
+ return (ret > SUID_DUMPABLE_ENABLED) ? SUID_DUMPABLE_PIPE_ONLY : ret;
}

int get_dumpable(struct mm_struct *mm)
@@ -2106,10 +2106,8 @@ void do_coredump(long signr, int exit_code, struct pt_regs *regs)
struct core_name cn;
struct mm_struct *mm = current->mm;
struct linux_binfmt * binfmt;
- const struct cred *old_cred;
- struct cred *cred;
+ bool pipeonly = false;
int retval = 0;
- int flag = 0;
int ispipe;
static atomic_t core_dump_count = ATOMIC_INIT(0);
struct coredump_params cprm = {
@@ -2132,25 +2130,16 @@ void do_coredump(long signr, int exit_code, struct pt_regs *regs)
if (!__get_dumpable(cprm.mm_flags))
goto fail;

- cred = prepare_creds();
- if (!cred)
- goto fail;
/*
- * We cannot trust fsuid as being the "true" uid of the
- * process nor do we know its entire history. We only know it
- * was tainted so we dump it as root in mode 2.
+ * We cannot trust the environment dumping a process that has
+ * changed privileges, so only write the dump to a pipe.
*/
- if (__get_dumpable(cprm.mm_flags) == 2) {
- /* Setuid core dump mode */
- flag = O_EXCL; /* Stop rewrite attacks */
- cred->fsuid = GLOBAL_ROOT_UID; /* Dump root private */
- }
+ if (__get_dumpable(cprm.mm_flags) == SUID_DUMPABLE_PIPE_ONLY)
+ pipeonly = true;

retval = coredump_wait(exit_code, &core_state);
if (retval < 0)
- goto fail_creds;
-
- old_cred = override_creds(cred);
+ goto fail;

/*
* Clear any false indication of pending signals that might
@@ -2220,11 +2209,14 @@ void do_coredump(long signr, int exit_code, struct pt_regs *regs)
} else {
struct inode *inode;

+ if (pipeonly)
+ goto fail_unlock;
+
if (cprm.limit < binfmt->min_coredump)
goto fail_unlock;

cprm.file = filp_open(cn.corename,
- O_CREAT | 2 | O_NOFOLLOW | O_LARGEFILE | flag,
+ O_CREAT | 2 | O_NOFOLLOW | O_LARGEFILE,
0600);
if (IS_ERR(cprm.file))
goto fail_unlock;
@@ -2268,9 +2260,6 @@ fail_unlock:
kfree(cn.corename);
fail_corename:
coredump_finish(mm);
- revert_creds(old_cred);
-fail_creds:
- put_cred(cred);
fail:
return;
}
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 4059c0f..48a7a8e 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -406,10 +406,15 @@ static inline void arch_pick_mmap_layout(struct mm_struct *mm) {}
extern void set_dumpable(struct mm_struct *mm, int value);
extern int get_dumpable(struct mm_struct *mm);

+/* get/set_dumpable() values */
+#define SUID_DUMPABLE_DISABLED 0
+#define SUID_DUMPABLE_ENABLED 1
+#define SUID_DUMPABLE_PIPE_ONLY 3
+
/* mm flags */
/* dumpable bits */
#define MMF_DUMPABLE 0 /* core dump is permitted */
-#define MMF_DUMP_SECURELY 1 /* core file is readable only by root */
+#define MMF_DUMP_SECURELY 1 /* core is readable only by pipe */

#define MMF_DUMPABLE_BITS 2
#define MMF_DUMPABLE_MASK ((1 << MMF_DUMPABLE_BITS) - 1)
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 4ab1187..d163af3 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -174,6 +174,9 @@ static int proc_dointvec_minmax_sysadmin(struct ctl_table *table, int write,
void __user *buffer, size_t *lenp, loff_t *ppos);
#endif

+static int proc_dointvec_suid_dumpable(struct ctl_table *table, int write,
+ void __user *buffer, size_t *lenp, loff_t *ppos);
+
#ifdef CONFIG_MAGIC_SYSRQ
/* Note: sysrq code uses it's own private copy */
static int __sysrq_enabled = SYSRQ_DEFAULT_ENABLE;
@@ -1498,9 +1501,7 @@ static struct ctl_table fs_table[] = {
.data = &suid_dumpable,
.maxlen = sizeof(int),
.mode = 0644,
- .proc_handler = proc_dointvec_minmax,
- .extra1 = &zero,
- .extra2 = &two,
+ .proc_handler = proc_dointvec_suid_dumpable,
},
#if defined(CONFIG_BINFMT_MISC) || defined(CONFIG_BINFMT_MISC_MODULE)
{
@@ -2009,6 +2010,37 @@ int proc_dointvec_minmax(struct ctl_table *table, int write,
do_proc_dointvec_minmax_conv, &param);
}

+/* Allow only the valid suid_dumpable values. */
+static int do_proc_dointvec_suid_dumpable_conv(bool *negp,
+ unsigned long *lvalp, int *valp, int write, void *data)
+{
+ if (write) {
+ int val = *negp ? -*lvalp : *lvalp;
+ if (val != SUID_DUMPABLE_DISABLED &&
+ val != SUID_DUMPABLE_ENABLED &&
+ val != SUID_DUMPABLE_PIPE_ONLY)
+ return -EINVAL;
+ *valp = val;
+ } else {
+ int val = *valp;
+ if (val < 0) {
+ *negp = true;
+ *lvalp = (unsigned long)-val;
+ } else {
+ *negp = false;
+ *lvalp = (unsigned long)val;
+ }
+ }
+ return 0;
+}
+
+static int proc_dointvec_suid_dumpable(struct ctl_table *table, int write,
+ void __user *buffer, size_t *lenp, loff_t *ppos)
+{
+ return do_proc_dointvec(table, write, buffer, lenp, ppos,
+ do_proc_dointvec_suid_dumpable_conv, NULL);
+}
+
static int __do_proc_doulongvec_minmax(void *data, struct ctl_table *table, int write,
void __user *buffer,
size_t *lenp, loff_t *ppos,
--
1.7.0.4


2012-06-22 19:55:55

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH v3] fs: introduce pipe-only dump mode suid_dumpable=3

On Fri, 22 Jun 2012 12:24:13 -0700
Kees Cook <[email protected]> wrote:

> The value
> of suid_dumpable=2 is now historic, and attempting to set this sysctl
> value returns -EINVAL.

This sounds a bit harsh - will it not cause existing configurations to
immediately break? If so, would it not be better to retain the =2 mode
for a while, and emit a nice warning when it is set?

>
> ...
>
> +/* Allow only the valid suid_dumpable values. */
> +static int do_proc_dointvec_suid_dumpable_conv(bool *negp,
> + unsigned long *lvalp, int *valp, int write, void *data)
> +{
> + if (write) {
> + int val = *negp ? -*lvalp : *lvalp;
> + if (val != SUID_DUMPABLE_DISABLED &&
> + val != SUID_DUMPABLE_ENABLED &&
> + val != SUID_DUMPABLE_PIPE_ONLY)
> + return -EINVAL;
> + *valp = val;
> + } else {
> + int val = *valp;
> + if (val < 0) {
> + *negp = true;
> + *lvalp = (unsigned long)-val;
> + } else {
> + *negp = false;
> + *lvalp = (unsigned long)val;

Those two typecasts are unneeded.

> + }
> + }
> + return 0;
> +}
> +

2012-06-22 21:09:31

by Kees Cook

[permalink] [raw]
Subject: Re: [PATCH v3] fs: introduce pipe-only dump mode suid_dumpable=3

On Fri, Jun 22, 2012 at 12:55 PM, Andrew Morton
<[email protected]> wrote:
> On Fri, 22 Jun 2012 12:24:13 -0700
> Kees Cook <[email protected]> wrote:
>
>> The value
>> of suid_dumpable=2 is now historic, and attempting to set this sysctl
>> value returns -EINVAL.
>
> This sounds a bit harsh - will it not cause existing configurations to
> immediately break? ?If so, would it not be better to retain the =2 mode
> for a while, and emit a nice warning when it is set?

I view it as a security vulnerability, so I'd rather see it
eliminated. I see "=1" as a security vulnerability too, but at least
that's well-known to be a bad idea. The "=2" mode has been assumed to
be safe, but it isn't.

-Kees

--
Kees Cook
Chrome OS Security

2012-06-22 21:34:40

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH v3] fs: introduce pipe-only dump mode suid_dumpable=3

On Fri, 22 Jun 2012 14:09:28 -0700
Kees Cook <[email protected]> wrote:

> On Fri, Jun 22, 2012 at 12:55 PM, Andrew Morton
> <[email protected]> wrote:
> > On Fri, 22 Jun 2012 12:24:13 -0700
> > Kees Cook <[email protected]> wrote:
> >
> >> The value
> >> of suid_dumpable=2 is now historic, and attempting to set this sysctl
> >> value returns -EINVAL.
> >
> > This sounds a bit harsh - will it not cause existing configurations to
> > immediately break? __If so, would it not be better to retain the =2 mode
> > for a while, and emit a nice warning when it is set?
>
> I view it as a security vulnerability, so I'd rather see it
> eliminated. I see "=1" as a security vulnerability too, but at least
> that's well-known to be a bad idea. The "=2" mode has been assumed to
> be safe, but it isn't.

But what will be the effects of the change? People's initscripts do an
"echo 2" which fails and the error message (if any) won't get logged
anywhere where anyone looks. So now their machine is bumbling along in
the wrong mode and much later on, someone notices that coredumps are
going awry? This is not exactly a user-friendly way of rolling out
kernel API changes!

And how serious is the security vulnerability, in real-world terms?
Serious enough to risk this amount of bustage?

2012-06-22 21:51:57

by Kees Cook

[permalink] [raw]
Subject: Re: [PATCH v3] fs: introduce pipe-only dump mode suid_dumpable=3

On Fri, Jun 22, 2012 at 2:34 PM, Andrew Morton
<[email protected]> wrote:
> On Fri, 22 Jun 2012 14:09:28 -0700
> Kees Cook <[email protected]> wrote:
>
>> On Fri, Jun 22, 2012 at 12:55 PM, Andrew Morton
>> <[email protected]> wrote:
>> > On Fri, 22 Jun 2012 12:24:13 -0700
>> > Kees Cook <[email protected]> wrote:
>> >
>> >> The value
>> >> of suid_dumpable=2 is now historic, and attempting to set this sysctl
>> >> value returns -EINVAL.
>> >
>> > This sounds a bit harsh - will it not cause existing configurations to
>> > immediately break? __If so, would it not be better to retain the =2 mode
>> > for a while, and emit a nice warning when it is set?
>>
>> I view it as a security vulnerability, so I'd rather see it
>> eliminated. I see "=1" as a security vulnerability too, but at least
>> that's well-known to be a bad idea. The "=2" mode has been assumed to
>> be safe, but it isn't.
>
> But what will be the effects of the change? ?People's initscripts do an
> "echo 2" which fails and the error message (if any) won't get logged
> anywhere where anyone looks. ?So now their machine is bumbling along in
> the wrong mode and much later on, someone notices that coredumps are
> going awry? ?This is not exactly a user-friendly way of rolling out
> kernel API changes!

Well, this is why I wanted to just change the meaning of "2" instead
of introducing "3". It seems much cleaner to me. Just stop "2" from
doing the dangerous thing and carry on.

> And how serious is the security vulnerability, in real-world terms?
> Serious enough to risk this amount of bustage?

If they're running in mode "2" and they do not have a coredump pipe
handler defined, local users can gain root access.

-Kees

--
Kees Cook
Chrome OS Security

2012-06-22 21:57:15

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH v3] fs: introduce pipe-only dump mode suid_dumpable=3

On Fri, 22 Jun 2012 14:51:54 -0700
Kees Cook <[email protected]> wrote:

> > And how serious is the security vulnerability, in real-world terms?
> > Serious enough to risk this amount of bustage?
>
> If they're running in mode "2" and they do not have a coredump pipe
> handler defined, local users can gain root access.

But the kernel can detect this case and avoid it? If we do that at the same
time, we can avoid any mode=2 non-back-compatible breakage?

2012-06-22 22:07:47

by Kees Cook

[permalink] [raw]
Subject: Re: [PATCH v3] fs: introduce pipe-only dump mode suid_dumpable=3

On Fri, Jun 22, 2012 at 2:57 PM, Andrew Morton
<[email protected]> wrote:
> On Fri, 22 Jun 2012 14:51:54 -0700
> Kees Cook <[email protected]> wrote:
>
>> > And how serious is the security vulnerability, in real-world terms?
>> > Serious enough to risk this amount of bustage?
>>
>> If they're running in mode "2" and they do not have a coredump pipe
>> handler defined, local users can gain root access.
>
> But the kernel can detect this case and avoid it? ?If we do that at the same
> time, we can avoid any mode=2 non-back-compatible breakage?

What? Do you mean detect if it's going to disk or to a pipe?

suid core dumps going to disk is not safe. The "mode=2" stuff was
added in an attempt to make it safe, but it has never actually be
safe. Some Linux systems with integrated crash handlers (i.e.
core_pattern with a pipe) want to catch crashes even in suid
processes, so mode=2 makes sense for them since they're handling the
core dump directly, making decisions about it, etc. However, if that
core_pattern is not a pipe, this leads to local users being able to
trick root processes into doing things to give the user root access.

mode=2 to disk _should_ break, is my point. It is not safe. Hence, my
original change to just disallow a mode=2 coredump from going to disk.
It's fine to throw it at the pipe, so leave that as-is.

-Kees

--
Kees Cook
Chrome OS Security

2012-06-22 22:20:32

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH v3] fs: introduce pipe-only dump mode suid_dumpable=3

On Fri, 22 Jun 2012 15:07:45 -0700
Kees Cook <[email protected]> wrote:

> mode=2 to disk _should_ break, is my point.

And my point is that we should at least tell people that we broke it.
I don't believe that returning an EINVAL from the write() is
sufficient. Because it introduces a high risk that people will run
misconfigured systems for lengthy periods and it will cause them to
have to do a *lot* of work once they discover that their system is
misbehaving.

So if we really really must instabreak back-compatibility, we should
shout loudly into syslog about it: tell people that their system is
broken and tell them what to do about it.

And we should explain and justify this extraordinary action in the
patch changelog.

2012-06-22 22:26:52

by Kees Cook

[permalink] [raw]
Subject: Re: [PATCH v3] fs: introduce pipe-only dump mode suid_dumpable=3

On Fri, Jun 22, 2012 at 3:20 PM, Andrew Morton
<[email protected]> wrote:
> On Fri, 22 Jun 2012 15:07:45 -0700
> Kees Cook <[email protected]> wrote:
>
>> mode=2 to disk _should_ break, is my point.
>
> And my point is that we should at least tell people that we broke it.
> I don't believe that returning an EINVAL from the write() is
> sufficient. ?Because it introduces a high risk that people will run
> misconfigured systems for lengthy periods and it will cause them to
> have to do a *lot* of work once they discover that their system is
> misbehaving.
>
> So if we really really must instabreak back-compatibility, we should
> shout loudly into syslog about it: tell people that their system is
> broken and tell them what to do about it.
>
> And we should explain and justify this extraordinary action in the
> patch changelog.

Okay, sounds good. Should mode 3 added with mode 2 removed, or just
drop the dangerous behavior from mode 2? I will be loud in either
situation (e.g. with mode 3, setting mode 2 shouts, or when attempt to
write to disk in mode 2, shout).

-Kees

--
Kees Cook
Chrome OS Security

2012-06-23 07:31:12

by James Morris

[permalink] [raw]
Subject: Re: [PATCH v3] fs: introduce pipe-only dump mode suid_dumpable=3

On Fri, 22 Jun 2012, Kees Cook wrote:

> Okay, sounds good. Should mode 3 added with mode 2 removed, or just
> drop the dangerous behavior from mode 2? I will be loud in either
> situation (e.g. with mode 3, setting mode 2 shouts, or when attempt to
> write to disk in mode 2, shout).

I'd prefer to drop the dangerous behavior.



--
James Morris
<[email protected]>

2012-06-23 22:34:40

by Rob Landley

[permalink] [raw]
Subject: Re: [PATCH v3] fs: introduce pipe-only dump mode suid_dumpable=3

On 06/22/2012 02:24 PM, Kees Cook wrote:
> When the suid_dumpable sysctl is set to "2", and there is no core
> dump pipe defined in the core_pattern sysctl, a local user can cause
> core files to be written to root-writable directories, potentially with
> user-controlled content. This means an admin can unknowningly reintroduce
> a variation of CVE-2006-2451.
...
> --- a/Documentation/sysctl/fs.txt
> +++ b/Documentation/sysctl/fs.txt
> @@ -167,12 +167,12 @@ or otherwise protected/tainted binaries. The modes are
> 1 - (debug) - all processes dump core when possible. The core dump is
> owned by the current user and no security is applied. This is
> intended for system debugging situations only. Ptrace is unchecked.
> -2 - (suidsafe) - any binary which normally would not be dumped is dumped
> - readable by root only. This allows the end user to remove
> - such a dump but not access it directly. For security reasons
> - core dumps in this mode will not overwrite one another or
> - other files. This mode is appropriate when administrators are
> - attempting to debug problems in a normal environment.
> +2 - (suidsafe) - no longer allowed (returns -EINVAL).

Random comment: a reference to the CVE might be good here.

Rob
--
GNU/Linux isn't: Linux=GPLv2, GNU=GPLv3+, they can't share code.
Either it's "mere aggregation", or a license violation. Pick one.