2021-12-28 17:10:10

by Wander Lairson Costa

[permalink] [raw]
Subject: [PATCH RFC v2 0/4] coredump: mitigate privilege escalation of process coredump

v2
==

Patch 02 conflicted with commit 92307383082d("coredump: Don't perform
any cleanups before dumping core") which I didn't have in my tree. V2
just changes the hunk

+#define PF_SUID 0x00000008

To

+#define PF_SUID 0x01000000

To merge cleanly. Other than that, it is the same patch as v1.

v1
==

A set-uid executable might be a vector to privilege escalation if the
system configures the coredump file name pattern as a relative
directory destiny. The full description of the vulnerability and
a demonstration of how we can exploit it can be found at [1].

This patch series adds a PF_SUID flag to the process in execve if it is
set-[ug]id binary and elevates the new image's privileges.

In the do_coredump function, we check if:

1) We have the SUID_FLAG set
2) We have CAP_SYS_ADMIN (the process might have decreased its
privileges)
3) The current directory is owned by root (the current code already
checks for core_pattern being a relative path).
4) non-privileged users don't have permission to write to the current
directory.

If all four conditions match, we set the need_suid_safe flag.

An alternative implementation (and more elegant IMO) would be saving
the fsuid and fsgid of the process in the task_struct before loading the
new image to the memory. But this approach would add eight bytes to all
task_struct instances where only a tiny fraction of the processes need
it and under a configuration that not all (most?) distributions don't
adopt by default.

Wander Lairson Costa (4):
exec: add a flag indicating if an exec file is a suid/sgid
process: add the PF_SUID flag
coredump: mitigate privilege escalation of process coredump
exec: only set the suid flag if the current proc isn't root

fs/coredump.c | 15 +++++++++++++++
fs/exec.c | 10 ++++++++++
include/linux/binfmts.h | 6 +++++-
include/linux/sched.h | 1 +
kernel/fork.c | 2 ++
5 files changed, 33 insertions(+), 1 deletion(-)

--
2.27.0



2021-12-28 17:10:57

by Wander Lairson Costa

[permalink] [raw]
Subject: [PATCH RFC v2 1/4] exec: add a flag indicating if an exec file is a suid/sgid

We create an additional flag in the struct linux_bprm to indicate a
suid/sgid binary. We will use this information in a later commit to
set the task_struct flags accordingly.

Signed-off-by: Wander Lairson Costa <[email protected]>
---
fs/exec.c | 2 ++
include/linux/binfmts.h | 6 +++++-
2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/fs/exec.c b/fs/exec.c
index 537d92c41105..ec07b36fdbb4 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1619,11 +1619,13 @@ static void bprm_fill_uid(struct linux_binprm *bprm, struct file *file)
if (mode & S_ISUID) {
bprm->per_clear |= PER_CLEAR_ON_SETID;
bprm->cred->euid = uid;
+ bprm->suid_bin = 1;
}

if ((mode & (S_ISGID | S_IXGRP)) == (S_ISGID | S_IXGRP)) {
bprm->per_clear |= PER_CLEAR_ON_SETID;
bprm->cred->egid = gid;
+ bprm->suid_bin = 1;
}
}

diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h
index 049cf9421d83..c4b41b9711d2 100644
--- a/include/linux/binfmts.h
+++ b/include/linux/binfmts.h
@@ -41,7 +41,11 @@ struct linux_binprm {
* Set when errors can no longer be returned to the
* original userspace.
*/
- point_of_no_return:1;
+ point_of_no_return:1,
+ /*
+ * Is this a suid/sgid binary?
+ */
+ suid_bin:1;
#ifdef __alpha__
unsigned int taso:1;
#endif
--
2.27.0


2021-12-28 17:11:35

by Wander Lairson Costa

[permalink] [raw]
Subject: [PATCH RFC v2 2/4] process: add the PF_SUID flag

If the binary file in an execve system call is a suid executable, we add
the PF_SUID flag to the process and all its future new children and
threads.

In a later commit, we will use this information to determine if it is
safe to core dump such a process.

Signed-off-by: Wander Lairson Costa <[email protected]>
---
fs/exec.c | 4 ++++
include/linux/sched.h | 1 +
kernel/fork.c | 2 ++
3 files changed, 7 insertions(+)

diff --git a/fs/exec.c b/fs/exec.c
index ec07b36fdbb4..81d6ab9a4f64 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1309,6 +1309,10 @@ int begin_new_exec(struct linux_binprm * bprm)

me->flags &= ~(PF_RANDOMIZE | PF_FORKNOEXEC | PF_KTHREAD |
PF_NOFREEZE | PF_NO_SETAFFINITY);
+
+ if (bprm->suid_bin)
+ me->flags |= PF_SUID;
+
flush_thread();
me->personality &= ~bprm->per_clear;

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 78c351e35fec..8ec2f907fb89 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1683,6 +1683,7 @@ extern struct pid *cad_pid;
#define PF_KTHREAD 0x00200000 /* I am a kernel thread */
#define PF_RANDOMIZE 0x00400000 /* Randomize virtual address space */
#define PF_SWAPWRITE 0x00800000 /* Allowed to write to swap */
+#define PF_SUID 0x01000000 /* The process comes from a suid/sgid binary */
#define PF_NO_SETAFFINITY 0x04000000 /* Userland is not allowed to meddle with cpus_mask */
#define PF_MCE_EARLY 0x08000000 /* Early kill for mce process policy */
#define PF_MEMALLOC_PIN 0x10000000 /* Allocation context constrained to zones which allow long term pinning. */
diff --git a/kernel/fork.c b/kernel/fork.c
index 3244cc56b697..f0375d102b57 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2076,6 +2076,8 @@ static __latent_entropy struct task_struct *copy_process(
delayacct_tsk_init(p); /* Must remain after dup_task_struct() */
p->flags &= ~(PF_SUPERPRIV | PF_WQ_WORKER | PF_IDLE | PF_NO_SETAFFINITY);
p->flags |= PF_FORKNOEXEC;
+ if (current->flags & PF_SUID)
+ p->flags |= PF_SUID;
INIT_LIST_HEAD(&p->children);
INIT_LIST_HEAD(&p->sibling);
rcu_copy_process(p);
--
2.27.0


2021-12-28 17:12:29

by Wander Lairson Costa

[permalink] [raw]
Subject: [PATCH RFC v2 3/4] coredump: mitigate privilege escalation of process coredump

A set-uid executable might be a vector to a privilege escalation if the
system configures the coredump file name pattern as a relative
directory destiny. The full description of the vulnerability and
a demonstration of how we can exploit it can be found at [1].

We now check if the core dump pattern is relative. If it is, then we
verify if root owns the current directory and if it does, we deny
writing the core file unless the directory is universally writable.

[1] https://www.openwall.com/lists/oss-security/2021/10/20/2

Signed-off-by: Wander Lairson Costa <[email protected]>
---
fs/coredump.c | 15 +++++++++++++++
1 file changed, 15 insertions(+)

diff --git a/fs/coredump.c b/fs/coredump.c
index a6b3c196cdef..26bea87af153 100644
--- a/fs/coredump.c
+++ b/fs/coredump.c
@@ -514,6 +514,7 @@ void do_coredump(const kernel_siginfo_t *siginfo)
struct core_name cn;
struct mm_struct *mm = current->mm;
struct linux_binfmt * binfmt;
+ struct inode *pwd_inode;
const struct cred *old_cred;
struct cred *cred;
int retval = 0;
@@ -559,6 +560,20 @@ void do_coredump(const kernel_siginfo_t *siginfo)
need_suid_safe = true;
}

+ /*
+ * If we are a set-uid/gid root process and the current directory is
+ * owned by root but not universally writable, prohibit dumps under
+ * this path.
+ *
+ * Mitigate https://www.openwall.com/lists/oss-security/2021/10/20/2
+ */
+ pwd_inode = current->fs->pwd.dentry->d_inode;
+ if (current->flags & PF_SUID &&
+ capable(CAP_SYS_ADMIN) &&
+ uid_eq(pwd_inode->i_uid, GLOBAL_ROOT_UID) &&
+ !(pwd_inode->i_mode & 0002))
+ need_suid_safe = true;
+
retval = coredump_wait(siginfo->si_signo, &core_state);
if (retval < 0)
goto fail_creds;
--
2.27.0


2021-12-28 17:13:45

by Wander Lairson Costa

[permalink] [raw]
Subject: [PATCH RFC v2 4/4] exec: only set the suid flag if the current proc isn't root

The goal of PF_SUID flag is to check if it is safe to coredump the
process. If the current process is already privileged, there is no
point in performing security checks because the name image is a
set-uid process.

Because of that, we don't set the suid flag if the forked process
already runs as root.

Signed-off-by: Wander Lairson Costa <[email protected]>
---
fs/exec.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/fs/exec.c b/fs/exec.c
index 81d6ab9a4f64..1a3458c6c9b7 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1310,7 +1310,11 @@ int begin_new_exec(struct linux_binprm * bprm)
me->flags &= ~(PF_RANDOMIZE | PF_FORKNOEXEC | PF_KTHREAD |
PF_NOFREEZE | PF_NO_SETAFFINITY);

- if (bprm->suid_bin)
+ /*
+ * We set the PF_SUID flags for security reasons. There is no
+ * point in setting it if the parent is root.
+ */
+ if (bprm->suid_bin && !capable(CAP_SYS_ADMIN))
me->flags |= PF_SUID;

flush_thread();
--
2.27.0


2022-01-03 22:11:54

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH RFC v2 0/4] coredump: mitigate privilege escalation of process coredump

Wander Lairson Costa <[email protected]> writes:

Have you seen the discussion at:
https://lkml.kernel.org/r/[email protected]
?

Adding a few people from that conversation.

> v2
> ==
>
> Patch 02 conflicted with commit 92307383082d("coredump: Don't perform
> any cleanups before dumping core") which I didn't have in my tree. V2
> just changes the hunk
>
> +#define PF_SUID 0x00000008
>
> To
>
> +#define PF_SUID 0x01000000
>
> To merge cleanly. Other than that, it is the same patch as v1.
>
> v1
> ==
>
> A set-uid executable might be a vector to privilege escalation if the
> system configures the coredump file name pattern as a relative
> directory destiny. The full description of the vulnerability and
> a demonstration of how we can exploit it can be found at [1].
>
> This patch series adds a PF_SUID flag to the process in execve if it is
> set-[ug]id binary and elevates the new image's privileges.
>
> In the do_coredump function, we check if:
>
> 1) We have the SUID_FLAG set
> 2) We have CAP_SYS_ADMIN (the process might have decreased its
> privileges)
> 3) The current directory is owned by root (the current code already
> checks for core_pattern being a relative path).
> 4) non-privileged users don't have permission to write to the current
> directory.

Which is a slightly different approach than we have discussed
previously.

Something persistent to mark the processes descended from the suid exec
is something commonly agreed upon.

How we can dump core after that (with the least disruption is the
remaining question).

You would always allow coredumps unless the target directory is
problematic. I remember it being suggested that even dumping to a pipe
might not be safe in practice. Can someone remember why?

The other approach we have discussed is simply not allowing coredumps
unless the target process takes appropriate action to reenable them.

Willy posted a patch to that effect.


From a proof of concept perspective PF_SUID and your directory checks look
like fine. From a production implementation standpoint I think we would
want to make them a bit more general. PF_SUID because it is more than
uid changes that can grant privilege during exec. We especially want to
watch out for setcap executables. The directory checks similarly look
very inflexible. I think what we want is to test if the process before
the privilege change of the exec could write to the directory.

Even with your directory test approach you are going to run into
the semi-common idio of becomming root and then starting a daemon
in debugging mode so you can get a core dump.

> If all four conditions match, we set the need_suid_safe flag.
>
> An alternative implementation (and more elegant IMO) would be saving
> the fsuid and fsgid of the process in the task_struct before loading the
> new image to the memory. But this approach would add eight bytes to all
> task_struct instances where only a tiny fraction of the processes need
> it and under a configuration that not all (most?) distributions don't
> adopt by default.

One possibility is to save a struct cred on the mm_struct. If done
carefully I think that would allow commit_creds to avoid the need
for dumpability changes (there would always be enough information to
directly compute it).

I can see that working for detecting dropped privileges. I don't know
if that would work for raised privileges.

Definitely focusing in on the mm_struct for where to save the needed
information seems preferable, as in general it is an mm property not a
per task property.

Eric



> Wander Lairson Costa (4):
> exec: add a flag indicating if an exec file is a suid/sgid
> process: add the PF_SUID flag
> coredump: mitigate privilege escalation of process coredump
> exec: only set the suid flag if the current proc isn't root
>
> fs/coredump.c | 15 +++++++++++++++
> fs/exec.c | 10 ++++++++++
> include/linux/binfmts.h | 6 +++++-
> include/linux/sched.h | 1 +
> kernel/fork.c | 2 ++
> 5 files changed, 33 insertions(+), 1 deletion(-)

2022-01-05 12:30:50

by Wander Costa

[permalink] [raw]
Subject: Re: [PATCH RFC v2 0/4] coredump: mitigate privilege escalation of process coredump

On Mon, Jan 3, 2022 at 7:54 PM Eric W. Biederman <[email protected]> wrote:
>
> Wander Lairson Costa <[email protected]> writes:
>
> Have you seen the discussion at:
> https://lkml.kernel.org/r/[email protected]
> ?
>

No, I wasn't aware of this, thanks.

> Adding a few people from that conversation.
>
> > v2
> > ==
> >
> > Patch 02 conflicted with commit 92307383082d("coredump: Don't perform
> > any cleanups before dumping core") which I didn't have in my tree. V2
> > just changes the hunk
> >
> > +#define PF_SUID 0x00000008
> >
> > To
> >
> > +#define PF_SUID 0x01000000
> >
> > To merge cleanly. Other than that, it is the same patch as v1.
> >
> > v1
> > ==
> >
> > A set-uid executable might be a vector to privilege escalation if the
> > system configures the coredump file name pattern as a relative
> > directory destiny. The full description of the vulnerability and
> > a demonstration of how we can exploit it can be found at [1].
> >
> > This patch series adds a PF_SUID flag to the process in execve if it is
> > set-[ug]id binary and elevates the new image's privileges.
> >
> > In the do_coredump function, we check if:
> >
> > 1) We have the SUID_FLAG set
> > 2) We have CAP_SYS_ADMIN (the process might have decreased its
> > privileges)
> > 3) The current directory is owned by root (the current code already
> > checks for core_pattern being a relative path).
> > 4) non-privileged users don't have permission to write to the current
> > directory.
>
> Which is a slightly different approach than we have discussed
> previously.
>
> Something persistent to mark the processes descended from the suid exec
> is something commonly agreed upon.
>
> How we can dump core after that (with the least disruption is the
> remaining question).
>
> You would always allow coredumps unless the target directory is
> problematic. I remember it being suggested that even dumping to a pipe
> might not be safe in practice. Can someone remember why?
>
> The other approach we have discussed is simply not allowing coredumps
> unless the target process takes appropriate action to reenable them.
>
> Willy posted a patch to that effect.
>
>
> From a proof of concept perspective PF_SUID and your directory checks look
> like fine. From a production implementation standpoint I think we would
> want to make them a bit more general. PF_SUID because it is more than
> uid changes that can grant privilege during exec. We especially want to
> watch out for setcap executables. The directory checks similarly look
> very inflexible. I think what we want is to test if the process before
> the privilege change of the exec could write to the directory.
>
> Even with your directory test approach you are going to run into
> the semi-common idio of becomming root and then starting a daemon
> in debugging mode so you can get a core dump.
>
> > If all four conditions match, we set the need_suid_safe flag.
> >
> > An alternative implementation (and more elegant IMO) would be saving
> > the fsuid and fsgid of the process in the task_struct before loading the
> > new image to the memory. But this approach would add eight bytes to all
> > task_struct instances where only a tiny fraction of the processes need
> > it and under a configuration that not all (most?) distributions don't
> > adopt by default.
>
> One possibility is to save a struct cred on the mm_struct. If done
> carefully I think that would allow commit_creds to avoid the need
> for dumpability changes (there would always be enough information to
> directly compute it).
>
> I can see that working for detecting dropped privileges. I don't know
> if that would work for raised privileges.
>
> Definitely focusing in on the mm_struct for where to save the needed
> information seems preferable, as in general it is an mm property not a
> per task property.
>

After reading the other thread and your comments, I came up with the
following idea:

- Create fields coredump_uid and coredump_gid in the mm_struct
- At fork time, copy the euid and egid to coredump_uid and coredump_gid.
- Only change coredump_uid/coredump_gid when the process changes its euid/egid.
- The do_coredump function already creates a local creds struct.
Change the code to set
its fsuid and fsgid to coredump_uid and coredump_gid.

This solution still has the inconvenience that a suid daemon probably
won't have the permission to
core dump by default. We can fix this by changing the setsid system
call to assign coredump_uid and coredump_gid
to euid and egid.

If it sounds reasonable, I can give this idea a try.


> Eric
>
>
>
> > Wander Lairson Costa (4):
> > exec: add a flag indicating if an exec file is a suid/sgid
> > process: add the PF_SUID flag
> > coredump: mitigate privilege escalation of process coredump
> > exec: only set the suid flag if the current proc isn't root
> >
> > fs/coredump.c | 15 +++++++++++++++
> > fs/exec.c | 10 ++++++++++
> > include/linux/binfmts.h | 6 +++++-
> > include/linux/sched.h | 1 +
> > kernel/fork.c | 2 ++
> > 5 files changed, 33 insertions(+), 1 deletion(-)
>