The coredump code always calls set_dumpable(0) when it starts (even
if RLIMIT_CORE prevents any core from being dumped). The effect of
this (via task_dumpable) is to make /proc/pid/* files owned by root
instead of the user, so the user can no longer examine his own
process--in a case where there was never any privileged data to
protect. This affects e.g. auxv, environ, fd; in Fedora (execshield)
kernels, also maps. In practice, you can only notice this when a
debugger has requested PTRACE_EVENT_EXIT tracing.
As far as I know, set_dumpable was only used in do_coredump for
synchronization and not intended for any security purpose.
(It doesn't secure anything that wasn't already unsecured when a
process dies by SIGTERM instead of SIGQUIT.)
This changes do_coredump to use a separate bit for its
synchronization, so the "dumpable" bits remain the same.
Signed-off-by: Roland McGrath <[email protected]>
---
fs/exec.c | 4 ++--
include/linux/sched.h | 2 ++
2 files changed, 4 insertions(+), 2 deletions(-)
diff --git a/fs/exec.c b/fs/exec.c
index 073b0b8..7f1e355 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1727,7 +1727,8 @@ int do_coredump(long signr, int exit_code, struct pt_regs * regs)
if (!binfmt || !binfmt->core_dump)
goto fail;
down_write(&mm->mmap_sem);
- if (!get_dumpable(mm)) {
+ if (!get_dumpable(mm) ||
+ test_and_set_bit(MMF_DUMP_STARTED, &mm->flags)) {
up_write(&mm->mmap_sem);
goto fail;
}
@@ -1741,7 +1742,6 @@ int do_coredump(long signr, int exit_code, struct pt_regs * regs)
flag = O_EXCL; /* Stop rewrite attacks */
current->fsuid = 0; /* Dump root private */
}
- set_dumpable(mm, 0);
retval = coredump_wait(exit_code);
if (retval < 0)
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 1f274c2..33676ad 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -367,6 +367,8 @@ extern int get_dumpable(struct mm_struct *mm);
#define MMF_DUMP_FILTER_DEFAULT \
((1 << MMF_DUMP_ANON_PRIVATE) | (1 << MMF_DUMP_ANON_SHARED))
+#define MMF_DUMP_STARTED 16 /* some thread entered do_coredump already */
+
struct mm_struct {
struct vm_area_struct * mmap; /* list of VMAs */
struct rb_root mm_rb;
On 10/10, Roland McGrath wrote:
>
> As far as I know, set_dumpable was only used in do_coredump for
> synchronization and not intended for any security purpose.
I don't understand why do_coredump() clears DUMPABLE too.
> This changes do_coredump to use a separate bit for its
> synchronization, so the "dumpable" bits remain the same.
> ...
>
> @@ -1727,7 +1727,8 @@ int do_coredump(long signr, int exit_code, struct pt_regs * regs)
> if (!binfmt || !binfmt->core_dump)
> goto fail;
> down_write(&mm->mmap_sem);
> - if (!get_dumpable(mm)) {
> + if (!get_dumpable(mm) ||
> + test_and_set_bit(MMF_DUMP_STARTED, &mm->flags)) {
> up_write(&mm->mmap_sem);
> goto fail;
Do we really need the new MMF_DUMP_STARTED flag? We are holding ->mmap_sem,
can't we check "mm->core_waiters != 0" instead?
Hmm. Actually, do we need any check? If another thread (or CLONE_VM task)
already started do_coredump(), we must see SIGNAL_GROUP_EXIT when we take
->mmap_sem. In that case coredump_wait() does nothing but returns -EAGAIN.
> - set_dumpable(mm, 0);
Looks like this change is all we need, no? The only problem is that we
return -EAGAIN if we race with another thread (the current code returns 0),
but nobody checks the returned value.
What do you think?
Oleg.
> Do we really need the new MMF_DUMP_STARTED flag? We are holding ->mmap_sem,
> can't we check "mm->core_waiters != 0" instead?
Yes, I think you're right. I gave the old use of dumpable the benefit of
the doubt as being needed for synchronization, but that would be enough too.
> Hmm. Actually, do we need any check? If another thread (or CLONE_VM task)
> already started do_coredump(), we must see SIGNAL_GROUP_EXIT when we take
> ->mmap_sem. In that case coredump_wait() does nothing but returns -EAGAIN.
You're right again here. I'm sure the old use predates this checking in
zap_threads, so it's understandable that something different was needed for
this synchronization before.
> > - set_dumpable(mm, 0);
>
> Looks like this change is all we need, no? The only problem is that we
> return -EAGAIN if we race with another thread (the current code returns 0),
> but nobody checks the returned value.
I concur. It merits a comment at the coredump_wait call that this takes
care of all synchronization issues including races to enter do_coredump.
Thanks,
Roland
On 10/11, Roland McGrath wrote:
>
> > Hmm. Actually, do we need any check? If another thread (or CLONE_VM task)
> > already started do_coredump(), we must see SIGNAL_GROUP_EXIT when we take
> > ->mmap_sem. In that case coredump_wait() does nothing but returns -EAGAIN.
>
> You're right again here. I'm sure the old use predates this checking in
> zap_threads, so it's understandable that something different was needed for
> this synchronization before.
Ugh, sorry, I was wrong. coredump_wait() initializes mm->core_done and sets
mm->core_startup_done before checking SIGNAL_GROUP_EXIT. This means we have
to check ->core_waiters or SIGNAL_GROUP_EXIT before calling coredump_wait().
Oleg.
> Ugh, sorry, I was wrong. coredump_wait() initializes mm->core_done and sets
> mm->core_startup_done before checking SIGNAL_GROUP_EXIT. This means we have
> to check ->core_waiters or SIGNAL_GROUP_EXIT before calling coredump_wait().
Ah. Checking core_waiters seems like a fine thing to do.
Thanks,
Roland
The coredump code always calls set_dumpable(0) when it starts (even
if RLIMIT_CORE prevents any core from being dumped). The effect of
this (via task_dumpable) is to make /proc/pid/* files owned by root
instead of the user, so the user can no longer examine his own
process--in a case where there was never any privileged data to
protect. This affects e.g. auxv, environ, fd; in Fedora (execshield)
kernels, also maps. In practice, you can only notice this when a
debugger has requested PTRACE_EVENT_EXIT tracing.
set_dumpable was only used in do_coredump for synchronization and not
intended for any security purpose. (It doesn't secure anything that wasn't
already unsecured when a process dies by SIGTERM instead of SIGQUIT.)
This changes do_coredump to check the core_waiters count as the means of
synchronization, which is sufficient. Now we leave the "dumpable" bits alone.
Signed-off-by: Roland McGrath <[email protected]>
---
fs/exec.c | 6 ++++--
1 files changed, 4 insertions(+), 2 deletions(-)
diff --git a/fs/exec.c b/fs/exec.c
index 2c942e2..3f70c3e 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1692,7 +1692,10 @@ int do_coredump(long signr, int exit_code, struct pt_regs * regs)
if (!binfmt || !binfmt->core_dump)
goto fail;
down_write(&mm->mmap_sem);
- if (!get_dumpable(mm)) {
+ /*
+ * If another thread got here first, or we are not dumpable, bail out.
+ */
+ if (mm->core_waiters || !get_dumpable(mm)) {
up_write(&mm->mmap_sem);
goto fail;
}
@@ -1706,7 +1709,6 @@ int do_coredump(long signr, int exit_code, struct pt_regs * regs)
flag = O_EXCL; /* Stop rewrite attacks */
current->fsuid = 0; /* Dump root private */
}
- set_dumpable(mm, 0);
retval = coredump_wait(exit_code);
if (retval < 0)