Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1034162AbcJ0P5H (ORCPT ); Thu, 27 Oct 2016 11:57:07 -0400 Received: from out02.mta.xmission.com ([166.70.13.232]:39100 "EHLO out02.mta.xmission.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S936559AbcJ0P5C (ORCPT ); Thu, 27 Oct 2016 11:57:02 -0400 From: ebiederm@xmission.com (Eric W. Biederman) To: Cyrill Gorcunov Cc: Kees Cook , Andrey Vagin , LKML , Pavel Emelyanov , Linux Containers , Jann Horn References: <20161024105959.GQ1847@uranus.lan> <8760oh8tbp.fsf@xmission.com> <20161024202925.GS1847@uranus.lan> <8760oh737b.fsf@xmission.com> <20161025090213.GX1847@uranus.lan> Date: Thu, 27 Oct 2016 10:54:34 -0500 In-Reply-To: <20161025090213.GX1847@uranus.lan> (Cyrill Gorcunov's message of "Tue, 25 Oct 2016 12:02:13 +0300") Message-ID: <87d1ilrdmt.fsf_-_@xmission.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-XM-SPF: eid=1bzn35-000542-Hz;;;mid=<87d1ilrdmt.fsf_-_@xmission.com>;;;hst=in01.mta.xmission.com;;;ip=75.170.125.99;;;frm=ebiederm@xmission.com;;;spf=neutral X-XM-AID: U2FsdGVkX183l/CAz33Zz9gBpQKxjJY2IJg+QPtHXUo= X-SA-Exim-Connect-IP: 75.170.125.99 X-SA-Exim-Mail-From: ebiederm@xmission.com X-Spam-Report: * -1.0 ALL_TRUSTED Passed through trusted hosts only via SMTP * 0.0 TVD_RCVD_IP Message was received from an IP address * 1.5 TR_Symld_Words too many words that have symbols inside * 0.7 XMSubLong Long Subject * 0.0 T_TM2_M_HEADER_IN_MSG BODY: No description available. * 0.8 BAYES_50 BODY: Bayes spam probability is 40 to 60% * [score: 0.5000] * -0.0 DCC_CHECK_NEGATIVE Not listed in DCC * [sa08 1397; Body=1 Fuz1=1 Fuz2=1] * 0.1 XMSolicitRefs_0 Weightloss drug X-Spam-DCC: XMission; sa08 1397; Body=1 Fuz1=1 Fuz2=1 X-Spam-Combo: **;Cyrill Gorcunov X-Spam-Relay-Country: X-Spam-Timing: total 704 ms - load_scoreonly_sql: 0.05 (0.0%), signal_user_changed: 7 (1.1%), b_tie_ro: 6 (0.9%), parse: 1.21 (0.2%), extract_message_metadata: 19 (2.7%), get_uri_detail_list: 2.6 (0.4%), tests_pri_-1000: 13 (1.9%), tests_pri_-950: 6 (0.8%), tests_pri_-900: 0.87 (0.1%), tests_pri_-400: 70 (9.9%), check_bayes: 69 (9.8%), b_tokenize: 31 (4.4%), b_tok_get_all: 20 (2.8%), b_comp_prob: 2.6 (0.4%), b_tok_touch_all: 4.6 (0.7%), b_finish: 0.70 (0.1%), tests_pri_0: 580 (82.4%), check_dkim_signature: 0.56 (0.1%), check_dkim_adsp: 2.7 (0.4%), tests_pri_500: 3.4 (0.5%), rewrite_mail: 0.00 (0.0%) Subject: [REVIEW][PATCH v2] mm: Add a user_ns owner to mm_struct and fix ptrace permission checks X-Spam-Flag: No X-SA-Exim-Version: 4.2.1 (built Thu, 05 May 2016 13:38:54 -0600) X-SA-Exim-Scanned: Yes (on in01.mta.xmission.com) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 6452 Lines: 198 During exec dumpable is cleared if the file that is being executed is not readable by the user executing the file. A bug in ptrace_may_access allows reading the file if the executable happens to enter into a subordinate user namespace (aka clone(CLONE_NEWUSER), unshare(CLONE_NEWUSER), or setns(fd, CLONE_NEWUSER). This problem is fixed with only necessary userspace breakage by adding a user namespace owner to mm_struct, captured at the time of exec, so it is clear in which user namespace CAP_SYS_PTRACE must be present in to be able to safely give read permission to the executable. The function ptrace_may_access is modified to verify that the ptracer has CAP_SYS_ADMIN in task->mm->user_ns instead of task->cred->user_ns. This ensures that if the task changes it's cred into a subordinate user namespace it does not become ptraceable. The function ptrace_attach is modified to only set PT_PTRACE_CAP when CAP_SYS_PTRACE is held over task->mm->user_ns. The intent of PT_PTRACE_CAP is to be a flag to note that whatever permission changes the task might go through the tracer has sufficient permissions for it not to be an issue. task->cred->user_ns is always the same as or descendent of mm->user_ns. Which guarantees that having CAP_SYS_PTRACE over mm->user_ns is the worst case for the tasks credentials. Cc: stable@vger.kernel.org Fixes: 8409cca70561 ("userns: allow ptrace from non-init user namespaces") Signed-off-by: "Eric W. Biederman" --- Jann or anyone who looked at my last version of this that I thought was ready to go, I have change the test in ptrace_may_access from: if (!mm || ((get_dumpable(mm) != SUID_DUMP_USER) && !ptrace_has_cap(mm->user_ns, mode))) return -EPERM; to: if (mm && ((get_dumpable(mm) != SUID_DUMP_USER) && !ptrace_has_cap(mm->user_ns, mode))) return -EPERM; As enforcing the dumpablity check without an mm caused a regression for Cyrill (he could not read task->exit code of a zomebie even with a full set of caps). Further I have moved the setting of PT_PTRACE_CAP up in ptrace_attach so that I can easily use the mm->user_ns. I can't imagine either of these changes making a practical difference to anyone but I am calling them out in case someone can. include/linux/mm_types.h | 1 + kernel/fork.c | 9 ++++++--- kernel/ptrace.c | 26 +++++++++++--------------- mm/init-mm.c | 2 ++ 4 files changed, 20 insertions(+), 18 deletions(-) diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 4a8acedf4b7d..08d947fc4c59 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -473,6 +473,7 @@ struct mm_struct { */ struct task_struct __rcu *owner; #endif + struct user_namespace *user_ns; /* store ref to file /proc//exe symlink points to */ struct file __rcu *exe_file; diff --git a/kernel/fork.c b/kernel/fork.c index 623259fc794d..fd85c68c2791 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -742,7 +742,8 @@ static void mm_init_owner(struct mm_struct *mm, struct task_struct *p) #endif } -static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p) +static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p, + struct user_namespace *user_ns) { mm->mmap = NULL; mm->mm_rb = RB_ROOT; @@ -782,6 +783,7 @@ static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p) if (init_new_context(p, mm)) goto fail_nocontext; + mm->user_ns = get_user_ns(user_ns); return mm; fail_nocontext: @@ -827,7 +829,7 @@ struct mm_struct *mm_alloc(void) return NULL; memset(mm, 0, sizeof(*mm)); - return mm_init(mm, current); + return mm_init(mm, current, current_user_ns()); } /* @@ -842,6 +844,7 @@ void __mmdrop(struct mm_struct *mm) destroy_context(mm); mmu_notifier_mm_destroy(mm); check_mm(mm); + put_user_ns(mm->user_ns); free_mm(mm); } EXPORT_SYMBOL_GPL(__mmdrop); @@ -1123,7 +1126,7 @@ static struct mm_struct *dup_mm(struct task_struct *tsk) memcpy(mm, oldmm, sizeof(*mm)); - if (!mm_init(mm, tsk)) + if (!mm_init(mm, tsk, mm->user_ns)) goto fail_nomem; err = dup_mmap(mm, oldmm); diff --git a/kernel/ptrace.c b/kernel/ptrace.c index 2a99027312a6..44a25a1e6e83 100644 --- a/kernel/ptrace.c +++ b/kernel/ptrace.c @@ -220,7 +220,7 @@ static int ptrace_has_cap(struct user_namespace *ns, unsigned int mode) static int __ptrace_may_access(struct task_struct *task, unsigned int mode) { const struct cred *cred = current_cred(), *tcred; - int dumpable = 0; + struct mm_struct *mm; kuid_t caller_uid; kgid_t caller_gid; @@ -271,16 +271,11 @@ static int __ptrace_may_access(struct task_struct *task, unsigned int mode) return -EPERM; ok: rcu_read_unlock(); - smp_rmb(); - if (task->mm) - dumpable = get_dumpable(task->mm); - rcu_read_lock(); - if (dumpable != SUID_DUMP_USER && - !ptrace_has_cap(__task_cred(task)->user_ns, mode)) { - rcu_read_unlock(); - return -EPERM; - } - rcu_read_unlock(); + mm = task->mm; + if (mm && + ((get_dumpable(mm) != SUID_DUMP_USER) && + !ptrace_has_cap(mm->user_ns, mode))) + return -EPERM; return security_ptrace_access_check(task, mode); } @@ -331,6 +326,11 @@ static int ptrace_attach(struct task_struct *task, long request, task_lock(task); retval = __ptrace_may_access(task, PTRACE_MODE_ATTACH_REALCREDS); + if (!retval) { + struct mm_struct *mm = task->mm; + if (mm && ns_capable(mm->user_ns, CAP_SYS_PTRACE)) + flags |= PT_PTRACE_CAP; + } task_unlock(task); if (retval) goto unlock_creds; @@ -344,10 +344,6 @@ static int ptrace_attach(struct task_struct *task, long request, if (seize) flags |= PT_SEIZED; - rcu_read_lock(); - if (ns_capable(__task_cred(task)->user_ns, CAP_SYS_PTRACE)) - flags |= PT_PTRACE_CAP; - rcu_read_unlock(); task->ptrace = flags; __ptrace_link(task, current); diff --git a/mm/init-mm.c b/mm/init-mm.c index a56a851908d2..975e49f00f34 100644 --- a/mm/init-mm.c +++ b/mm/init-mm.c @@ -6,6 +6,7 @@ #include #include +#include #include #include @@ -21,5 +22,6 @@ struct mm_struct init_mm = { .mmap_sem = __RWSEM_INITIALIZER(init_mm.mmap_sem), .page_table_lock = __SPIN_LOCK_UNLOCKED(init_mm.page_table_lock), .mmlist = LIST_HEAD_INIT(init_mm.mmlist), + .user_ns = &init_user_ns, INIT_MM_CONTEXT(init_mm) }; -- 2.10.1