Received: by 2002:ac0:a582:0:0:0:0:0 with SMTP id m2-v6csp1885304imm; Thu, 11 Oct 2018 01:10:48 -0700 (PDT) X-Google-Smtp-Source: ACcGV60wSHNnPFvYOqdVCiJa+AmeQP4kEFtDIxdn76bKDzRbw0tbzeZL7a1k7EnjGihDUX56gUay X-Received: by 2002:a63:f80a:: with SMTP id n10-v6mr514650pgh.57.1539245448234; Thu, 11 Oct 2018 01:10:48 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1539245448; cv=none; d=google.com; s=arc-20160816; b=RvEXVCvU9z+FSOox/TPDTuCOc4wm1SmGiPnULylzpu5nNeCOvCevQAwEwtrE1eIldB DKLmXniRuli5nyT9mvo3z8QBoMyZm9Ls560WkDudcXVbJGHWPD2KqypGy0GfI2Kf9seE avOH2nEYcTOf3nYNE/3pIR5z/mNHXCn7d6FaucVeN0ru9yhcv+9gsGSoralS79cpGMDr hop0JPdHYpLcTr2wUxUtwkkA3J/xbqnN4zbhxAGnc9XDXIfQPyUKx72HHI1ym8LV9KZf 145fkqa10M4oYR/suT6MhTadklev8BjmNgLz6lwsxVgQABNF9g9OH4j20O6jm01imJu3 ypxA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :subject:user-agent:references:in-reply-to:message-id:date:cc:to :from:dkim-signature; bh=O0/z/U2Trlu01OLg2n4mHVQp0lUlbLhSGLB6UOuQPeo=; b=PTYia3W+e/2yLFFXBIc3Ktwg5Ug2E+KEcbEFh1gYMmN47XF/uhjZp2mcX5k0hTNw5G G9V8THKzcL3vX/fQxh84V0J8Z+vMtUjeVhUSYTXZvrKnQxTysEVIzZwF/E2K7WplPmeT JDJ8KZT36OHRbzt9ZNokSPGgSZpT3nEJpPfv1oifJudWXs8/zDxOHWUsohjk7CFHMuUf lNRLjO4du0qHbpzhX7QXNEAT+p3p7VTcoAnrJikRlsOB6HrTJnBgfxIzuWPJJ4/6APwW GfFYtr7FjXgE2OWMNz9PsCq+LJncigUyh8z7DzOSCVhJ+DIk21AHtABUCj0nsCRFTIgx UMiw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@paul-moore-com.20150623.gappssmtp.com header.s=20150623 header.b=c+JDwpdb; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id g7-v6si28236479plb.167.2018.10.11.01.10.33; Thu, 11 Oct 2018 01:10:48 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@paul-moore-com.20150623.gappssmtp.com header.s=20150623 header.b=c+JDwpdb; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727452AbeJKOui (ORCPT + 99 others); Thu, 11 Oct 2018 10:50:38 -0400 Received: from mail-it1-f193.google.com ([209.85.166.193]:52656 "EHLO mail-it1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726008AbeJKOui (ORCPT ); Thu, 11 Oct 2018 10:50:38 -0400 Received: by mail-it1-f193.google.com with SMTP id 134-v6so12018914itz.2 for ; Thu, 11 Oct 2018 00:24:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=paul-moore-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:date:message-id:in-reply-to:references:user-agent :subject:mime-version:content-transfer-encoding; bh=O0/z/U2Trlu01OLg2n4mHVQp0lUlbLhSGLB6UOuQPeo=; b=c+JDwpdbqM5v1hriepC0LM9H5C3dpYhL0D01PNSd6gABhBEH6aGi+gYzU9j5xhX7e6 IyL/uKWtP/CsHaMXDLTp7i2+h1/SGwGnEXQ9mmqcy5CCruCj4P5RfSk12Yhf22DSP8yY SXrZjQlHA5UhNGngfi8vHuPrx2LoNRjCMDy6Z3Up/4XJtV4WK7VvGQWB1Fy1czHFrrEz xZ0BfOT7KW0nJGdaQgMj1vRiRCAXI1cyaN5RMphPqO2RNr0xuwT5DerdOzF2FKBrYDJS O1wDSiK9jiDGrU2vxecznQCT/NAg48JckH2KSD+PbLjHdRXUDJPzYe5LEBTnYoU13JPc dnlw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:date:message-id:in-reply-to :references:user-agent:subject:mime-version :content-transfer-encoding; bh=O0/z/U2Trlu01OLg2n4mHVQp0lUlbLhSGLB6UOuQPeo=; b=hieF6rNtciyZA0czzijy+nkIIeIh0eLTHNUwT2y1BeCXCLKfMjBRvljYuDtTyNfFso xsmwMchRLncT+HhS09vJnKJerx8PPS4sVgZ9RjYQGRhR+UHKTG3VH5w1vk/vXvudyBCV FTiBoqFymUWFYlgJugVHl09CVKP3KUTuBuyo+Ehvk1NK/FR5BGFLWvudGHF8QoMhI9BJ tXbCQtHIylkJxUSkIWTLr3R3pw64us4jjwt4b10IugrAaPwl2RotNnUpLaJbNLw5QXE6 y9V2KHQTPOnCSyeHqGidKs7zlpnOj6HD5QCPKuf3nyiVANwN8FYV21RVrQafEaZQGu6I 54oA== X-Gm-Message-State: ABuFfohH1cDqMAegTXgx26A75IJEozIcsr1t9JmgrbV80JiGvtaZjxFu mcWKIGTrATixn34Bd8BDZb++ X-Received: by 2002:a05:660c:88d:: with SMTP id o13mr584228itk.34.1539242679026; Thu, 11 Oct 2018 00:24:39 -0700 (PDT) Received: from [10.20.3.205] ([12.226.92.2]) by smtp.gmail.com with ESMTPSA id g73-v6sm8519940itg.37.2018.10.11.00.24.36 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Thu, 11 Oct 2018 00:24:37 -0700 (PDT) From: Paul Moore To: Jann Horn CC: , Tycho Andersen , Kees Cook , Linux API , , , Oleg Nesterov , kernel list , "Eric W. Biederman" , , Christian Brauner , Andy Lutomirski , "linux-security-module" , , Stephen Smalley , Eric Paris Date: Thu, 11 Oct 2018 03:24:34 -0400 Message-ID: <16662034750.2781.85c95baa4474aabc7814e68940a78392@paul-moore.com> In-Reply-To: References: <20180927151119.9989-1-tycho@tycho.ws> <20180927151119.9989-4-tycho@tycho.ws> <20181008151629.hkgzzsluevwtuclw@brauner.io> <20181008162147.ubfxxsv2425l2zsp@brauner.io> <20181008181815.pwnqxngj22mhm2vj@brauner.io> <20181009132850.fp6yne2vgmfpi27k@brauner.io> User-Agent: AquaMail/1.16.1-1284 (build: 101600100) Subject: Re: [PATCH v7 3/6] seccomp: add a way to get a listener fd from ptrace MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On October 10, 2018 11:34:11 AM Jann Horn wrote: > On Wed, Oct 10, 2018 at 5:32 PM Paul Moore wrote: >> On Tue, Oct 9, 2018 at 9:36 AM Jann Horn wrote: >>> +cc selinux people explicitly, since they probably have opinions on thi= s >> >> I just spent about twenty minutes working my way through this thread, >> and digging through the containers archive trying to get a good >> understanding of what you guys are trying to do, and I'm not quite >> sure I understand it all. However, from what I have seen, this >> approach looks very ptrace-y to me (I imagine to others as well based >> on the comments) and because of this I think ensuring the usual ptrace >> access controls are evaluated, including the ptrace LSM hooks, is the >> right thing to do. > > Basically the problem is that this new ptrace() API does something > that doesn't just influence the target task, but also every other task > that has the same seccomp filter. So the classic ptrace check doesn't > work here. Due to some rather unfortunate events today I'm suddenly without easy acces= s to the kernel code, but would it be possible to run the LSM ptrace access= control checks against all of the affected tasks? If it is possible, how = painful would it be? > >> If I've missed something, or I'm thinking about this wrong, please >> educate me; just a heads-up that I'm largely offline for most of this >> week so responses on my end are going to be delayed much more than >> usual. >> >>> On Tue, Oct 9, 2018 at 3:29 PM Christian Brauner = wrote: >>>> On Tue, Oct 09, 2018 at 02:39:53PM +0200, Jann Horn wrote: >>>>> On Mon, Oct 8, 2018 at 8:18 PM Christian Brauner wrote: >>>>>> On Mon, Oct 08, 2018 at 06:42:00PM +0200, Jann Horn wrote: >>>>>>> On Mon, Oct 8, 2018 at 6:21 PM Christian Brauner wrote: >>>>>>> > On Mon, Oct 08, 2018 at 05:33:22PM +0200, Jann Horn wrote: >>>>>>> > > On Mon, Oct 8, 2018 at 5:16 PM Christian Brauner wrote: >>>>>>> > > > On Thu, Sep 27, 2018 at 09:11:16AM -0600, Tycho Andersen wrot= e: >>>>>>> > > > > diff --git a/kernel/seccomp.c b/kernel/seccomp.c >>>>>>> > > > > index 44a31ac8373a..17685803a2af 100644 >>>>>>> > > > > --- a/kernel/seccomp.c >>>>>>> > > > > +++ b/kernel/seccomp.c >>>>>>> > > > > @@ -1777,4 +1777,35 @@ static struct file *init_listener(st= ruct task_struct *task, >>>>>>> > > > > >>>>>>> > > > > return ret; >>>>>>> > > > > } >>>>>>> > > > > + >>>>>>> > > > > +long seccomp_new_listener(struct task_struct *task, >>>>>>> > > > > + unsigned long filter_off) >>>>>>> > > > > +{ >>>>>>> > > > > + struct seccomp_filter *filter; >>>>>>> > > > > + struct file *listener; >>>>>>> > > > > + int fd; >>>>>>> > > > > + >>>>>>> > > > > + if (!capable(CAP_SYS_ADMIN)) >>>>>>> > > > > + return -EACCES; >>>>>>> > > > >>>>>>> > > > I know this might have been discussed a while back but why ex= actly do we >>>>>>> > > > require CAP_SYS_ADMIN in init_userns and not in the target us= erns? What >>>>>>> > > > if I want to do a setns()fd, CLONE_NEWUSER) to the target pro= cess and >>>>>>> > > > use ptrace from in there? >>>>>>> > > >>>>>>> > > See https://lore.kernel.org/lkml/CAG48ez3R+ZJ1vwGkDfGzKX2mz6f= =3DjjJWsO5pCvnH68P+RKO8Ow@mail.gmail.com/ >>>>>>> > > . Basically, the problem is that this doesn't just give you cap= ability >>>>>>> > > over the target task, but also over every other task that has t= he same >>>>>>> > > filter installed; you need some sort of "is the caller capable = over >>>>>>> > > the filter and anyone who uses it" check. >>>>>>> > >>>>>>> > Thanks. >>>>>>> > But then this new ptrace feature as it stands is imho currently b= roken. >>>>>>> > If you can install a seccomp filter with SECCOMP_RET_USER_NOTIF i= f you >>>>>>> > are ns_cpabable(CAP_SYS_ADMIN) and also get an fd via seccomp() i= tself >>>>>>> > if you are ns_cpabable(CAP_SYS_ADMIN) >>>>> >>>>> Actually, you don't need CAP_SYS_ADMIN for seccomp() at all as long a= s >>>>> you enable the NNP flag, I think? >>>> >>>> Yes, if you turn on NNP you don't even need sys_admin. >>>> >>>>> >>>>>>> > then either the new ptrace() api >>>>>>> > extension should be fixed to allow for this too or the seccomp() = way of >>>>>>> > retrieving the pid - which I really think we want - needs to be f= ixed to >>>>>>> > require capable(CAP_SYS_ADMIN) too. >>>>>>> > The solution where both require ns_capable(CAP_SYS_ADMIN) is - im= ho - >>>>>>> > the preferred way to solve this. >>>>>>> > Everything else will just be confusing. >>>>>>> >>>>>>> First you say "broken", then you say "confusing". Which one do you = mean? >>>>>> >>>>>> Both. It's broken in so far as it places a seemingly unnecessary >>>>>> restriction that could be fixed. You outlined one possible fix yours= elf >>>>>> in the link you provided. >>>>> >>>>> If by "possible fix" you mean "check whether the seccomp filter is >>>>> only attached to a single task": That wouldn't fundamentally change >>>>> the situation, it would only add an additional special case. >>>>> >>>>>> And it's confusing in so far as there is a way >>>>>> via seccomp() to get the fd without said requirement. >>>>> >>>>> I don't find it confusing at all. seccomp() and ptrace() are very >>>> >>>> Fine, then that's a matter of opinion. I find it counterintuitive that >>>> you can get an fd without privileges via one interface but not via >>>> another. >>>> >>>>> different situations: When you use seccomp(), infrastructure is >>>> >>>> Sure. Note, that this is _one_ of the reasons why I want to make sure = we >>>> keep the native seccomp() only based way of getting an fd without >>>> forcing userspace to switching to a differnet kernel api. >>>> >>>>> already in place for ensuring that your filter is only applied to >>>>> processes over which you are capable, and propagation is limited by >>>>> inheritance from your task down. When you use ptrace(), you need a >>>>> pretty different sort of access check that checks whether you're >>>>> privileged over ancestors, siblings and so on of the target task. >>>> >>>> So, don't get me wrong I'm not arguing against the ptrace() interface = in >>>> general. If this is something that people find useful, fine. But, I >>>> would like to have a simple single-syscall pure-seccomp() based way of >>>> getting an fd, i.e. what we have in patch 1 of this series. >>> >>> Yeah, I also prefer the seccomp() one. >>> >>>>> But thinking about it more, I think that CAP_SYS_ADMIN over the saved >>>>> current->mm->user_ns of the task that installed the filter (stored as >>>>> a "struct user_namespace *" in the filter) should be acceptable. >>>> >>>> Hm... Why not CAP_SYS_PTRACE? >>> >>> Because LSMs like SELinux add extra checks that apply even if you have >>> CAP_SYS_PTRACE, and this would subvert those. The only capability I >>> know of that lets you bypass LSM checks by design (if no LSM blocks >>> the capability itself) is CAP_SYS_ADMIN. >>> >>>> One more thing. Citing from [1] >>>> >>>>> I think there's a security problem here. Imagine the following scenar= io: >>>>> >>>>> 1. task A (uid=3D=3D0) sets up a seccomp filter that uses SECCOMP_RET= _USER_NOTIF >>>>> 2. task A forks off a child B >>>>> 3. task B uses setuid(1) to drop its privileges >>>>> 4. task B becomes dumpable again, either via prctl(PR_SET_DUMPABLE, 1= ) >>>>> or via execve() >>>>> 5. task C (the attacker, uid=3D=3D1) attaches to task B via ptrace >>>>> 6. task C uses PTRACE_SECCOMP_NEW_LISTENER on task B >>>> >>>> Sorry, to be late to the party but would this really pass >>>> __ptrace_may_access() in ptrace_attach()? It doesn't seem obvious to m= e >>>> that it would... Doesn't look like it would get past: >>>> >>>> tcred =3D __task_cred(task); >>>> if (uid_eq(caller_uid, tcred->euid) && >>>> uid_eq(caller_uid, tcred->suid) && >>>> uid_eq(caller_uid, tcred->uid) && >>>> gid_eq(caller_gid, tcred->egid) && >>>> gid_eq(caller_gid, tcred->sgid) && >>>> gid_eq(caller_gid, tcred->gid)) >>>> goto ok; >>>> if (ptrace_has_cap(tcred->user_ns, mode)) >>>> goto ok; >>>> rcu_read_unlock(); >>>> return -EPERM; >>>> ok: >>>> rcu_read_unlock(); >>>> mm =3D task->mm; >>>> if (mm && >>>> ((get_dumpable(mm) !=3D SUID_DUMP_USER) && >>>> !ptrace_has_cap(mm->user_ns, mode))) >>>> return -EPERM; >>> >>> Which specific check would prevent task C from attaching to task B? If >>> the UIDs match, the first "goto ok" executes; and you're dumpable, so >>> you don't trigger the second "return -EPERM". >>> >>>>> 7. because the seccomp filter is shared by task A and task B, task C >>>>> is now able to influence syscall results for syscalls performed by >>>>> task A >>>> >>>> [1]: https://lore.kernel.org/lkml/CAG48ez3R+ZJ1vwGkDfGzKX2mz6f=3DjjJWs= O5pCvnH68P+RKO8Ow@mail.gmail.com/ >> >> >> >> -- >> paul moore >> www.paul-moore.com