Received: by 2002:a25:4158:0:0:0:0:0 with SMTP id o85csp2677484yba; Fri, 10 May 2019 16:39:23 -0700 (PDT) X-Google-Smtp-Source: APXvYqx1G/wYLSaIClsYOQIiMs1FJjSOdQGSj/LALwE7berUavpdWA/oiQHz6zCpf8K1kxSpfZjN X-Received: by 2002:a17:902:5983:: with SMTP id p3mr15720020pli.224.1557531563056; Fri, 10 May 2019 16:39:23 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1557531563; cv=none; d=google.com; s=arc-20160816; b=razBzSIxH1O5W3r+PKdmLv26hBzGSltRFpAeEVCZsxqVentySy5mA4eNOWCzWv7bmJ 4r6TInHRFI5p9pCiE4l0M3iQSTWDNVkB6mHPUDNLn694zAkBmSxc+8WUDNJaKhYe5df2 V2hmW7SV5m76Y1+0BIpu0p1EhHjdvUog4wtd4VsaIfXN7oYxoTWVXt3kYLX/RebWbua6 cKir77G/q/jtcfLtq4lYTEGawRlzW1mBEjYBT70f9ZhZUG5ZcmmFDtITe+jmI995K76u cRtnqADp0i67CX9eG0V8ERzbfo25brz96BBHbGVdEx18QbXtLLVWTlvKaNyEpfojiidM mTCg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=UCHjfY8ffpbEmMSgw3athUzkrjNiBk31tuUuOHfkH0E=; b=BBh7h2effj2rDehPVQ5hYbtz40qqw8W3LRypxhJEGPAWpcN3GRUBNo2EnjvzmF7VBj T8HGdsn2amEWVvZpnktfDW8bWWB+re5ojNZuMBYgrfLNNutIm4eXgcCwFIA2vZzAPndz h6zjBjsWOUhlN/cnMoYz5EpEabPyuP5/u7zCQmk057ZpyNsTkrCvBHd0JYDaHyLPQV02 VL/CHrPCY7/CGrMRHfePqV5bVtdE0j6u+WXsOtlMld4frRS80mB42BYaujfb2Mv+nfq3 gXuQs1ZtrWCboUPU5+7FzJ/GcUuay+P1kt3TlTH4B9Ev6oTwKouFWbLTozvahyUuSNQS nqlw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@brauner.io header.s=google header.b=SObLv4Av; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id q4si9595825pgc.108.2019.05.10.16.39.06; Fri, 10 May 2019 16:39:23 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@brauner.io header.s=google header.b=SObLv4Av; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728076AbfEJXgw (ORCPT + 99 others); Fri, 10 May 2019 19:36:52 -0400 Received: from mail-lj1-f194.google.com ([209.85.208.194]:35604 "EHLO mail-lj1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727957AbfEJXgv (ORCPT ); Fri, 10 May 2019 19:36:51 -0400 Received: by mail-lj1-f194.google.com with SMTP id m20so6441485lji.2 for ; Fri, 10 May 2019 16:36:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=brauner.io; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=UCHjfY8ffpbEmMSgw3athUzkrjNiBk31tuUuOHfkH0E=; b=SObLv4AvXYIrUFwvgxqqJzOJ2V92LqChl1+4qkDW9oRt6uYTBuqgGeYXehc6lunlS0 WOi20UjpfhFKIt7GAejYR6JkXn+du2032xYUCPTsw8KIplwp0QIOYf+wmiFdJdsVSvnm tY1BdLVmpbwMUdhJNF7tzn3SCKxPGKz6HBG+k8RDtDR3BEMTaYF6Gn+s+ocewQgwD21/ J7UK8FTHZnS0+VyOasXkg60jp1yVFsV9SWY7l2DwZTm3YenVwEIGPGRuAb19M6HgLxf6 SOU2iF9zUmYQCurzaildkMAe0vsxwPYHp9AjmpAxnIegeVv5BkiS0k1FQ3gZGq5hlntK dLbg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=UCHjfY8ffpbEmMSgw3athUzkrjNiBk31tuUuOHfkH0E=; b=jvpNXmvOb1s9C5BRmFA7HExL/2OYg+MIJcoQZswEzYCvzIXA0ClMUU2PnXtCFYkALM jWh6FAdnT4IfsDTZavP+x6Ioz5sdzYYyzsCtn42478vBxj7+CEudykTUUEc6b8vTjQR1 0ZbklvfsTYEo5JiPu1BCauv1TYaJrJ3jY039Enlg2qcszjBgJpp3dmtQcmzldACXXFcH lpad1kIC+WtA57hHd2uu5JNLY6U98nVDoDzjcB24u4SUSCdBYExy32IULh+LEbcC8Ktp cM4dlPARpvkeltrOTnwWKAUlKHxzEKgz/w2wGfCvtXFRWlqPkxw1gWNhQwPgy0jcQMSz yCGQ== X-Gm-Message-State: APjAAAW7SXoruj02ocWC8vD7Cdhck0WvJDRkINGk6D+LBurKgTZU6OT5 SsDiO9TJjy/HnGO3vHqHsQYGznTJ5zrSlRHTmyj9Pw== X-Received: by 2002:a2e:1293:: with SMTP id 19mr7423135ljs.120.1557531408970; Fri, 10 May 2019 16:36:48 -0700 (PDT) MIME-Version: 1.0 References: <20190506165439.9155-1-cyphar@cyphar.com> <20190506165439.9155-6-cyphar@cyphar.com> <20190506191735.nmzf7kwfh7b6e2tf@yavin> <20190510204141.GB253532@google.com> <20190510225527.GA59914@google.com> In-Reply-To: <20190510225527.GA59914@google.com> From: Christian Brauner Date: Sat, 11 May 2019 01:36:37 +0200 Message-ID: Subject: Re: [PATCH v6 5/6] binfmt_*: scope path resolution of interpreters To: Jann Horn Cc: Andy Lutomirski , Aleksa Sarai , Al Viro , Jeff Layton , "J. Bruce Fields" , Arnd Bergmann , David Howells , Eric Biederman , Andrew Morton , Alexei Starovoitov , Kees Cook , Tycho Andersen , David Drysdale , Chanho Min , Oleg Nesterov , Aleksa Sarai , Linus Torvalds , Linux Containers , linux-fsdevel , Linux API , kernel list , linux-arch Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sat, May 11, 2019 at 12:55 AM Jann Horn wrote: > > On Fri, May 10, 2019 at 02:20:23PM -0700, Andy Lutomirski wrote: > > On Fri, May 10, 2019 at 1:41 PM Jann Horn wrote: > > > > > > On Tue, May 07, 2019 at 05:17:35AM +1000, Aleksa Sarai wrote: > > > > On 2019-05-06, Jann Horn wrote: > > > > > In my opinion, CVE-2019-5736 points out two different problems: > > > > > > > > > > The big problem: The __ptrace_may_access() logic has a special-case > > > > > short-circuit for "introspection" that you can't opt out of; this > > > > > makes it possible to open things in procfs that are related to the > > > > > current process even if the credentials of the process wouldn't permit > > > > > accessing another process like it. I think the proper fix to deal with > > > > > this would be to add a prctl() flag for "set whether introspection is > > > > > allowed for this process", and if userspace has manually un-set that > > > > > flag, any introspection special-case logic would be skipped. > > > > > > > > We could do PR_SET_DUMPABLE=3 for this, I guess? > > > > > > Hmm... I'd make it a new prctl() command, since introspection is > > > somewhat orthogonal to dumpability. Also, dumpability is per-mm, and I > > > think the introspection flag should be per-thread. > > > > I've lost track of the context here, but it seems to me that > > mitigating attacks involving accidental following of /proc links > > shouldn't depend on dumpability. What's the actual problem this is > > trying to solve again? > > The one actual security problem that I've seen related to this is > CVE-2019-5736. There is a write-up of it at > > under "Successful approach", but it goes more or less as follows: > > A container is running that doesn't use user namespaces (because for > some reason I don't understand, apparently some people still do that). > An evil process is running inside the container with UID 0 (as in, > GLOBAL_ROOT_UID); so if the evil process inside the container was able > to reach root-owned files on the host filesystem, it could write into > them. > > The container engine wants to spawn a new process inside the container. > It forks off a child that joins the container's namespaces (including > PID and mount namespaces), and then the child calls execve() on some > path in the container. > The attacker replaces the executable in the container with a symlink > to /proc/self/exe and replaces a library inside the container with a > malicious one. > When the container engine calls execve(), intending to run an executable > inside the container, it instead goes through ptrace_may_access() using > the introspection short-circuit and re-executes its own executable > through the jumped symlink /proc/self/exe (which is normally unreachable > for the container). After the execve(), the process loads an evil > library from inside the container and is under the control of the > container. > Now the container controls a process whose /proc/self/exe is a jumped > symlink to a host executable, and the container can write into it. > > Some container engines are now using an extremely ugly hack to work > around this - whenever they want to enter a container, they copy the > host binary into a new memfd and execute that to avoid exposing the > original host binary to containers: > > > > In my opinion, the problems here are: > > - Apparently some people run untrusted containers without user > namespaces. It would be really nice if people could not do that. > (Probably the biggest problem here.) I know I sound like a broken record since I've been going on about this forever together with a lot of other people but honestly, the fact that people are running untrusted workloads in privileged containers is the real issue here. Aleksa is a good friend of mine and we have discussed this a lot so I hope he doesn't hate me for saying this again: it is crazy that there are container runtimes out there that promise (or at least do not state the opposite) containers without user namespaces or containers with user namespaces that allow to map the host root id to anything can be safe. They cannot. Even if this /proc/*/exe thing is somehow blocked there are other ways of escaping from a privileged container. We (i.e. LXC) literally do not accept CVEs for privileged containers because we do not consider them safe by design. It seems to me to be heading in the wrong direction to keep up the illusion that with enough effort we can make this all nice and safe. Yes, the userspace memfd hack we came up with is as ugly as a security patch can be but if you make promises you can't keep you better be prepared to pay the price when things start to fall apart. So if this part of the patch is just needed to handle this do we really want to do all that tricky work or is there more to gain from this that makes it worth it. Christian