Received: by 2002:a25:4158:0:0:0:0:0 with SMTP id o85csp3423312yba; Sat, 11 May 2019 10:17:40 -0700 (PDT) X-Google-Smtp-Source: APXvYqy5hRaLMrcfpsBt+yrXD8E2TBBgXZgT/BYVjlZWoyx97YpHojAs4yH6n6lnDxpnNznMfkJF X-Received: by 2002:aa7:9116:: with SMTP id 22mr23090773pfh.165.1557595060458; Sat, 11 May 2019 10:17:40 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1557595060; cv=none; d=google.com; s=arc-20160816; b=cNR1BShjz0dkKi7fTqCDoT0+VsMznjbBRiX/7+MeW0nzCTTQ4o2ofFV8sYbDUxzfEo D3L83qPaS2id+NDcRWJEz13tF8vKmqU2QzjJ0+RoKK2qfY688dY4Kv9o7ZXuHbpZ37CJ YJq2m2AbwTSiZSN3ahVCJE2xAfMZADWSgpuAaKV3DOzXKdJWOEwnqi+yty5OfXYyN430 mMscfw1/scUhQ0hsm4fHtqVxOAHsN5v2pU4/9fZ9HdGNR8iTg840fhAQVddKWMCrJOYs 1R8Lu2ZLgbn0dm4kl4Q7KOoDEPKaytlfIGm535ippWQOiF6WVxWxShYUVJeVaJh5B3of NqUw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:to:references:message-id :content-transfer-encoding:cc:date:in-reply-to:from:subject :mime-version:dkim-signature; bh=jZwCSftxJ8jGLP4AOxqI82+D69OlL58PjFg7M2MhcPg=; b=oZPHahHAqeRKSOnUcn4DEAiXHZo96S+yfOv/UyPLrW4mD9Cws1pH11DhBbAi7OaXqT snyE+vr4p1gTyFFYVisIVRmXZtzgsqc1y93fkb25auZ7R2gbiMG4fpM9DMCcbW4D/iSj ZQMe4d1x3BCNGR+kKHMprs9Cec3NMjaBpfihlEPbqnwVqK2E1dpMP51Lyr6CqKTMrHam ODotPGCsfMMIrsA8VtkDyq5Ga+ALF+c7mynTD7jZnskjsL+MTvxK5c5EaERZ0SOE8uYq qG6LVLG/G3+wVA2aZFdLkiwtb6cyUIpuGbfTOYf8Txb4WwSkVe13N0jHqfE4hOGaXGbp UrYA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@amacapital-net.20150623.gappssmtp.com header.s=20150623 header.b=MLbHwTI1; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id c2si12410639pgd.415.2019.05.11.10.17.22; Sat, 11 May 2019 10:17:40 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@amacapital-net.20150623.gappssmtp.com header.s=20150623 header.b=MLbHwTI1; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727257AbfEKRAx (ORCPT + 99 others); Sat, 11 May 2019 13:00:53 -0400 Received: from mail-pg1-f194.google.com ([209.85.215.194]:38699 "EHLO mail-pg1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726272AbfEKRAx (ORCPT ); Sat, 11 May 2019 13:00:53 -0400 Received: by mail-pg1-f194.google.com with SMTP id j26so4555420pgl.5 for ; Sat, 11 May 2019 10:00:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amacapital-net.20150623.gappssmtp.com; s=20150623; h=mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=jZwCSftxJ8jGLP4AOxqI82+D69OlL58PjFg7M2MhcPg=; b=MLbHwTI1YOsIPzVKr77SniYja0eFxYm6pUzvO6ZOGv6xtO1LSTe7DrZ5OnxOMxLOYj BCJK/KQHgD9Bo2Cms6d/suLf3ujHpz6Q6wSX6BTSEPOYmdc+uobvATnhQ1gxsSFMNEFx 15ie5t1upui2XLIrMplBHkFq8NaEnsPGXE1WOVW7Xy6h/wCR8Xugr4NIa+GwPer8chMo 24aAsFtZ/ye7n+0/mLGDbAkLYugePfX9RRINUb+ZhyTIzmMwtXZjNqTpwSvJNn+RpVIF v90ZAgJyGlmQwdpMW+y/xvzXjE00AVz/EqPo1hI7s4oZDxTp8MxYoaFX19LOXZflO+As HRvw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=jZwCSftxJ8jGLP4AOxqI82+D69OlL58PjFg7M2MhcPg=; b=VIYXCCnGDDUZN3/H5vdzpjJRkHswNq0HxEB5EwK4TjCtxFAGcGy1Q2GpOoH0wQFHDd WcsT8FkY/t6Dtlkamk2pN3bgJwbrceaYxXnmm2f5x14Xj9N8dSDZhHXzTZOpx7TsSEIH rwyKMXJnfgzC4bYc1Cv7stMn5nXDt6MjH+3H3EKw6COS7hJ02RgCC74RdjxoZFeefU3q yZEroZZlRULjAPcWCGzMSo8T8ufjqY7zwQyCy+qQQXk1agFtzxVz10+kU1TmaiKLnpS8 jyp3VF5WvhDsZy5LqtfjwKZAynSGzSTaZq1HpSFnanwJC0QM17XHWBW1QL560E9qNrjg W1WA== X-Gm-Message-State: APjAAAVQTtkzfAwFUV05TnapecBcbOyM9yp0E8A39ENpFiyuY+SMQiGF 5Uw9Wpf/vdaa4TZyFNEMzYEusg== X-Received: by 2002:a63:d816:: with SMTP id b22mr21540940pgh.16.1557594051790; Sat, 11 May 2019 10:00:51 -0700 (PDT) Received: from ?IPv6:2600:1010:b006:1d0d:7d97:e542:5c4a:fdf6? ([2600:1010:b006:1d0d:7d97:e542:5c4a:fdf6]) by smtp.gmail.com with ESMTPSA id a3sm9014995pgl.74.2019.05.11.10.00.49 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sat, 11 May 2019 10:00:49 -0700 (PDT) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (1.0) Subject: Re: [PATCH v6 5/6] binfmt_*: scope path resolution of interpreters From: Andy Lutomirski X-Mailer: iPhone Mail (16E227) In-Reply-To: <20190510225527.GA59914@google.com> Date: Sat, 11 May 2019 10:00:47 -0700 Cc: Andy Lutomirski , Aleksa Sarai , Al Viro , Jeff Layton , "J. Bruce Fields" , Arnd Bergmann , David Howells , Eric Biederman , Andrew Morton , Alexei Starovoitov , Kees Cook , Christian Brauner , Tycho Andersen , David Drysdale , Chanho Min , Oleg Nesterov , Aleksa Sarai , Linus Torvalds , Linux Containers , linux-fsdevel , Linux API , kernel list , linux-arch Content-Transfer-Encoding: quoted-printable Message-Id: References: <20190506165439.9155-1-cyphar@cyphar.com> <20190506165439.9155-6-cyphar@cyphar.com> <20190506191735.nmzf7kwfh7b6e2tf@yavin> <20190510204141.GB253532@google.com> <20190510225527.GA59914@google.com> To: Jann Horn Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > On May 10, 2019, at 3:55 PM, Jann Horn wrote: >=20 >> On Fri, May 10, 2019 at 02:20:23PM -0700, Andy Lutomirski wrote: >>> On Fri, May 10, 2019 at 1:41 PM Jann Horn wrote: >>>=20 >>>> On Tue, May 07, 2019 at 05:17:35AM +1000, Aleksa Sarai wrote: >>>>> On 2019-05-06, Jann Horn wrote: >>>>> In my opinion, CVE-2019-5736 points out two different problems: >>>>>=20 >>>>> The big problem: The __ptrace_may_access() logic has a special-case >>>>> short-circuit for "introspection" that you can't opt out of; this >>>>> makes it possible to open things in procfs that are related to the >>>>> current process even if the credentials of the process wouldn't permit= >>>>> accessing another process like it. I think the proper fix to deal with= >>>>> this would be to add a prctl() flag for "set whether introspection is >>>>> allowed for this process", and if userspace has manually un-set that >>>>> flag, any introspection special-case logic would be skipped. >>>>=20 >>>> We could do PR_SET_DUMPABLE=3D3 for this, I guess? >>>=20 >>> Hmm... I'd make it a new prctl() command, since introspection is >>> somewhat orthogonal to dumpability. Also, dumpability is per-mm, and I >>> think the introspection flag should be per-thread. >>=20 >> I've lost track of the context here, but it seems to me that >> mitigating attacks involving accidental following of /proc links >> shouldn't depend on dumpability. What's the actual problem this is >> trying to solve again? >=20 > The one actual security problem that I've seen related to this is > CVE-2019-5736. There is a write-up of it at > > under "Successful approach", but it goes more or less as follows: >=20 > A container is running that doesn't use user namespaces (because for > some reason I don't understand, apparently some people still do that). > An evil process is running inside the container with UID 0 (as in, > GLOBAL_ROOT_UID); so if the evil process inside the container was able > to reach root-owned files on the host filesystem, it could write into > them. >=20 > The container engine wants to spawn a new process inside the container. > It forks off a child that joins the container's namespaces (including > PID and mount namespaces), and then the child calls execve() on some > path in the container. I think that, at this point, the task should be considered owned by the cont= ainer. Maybe we should have a better API than execve() to execute a program= in a safer way, but fiddling with dumpability seems like a band-aid. In fa= ct, the process is arguably pwned even *before* execve. A better =E2=80=9Cspawn=E2=80=9D API should fix this. In the mean time, I t= hink it should be assumed that, if you join a container=E2=80=99s namespaces= , you are at its mercy. > The attacker replaces the executable in the container with a symlink > to /proc/self/exe and replaces a library inside the container with a > malicious one. Cute. > When the container engine calls execve(), intending to run an executable > inside the container, it instead goes through ptrace_may_access() using > the introspection short-circuit and re-executes its own executable > through the jumped symlink /proc/self/exe (which is normally unreachable > for the container). After the execve(), the process loads an evil > library from inside the container and is under the control of the > container. > Now the container controls a process whose /proc/self/exe is a jumped > symlink to a host executable, and the container can write into it. >=20 > Some container engines are now using an extremely ugly hack to work > around this - whenever they want to enter a container, they copy the > host binary into a new memfd and execute that to avoid exposing the > original host binary to containers: > >=20 >=20 > In my opinion, the problems here are: >=20 > - Apparently some people run untrusted containers without user > namespaces. It would be really nice if people could not do that. > (Probably the biggest problem here.) > - ptrace_may_access() has a short-circuit that permits a process to > unintentionally look at itself even if it has dropped privileges - > here, it permits the execve("/proc/self/exe", ...) that would > normally be blocked by the check for CAP_SYS_PTRACE if the process > is nondumpable. I don=E2=80=99t see this as a problem. Dumpable is about protecting a task f= rom others, not about protecting a task against itself. > - You can use /proc/*/exe to get a writable fd. This is IMO the real bug.=