Received: by 2002:a25:4158:0:0:0:0:0 with SMTP id o85csp2648510yba; Fri, 10 May 2019 15:58:15 -0700 (PDT) X-Google-Smtp-Source: APXvYqylrEr03GoF73DotSiR8XfCgyzzZADYNUEVUiVW/TcLCAAaBGto0FkNxfvgf5jgzUbOgsOr X-Received: by 2002:a62:582:: with SMTP id 124mr17661474pff.209.1557529095435; Fri, 10 May 2019 15:58:15 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1557529095; cv=none; d=google.com; s=arc-20160816; b=eeXzZhCDdKybthmTXUoFDNHTyISJvlqwn8+ACw8ZPWQl9Uk0MXqbXlwDDYxohAgAhw bJnmnN0sltPBsizYGwiLP/OLnghk7ufMov3P7DKytlReoL1X4aBCIMok5DrcE77HrwHn KViJgppsWE9J74B54xolvD6CFHGFgYn05c2kNHbbFDW8oZEyIQwccBlvxGbNF+sqfK4S 6+ey1ZM1KFZfHrJEqynXG2w+PCnbkLwu3qDd3bMGRoEG2WgjQKp2Nc2maJJK2CWo6lD5 CuWrBAY1J4Nshzr1FmwWG+oWivpmrs6K7CvAzVDiI1Jw5qRHanJo5qBTu7OB2k4IY36H Khpw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=WVJrycbr/gzeELcTlNLGESPEZue/RmnlQ2sNJEpDNsY=; b=F1eSLVhYDqcZDc5Ux1nFoloxx0jUb+c2uBi0ts982WoBdM5pUP9gaL50E6pFoUz3M5 4cnhAedoRoxWnlySubq51EfdWV+Lt5wqK7X9rxSgDAQbRST+2TOeETNUlCJTfurt5VL/ linO3wc1fxnoho8yB5dsmeuI3F4GeJQ0BzB99EFAema+bN0D4L18ggKesnZX9c9GCwqV ola6liiJy1yFGBPHd97I+kYjnx9nyIyGfVVTJmTjWEvZSazfB3aZxCxrA+uc8ymivJjZ wlAH3SG6RK9iOr29U6QNIwDsVR7bSior73K09E7mKzf28iVHcJdnhdWG5en8hhLKwNVz SKpA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=PVtCvoAB; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id b4si9074015plr.116.2019.05.10.15.57.59; Fri, 10 May 2019 15:58:15 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=PVtCvoAB; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728050AbfEJWzi (ORCPT + 99 others); Fri, 10 May 2019 18:55:38 -0400 Received: from mail-ed1-f66.google.com ([209.85.208.66]:36628 "EHLO mail-ed1-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726721AbfEJWzh (ORCPT ); Fri, 10 May 2019 18:55:37 -0400 Received: by mail-ed1-f66.google.com with SMTP id a8so7210306edx.3 for ; Fri, 10 May 2019 15:55:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=WVJrycbr/gzeELcTlNLGESPEZue/RmnlQ2sNJEpDNsY=; b=PVtCvoABifhhpps7LQCxWpbe0cETkuW4407swtpW2wZNB0MYdMM0GxxDqv0HNnNodv Ayg8e8drsN5+V97gaXJXhOaknixPfzkzvVcOPyIYII32yga6n2wAaMTThM8259KmS1TC xSJrWL+MhMncjrmBTiAaJe/uJL9mtGHSfExT19YLks6SK0rDOxEamgcnb8LEhXqZNBD5 xaij2B4S8UQV1G46OCEPRl6wk8lJFCRFQ5z0okf/im6T3M/WNhhB1k+bGuD4HbdhzVXX 8V9EB9KSr9JsFhY3/whaTmkpEOdY5oRlYM0GdJtu+83RR0JobEEwXtgv42WNiUjIl3Gh QibQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=WVJrycbr/gzeELcTlNLGESPEZue/RmnlQ2sNJEpDNsY=; b=gA7rdxwHyEa97Qpkf5bKnl8vORZroBUwvF1QTFSpQg6QQIGHXYBknHNrve4uGXIvAl J0q2mMOmLLjZ/hwd3qNWA8L6Uivx3hgunTsSoINrTcbddpiGwvcYhjgUP1LChbmS0Q7K 2pZSumZ6MXW3dedCBEH9Jw6BK8nbL6czdlEpo1wdtl7u21KmSdujf92eEP0oWvCVoZ9l Logly37eSMQ9Wn28HxdY07uaS7tilv0m2qrozX3ahTjl3mcQCfpur/xsimrfz45MDE/W 4WavvRmYuxddyG1wNYSpH/XuxZ7U9X6BZ61+ijQALGb50BUZcjkxDAUa+VXNTlQDcauV qsNA== X-Gm-Message-State: APjAAAV3zlelU9g6sAQ/QOslSOQx46cn2u8AyztP1Vh8p/Nm1SJ70qaF WhXssnYshoaN/m/ksp0VMrCTvQ== X-Received: by 2002:a17:906:f91:: with SMTP id q17mr10924134ejj.63.1557528934578; Fri, 10 May 2019 15:55:34 -0700 (PDT) Received: from google.com ([2a00:79e0:1b:201:ee0a:cce3:df40:3ac5]) by smtp.gmail.com with ESMTPSA id b4sm1767070edf.7.2019.05.10.15.55.32 (version=TLS1_3 cipher=AEAD-AES256-GCM-SHA384 bits=256/256); Fri, 10 May 2019 15:55:33 -0700 (PDT) Date: Sat, 11 May 2019 00:55:27 +0200 From: Jann Horn To: Andy Lutomirski Cc: Aleksa Sarai , Al Viro , Jeff Layton , "J. Bruce Fields" , Arnd Bergmann , David Howells , Eric Biederman , Andrew Morton , Alexei Starovoitov , Kees Cook , Christian Brauner , Tycho Andersen , David Drysdale , Chanho Min , Oleg Nesterov , Aleksa Sarai , Linus Torvalds , Linux Containers , linux-fsdevel , Linux API , kernel list , linux-arch Subject: Re: [PATCH v6 5/6] binfmt_*: scope path resolution of interpreters Message-ID: <20190510225527.GA59914@google.com> References: <20190506165439.9155-1-cyphar@cyphar.com> <20190506165439.9155-6-cyphar@cyphar.com> <20190506191735.nmzf7kwfh7b6e2tf@yavin> <20190510204141.GB253532@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, May 10, 2019 at 02:20:23PM -0700, Andy Lutomirski wrote: > On Fri, May 10, 2019 at 1:41 PM Jann Horn wrote: > > > > On Tue, May 07, 2019 at 05:17:35AM +1000, Aleksa Sarai wrote: > > > On 2019-05-06, Jann Horn wrote: > > > > In my opinion, CVE-2019-5736 points out two different problems: > > > > > > > > The big problem: The __ptrace_may_access() logic has a special-case > > > > short-circuit for "introspection" that you can't opt out of; this > > > > makes it possible to open things in procfs that are related to the > > > > current process even if the credentials of the process wouldn't permit > > > > accessing another process like it. I think the proper fix to deal with > > > > this would be to add a prctl() flag for "set whether introspection is > > > > allowed for this process", and if userspace has manually un-set that > > > > flag, any introspection special-case logic would be skipped. > > > > > > We could do PR_SET_DUMPABLE=3 for this, I guess? > > > > Hmm... I'd make it a new prctl() command, since introspection is > > somewhat orthogonal to dumpability. Also, dumpability is per-mm, and I > > think the introspection flag should be per-thread. > > I've lost track of the context here, but it seems to me that > mitigating attacks involving accidental following of /proc links > shouldn't depend on dumpability. What's the actual problem this is > trying to solve again? The one actual security problem that I've seen related to this is CVE-2019-5736. There is a write-up of it at under "Successful approach", but it goes more or less as follows: A container is running that doesn't use user namespaces (because for some reason I don't understand, apparently some people still do that). An evil process is running inside the container with UID 0 (as in, GLOBAL_ROOT_UID); so if the evil process inside the container was able to reach root-owned files on the host filesystem, it could write into them. The container engine wants to spawn a new process inside the container. It forks off a child that joins the container's namespaces (including PID and mount namespaces), and then the child calls execve() on some path in the container. The attacker replaces the executable in the container with a symlink to /proc/self/exe and replaces a library inside the container with a malicious one. When the container engine calls execve(), intending to run an executable inside the container, it instead goes through ptrace_may_access() using the introspection short-circuit and re-executes its own executable through the jumped symlink /proc/self/exe (which is normally unreachable for the container). After the execve(), the process loads an evil library from inside the container and is under the control of the container. Now the container controls a process whose /proc/self/exe is a jumped symlink to a host executable, and the container can write into it. Some container engines are now using an extremely ugly hack to work around this - whenever they want to enter a container, they copy the host binary into a new memfd and execute that to avoid exposing the original host binary to containers: In my opinion, the problems here are: - Apparently some people run untrusted containers without user namespaces. It would be really nice if people could not do that. (Probably the biggest problem here.) - ptrace_may_access() has a short-circuit that permits a process to unintentionally look at itself even if it has dropped privileges - here, it permits the execve("/proc/self/exe", ...) that would normally be blocked by the check for CAP_SYS_PTRACE if the process is nondumpable. - You can use /proc/*/exe to get a writable fd.