Received: by 2002:ac0:bc90:0:0:0:0:0 with SMTP id a16csp3356img; Wed, 20 Mar 2019 12:43:41 -0700 (PDT) X-Google-Smtp-Source: APXvYqzm4zLYqtpd6mL40rqLHrWf2EL+Y+AIeQR2QRNXlGuEXjUhgeHjScOQ4y2xlt+/BlWG9fVr X-Received: by 2002:a17:902:9683:: with SMTP id n3mr9844005plp.333.1553111021581; Wed, 20 Mar 2019 12:43:41 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1553111021; cv=none; d=google.com; s=arc-20160816; b=fz5PGygGFVaarwkvvLjelrDj0QsT6oOySyGYpqP+q3tFPnxdpw5xRf8NtYrGv9cvM+ aw4H5qhZkpsdLu7nQoKHtYLv7qhZKKmUd8deCdu8lLhYOBVUBR6yd2gmY0Q3VsvKnjr/ 3cHnk/Z1BliffOmLKX1Tlcbzk1rUpE5pmGnBq5bdDYQEKMztWaolTtnn0FRa+gdfcy9V pn6mQm7nOx8ftj3sDfYnKmFoSytwG7TPlZeS/KME1wTe1JXk0xBP+2xbiA1sDU87jx3R UzEsy9KSQmCWCArT0VVbPGLELICoeqPqdxtBADtoMViGQoyR1Tjf8yipAru7AFSDDv4C sKLQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=SvArG4neodJjg81/l1ml2GxK3X5tveRM+mJKV/2MOkc=; b=MnRZ/UylRfkPj9vSgBMsxaCL+kAb2Ae+MMq/u35bKb3FE0dNzAUZZq4X13mnNBvmdK sONAvcaoxULbUPucwvYh28RRs5XvYK4TKw9MqpMfHENIW3Yhho+mZ5fxNXOrmGfKCTKq bL8hD/0jkZPsvKjcsvsPmfEnCOcA92PDUG6jcyNpniUDZd0xDgYfSEZAMH72QXYGaNKa saPu/tuxEgyNNzyGzVSE52WjHO9awi39vWs4oMmh7PseCkatMHw4ItJmI+vs4wV5Tb0X OuTyZDvN8v9JlVB3TraytvTNIG99Vt9LuZvGq0SAHywP6XMRICqIQH5Juac42jOTOf02 nOhA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=OA+7LFBa; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 67si2266391pgc.256.2019.03.20.12.43.25; Wed, 20 Mar 2019 12:43:41 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=OA+7LFBa; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727336AbfCTTk7 (ORCPT + 99 others); Wed, 20 Mar 2019 15:40:59 -0400 Received: from mail-vk1-f172.google.com ([209.85.221.172]:35172 "EHLO mail-vk1-f172.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726067AbfCTTk7 (ORCPT ); Wed, 20 Mar 2019 15:40:59 -0400 Received: by mail-vk1-f172.google.com with SMTP id g24so846283vki.2 for ; Wed, 20 Mar 2019 12:40:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=SvArG4neodJjg81/l1ml2GxK3X5tveRM+mJKV/2MOkc=; b=OA+7LFBabangtz/p7+isIh2AehX9or2nW+UiM+LmdDlGICer9Hr1pT/Zb2fX8f3yMo Quw+a/ByrKDYjBc3YXralGnDlUQtfNB3wN7yg6F3ic3z6H3YGtA7R8K0chdALunMX9Qe UIJDpnYnbayASP16vApGjx5myPEIQ4is88+Xz/plHQ+eHIEbSsl/KbFDuL4WSq9RpzFy FF/8m0BmrpWsKv062B1wnLIgrP5QZZhOWK+wFYM9lQJTSIAMfHm7aLSdNtyQT7mo9hXW G3qrzq2ZPU6pWNiZ7h/DXjc/toE6jVACJlnlKa8+x+85gT7CQIHeOQTzbMX/isOSzhla KkCw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=SvArG4neodJjg81/l1ml2GxK3X5tveRM+mJKV/2MOkc=; b=EdvjN7LQ7/u1/6qXsxp2jhleUyrN2BdZDXIJX8INxnsQ+34ORtC1DBdzGH4G0e7MgQ b1tuHpZv9CZvopQs77aNrklq7tw9jQsoxOXHRkatQeeOvsDYmKh1JFZ7QXh0u1OkNQBm 7p8rWcsY+jfT5QdZV4REFsc2sw1IXYm7E00J3KqhvtDeGihjkvuGgw/aDxU4A9O1B6ii bRMwG6pL7zH8OO8FeO+LCj7sHY+OglKY/NVLKsL34qZZ0OSGS8xQ+iKxJo39Rzk8FNdj Amkl9fBVzF5xfxtq1kankWIrIjFIPdkrKfvEYq6Y3zHrgfG09KXwQd/HtbQp9HAeplq4 7heQ== X-Gm-Message-State: APjAAAUbzv7CEzXKAOiARrP62JoSrtHmPsfHHWOY09ykXsxZw1X0s89y MT1PdAVZd7HcJ4xt9b3JAM59lGpC4mBNM0FP5PvR/Q== X-Received: by 2002:a1f:82ce:: with SMTP id e197mr5989535vkd.89.1553110857728; Wed, 20 Mar 2019 12:40:57 -0700 (PDT) MIME-Version: 1.0 References: <20190319231020.tdcttojlbmx57gke@brauner.io> <20190320015249.GC129907@google.com> <20190320035953.mnhax3vd47ya4zzm@brauner.io> <4A06C5BB-9171-4E70-BE31-9574B4083A9F@joelfernandes.org> <20190320182649.spryp5uaeiaxijum@brauner.io> <20190320185156.7bq775vvtsxqlzfn@brauner.io> <20190320191412.5ykyast3rgotz3nu@brauner.io> In-Reply-To: <20190320191412.5ykyast3rgotz3nu@brauner.io> From: Daniel Colascione Date: Wed, 20 Mar 2019 12:40:46 -0700 Message-ID: Subject: Re: pidfd design To: Christian Brauner Cc: Andy Lutomirski , Joel Fernandes , Suren Baghdasaryan , Steven Rostedt , Sultan Alsawaf , Tim Murray , Michal Hocko , Greg Kroah-Hartman , =?UTF-8?B?QXJ2ZSBIasO4bm5ldsOlZw==?= , Todd Kjos , Martijn Coenen , Ingo Molnar , Peter Zijlstra , LKML , "open list:ANDROID DRIVERS" , linux-mm , kernel-team , Oleg Nesterov , "Serge E. Hallyn" , Kees Cook Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Mar 20, 2019 at 12:14 PM Christian Brauner wrote: > > On Wed, Mar 20, 2019 at 11:58:57AM -0700, Andy Lutomirski wrote: > > On Wed, Mar 20, 2019 at 11:52 AM Christian Brauner wrote: > > > > > > You're misunderstanding. Again, I said in my previous mails it should > > > accept pidfds optionally as arguments, yes. But I don't want it to > > > return the status fds that you previously wanted pidfd_wait() to return. > > > I really want to see Joel's pidfd_wait() patchset and have more people > > > review the actual code. > > > > Just to make sure that no one is forgetting a material security consideration: > > Andy, thanks for commenting! > > > > > $ ls /proc/self > > attr exe mountinfo projid_map status > > autogroup fd mounts root syscall > > auxv fdinfo mountstats sched task > > cgroup gid_map net schedstat timers > > clear_refs io ns sessionid timerslack_ns > > cmdline latency numa_maps setgroups uid_map > > comm limits oom_adj smaps wchan > > coredump_filter loginuid oom_score smaps_rollup > > cpuset map_files oom_score_adj stack > > cwd maps pagemap stat > > environ mem personality statm > > > > A bunch of this stuff makes sense to make accessible through a syscall > > interface that we expect to be used even in sandboxes. But a bunch of > > it does not. For example, *_map, mounts, mountstats, and net are all > > namespace-wide things that certain policies expect to be unavailable. > > stack, for example, is a potential attack surface. Etc. If you can access these files sources via open(2) on /proc/, you should be able to access them via a pidfd. If you can't, you shouldn't. Which /proc? The one you'd get by mounting procfs. I don't see how pidfd makes any material changes to anyone's security. As far as I'm concerned, if a sandbox can't mount /proc at all, it's just a broken and unsupported configuration. An actual threat model and real thought paid to access capabilities would help. Almost everything around the interaction of Linux kernel namespaces and security feels like a jumble of ad-hoc patches added as afterthoughts in response to random objections. >> All these new APIs either need to > > return something more restrictive than a proc dirfd or they need to > > follow the same rules. What's wrong with the latter? > > And I'm afraid that the latter may be a > > nonstarter if you expect these APIs to be used in libraries. What's special about libraries? How is a library any worse-off using openat(2) on a pidfd than it would be just opening the file called "/proc/$apid"? > > Yes, this is unfortunate, but it is indeed the current situation. I > > suppose that we could return magic restricted dirfds, or we could > > return things that aren't dirfds and all and have some API that gives > > you the dirfd associated with a procfd but only if you can see > > /proc/PID. > > What would be your opinion to having a > /proc//handle > file instead of having a dirfd. Essentially, what I initially proposed > at LPC. The change on what we currently have in master would be: > https://gist.github.com/brauner/59eec91550c5624c9999eaebd95a70df And how do you propose, given one of these handle objects, getting a process's current priority, or its current oom score, or its list of memory maps? As I mentioned in my original email, and which nobody has addressed, if you don't use a dirfd as your process handle or you don't provide an easy way to get one of these proc directory FDs, you need to duplicate a lot of metadata access interfaces.