Received: by 2002:ac0:bc90:0:0:0:0:0 with SMTP id a16csp839692img; Wed, 20 Mar 2019 12:00:11 -0700 (PDT) X-Google-Smtp-Source: APXvYqxnJ+LzOpG4ghRrfMcqavvKXakDkMHk2XG8FIT1OJHU/7kmVY7XZaMyRzsyCHNQesSsl1nO X-Received: by 2002:aa7:8a95:: with SMTP id a21mr9142736pfc.14.1553108411274; Wed, 20 Mar 2019 12:00:11 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1553108411; cv=none; d=google.com; s=arc-20160816; b=Y98mc2yMOig0xW/iZCVjStpf94MJ95ZWd0+Fx75wL1eGGlEl+FMYOio7HSJ87RkYPT j5RLpsDd99ZyRmFer7606pVrkjLXk4VhA8AftHEbUXmygOXMj9KtrkGwmN9spfPFMNEc RBDg3ku4jvyWCTo1N7mwOMl/3G5ckyJH+uI4POwTp8VA3hu7NVii8nsIN9dfeFSkMbtT JlZCKSku2tXD2rlDqVE4DvOjXickyfmJqU/BtUnO11SpsyjsQStFinyah2SmxiSs60+c IP4CMnERoHgP/LTiz4dXkl7HRpGzawUpgXuvaBBZWPLj1JnVhmEh7+wv6gZ/VzaFJQFK /vLQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=ZsMIk5N/e1lX2C2dObjWfHVYZX/KpN3z9XU57acVrEM=; b=qFcrjheadvE4UbRh89QRJDT2dRzEdV3aCwCMkUHY86Gvk6CdajCdlZaq6G2wZOq1QX ffvB45Laqwvrih8HcXiAKhoAv24T5hwdLKDCGX7t/TwoO4dQkQgZ1xQ+Hd+5axBc5t6Z nnJOtE1BXzVEK9rDaQVM+WKcUctlsJZj5iQjLwLM2TLlXKvpW3ay+22I4ryRtWcgQWpx GXj5iAKNZqqpnJkhve/qw6F5I5gSQw9E2ODbGTaU4LU/1Zr4vJeIDCPFf1EqOdOeVMN7 g38sz+LS0KuCy3KKZ+w2m1f4yHDQmFxS2YPersm/psnRaRb0YGltCXGb94fRX7tvr2aP 2cpg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=mRIf6w3Q; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id k7si2382757plt.198.2019.03.20.11.59.55; Wed, 20 Mar 2019 12:00:11 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=mRIf6w3Q; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727380AbfCTS7L (ORCPT + 99 others); Wed, 20 Mar 2019 14:59:11 -0400 Received: from mail.kernel.org ([198.145.29.99]:45256 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727217AbfCTS7K (ORCPT ); Wed, 20 Mar 2019 14:59:10 -0400 Received: from mail-wm1-f42.google.com (mail-wm1-f42.google.com [209.85.128.42]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id B3333218D4 for ; Wed, 20 Mar 2019 18:59:09 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1553108350; bh=jp8AbSaqwT21JnyYDjKNO3C8evoJ6yy8quSVRfFSrac=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=mRIf6w3QFh3e9I8TEAE+h2PxblsGDwAvz/vvD8Akc66cH92WI1V37lsdfwOx4+xnd JLwYPXQAmxsZPJPnkaIC9PuN/GyDZDVoR3K3um+rtpcmpAB1nH7GpqTGRQs+2gFosD MPn8HQc+0pW04uhAeiyRhhiaezb/4rABISJifUvM= Received: by mail-wm1-f42.google.com with SMTP id n19so338877wmi.1 for ; Wed, 20 Mar 2019 11:59:09 -0700 (PDT) X-Gm-Message-State: APjAAAUsWbzqRTiYcZByaEtL1yu3uuwje4weGAi5CoJt+olPZo8a4rCC 38fcZiPwcP+Uq3+XykNwbUFo+cm37JYH94nMG723/Q== X-Received: by 2002:a1c:9a41:: with SMTP id c62mr9266896wme.108.1553108348098; Wed, 20 Mar 2019 11:59:08 -0700 (PDT) MIME-Version: 1.0 References: <20190319221415.baov7x6zoz7hvsno@brauner.io> <20190319231020.tdcttojlbmx57gke@brauner.io> <20190320015249.GC129907@google.com> <20190320035953.mnhax3vd47ya4zzm@brauner.io> <4A06C5BB-9171-4E70-BE31-9574B4083A9F@joelfernandes.org> <20190320182649.spryp5uaeiaxijum@brauner.io> <20190320185156.7bq775vvtsxqlzfn@brauner.io> In-Reply-To: <20190320185156.7bq775vvtsxqlzfn@brauner.io> From: Andy Lutomirski Date: Wed, 20 Mar 2019 11:58:57 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: pidfd design To: Christian Brauner Cc: Daniel Colascione , Joel Fernandes , Suren Baghdasaryan , Steven Rostedt , Sultan Alsawaf , Tim Murray , Michal Hocko , Greg Kroah-Hartman , =?UTF-8?B?QXJ2ZSBIasO4bm5ldsOlZw==?= , Todd Kjos , Martijn Coenen , Ingo Molnar , Peter Zijlstra , LKML , "open list:ANDROID DRIVERS" , linux-mm , kernel-team , Oleg Nesterov , "Serge E. Hallyn" , Kees Cook Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Mar 20, 2019 at 11:52 AM Christian Brauner wrote: > > You're misunderstanding. Again, I said in my previous mails it should > accept pidfds optionally as arguments, yes. But I don't want it to > return the status fds that you previously wanted pidfd_wait() to return. > I really want to see Joel's pidfd_wait() patchset and have more people > review the actual code. Just to make sure that no one is forgetting a material security consideration: $ ls /proc/self attr exe mountinfo projid_map status autogroup fd mounts root syscall auxv fdinfo mountstats sched task cgroup gid_map net schedstat timers clear_refs io ns sessionid timerslack_ns cmdline latency numa_maps setgroups uid_map comm limits oom_adj smaps wchan coredump_filter loginuid oom_score smaps_rollup cpuset map_files oom_score_adj stack cwd maps pagemap stat environ mem personality statm A bunch of this stuff makes sense to make accessible through a syscall interface that we expect to be used even in sandboxes. But a bunch of it does not. For example, *_map, mounts, mountstats, and net are all namespace-wide things that certain policies expect to be unavailable. stack, for example, is a potential attack surface. Etc. As it stands, if you create a fresh userns and mountns and try to mount /proc, there are some really awful and hideous rules that are checked for security reasons. All these new APIs either need to return something more restrictive than a proc dirfd or they need to follow the same rules. And I'm afraid that the latter may be a nonstarter if you expect these APIs to be used in libraries. Yes, this is unfortunate, but it is indeed the current situation. I suppose that we could return magic restricted dirfds, or we could return things that aren't dirfds and all and have some API that gives you the dirfd associated with a procfd but only if you can see /proc/PID. --Andy