Received: by 2002:a25:8b91:0:0:0:0:0 with SMTP id j17csp5709190ybl; Sun, 22 Dec 2019 12:23:31 -0800 (PST) X-Google-Smtp-Source: APXvYqw5O/Y0tMjexMrUGY9i++wrSloNiiMOAK3YvnlcXgXe2P0aW0U5rHu2RiljCaiR3jsyi+E7 X-Received: by 2002:a05:6830:4d5:: with SMTP id s21mr12596564otd.294.1577046211066; Sun, 22 Dec 2019 12:23:31 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1577046211; cv=none; d=google.com; s=arc-20160816; b=vMhxgwABtc9sy896y3cx/a6Qb03VYysoMEyRuxlawcJGf7k66sDlpCCUauVW57Wgaz bxKQqZcwRPobov0Z3QfDajHomBFelsCQITKSx0990dHbLwgGcBwqTxnUBs5J+bQA/xPo WOoJInI6Fp6iS8XmXxGMCLPtoIOiGLsxpXLbtO9A1ZdMe5Au8x360DoRuWmgeK4HvnV4 NhWUDE2ORMF+T46emhDv6RSLc5deGJIUigGDPaY5wny+iOcccGn61oatxS1+UCaCvfD1 opTYO8abBjCY4ML91DqJIxQCFG+24F2bnhBwFfDbqheSnkSrPENl9T+jabAdMWnpra5G mdzg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=21uogGNbfXQvvEJgQ3vm9/wcakDd43WP3881ORnUl74=; b=Il1Q/hYwjsGZhiHnxVE79+5QqFnijjOx+bG4T1Imqvyna/bSiVcOoULzTv1bo6CpEJ Fg3eJzPBYFlsD2ph7WgWr/xdSJa9sb5QQnjqXYcv+AR+ppTYmPCq3hmSmi1ZI9yiG4NO EGeyf/9BIbbUeaGTu1E0VkfC28fBHKRg8spN3dr7n4EbTJ8vMH1alrJC+rQppn7qFTBQ KcIwrsExw5VCq2MrgTAWLWUmqzTr+5qlgEpy7Aq5FYMSMNghuDbqq4IGXs3/Xi4TYG+H Xeihp1ijldaKXWssyMR921B4NWaJYYZOWGt+UvJs4wKQ8VKSU6c/QsiASVuuz4pFRXEJ UrEw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id z22si1293395otq.94.2019.12.22.12.23.19; Sun, 22 Dec 2019 12:23:31 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726826AbfLVUQQ (ORCPT + 99 others); Sun, 22 Dec 2019 15:16:16 -0500 Received: from youngberry.canonical.com ([91.189.89.112]:42128 "EHLO youngberry.canonical.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725951AbfLVUQP (ORCPT ); Sun, 22 Dec 2019 15:16:15 -0500 Received: from [172.58.30.161] (helo=wittgenstein) by youngberry.canonical.com with esmtpsa (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1ij7e6-0002dA-FH; Sun, 22 Dec 2019 20:16:11 +0000 Date: Sun, 22 Dec 2019 21:15:58 +0100 From: Christian Brauner To: Sargun Dhillon Cc: Emilio Cobos =?utf-8?Q?=C3=81lvarez?= , Arnd Bergmann , Jann Horn , Gian-Carlo Pascutto , Linux API , Linux Containers , Jed Davis , LKML , Oleg Nesterov , Al Viro , Linux FS-devel Mailing List , Andy Lutomirski Subject: Re: [PATCH v5 2/3] pid: Introduce pidfd_getfd syscall Message-ID: <20191222201556.zcjceuwpel26jo37@wittgenstein> References: <20191220232810.GA20233@ircssh-2.c.rugged-nimbus-611.internal> <20191222124756.o2v2zofseypnqg3t@wittgenstein> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: User-Agent: NeoMutt/20180716 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sun, Dec 22, 2019 at 10:36:42AM -0800, Sargun Dhillon wrote: > , On Sun, Dec 22, 2019 at 4:48 AM Christian Brauner > wrote: > > > > On Fri, Dec 20, 2019 at 11:28:13PM +0000, Sargun Dhillon wrote: > > > This syscall allows for the retrieval of file descriptors from other > > > processes, based on their pidfd. This is possible using ptrace, and > > > injection of parasitic code along with using SCM_RIGHTS to move > > > file descriptors between a tracee and a tracer. Unfortunately, ptrace > > > comes with a high cost of requiring the process to be stopped, and > > > breaks debuggers. This does not require stopping the process under > > > manipulation. > > > > > > One reason to use this is to allow sandboxers to take actions on file > > > descriptors on the behalf of another process. For example, this can be > > > combined with seccomp-bpf's user notification to do on-demand fd > > > extraction and take privileged actions. For example, it can be used > > > to bind a socket to a privileged port. > > > > > > /* prototype */ > > > /* > > > * pidfd_getfd_options is an extensible struct which can have options > > > * added to it. If options is NULL, size, and it will be ignored be > > > * ignored, otherwise, size should be set to sizeof(*options). If > > > * option is newer than the current kernel version, E2BIG will be > > > * returned. > > > */ > > > struct pidfd_getfd_options {}; > > > long pidfd_getfd(int pidfd, int fd, unsigned int flags, > > > struct pidfd_getfd_options *options, size_t size); > That's embarrassing. This was supposed to read: > long pidfd_getfd(int pidfd, int fd, struct pidfd_get_options *options, > size_t size); > > > > > The prototype advertises a flags argument but the actual > > > > +SYSCALL_DEFINE4(pidfd_getfd, int, pidfd, int, fd, > > + struct pidfd_getfd_options __user *, options, size_t, usize) > > > > does not have a flags argument... > > > > I think having a flags argument makes a lot of sense. > > > > I'm not sure what to think about the struct. I agree with Aleksa that > > having an empty struct is not a great idea. From a design perspective it > > seems very out of place. If we do a struct at all putting at least a > > single reserved field in there might makes more sense. > > > > In general, I think we need to have a _concrete_ reason why putting a > > struct versioned by size as arguments for this syscall. > > That means we need to have at least a concrete example for a new feature > > for this syscall where a flag would not convey enough information. > I can think of at least two reasons we need flags: > * Clearing cgroup flags > * Closing the process under manipulation's FD when we fetch it. > > The original reason for wanting to have two places where we can put > flags was to have a different field for fd flags vs. call flags. I'm not sure > there's any flags you'd want to set. > > Given this, if we want to go down the route of a syscall, we should just > leave it as a __u64 flags, and drop the pointer to the struct, if we're I think it needs to be an unsigned int. Having a 64bit register arg is really messy on 32bit and means you need to have a compat syscall implementation which handles this. Christian