Received: by 2002:a25:4158:0:0:0:0:0 with SMTP id o85csp2332000yba; Mon, 15 Apr 2019 09:26:38 -0700 (PDT) X-Google-Smtp-Source: APXvYqzowM1ZuF5EuFoYOLFcZwmdFhhj3CMYNWOitigaqpf+d4c+XXD6rAxVvtjzvTLzh7O7AKUF X-Received: by 2002:a63:c10d:: with SMTP id w13mr71302948pgf.311.1555345598731; Mon, 15 Apr 2019 09:26:38 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1555345598; cv=none; d=google.com; s=arc-20160816; b=go7art4lA7xKCge6p222ArLxQTTDdULcNDXUL4A2qHCJc+7/G2E4Tro0ml/lhAEFz6 WEK4Q4xdur2WlooT1+/llr5MLEEc9LVC+IlmcePvVPldlf6145eGikVneHiAxwnv+5Rv Dd7K77kFRq1Ssg/+ga1FJ7bOwikWsuXWWyR7UErThCi/wlPDPhIkmO3mc+Uoz5tpRujF O/WH2jGZHRQkcCJvatdTb+dcMvvH3ItLJu9BmRGRixqpKdI02QD7o8MQyoRwxGgOx7lQ x6SngWbhaWZx09qq1zcjj4V1u3mxe8sPjPykMNEvJ16WJwnQf4Ctdv8IJD8vPO+6OBZy MePQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=Kv51A9NZcOFkd2xZwLrm10H5Q11JeOUU4uR7MMM4cKQ=; b=bI/PGr6jnBQc8DXI6UsF6H2Aopnhzl7LqUJrKF/3UXEgsXzg3WBd8Ng62+YWwvkxW9 7yf9qUz4clSbyr1skpQIXyXp2P1EuArFFYwu9qwWWYR3U0nULKLfA/y+4E2SJf4xS0Kf DMoQ8Lx9ofFfDwYig3U67b9ia9gxROx1eyKozmszkXK0Y/ps00Gad1IVz8CcrkTmBCD3 ZofzzohkHD2InthhCYG1Tb+CTcEj/5tLpf5bYspk/VInvtgKqROyUE5r51chzmRey5eG m17o0Bpt9Dudzp8vF1TM5j3h4kVIRpVtyDSCp/hW5PqTVEibb9IoAtL5Oudor8kHNjnu jHeg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@joelfernandes.org header.s=google header.b=cZM0tsmj; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id y204si42176589pfb.184.2019.04.15.09.26.21; Mon, 15 Apr 2019 09:26:38 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@joelfernandes.org header.s=google header.b=cZM0tsmj; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727646AbfDOQZr (ORCPT + 99 others); Mon, 15 Apr 2019 12:25:47 -0400 Received: from mail-pf1-f195.google.com ([209.85.210.195]:35189 "EHLO mail-pf1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727447AbfDOQZq (ORCPT ); Mon, 15 Apr 2019 12:25:46 -0400 Received: by mail-pf1-f195.google.com with SMTP id t21so8862801pfh.2 for ; Mon, 15 Apr 2019 09:25:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=joelfernandes.org; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=Kv51A9NZcOFkd2xZwLrm10H5Q11JeOUU4uR7MMM4cKQ=; b=cZM0tsmjcFuX35sAEMbLybikGA6rsbZ2pweIzNT2EF73NihBIQjlhZ1yCChf7vpCqq 6VsvNYNM92yk/j6rrtgdg3xMuANx/yRY0JRM+fCROhPg+AcN6YamfDDhDLD1kN1Lq+hE yvxsCa3rHXXWASM3R4UZT9lRA3WNmN8zEZLrs= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=Kv51A9NZcOFkd2xZwLrm10H5Q11JeOUU4uR7MMM4cKQ=; b=Jy7YMCmrco7dOIf1TswMcXfWSpVhrt+7tTZ79asbPG4xWBWD5yz+2/SrbCDHvvHmlC k1LzKCSfRxt/BlxJceYM17GiwQJDaflj7eSge0W1goat/2sXnCUnljxiPQpZxP4iBtlj 6+Tqx8j2LYte/RWN7OanWsYRK/53Fs+TjVZPQPAI9PySlzSn5jUFUwecnoZPHX3fCXUL kfas8rU5eEw5FCwgrOWTeht3HZm71uDXwYA/HHnyoUZYEPskwep40rvyVEexrYikBGCn z1qPPNkGv2qnfVHCa87ldyJN0IE2gduesO//q2IyXdCtW2uoNdmsnUjkH8DMZA/F/V6R 6aBA== X-Gm-Message-State: APjAAAVjELDZ3x17X293DOdXXmCTmbvbOPH4LYZHWd1zJBZ+vdmdBF8N SfwPacGYYpO+OjDhF9Snfq5Gyg== X-Received: by 2002:aa7:943b:: with SMTP id y27mr52448203pfo.59.1555345544730; Mon, 15 Apr 2019 09:25:44 -0700 (PDT) Received: from localhost ([2620:15c:6:12:9c46:e0da:efbf:69cc]) by smtp.gmail.com with ESMTPSA id b16sm67626644pfo.168.2019.04.15.09.25.42 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Mon, 15 Apr 2019 09:25:43 -0700 (PDT) Date: Mon, 15 Apr 2019 12:25:42 -0400 From: Joel Fernandes To: Christian Brauner Cc: Oleg Nesterov , torvalds@linux-foundation.org, viro@zeniv.linux.org.uk, jannh@google.com, dhowells@redhat.com, linux-api@vger.kernel.org, linux-kernel@vger.kernel.org, serge@hallyn.com, luto@kernel.org, arnd@arndb.de, ebiederm@xmission.com, keescook@chromium.org, tglx@linutronix.de, mtk.manpages@gmail.com, akpm@linux-foundation.org, cyphar@cyphar.com, dancol@google.com Subject: Re: [PATCH 2/4] clone: add CLONE_PIDFD Message-ID: <20190415162542.GA246478@google.com> References: <20190414201436.19502-1-christian@brauner.io> <20190414201436.19502-3-christian@brauner.io> <20190415105209.GA22204@redhat.com> <20190415114204.ydczeuwmi74wfsuv@brauner.io> <20190415132416.GB22204@redhat.com> <20190415135246.d6pvyf3pkt3sbh6t@brauner.io> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190415135246.d6pvyf3pkt3sbh6t@brauner.io> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Apr 15, 2019 at 03:52:48PM +0200, Christian Brauner wrote: > On Mon, Apr 15, 2019 at 03:24:16PM +0200, Oleg Nesterov wrote: > > On 04/15, Christian Brauner wrote: > > > > > > > CLONE_PARENT_SETTID doesn't look very usefule, so what if we add > > > > > > > > if ((clone_flags & (CLONE_PIDFD|CLONE_PARENT_SETTID)) == > > > > (CLONE_PIDFD|CLONE_PARENT_SETTID)) > > > > return ERR_PTR(-EINVAL); > > > > > > > > at the start of copy_process() ? > > > > > > > > Then it can do > > > > > > > > if (clone_flags & CLONE_PIDFD) { > > > > retval = pidfd_create(pid, &pidfdf); > > > > if (retval < 0) > > > > goto bad_fork_free_pid; > > > > retval = put_user(retval, parent_tidptr) > > > > if (retval < 0) > > > > goto bad_fork_free_pid; > > > > } > > > > > > Uhhh Oleg, that is nifty. I have to say I like that a lot. This would > > > let us return the pid and the pidfd in one go and we can also start > > > pidfd numbering at 0. > > > > Christian, sorry if it was already discussed, but I can't force myself to > > read all the previous discussions ;) > > > > If we forget about CONFIG_PROC_FS, why do we really want to create a file? > > > > > > Suppose we add a global u64 counter incremented by copy_process and reported > > in /proc/$pid/status. Suppose that clone(CLONE_PIDFD) writes this counter to > > *parent_tidptr. Let's denote this counter as UNIQ_PID. > > > > Now, if you want to (say) safely kill a task and you have its UNIQ_PID, you > > can do > > > > kill_by_pid_uniq(int pid, u64 uniq_pid) > > { > > pidfd = open("/proc/$pid", O_DIRECTORY); > > > > status = openat(pidfd, "status"); > > u64 this_uniq_pid = ... read UNIQ_PID from status ...; > > > > if (uniq_pid != this_uniq_pid) > > return; > > > > pidfd_send_signal(pidfd); > > } > > > > Why else do we want pidfd? > > I think this was thrown around at one point but this is rather > inelegant imho. It basically makes a process unique by using a > combination of two identifiers. You end up with a similar concept but > you make it way less flexible and extensible imho. With pidfds you can > not care about pids at all if you don't want to. The UNIQ_PID thing > would require you to always juggle two identifiers. > > Your example would also only work if CONFIG_PROC_FS is set (Not sure if > that's what you meant by "forget about CONFIG_PROC_FS")? Say, you get > a pid from clone() and your UNIQ_PID thing. Then you still can't > reliably kill a process because pidfd_send_signal() is not useable since > you can't get pidfds. And if you go the kill way you end up with the same > problem. Yes, you could solve this by probably extending syscalls to > take a UNIQ_PID argument but that seems very inelegant. > > The UNIQ_PID implementation would also require being tracked in the > kernel either in task_struct or struct pid potentially and thus would > probably add more infrastructure in the kernel. We don't need any of > that if we simply rely on pidfds. > > Most of all, the pidfd concept allows one way more flexibility in > extending it. For example, Joel is working on a patchset to make pidfds > pollable so you can get information about a process death by polling > them. We also want to be able to potentially wait on a process with > waitid(W_PIDFD) or similar as suggested by Linus in earlier threads. At > that point you end up in a similar situation as tgkill() where you pass > a tgid and a pid already to make sure that the pid you pass has the tgid > as thread-group leader. That is all way simpler with pidfds. I agree the pidfd file descriptor approach is simpler than dealing with 2 pids and is needed for the poll notification support I posted. Also in the future it allows for a pidfd to sent over IPC to another process using binder or unix domain sockets. thanks, - Joel