Received: by 2002:a25:4158:0:0:0:0:0 with SMTP id o85csp4108931yba; Mon, 29 Apr 2019 13:55:30 -0700 (PDT) X-Google-Smtp-Source: APXvYqww+rWgP/YW8hZMDHJRI9V9gWwm91BbWzQXI51qRa2iDdeTf75mk/th6UDcCg0wUcFXM+9t X-Received: by 2002:a17:902:505:: with SMTP id 5mr51702958plf.323.1556571330097; Mon, 29 Apr 2019 13:55:30 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1556571330; cv=none; d=google.com; s=arc-20160816; b=YBJ8mCOv0pOSipTUGvLlIzyElqDEPdvIF2jTWrtwOHh0s+bjlspUsDo0pzfujcagYZ oycbm8fPFWb6pWo8gwwa9Hym4TsPUaIowVFuq5+Cs8H1WnGgx5BwoDJzs4fbUAIDPwZK AqyuXcVec5iJY9Xwkht70KlltPIInEUK+C3zdB95719f1VnqdnN3JmSDT1pCEMYv62XS JNF5YEOncGTzrPrqo1aWuIK7CmykZmP/VKr9UYln7ByLiWygv13c/OW7vsj9AyOL2n3w 3ZSu/+wJE4shbp3dixRjgU5M66/WUKn/ojfqe5s+JmYEMT8Pur+8QSh+mdVnDL4z1E+h D3fQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=3qA2gnh3WCYTM8Z8LwSdNcpUBBLD5ma7h/mHJ94b/Z4=; b=IIgTkiwC7Od/KCM/r+lwQscL7irmmUbK2OLywGZVmj+JQjBDQwPUY+VpGS46Vc840Q ynQVmaYeHy9io6EO16Dz5YCI6G00jies5VAwDuiFEUUoMMZ75dnrorz+v2gvUgOKnWZZ 2AT0731z1bUpi4hPgexpuaD+D0Jcyaj1CniheMQuGUypCF7XD573wipmPrbuD+LWvg9B z3aatmcR+fRWKbhNC05rFUspGNErZ9Xt7ACbGOqDx9W+wFSP2tHMfjOy2RQy4epIz+/w geqAy32voyDobB8voA+YHcPHKpuQ5jog9BsSJyHbQPjAdkyh3SmW3r9fxfukIxZ/0eXG K56g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@brauner.io header.s=google header.b=D9Ez8JKM; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id v9si32938796pgs.17.2019.04.29.13.55.14; Mon, 29 Apr 2019 13:55:30 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@brauner.io header.s=google header.b=D9Ez8JKM; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729428AbfD2UxE (ORCPT + 99 others); Mon, 29 Apr 2019 16:53:04 -0400 Received: from mail-lf1-f65.google.com ([209.85.167.65]:41354 "EHLO mail-lf1-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728565AbfD2UxE (ORCPT ); Mon, 29 Apr 2019 16:53:04 -0400 Received: by mail-lf1-f65.google.com with SMTP id t30so8974524lfd.8 for ; Mon, 29 Apr 2019 13:53:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=brauner.io; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=3qA2gnh3WCYTM8Z8LwSdNcpUBBLD5ma7h/mHJ94b/Z4=; b=D9Ez8JKMsqbiIul/T3ZCGg8YZL5W573M9HFx64cRKPToTZ+/XbvxRJjy/hT8aFkks+ NjPilGNY4l4+hUf8q9D5txLxa8XPZ/toXOTUPNJHlNl4kjVcG2dDk1UIYrVqrM8dzkPb +MJtLuJ+YpN+dPEDOH9CJS+tZP+ChfHmXazmErwjJCBvIo5bZX26GHZfPMAu3U0T0auN Uslws8h29oq1Wvcu4amBhMWvUGPr58kJu/DTHA67GVH0rt8S9txOWyHnA32NAiMZmQmQ Xp6EvSuzgztcJfOCahbps/1iFUTBJpCMkwWNIpzVEuMWKBLKzx/kvDiJ8yf5CDouge3w ARDA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=3qA2gnh3WCYTM8Z8LwSdNcpUBBLD5ma7h/mHJ94b/Z4=; b=ndoqRjOCHRaNHJZh23hqH5ZgejGnnsLGKhpKXPZ5GYDsZ+dt0YKtOOyyTJ2/KmxrCZ JMLenyUhl6y/GmYWpjJ0PKvG2JxxVSBe4yr0N4egGuzIo6u1iN77t7imjKi/aqy1ejT5 Y1ep5gBJv6UvBGvlO7W3btHTDMF1CQ407nnbP22hzLTxSHE5dhmJvWDAhYswLWmkzdrl 2AZN/6D8s6qrnrACl2m4SAi2syossHMW3EgVa6ZO0uor9NhUwepXA7GJ6M946fZt1kdl 9bZsil/Xxg1LhhJ+FJJQfoj2+8nZ2trBg7l32G0ZiT2mARBYINR7MsV262RFF2rPUKn1 IF+Q== X-Gm-Message-State: APjAAAVVGtNfrcIMFCC0wjJvhyXX1aPCJqP9hL4IRvhWKMws8zZZYU8u 2NDHgYm2vvQwvzANkBryXgu0VTq+++Qu2h3KtnR+VQ== X-Received: by 2002:a19:7406:: with SMTP id v6mr9658592lfe.9.1556571181274; Mon, 29 Apr 2019 13:53:01 -0700 (PDT) MIME-Version: 1.0 References: <20190414201436.19502-1-christian@brauner.io> <20190415195911.z7b7miwsj67ha54y@yavin> <20190420071406.GA22257@ip-172-31-15-78> <87v9ywbkp8.fsf@oldenburg2.str.redhat.com> In-Reply-To: <87v9ywbkp8.fsf@oldenburg2.str.redhat.com> From: Christian Brauner Date: Mon, 29 Apr 2019 22:52:50 +0200 Message-ID: Subject: Re: RFC: on adding new CLONE_* flags [WAS Re: [PATCH 0/4] clone: add CLONE_PIDFD] To: Florian Weimer Cc: Jann Horn , Kevin Easton , Andy Lutomirski , Aleksa Sarai , "Enrico Weigelt, metux IT consult" , Linus Torvalds , Al Viro , David Howells , Linux API , LKML , "Serge E. Hallyn" , Arnd Bergmann , "Eric W. Biederman" , Kees Cook , Thomas Gleixner , Michael Kerrisk , Andrew Morton , Oleg Nesterov , Joel Fernandes , Daniel Colascione Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Apr 29, 2019 at 10:50 PM Florian Weimer wrote: > > * Jann Horn: > > >> int clone_temporary(int (*fn)(void *arg), void *arg, pid_t *child_pid, > >> ) > >> > >> and then you'd use it like this to fork off a child process: > >> > >> int spawn_shell_subprocess_(void *arg) { > >> char *cmdline =3D arg; > >> execl("/bin/sh", "sh", "-c", cmdline); > >> return -1; > >> } > >> pid_t spawn_shell_subprocess(char *cmdline) { > >> pid_t child_pid; > >> int res =3D clone_temporary(spawn_shell_subprocess_, cmdline, > >> &child_pid, [...]); > >> if (res =3D=3D 0) return child_pid; > >> return res; > >> } > >> > >> clone_temporary() could be implemented roughly as follows by the libc > >> (or other userspace code): > >> > >> sigset_t sigset, sigset_old; > >> sigfillset(&sigset); > >> sigprocmask(SIG_SETMASK, &sigset, &sigset_old); > >> int child_pid; > >> int result =3D 0; > >> /* starting here, use inline assembly to ensure that no stack > >> allocations occur */ > >> long child =3D syscall(__NR_clone, > >> CLONE_VM|CLONE_CHILD_SETTID|CLONE_CHILD_CLEARTID|SIGCHLD, $RSP - > >> ABI_STACK_REDZONE_SIZE, NULL, &child_pid, 0); > >> if (child =3D=3D -1) { result =3D -1; goto reset_sigmask; } > >> if (child =3D=3D 0) { > >> result =3D fn(arg); > >> syscall(__NR_exit, 0); > >> } > >> futex(&child_pid, FUTEX_WAIT, child, NULL); > >> /* end of no-stack-allocations zone */ > >> reset_sigmask: > >> sigprocmask(SIG_SETMASK, &sigset_old, NULL); > >> return result; > > > > ... I guess that already has a name, and it's called vfork(). (Well, > > except that the Linux vfork() isn't a real vfork().) > > > > So I guess my question is: Why not vfork()? > > Mainly because some users want access to the clone flags, and that's not > possible with the current userspace wrappers. The stack setup for the > undocumented clone wrapper is also cumbersome, and the ia64 pecularity > annoying. > > For the stack sharing, the callback-based interface looks like the > absolutely right thing to do to me. It enforces the notion that you can > safely return on the child path from a function calling vfork. > > > And if vfork() alone isn't flexible enough, alternatively: How about > > an API that forks a new child in the same address space, and then > > allows the parent to invoke arbitrary syscalls in the context of the > > child? > > As long it's not an eBPF script =E2=80=A6 You shouldn't even joke about this (I'm serious.). I'm very certain there are people who'd think this is a good idea. > > > You could also build that in userspace if you wanted, I think - just > > let the child run an assembly loop that reads registers from a unix > > seqpacket socket, invokes the syscall instruction, and writes the > > value of the result register back into the seqpacket socket. As long > > as you use CLONE_VM, you don't have to worry about moving the pointer > > targets of syscalls. The user-visible API could look like this: > > People already use a variant of this, execve'ing twice. See > jspawnhelper. > > Thanks, > Florian