Received: by 2002:a25:d7c1:0:0:0:0:0 with SMTP id o184csp3816668ybg; Mon, 28 Oct 2019 20:13:07 -0700 (PDT) X-Google-Smtp-Source: APXvYqzLnhPMhhmtcA4/rpRbCntb3ZGcyJXR22D+KRHSmAM38sq3AT1Ie29B5k/W4GRZKA7k+DaA X-Received: by 2002:a17:906:aad9:: with SMTP id kt25mr1051396ejb.129.1572318787003; Mon, 28 Oct 2019 20:13:07 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1572318786; cv=none; d=google.com; s=arc-20160816; b=d/bOdcEyXK4heAbe+N+x98ZwHqjvkOMxKc5TWBqJ1xNPCO0K+Cc7DprnqmQJ0XzRwl 9HR487PZf4vyh004AmMihivZFKhBbyhflNg2755aV8EYy+0NvHTBN9gsKSv//PcK8kcf lcXt1ePiVl2d+ndmuX0sPgyY1AXOmb6vU4Zaz/BFAUC56+YYH7At8hpZtUX4j8NWdOYc Hy7gZCBe+VU5KHAa45QY95aV8iiZznYxbNsxQqAU8PCrTqH7kdKrQCQ1ffy/LjypvLCS qTHpxwJnyDacbF8bb+1bghNzRyg7PS+rIx8ZfPBBF9eP88YNl89wQabbdNK2sx3uIpPs EgXg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=wXxhu/tmc9YP1XTfcQJ3HjAvQlCmqGJP256WPTj2XSc=; b=wphbWpA84wiNDdB6aOYH0idLqXnPzDjdztLogX5ghsnhkprhbhDBx42b/RZhLn8j6e PFCY7jhxQd3NTKGM7DIIDKNIkK2wKIJA9JG1ZGgg4H3zd9r2XpVvwdEBfBgJE5tMKAST TUqTmjjBroTsiAUQeSsUe19ijNuyZ7N5gaiMbQeMhkf/t3BTXCXuRO55/laJ/KIjFWY1 BSTuXMNhpRRbeWdezYtnag5FUqGN5b4u9hXxOrD2v88Wi42ZL6aTXN0Z9jQ+oZw7F2xw cO8I9r5qkgpDSVvJWdOMZwTJZqKlZaS5U7+uSZdDltGszCEVGGrw7tBeWYyVU1/QWhl6 6k0A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=GJ40qGJD; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id d7si9340004eda.429.2019.10.28.20.12.43; Mon, 28 Oct 2019 20:13:06 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=GJ40qGJD; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2390564AbfJ1PMi (ORCPT + 99 others); Mon, 28 Oct 2019 11:12:38 -0400 Received: from mail-oi1-f195.google.com ([209.85.167.195]:43009 "EHLO mail-oi1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731098AbfJ1PMi (ORCPT ); Mon, 28 Oct 2019 11:12:38 -0400 Received: by mail-oi1-f195.google.com with SMTP id s5so6139930oie.10 for ; Mon, 28 Oct 2019 08:12:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=wXxhu/tmc9YP1XTfcQJ3HjAvQlCmqGJP256WPTj2XSc=; b=GJ40qGJD7TtbIgudDFm2N7LHOQ6aSoOljrrxXlqEik+9bXO7yJallt9amumZdHCmjG 27n00ClQMzIia2m9uAlnK3x919OrGWy4dvA3Xp1pJJUzxWBIIZwvxmBpryE3+luDzYyq b8x8CaYNpg/Ya0dz6dBgKfc41xtBqIwIUMNVHhp1ncWHdN0rxm/9FplrQ64sV3AsycdO AkAmBf3pHs9hUKE9iRHFo5EEM1vKHq/iolqALxq8JXvSyZB1rgKMDHS+teXambVzbsnp 7zLsscjgNQF6ZVIJu3TXD+HhwdAbPJfVw1QH9JGup9MKa3Ac8UhSRm3Yn7ZX6LRxdZYj 8L7A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=wXxhu/tmc9YP1XTfcQJ3HjAvQlCmqGJP256WPTj2XSc=; b=OAoZ7RF9cl6h24eI7bNTlpE85kYIWdpf9kRbRn4kluGeIXZpiNkEvfmTgon8aaZ4Uz r1pqu5sSNXhb0nBDFRvwkF4l5i2HWRWUN8hTCbMhUs6hTe7QNMpgD9Ot9/e579nB8h8t 9xIBGHhIN8SaP2g+04X3mC8WHmdpk/WjNj4xS9KbFYO+ppbkx41Uxd9iKBwAsn6w9rMA bJ1P2bpjP0GGteTUiEYfszD4ssIKqSjQ/JEpH6nneI3mo7pYTxb+miCvdJPjuWjFSkqF WQlmzIPDhUVUh3VLOLGKqrELeOlPKGgFs9OTwL8uJTkfmGV63N56/MfJFD3GZlMxuMpX Z5TA== X-Gm-Message-State: APjAAAXMN6bx5WEM+tZY0yOCsY6LGrh/1pKYBB10AJV4f3o22j0qIw7G 98kzXCrjaDAoDpM2wab3+02xuUpj6d0KOqTDSo6HaA== X-Received: by 2002:a05:6808:b30:: with SMTP id t16mr367143oij.157.1572275556022; Mon, 28 Oct 2019 08:12:36 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Jann Horn Date: Mon, 28 Oct 2019 16:12:09 +0100 Message-ID: Subject: Re: For review: documentation of clone3() system call To: Michael Kerrisk-manpages Cc: Christian Brauner , lkml , linux-man , Kees Cook , Florian Weimer , Oleg Nesterov , Arnd Bergmann , David Howells , Pavel Emelyanov , Andrew Morton , Adrian Reber , Andrei Vagin , Linux API Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Oct 25, 2019 at 6:59 PM Michael Kerrisk (man-pages) wrote: > I've made a first shot at adding documentation for clone3(). You can > see the diff here: > https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/commit/?id=3D= faa0e55ae9e490d71c826546bbdef954a1800969 [...] > clone3() > The clone3() system call provides a superset of the functionality > of the older clone() interface. It also provides a number of API > improvements, including: space for additional flags bits; cleaner > separation in the use of various arguments; and the ability to > specify the size of the child's stack area. You might want to note somewhere that its flags can't be seccomp-filtered because they're stored in memory, making it inappropriate to use in heavily sandboxed processes. > struct clone_args { > u64 flags; /* Flags bit mask */ > u64 pidfd; /* Where to store PID file descriptor > (int *) */ > u64 child_tid; /* Where to store child TID, > in child's memory (int *) */ > u64 parent_tid; /* Where to store child TID, > in parent's memory (int *) */ > u64 exit_signal; /* Signal to deliver to parent on > child termination */ > u64 stack; /* Pointer to lowest byte of stack */ > u64 stack_size; /* Size of stack */ > u64 tls; /* Location of new TLS */ > }; > > The size argument that is supplied to clone3() should be initial= =E2=80=90 > ized to the size of this structure. (The existence of the size > argument permits future extensions to the clone_args structure.) > > The stack for the child process is specified via cl_args.stack, > which points to the lowest byte of the stack area, and Here and in the comment in the struct above, you say that .stack "points to the lowest byte of the stack area", but isn't that architecture-dependent? For most architectures, I think it should instead be "is the initial stack pointer", with the exception of IA64 (and maybe others, I'm not sure). For example, on X86, when launching a thread with an initially empty stack, it points directly *after* the end of the stack area. > cl_args.stack_size, which specifies the size of the stack in > bytes. In the case where the CLONE_VM flag (see below) is speci= =E2=80=90 stack_size is ignored on most architectures. > fied, a stack must be explicitly allocated and specified. Other= =E2=80=90 > wise, these two fields can be specified as NULL and 0, which > causes the child to use the same stack area as the parent (in the > child's own virtual address space). [...] > Equivalence between clone() and clone3() arguments > Unlike the older clone() interface, where arguments are passed > individually, in the newer clone3() interface the arguments are > packaged into the clone_args structure shown above. This struc= =E2=80=90 > ture allows for a superset of the information passed via the > clone() arguments. > > The following table shows the equivalence between the arguments of > clone() and the fields in the clone_args argument supplied to > clone3(): > > clone() clone(3) Notes > cl_args field > flags & ~0xff flags > parent_tid pidfd See CLONE_PIDFD > child_tid child_tid See CLONE_CHILD_SETTID > parent_tid parent_tid See CLONE_PARENT_SETTID > flags & 0xff exit_signal > stack stack > > --- stack_size (except that on ia64, stack_size also exists in clone2(), and if you're not on ia64, stack_size doesn't do anything, at least on X86, so showing them side by side like this doesn't really make sense)