Received: by 2002:a25:4158:0:0:0:0:0 with SMTP id o85csp4370789yba; Mon, 29 Apr 2019 19:17:42 -0700 (PDT) X-Google-Smtp-Source: APXvYqzlWM3g1sTgeBAFSAhEKB8C6w2GesaijZigtwHJIn2+KufMUVh0jn3C0vWroeYfhX0tzDwS X-Received: by 2002:aa7:8383:: with SMTP id u3mr24320189pfm.245.1556590662055; Mon, 29 Apr 2019 19:17:42 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1556590662; cv=none; d=google.com; s=arc-20160816; b=uZnaf1Et70yDA33U9Can4HmlZI83NTzE+RgE8IS1YDfKuZMBWLNcmXElHRJtd/wifd Ady9XtkMzPC8q17hm0q5LGQhsYC388xK/w7Pu6N88P/A3ivCYbaxGmMTD+aRT/7TOje3 Nk9b1n5DjYe2hxGHBwYObajWPzM/wdyLeR1zubMwmzxqaga0JmLBm7yBzbI4c/QoKDg0 AeB6+muCijdzA75foTdSZNrLa3xrEY4vMs3qzBRj/Z/44E+VoJW/s7qwg/kl7hntoKad dnOH2RJkVNxwK2VPhj9lweJBJX4tkTlDQRxJlyeNkcUnEMrU+k2mwAd6YAc96vXwEnJw 9ORA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=O7K80baoGD98i/ki0iA1SM4BuBVnujumewB88uhrGPE=; b=hBhPuC7uJkSWc6jOSDoSSbp0ZhFlevnLkBHZb1ft7elrtxtnu3FddYXpE4LMXwFNOP Zmiv13haazOH0eyO7WsKU3NE5bLTgeWIBD93iFPcLCeHGqFj/yfVQIfprub+f1mc/GNm Vh99UKzjss1z4LtmojwfYMeCswnYKOGodepOzE2qqaAHF3/lATlntKdvWVfzaYzhZdGE DY7Dyt2MnItPPOLDz7QFcJgO41/gZcTFk8/pw47JWWF0ij0WWrTXkXjbr2l7osUC3ts5 pZv2JwapikVt0twh4WhHlOX80qI+XYWcBft8ub1t/M+CkZ4K8fjt+YAXcUcxPmKGrVau C1mw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linux-foundation.org header.s=google header.b=agEwzL5u; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id d1si28254582plo.9.2019.04.29.19.17.25; Mon, 29 Apr 2019 19:17:42 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linux-foundation.org header.s=google header.b=agEwzL5u; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729898AbfD3CQd (ORCPT + 99 others); Mon, 29 Apr 2019 22:16:33 -0400 Received: from mail-lf1-f65.google.com ([209.85.167.65]:46880 "EHLO mail-lf1-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729803AbfD3CQd (ORCPT ); Mon, 29 Apr 2019 22:16:33 -0400 Received: by mail-lf1-f65.google.com with SMTP id k18so9406346lfj.13 for ; Mon, 29 Apr 2019 19:16:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux-foundation.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=O7K80baoGD98i/ki0iA1SM4BuBVnujumewB88uhrGPE=; b=agEwzL5uVGX3m7CSaELzs/03mfCALUUKzAl+yfarpEfu3aAmqoEpsz3pM4cUroNLl/ 6bneaBMEu0Mu4GZZEgEZtYPC7Go4RCUjhq5pyTbn8KeNhVXtO4bI8l39EoVfErZQk0aR 6Hh69Nwv8qf3nijb9WYOsKX2+TAZsGITw3iDI= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=O7K80baoGD98i/ki0iA1SM4BuBVnujumewB88uhrGPE=; b=naNORDYcDQUeK9KWIp9MhuCrOIn0L97tOy3eWz4WyIhrdGkJcY+NvSsZx+MSyVEzp0 2KHx9Ih4z2nTLiG20iY9LLzxKuXpvfDB6CLfKjzDOXidj9WtzuUunzxkAfnmARDA4Ws4 ESSqjZ2RqYyS/XEFdzftDwQCiY9jmj64d6M1DEXHZ17/TA+jGfP7rTR4M0qkeYha5bRb mKL1cOowCrEcJ8al+tJ1XaoZuOcDbNq0qWXSJCUD8+9dNhNUcecJsY6giIEgyA2WC96d QjRb0tSdr12arPofRGEBrDjYNmqXNeVhNfB+QKrOIIgtJXtor+dViSAhuKjlCXqW9Nzi Eo1w== X-Gm-Message-State: APjAAAVAdYSVj+XYfkWSKeOJ1XQzo+8rlnku7wOFgJQ6g2u0vxqEG3Gv +S7TFt3mEgiap4bIq2eDNh5P1f5peOg= X-Received: by 2002:a19:7d04:: with SMTP id y4mr16620107lfc.153.1556590590621; Mon, 29 Apr 2019 19:16:30 -0700 (PDT) Received: from mail-lj1-f178.google.com (mail-lj1-f178.google.com. [209.85.208.178]) by smtp.gmail.com with ESMTPSA id f22sm1938059lja.48.2019.04.29.19.16.27 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 29 Apr 2019 19:16:27 -0700 (PDT) Received: by mail-lj1-f178.google.com with SMTP id t1so1125076lje.10 for ; Mon, 29 Apr 2019 19:16:27 -0700 (PDT) X-Received: by 2002:a2e:9ac8:: with SMTP id p8mr31073909ljj.79.1556590587148; Mon, 29 Apr 2019 19:16:27 -0700 (PDT) MIME-Version: 1.0 References: <20190414201436.19502-1-christian@brauner.io> <20190415195911.z7b7miwsj67ha54y@yavin> <20190420071406.GA22257@ip-172-31-15-78> In-Reply-To: From: Linus Torvalds Date: Mon, 29 Apr 2019 19:16:11 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: RFC: on adding new CLONE_* flags [WAS Re: [PATCH 0/4] clone: add CLONE_PIDFD] To: Jann Horn Cc: Kevin Easton , Andy Lutomirski , Christian Brauner , Aleksa Sarai , "Enrico Weigelt, metux IT consult" , Al Viro , David Howells , Linux API , LKML , "Serge E. Hallyn" , Arnd Bergmann , "Eric W. Biederman" , Kees Cook , Thomas Gleixner , Michael Kerrisk , Andrew Morton , Oleg Nesterov , Joel Fernandes , Daniel Colascione Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Apr 29, 2019 at 5:39 PM Jann Horn wrote: > > ... uuuh, whoops. Turns out I don't know what I'm talking about. Well, apparently there's some odd libc issue accoprding to Florian, so there *might* be something to it. > Nevermind. For some reason I thought vfork() was just > CLONE_VFORK|SIGCHLD, but now I see I got that completely wrong. Well, inside the kernel, that's actually *very* close to what vfork() is: SYSCALL_DEFINE0(vfork) { return _do_fork(CLONE_VFORK | CLONE_VM | SIGCHLD, 0, 0, NULL, NULL, 0); } but that's just an internal implementation detail. It's a real vfork() and should act as the traditional BSD "share everything" without any address space copying. The CLONE_VFORK flag is what does the "wait for child to exit or execve" magic. Note that vfork() is "exciting" for the compiler in much the same way "setjmp/longjmp()" is, because of the shared stack use in the child and the parent. It is *very* easy to get this wrong and cause massive and subtle memory corruption issues because the parent returns to something that has been messed up by the child. That may be why some libc might end up just using "fork()", because it ends up avoiding bugs in user space. (In fact, if I recall correctly, the _reason_ we have an explicit 'vfork()' entry point rather than using clone() with magic parameters was that the lack of arguments meant that you didn't have to save/restore any registers in user space, which made the whole stack issue simpler. But it's been two decades, so my memory is bitrotting). Also, particularly if you have a big address space, vfork()+execve() can be quite a bit faster than fork()+execve(). Linux fork() is pretty efficient, but if you have gigabytes of VM space to copy, it's going to take time even if you do it fairly well. Linus