Received: by 2002:a25:ab43:0:0:0:0:0 with SMTP id u61csp3054680ybi; Sun, 26 May 2019 13:30:10 -0700 (PDT) X-Google-Smtp-Source: APXvYqwVKtE2AtRT6RFJKD+tA+6r6ZT4KnM/64Ulf06OLelIfAy4PDlaThcdX4WZbVp+J/aqXDJn X-Received: by 2002:a17:90a:de14:: with SMTP id m20mr26284619pjv.36.1558902610283; Sun, 26 May 2019 13:30:10 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1558902610; cv=none; d=google.com; s=arc-20160816; b=z4yc3BWQnm/GUcpdIcbZbEk4CBeVuR2OumudOdHgQ7qvqDp0RIIlPWzdm8p4SF7PNu y261SCeU55q534S82S64ciscD4OY7LNmuS8B0b1OTGXyNdIMLKGUYXiJku4DwHC05YS4 IhJBsLulFHgM429cF2uIPx5VHXOR64ynw+y+Zt0BVdh/Z3elRa8C3sz8Z/GL/Tk8iU05 Jgy74faTiU6OA7+kySSZXSN5UZFNsd/FF7rSQYe4zqu7g776zoIHCs4ZfXV1n0kXgfeQ cQDu/aM4zpgh+qvilqWuF888lCcevOWxBx3fhgTwchk+G1pDViwZRYJvFF7N4J65QVze esww== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=98xmIfwNSpIj947R0jyy5BYF7kjSbLHVRnaOgM93vw8=; b=UBOJ3PUEnhpgFniHlL25iG/dTQo/CvXZEnKgfQnZL/tSrODvrawLZR3Qx7FwoDm0Cq fm2CPhVim3PEB7v51TCjtqFFP9TZmRsQY3bkWznjS/WC6RNZTr4NBZcvqXsMa7dj+a3Y 7z+aORSHSAs8adpsSLugQDmsYZDF4oTDD0zpKp6lEkGmTxQZdWOFmI2HR5C1D4LiaXqR t+d+XVV8fWyk/g5WoQI5pRtHpAG036U/V17bEC8bLpFks/7TV16hRX7+dYqUg5UR6MMz 8T9754DOMjkEFHhJQA4Yw16rMNj5qk+WaLKnOMUYxBdbtcVR6iDzDN6VhAnPk8Axe5CQ MQww== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id h5si15246028pgg.526.2019.05.26.13.29.54; Sun, 26 May 2019 13:30:10 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726096AbfEZU1r (ORCPT + 99 others); Sun, 26 May 2019 16:27:47 -0400 Received: from port70.net ([81.7.13.123]:59088 "EHLO port70.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725616AbfEZU1q (ORCPT ); Sun, 26 May 2019 16:27:46 -0400 X-Greylist: delayed 420 seconds by postgrey-1.27 at vger.kernel.org; Sun, 26 May 2019 16:27:44 EDT Received: by port70.net (Postfix, from userid 1002) id 64F7EABEC0BA; Sun, 26 May 2019 22:20:42 +0200 (CEST) Date: Sun, 26 May 2019 22:20:42 +0200 From: Szabolcs Nagy To: Christian Brauner Cc: viro@zeniv.linux.org.uk, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org, torvalds@linux-foundation.org, fweimer@redhat.com, jannh@google.com, oleg@redhat.com, tglx@linutronix.de, arnd@arndb.de, shuah@kernel.org, dhowells@redhat.com, tkjos@android.com, ldv@altlinux.org, miklos@szeredi.hu, linux-alpha@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-ia64@vger.kernel.org, linux-m68k@lists.linux-m68k.org, linux-mips@vger.kernel.org, linux-parisc@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-s390@vger.kernel.org, linux-sh@vger.kernel.org, sparclinux@vger.kernel.org, linux-xtensa@linux-xtensa.org, linux-arch@vger.kernel.org, linux-kselftest@vger.kernel.org, x86@kernel.org Subject: Re: [PATCH v2 1/2] open: add close_range() Message-ID: <20190526202041.GO16415@port70.net> References: <20190523154747.15162-1-christian@brauner.io> <20190523154747.15162-2-christian@brauner.io> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190523154747.15162-2-christian@brauner.io> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org * Christian Brauner [2019-05-23 17:47:46 +0200]: > This adds the close_range() syscall. It allows to efficiently close a range > of file descriptors up to all file descriptors of a calling task. > > The syscall came up in a recent discussion around the new mount API and > making new file descriptor types cloexec by default. During this > discussion, Al suggested the close_range() syscall (cf. [1]). Note, a > syscall in this manner has been requested by various people over time. > > First, it helps to close all file descriptors of an exec()ing task. This > can be done safely via (quoting Al's example from [1] verbatim): > > /* that exec is sensitive */ > unshare(CLONE_FILES); > /* we don't want anything past stderr here */ > close_range(3, ~0U); > execve(....); this does not work in a hosted c implementation unless the libc guarantees not to use libc internal fds (e.g. in execve). (the libc cannot easily abstract fds, so the syscall abi layer fd semantics is necessarily visible to user code.) i think this is a new constraint for userspace runtimes. (not entirely unreasonable though) > The code snippet above is one way of working around the problem that file > descriptors are not cloexec by default. This is aggravated by the fact that > we can't just switch them over without massively regressing userspace. For > a whole class of programs having an in-kernel method of closing all file > descriptors is very helpful (e.g. demons, service managers, programming > language standard libraries, container managers etc.). was cloexec_range(a,b) considered? > (Please note, unshare(CLONE_FILES) should only be needed if the calling > task is multi-threaded and shares the file descriptor table with another > thread in which case two threads could race with one thread allocating > file descriptors and the other one closing them via close_range(). For the > general case close_range() before the execve() is sufficient.) assuming there is no unblocked signal handler that may open fds. a syscall that tramples on fds not owned by the caller is ugly (not generally safe to use and may break things if it gets used), i don't have a better solution for fd leaks or missing cloexec, but i think it needs more analysis how it can be used.