Received: by 2002:a05:6a10:f347:0:0:0:0 with SMTP id d7csp4428647pxu; Mon, 21 Dec 2020 12:13:50 -0800 (PST) X-Google-Smtp-Source: ABdhPJwZTqbjn1Jhyyt2l4ph+5gVC+LKQpeex4vQGMUvSqQFLLKL1Qx+STDhh64ol41NGPvF/yAS X-Received: by 2002:a17:906:c2ca:: with SMTP id ch10mr10312187ejb.157.1608581630211; Mon, 21 Dec 2020 12:13:50 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1608581630; cv=none; d=google.com; s=arc-20160816; b=0XdrHe0zS79Mubd9R5pb8U4BWd2tZ6dG7FRwy7hkyTtx1uc9Tj97Hqj2VGlFzzLsca gYsIO+qL/7Js0QQc9Z0LL7pq/xsiXetjtWnMBsQVj4pXgzTOyi56Ur6NjBFIv1eOavdg SVBXRFTHFoZDIUyF8u7mWUm0QB6AyjtwV2l6wijWlAOCBgYOo6oA/UNZK2Gs9w+BbXUV hxCGE1axV2Z+v3fzuTtAAD8V7WDlK7TWRsgukmKxSGUpi7yXfGZK36oDX5R5sIqpQ4eW QfavQ3ygyVNokqLZ801NztcLB19qNy7EfboXOXvbpJGKM6riKUuXl6VuoMhfsR2oflSY B6QA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from; bh=yucFdG1RrJ/C/muaYj9wGhOuaOAwiMimi11LCQwCVVM=; b=V2AZ5U2z5nOwSWNNLNwLhM3pYhGSvCEOMoH32KE+QkyENcYc9s/UPkOHaXBVl/eGSh AgYJcSUixd2wsbTdIQ9Hew+ohVAv3s2BNkMqrcIrM9QrB6q2GvpykNs41hMN1QkQG0Dt bcbTKIELOg2fHkLdmBJkUS1424b2ixpiXz03kILF2kubyNji1JoBL2fjePN04Glpx7MZ EA8HtO+GX1bq5NhsOELXWHuxq8mGH1GWWCjERL8WHQpVlejTcZ6q7/0XX8vO2w8vPijJ EgrxGEryXpRyrU4kHIbZX8NsGcMboyQ11VMl2Yc8JEiVoPWMwZWr23kh6IAS45akR2tx FdkA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id o6si9601113ejj.354.2020.12.21.12.13.28; Mon, 21 Dec 2020 12:13:50 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1725848AbgLUUL5 (ORCPT + 99 others); Mon, 21 Dec 2020 15:11:57 -0500 Received: from 10.mo68.mail-out.ovh.net ([46.105.79.203]:60625 "EHLO 10.mo68.mail-out.ovh.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725782AbgLUUL5 (ORCPT ); Mon, 21 Dec 2020 15:11:57 -0500 Received: from player771.ha.ovh.net (unknown [10.108.42.73]) by mo68.mail-out.ovh.net (Postfix) with ESMTP id E50DC181B85 for ; Mon, 21 Dec 2020 20:47:18 +0100 (CET) Received: from sk2.org (82-65-25-201.subs.proxad.net [82.65.25.201]) (Authenticated sender: steve@sk2.org) by player771.ha.ovh.net (Postfix) with ESMTPSA id 97E381979DBD0; Mon, 21 Dec 2020 19:47:11 +0000 (UTC) Authentication-Results: garm.ovh; auth=pass (GARM-99G0039a35bce2-3516-462d-98bf-733b93d04579, A98814B0FADA3C5872731A5F91035151D8D30146) smtp.auth=steve@sk2.org X-OVh-ClientIp: 82.65.25.201 From: Stephen Kitt To: linux-man@vger.kernel.org, Alejandro Colomar , Michael Kerrisk Cc: Christian Brauner , Giuseppe Scrivano , linux-kernel@vger.kernel.org, Stephen Kitt Subject: [PATCH v5] close_range.2: new page documenting close_range(2) Date: Mon, 21 Dec 2020 20:46:56 +0100 Message-Id: <20201221194656.22111-1-steve@sk2.org> X-Mailer: git-send-email 2.20.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Ovh-Tracer-Id: 18142188150336802066 X-VR-SPAMSTATE: OK X-VR-SPAMSCORE: -100 X-VR-SPAMCAUSE: gggruggvucftvghtrhhoucdtuddrgedujedrvddtvddgudefvdcutefuodetggdotefrodftvfcurfhrohhfihhlvgemucfqggfjpdevjffgvefmvefgnecuuegrihhlohhuthemucehtddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmdenucfjughrpefhvffufffkofgggfestdekredtredttdenucfhrhhomhepufhtvghphhgvnhcumfhithhtuceoshhtvghvvgesshhkvddrohhrgheqnecuggftrfgrthhtvghrnhepteegudfgleekieekteeggeetveefueefteeugfduieeitdfhhedtfeefkedvfeefnecukfhppedtrddtrddtrddtpdekvddrieehrddvhedrvddtudenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhepmhhouggvpehsmhhtphdqohhuthdphhgvlhhopehplhgrhigvrhejjedurdhhrgdrohhvhhdrnhgvthdpihhnvghtpedtrddtrddtrddtpdhmrghilhhfrhhomhepshhtvghvvgesshhkvddrohhrghdprhgtphhtthhopehlihhnuhigqdhkvghrnhgvlhesvhhgvghrrdhkvghrnhgvlhdrohhrgh Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org This documents close_range(2) based on information in 278a5fbaed89dacd04e9d052f4594ffd0e0585de, 60997c3d45d9a67daf01c56d805ae4fec37e0bd8, and 582f1fb6b721facf04848d2ca57f34468da1813e. Signed-off-by: Stephen Kitt --- V5: clarification of the open/close_range/execve sequence V4: sort flags alphabetically move commit references inside the corresponding section more semantic newlines unformat numeric constants more formatting for function references escape C backslashes C99 loop indices V3: fix synopsis overflow copy notes from membarrier.2 re the lack of wrapper semantic newlines drop non-standard "USE CASES" section heading add code example V2: unsigned int to match the kernel declarations groff and grammar tweaks CLOSE_RANGE_UNSHARE unshares *and* closes Explain that EMFILE and ENOMEM can occur with C_R_U "Conforming to" phrasing Detailed explanation of CLOSE_RANGE_UNSHARE Reading /proc isn't common man2/close_range.2 | 267 +++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 267 insertions(+) create mode 100644 man2/close_range.2 diff --git a/man2/close_range.2 b/man2/close_range.2 new file mode 100644 index 000000000..0677a9bf9 --- /dev/null +++ b/man2/close_range.2 @@ -0,0 +1,267 @@ +.\" Copyright (c) 2020 Stephen Kitt +.\" +.\" %%%LICENSE_START(VERBATIM) +.\" Permission is granted to make and distribute verbatim copies of this +.\" manual provided the copyright notice and this permission notice are +.\" preserved on all copies. +.\" +.\" Permission is granted to copy and distribute modified versions of this +.\" manual under the conditions for verbatim copying, provided that the +.\" entire resulting derived work is distributed under the terms of a +.\" permission notice identical to this one. +.\" +.\" Since the Linux kernel and libraries are constantly changing, this +.\" manual page may be incorrect or out-of-date. The author(s) assume no +.\" responsibility for errors or omissions, or for damages resulting from +.\" the use of the information contained herein. The author(s) may not +.\" have taken the same level of care in the production of this manual, +.\" which is licensed free of charge, as they might when working +.\" professionally. +.\" +.\" Formatted or processed versions of this manual, if unaccompanied by +.\" the source, must acknowledge the copyright and authors of this work. +.\" %%%LICENSE_END +.\" +.TH CLOSE_RANGE 2 2020-12-08 "Linux" "Linux Programmer's Manual" +.SH NAME +close_range \- close all file descriptors in a given range +.SH SYNOPSIS +.nf +.B #include +.PP +.BI "int close_range(unsigned int " first ", unsigned int " last , +.BI " unsigned int " flags ); +.fi +.PP +.IR Note : +There is no glibc wrapper for this system call; see NOTES. +.SH DESCRIPTION +The +.BR close_range () +system call closes all open file descriptors from +.I first +to +.I last +(included). +.PP +Errors closing a given file descriptor are currently ignored. +.PP +.I flags +can be 0 or set to one or both of the following: +.TP +.BR CLOSE_RANGE_CLOEXEC " (since Linux 5.10)" +sets the close-on-exec bit instead of +immediately closing the file descriptors. +.TP +.B CLOSE_RANGE_UNSHARE +unshares the range of file descriptors from any other processes, +before closing them, +avoiding races with other threads sharing the file descriptor table. +.SH RETURN VALUE +On success, +.BR close_range () +returns 0. +On error, \-1 is returned and +.I errno +is set to indicate the cause of the error. +.SH ERRORS +.TP +.B EINVAL +.I flags +is not valid, or +.I first +is greater than +.IR last . +.PP +The following can occur with +.B CLOSE_RANGE_UNSHARE +(when constructing the new descriptor table): +.TP +.B EMFILE +The per-process limit on the number of open file descriptors has been reached +(see the description of +.B RLIMIT_NOFILE +in +.BR getrlimit (2)). +.TP +.B ENOMEM +Insufficient kernel memory was available. +.SH VERSIONS +.BR close_range () +first appeared in Linux 5.9. +.SH CONFORMING TO +.BR close_range () +is a nonstandard function that is also present on FreeBSD. +.SH NOTES +Glibc does not provide a wrapper for this system call; call it using +.BR syscall (2). +.SS Closing all open file descriptors +.\" 278a5fbaed89dacd04e9d052f4594ffd0e0585de +To avoid blindly closing file descriptors +in the range of possible file descriptors, +this is sometimes implemented (on Linux) +by listing open file descriptors in +.I /proc/self/fd/ +and calling +.BR close (2) +on each one. +.BR close_range () +can take care of this without requiring +.I /proc +and within a single system call, +which provides significant performance benefits. +.SS Closing file descriptors before exec +.\" 60997c3d45d9a67daf01c56d805ae4fec37e0bd8 +File descriptors can be closed safely using +.PP +.in +4n +.EX +/* we don't want anything past stderr here */ +close_range(3, ~0U, CLOSE_RANGE_UNSHARE); +execve(....); +.EE +.in +.PP +.B CLOSE_RANGE_UNSHARE +is conceptually equivalent to +.PP +.in +4n +.EX +unshare(CLONE_FILES); +close_range(first, last, 0); +.EE +.in +.PP +but can be more efficient: +if the unshared range extends past +the current maximum number of file descriptors allocated +in the caller's file descriptor table +(the common case when +.I last +is ~0U), +the kernel will unshare a new file descriptor table for the caller up to +.IR first . +This avoids subsequent close calls entirely; +the whole operation is complete once the table is unshared. +.SS Closing files on \fBexec\fP +.\" 582f1fb6b721facf04848d2ca57f34468da1813e +This is particularly useful in cases where multiple +.RB pre- exec +setup steps risk conflicting with each other. +For example, setting up a +.BR seccomp (2) +profile can conflict with a +.BR close_range () +call: +if the file descriptors are closed before the +.BR seccomp (2) +profile is set up, +the profile setup can't use them itself, +or control their closure; +if the file descriptors are closed afterwards, +the seccomp profile can't block the +.BR close_range () +call or any fallbacks. +Using +.B CLOSE_RANGE_CLOEXEC +avoids this: +the descriptors can be marked before the +.BR seccomp (2) +profile is set up, +and the profile can control access to +.BR close_range () +without affecting the calling process. +.SH EXAMPLES +The following program is designed to be execed by the second program +below. +It lists its open file descriptors: +.PP +.in +4n +.EX +/* listopen.c */ + +#include +#include + +int +main(int argc, char *argv[]) +{ + struct stat buf; + + for (int i = 0; i < 100; i++) { + if (!fstat(i, &buf)) + printf("FD %d is open.\en", i); + } + + exit(EXIT_SUCCESS); +) +.EE +.in +.PP +This program executes the command given on its command-line, +after opening the files listed after the command +and then using +.BR close_range () +to close them: +.PP +.in +4n +.EX +/* close_range.c */ + +#include +#include +#include +#include +#include +#include +#include +#include + +int +main(int argc, char *argv[]) +{ + char *newargv[] = { NULL }; + char *newenviron[] = { NULL }; + + if (argc < 3) { + fprintf(stderr, "Usage: %s \en", argv[0]); + exit(EXIT_FAILURE); + } + + for (int i = 2; i < argc; i++) { + if (open(argv[i], O_RDONLY) == -1) { + perror(argv[i]); + exit(EXIT_FAILURE); + } + } + + if (syscall(__NR_close_range, 3, ~0U, CLOSE_RANGE_UNSHARE) == -1) { + perror("close_range"); + exit(EXIT_FAILURE); + } + + execve(argv[1], newargv, newenviron); + perror("execve"); + exit(EXIT_FAILURE); +} +.EE +.in +.PP +We can use the second program to exec the first as follows: +.PP +.in +4n +.EX +.RB "$" " make listopen close_range" +.RB "$" " ./close_range ./listopen /dev/null /dev/zero" +FD 0 is open. +FD 1 is open. +FD 2 is open. +.EE +.in +.PP +Removing the call to +.BR close_range () +will show different output, +with the file descriptors for the named files still open. +.SH SEE ALSO +.BR close (2) base-commit: b5dae3959625f5ff378e9edf9139057d1c06bb55 -- 2.20.1