2020-12-09 22:05:31

by Stephen Kitt

[permalink] [raw]
Subject: [PATCH v2] close_range.2: new page documenting close_range(2)

This documents close_range(2) based on information in
278a5fbaed89dacd04e9d052f4594ffd0e0585de and
60997c3d45d9a67daf01c56d805ae4fec37e0bd8.

Signed-off-by: Stephen Kitt <[email protected]>
---
V2: unsigned int to match the kernel declarations
groff and grammar tweaks
CLOSE_RANGE_UNSHARE unshares *and* closes
Explain that EMFILE and ENOMEM can occur with C_R_U
"Conforming to" phrasing
Detailed explanation of CLOSE_RANGE_UNSHARE
Reading /proc isn't common

man2/close_range.2 | 138 +++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 138 insertions(+)
create mode 100644 man2/close_range.2

diff --git a/man2/close_range.2 b/man2/close_range.2
new file mode 100644
index 000000000..403142b33
--- /dev/null
+++ b/man2/close_range.2
@@ -0,0 +1,138 @@
+.\" Copyright (c) 2020 Stephen Kitt <[email protected]>
+.\"
+.\" %%%LICENSE_START(VERBATIM)
+.\" Permission is granted to make and distribute verbatim copies of this
+.\" manual provided the copyright notice and this permission notice are
+.\" preserved on all copies.
+.\"
+.\" Permission is granted to copy and distribute modified versions of this
+.\" manual under the conditions for verbatim copying, provided that the
+.\" entire resulting derived work is distributed under the terms of a
+.\" permission notice identical to this one.
+.\"
+.\" Since the Linux kernel and libraries are constantly changing, this
+.\" manual page may be incorrect or out-of-date. The author(s) assume no
+.\" responsibility for errors or omissions, or for damages resulting from
+.\" the use of the information contained herein. The author(s) may not
+.\" have taken the same level of care in the production of this manual,
+.\" which is licensed free of charge, as they might when working
+.\" professionally.
+.\"
+.\" Formatted or processed versions of this manual, if unaccompanied by
+.\" the source, must acknowledge the copyright and authors of this work.
+.\" %%%LICENSE_END
+.\"
+.TH CLOSE_RANGE 2 2020-12-08 "Linux" "Linux Programmer's Manual"
+.SH NAME
+close_range \- close all file descriptors in a given range
+.SH SYNOPSIS
+.nf
+.B #include <linux/close_range.h>
+.PP
+.BI "int close_range(unsigned int " first ", unsigned int " last ", unsigned int " flags );
+.fi
+.SH DESCRIPTION
+The
+.BR close_range ()
+system call closes all open file descriptors from
+.I first
+to
+.I last
+(included).
+.PP
+Errors closing a given file descriptor are currently ignored.
+.PP
+.I flags
+can be set to
+.B CLOSE_RANGE_UNSHARE
+to unshare the range of file descriptors from any other processes,
+before closing them, avoiding races with other threads sharing the
+file descriptor table.
+.SH RETURN VALUE
+On success,
+.BR close_range ()
+returns 0.
+On error, \-1 is returned and
+.I errno
+is set to indicate the cause of the error.
+.SH ERRORS
+.TP
+.B EINVAL
+.I flags
+is not valid, or
+.I first
+is greater than
+.IR last .
+.PP
+The following can occur with
+.B CLOSE_RANGE_UNSHARE
+(when constructing the new descriptor table):
+.TP
+.B EMFILE
+The per-process limit on the number of open file descriptors has been reached
+(see the description of
+.B RLIMIT_NOFILE
+in
+.BR getrlimit (2)).
+.TP
+.B ENOMEM
+Insufficient kernel memory was available.
+.SH VERSIONS
+.BR close_range ()
+first appeared in Linux 5.9.
+.SH CONFORMING TO
+.BR close_range ()
+is a nonstandard function that is also present on FreeBSD.
+.SH NOTES
+Currently, there is no glibc wrapper for this system call; call it using
+.BR syscall (2).
+.PP
+.B CLOSE_RANGE_UNSHARE
+is conceptually equivalent to
+.PP
+.in +4n
+.EX
+unshare(CLONE_FILES);
+close_range(first, last, 0);
+.EE
+.in
+.PP
+but can be more efficient: if the unshared range extends past the
+current maximum number of file descriptors allocated in the caller's
+file descriptor table (the common case when
+.I last
+is
+.BR ~0U ),
+the kernel will unshare a new file descriptor
+table for the caller up to
+.IR first .
+This avoids subsequent close calls entirely; the whole operation is
+complete once the table is unshared.
+.SH USE CASES
+.\" 278a5fbaed89dacd04e9d052f4594ffd0e0585de
+.\" 60997c3d45d9a67daf01c56d805ae4fec37e0bd8
+.SS Closing file descriptors before exec
+File descriptors can be closed safely using
+.PP
+.in +4n
+.EX
+/* we don't want anything past stderr here */
+close_range(3, ~0U, CLOSE_RANGE_UNSHARE);
+execve(....);
+.EE
+.in
+.SS Closing all open file descriptors
+To avoid blindly closing file descriptors in the range of possible
+file descriptors, this is sometimes implemented (on Linux) by listing
+open file descriptors in
+.I /proc/self/fd/
+and calling
+.BR close (2)
+on each one.
+.BR close_range ()
+can take care of this without requiring
+.I /proc
+and with a single system call, which provides significant performance
+benefits.
+.SH SEE ALSO
+.BR close (2)

base-commit: b5dae3959625f5ff378e9edf9139057d1c06bb55
--
2.20.1


2020-12-10 00:29:19

by Alejandro Colomar

[permalink] [raw]
Subject: Re: [PATCH v2] close_range.2: new page documenting close_range(2)

Hi Stephen,

A few more comments below.

Michael, please have a look at them too.

Christian, do you have any program that you used to test the syscall
that could be added as an example program to the page?

Thanks,

Alex

On 12/9/20 11:00 PM, Stephen Kitt wrote:
> This documents close_range(2) based on information in
> 278a5fbaed89dacd04e9d052f4594ffd0e0585de and
> 60997c3d45d9a67daf01c56d805ae4fec37e0bd8.
>
> Signed-off-by: Stephen Kitt <[email protected]>
> ---
> V2: unsigned int to match the kernel declarations
> groff and grammar tweaks
> CLOSE_RANGE_UNSHARE unshares *and* closes
> Explain that EMFILE and ENOMEM can occur with C_R_U
> "Conforming to" phrasing
> Detailed explanation of CLOSE_RANGE_UNSHARE
> Reading /proc isn't common
>
> man2/close_range.2 | 138 +++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 138 insertions(+)
> create mode 100644 man2/close_range.2
>
> diff --git a/man2/close_range.2 b/man2/close_range.2
> new file mode 100644
> index 000000000..403142b33
> --- /dev/null
> +++ b/man2/close_range.2
> @@ -0,0 +1,138 @@
> +.\" Copyright (c) 2020 Stephen Kitt <[email protected]>
> +.\"
> +.\" %%%LICENSE_START(VERBATIM)
> +.\" Permission is granted to make and distribute verbatim copies of this
> +.\" manual provided the copyright notice and this permission notice are
> +.\" preserved on all copies.
> +.\"
> +.\" Permission is granted to copy and distribute modified versions of this
> +.\" manual under the conditions for verbatim copying, provided that the
> +.\" entire resulting derived work is distributed under the terms of a
> +.\" permission notice identical to this one.
> +.\"
> +.\" Since the Linux kernel and libraries are constantly changing, this
> +.\" manual page may be incorrect or out-of-date. The author(s) assume no
> +.\" responsibility for errors or omissions, or for damages resulting from
> +.\" the use of the information contained herein. The author(s) may not
> +.\" have taken the same level of care in the production of this manual,
> +.\" which is licensed free of charge, as they might when working
> +.\" professionally.
> +.\"
> +.\" Formatted or processed versions of this manual, if unaccompanied by
> +.\" the source, must acknowledge the copyright and authors of this work.
> +.\" %%%LICENSE_END
> +.\"
> +.TH CLOSE_RANGE 2 2020-12-08 "Linux" "Linux Programmer's Manual"
> +.SH NAME
> +close_range \- close all file descriptors in a given range
> +.SH SYNOPSIS
> +.nf
> +.B #include <linux/close_range.h>
> +.PP
> +.BI "int close_range(unsigned int " first ", unsigned int " last ", unsigned int " flags );

This line overflows an 80-col terminal. Fix:

.BI "int close_range(unsigned int " first ", unsigned int " last ,
.BI " unsigned int " flags );


> +.fi

Please, add a note here that there is no wrapper for this syscall,
as in other syscalls without wrapper (see membarrier(2) as an example).
That way it's easier to grep, if all pages have the same notice.

> +.SH DESCRIPTION
> +The
> +.BR close_range ()
> +system call closes all open file descriptors from
> +.I first
> +to
> +.I last
> +(included).
> +.PP
> +Errors closing a given file descriptor are currently ignored.
> +.PP
> +.I flags
> +can be set to
> +.B CLOSE_RANGE_UNSHARE
> +to unshare the range of file descriptors from any other processes,
> +before closing them, avoiding races with other threads sharing the
> +file descriptor table.

Please use semantic newlines.
See man-pages(7)::STYLE GUIDE::Use semantic newlines

> +.SH RETURN VALUE
> +On success,
> +.BR close_range ()
> +returns 0.
> +On error, \-1 is returned and
> +.I errno
> +is set to indicate the cause of the error.
> +.SH ERRORS
> +.TP
> +.B EINVAL
> +.I flags
> +is not valid, or
> +.I first
> +is greater than
> +.IR last .
> +.PP
> +The following can occur with
> +.B CLOSE_RANGE_UNSHARE
> +(when constructing the new descriptor table):
> +.TP
> +.B EMFILE
> +The per-process limit on the number of open file descriptors has been reached
> +(see the description of
> +.B RLIMIT_NOFILE
> +in
> +.BR getrlimit (2)).
> +.TP
> +.B ENOMEM
> +Insufficient kernel memory was available.
> +.SH VERSIONS
> +.BR close_range ()
> +first appeared in Linux 5.9.
> +.SH CONFORMING TO
> +.BR close_range ()
> +is a nonstandard function that is also present on FreeBSD.
> +.SH NOTES
> +Currently, there is no glibc wrapper for this system call; call it using
> +.BR syscall (2).

I can see that this notice is also present on a few pages,
but the one in membarrier(2) is more extended.

Please, copy the notices from membarrier(2).
There's one in SYNOPSIS, and one in NOTES.

> +.PP
> +.B CLOSE_RANGE_UNSHARE
> +is conceptually equivalent to
> +.PP
> +.in +4n
> +.EX
> +unshare(CLONE_FILES);
> +close_range(first, last, 0);
> +.EE
> +.in
> +.PP
> +but can be more efficient: if the unshared range extends past the
> +current maximum number of file descriptors allocated in the caller's
> +file descriptor table (the common case when
> +.I last
> +is
> +.BR ~0U ),
> +the kernel will unshare a new file descriptor
> +table for the caller up to
> +.IR first .
> +This avoids subsequent close calls entirely; the whole operation is
> +complete once the table is unshared.

I think the above is more suitable for the DESCRIPTION, but what are
your thoughts, mtk?

> +.SH USE CASES

This section is unconventional. Please move that text to one of the
traditional sections. I think DESCRIPTION would be the best place for this.

For a list of the traditional sections,
see man-pages(7)::DESCRIPTION::Sections within a manual page

> +.\" 278a5fbaed89dacd04e9d052f4594ffd0e0585de
> +.\" 60997c3d45d9a67daf01c56d805ae4fec37e0bd8
> +.SS Closing file descriptors before exec
> +File descriptors can be closed safely using
> +.PP
> +.in +4n
> +.EX
> +/* we don't want anything past stderr here */
> +close_range(3, ~0U, CLOSE_RANGE_UNSHARE);
> +execve(....);
> +.EE
> +.in
> +.SS Closing all open file descriptors
> +To avoid blindly closing file descriptors in the range of possible
> +file descriptors, this is sometimes implemented (on Linux) by listing
> +open file descriptors in
> +.I /proc/self/fd/
> +and calling
> +.BR close (2)
> +on each one.
> +.BR close_range ()
> +can take care of this without requiring
> +.I /proc
> +and with a single system call, which provides significant performance
> +benefits.
> +.SH SEE ALSO
> +.BR close (2)
>
> base-commit: b5dae3959625f5ff378e9edf9139057d1c06bb55
>

--
Alejandro Colomar
Linux man-pages comaintainer; https://www.kernel.org/doc/man-pages/
http://www.alejandro-colomar.es

2020-12-10 23:11:12

by Stephen Kitt

[permalink] [raw]
Subject: Re: [PATCH v2] close_range.2: new page documenting close_range(2)

Hi Alejandro,

On Thu, 10 Dec 2020 01:24:28 +0100, "Alejandro Colomar (man-pages)"
<[email protected]> wrote:
> A few more comments below.

Thanks for all the detailed reviews! (Same goes to everyone reviewing this
yesterday, thanks for helping turn my not-so-great attempt into something
really useful.)

> Michael, please have a look at them too.
>
> Christian, do you have any program that you used to test the syscall
> that could be added as an example program to the page?

I have some example code that I could turn into an example program, but I
imagine Christian has better code than me ;-).

> > +.BI "int close_range(unsigned int " first ", unsigned int " last ",
> > unsigned int " flags );
>
> This line overflows an 80-col terminal. Fix:
>
> .BI "int close_range(unsigned int " first ", unsigned int " last ,
> .BI " unsigned int " flags );

Noted, thanks.

> > +.fi
>
> Please, add a note here that there is no wrapper for this syscall,
> as in other syscalls without wrapper (see membarrier(2) as an example).
> That way it's easier to grep, if all pages have the same notice.

Ah, right, I used pidfd_open(2) as an example, where it's mentioned later.

> Please use semantic newlines.
> See man-pages(7)::STYLE GUIDE::Use semantic newlines

Noted, thanks.

> > +.SH NOTES
> > +Currently, there is no glibc wrapper for this system call; call it using
> > +.BR syscall (2).
>
> I can see that this notice is also present on a few pages,
> but the one in membarrier(2) is more extended.
>
> Please, copy the notices from membarrier(2).
> There's one in SYNOPSIS, and one in NOTES.

Will do!

> > +.SH USE CASES
>
> This section is unconventional. Please move that text to one of the
> traditional sections. I think DESCRIPTION would be the best place for this.
>
> For a list of the traditional sections,
> see man-pages(7)::DESCRIPTION::Sections within a manual page

Indeed; it grew out of pidfd_open's "Use cases" section but making all this
part of the description would be much better.

I'll wait and see if there are other comments, and send a v3 addressing all
the above.

Regards,

Stephen


Attachments:
(No filename) (849.00 B)
OpenPGP digital signature
Subject: Re: [PATCH v2] close_range.2: new page documenting close_range(2)

On 12/10/20 1:24 AM, Alejandro Colomar (man-pages) wrote:
> Hi Stephen,
>
> A few more comments below.
>
> Michael, please have a look at them too.
>
> Christian, do you have any program that you used to test the syscall
> that could be added as an example program to the page?
>
> Thanks,
>
> Alex
>
> On 12/9/20 11:00 PM, Stephen Kitt wrote:
>> This documents close_range(2) based on information in
>> 278a5fbaed89dacd04e9d052f4594ffd0e0585de and
>> 60997c3d45d9a67daf01c56d805ae4fec37e0bd8.
>>
>> Signed-off-by: Stephen Kitt <[email protected]>
>> ---
>> V2: unsigned int to match the kernel declarations
>> groff and grammar tweaks
>> CLOSE_RANGE_UNSHARE unshares *and* closes
>> Explain that EMFILE and ENOMEM can occur with C_R_U
>> "Conforming to" phrasing
>> Detailed explanation of CLOSE_RANGE_UNSHARE
>> Reading /proc isn't common
>>
>> man2/close_range.2 | 138 +++++++++++++++++++++++++++++++++++++++++++++
>> 1 file changed, 138 insertions(+)
>> create mode 100644 man2/close_range.2
>>
>> diff --git a/man2/close_range.2 b/man2/close_range.2
>> new file mode 100644
>> index 000000000..403142b33
>> --- /dev/null
>> +++ b/man2/close_range.2

[...]

>> +.SH USE CASES
>
> This section is unconventional. Please move that text to one of the
> traditional sections. I think DESCRIPTION would be the best place for this.

Actually, I'd just drop this SH line, and keep the
subsections where they are in NOTES.

Thanks,

Michael


--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

2020-12-13 15:08:31

by Christian Brauner

[permalink] [raw]
Subject: Re: [PATCH v2] close_range.2: new page documenting close_range(2)

On Thu, Dec 10, 2020 at 01:24:28AM +0100, Alejandro Colomar (man-pages) wrote:
> Hi Stephen,
>
> A few more comments below.
>
> Michael, please have a look at them too.
>
> Christian, do you have any program that you used to test the syscall
> that could be added as an example program to the page?

As with all non-trivial feature patchsets going through my tree we've
added selftests. :)

tools/testing/selftests/core/close_range_test.c

It should be fairly simply to extract a tinified example from there.

Thanks!
Christian