LinuxLists.cc - For review: pidfd

2019-09-24 16:57:19

Subject: For review: pidfd_open(2) manual page

Hello Christian and all,

Below, I have the rendered version of the current draft of
the pidfd_open(2) manual page that I have written.
The page source can be found in a Git branch at:
https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/log/?h=draft_pidfd

I would be pleased to receive corrections and notes on any
details that should be added. (For example, are there error
cases that I have missed?)

Would you be able to review please?

Thanks,

Michael

NAME
pidfd_open - obtain a file descriptor that refers to a process

SYNOPSIS
int pidfd_open(pid_t pid, unsigned int flags);

DESCRIPTION
The pidfd_open() system creates a file descriptor that refers to
the process whose PID is specified in pid. The file descriptor is
returned as the function result; the close-on-exec flag is set on
the file descriptor.

The flags argument is reserved for future use; currently, this
argument must be specified as 0.

RETURN VALUE
On success, pidfd_open() returns a nonnegative file descriptor.
On success, -1 is returned and errno is set to indicate the cause
of the error.

ERRORS
EINVAL flags is not 0.

EINVAL pid is not valid.

ESRCH The process specified by pid does not exist.

VERSIONS
pidfd_open() first appeared in Linux 5.3.

CONFORMING TO
pidfd_open() is Linux specific.

NOTES
Currently, there is no glibc wrapper for this system call; call it
using syscall(2).

The pidfd_send_signal(2) system call can be used to send a signal
to the process referred to by a PID file descriptor.

A PID file descriptor can be monitored using poll(2), select(2),
and epoll(7). When the process that it refers to terminates, the
file descriptor indicates as readable. Note, however, that in the
current implementation, nothing can be read from the file descrip‐
tor.

The pidfd_open() system call is the preferred way of obtaining a
PID file descriptor. The alternative is to obtain a file descrip‐
tor by opening a /proc/[pid] directory. However, the latter tech‐
nique is possible only if the proc(5) file system is mounted; fur‐
thermore, the file descriptor obtained in this way is not pol‐
lable.

See also the discussion of the CLONE_PIDFD flag in clone(2).

EXAMPLE
The program below opens a PID file descriptor for the process
whose PID is specified as its command-line argument. It then mon‐
itors the file descriptor for readability (POLLIN) using poll(2).
When the process with the specified by PID terminates, poll(2)
returns, and indicates that the file descriptor is readable.

Program source

#define _GNU_SOURCE
#include <sys/syscall.h>
#include <unistd.h>
#include <poll.h>
#include <stdlib.h>
#include <stdio.h>

#ifndef __NR_pidfd_open
#define __NR_pidfd_open 434
#endif

static
int pidfd_open(pid_t pid, unsigned int flags)
{
return syscall(__NR_pidfd_open, pid, flags);
}

int
main(int argc, char *argv[])
{
struct pollfd pollfd;
int pidfd, ready;

if (argc != 2) {
fprintf(stderr, "Usage: %s <pid>\n", argv[0]);
exit(EXIT_SUCCESS);
}

pidfd = pidfd_open(atoi(argv[1]), 0);
if (pidfd == -1) {
perror("pidfd_open");
exit(EXIT_FAILURE);
}

pollfd.fd = pidfd;
pollfd.events = POLLIN;

ready = poll(&pollfd, 1, -1);
if (ready == -1) {
perror("poll");
exit(EXIT_FAILURE);
}

printf("Events (0x%x): POLLIN is %sset\n", pollfd.revents,
(pollfd.revents & POLLIN) ? "" : "not ");

exit(EXIT_SUCCESS);
}

SEE ALSO
clone(2), kill(2), pidfd_send_signal(2), poll(2), select(2),
epoll(7)

--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

2019-09-25 11:43:59

by Florian Weimer

[permalink] [raw]

Subject: Re: For review: pidfd_open(2) manual page

* Michael Kerrisk:

> SYNOPSIS
> int pidfd_open(pid_t pid, unsigned int flags);

Should this mention <sys/types.h> for pid_t?

> ERRORS
> EINVAL flags is not 0.
>
> EINVAL pid is not valid.
>
> ESRCH The process specified by pid does not exist.

Presumably, EMFILE and ENFILE are also possible errors, and so is
ENOMEM.

> A PID file descriptor can be monitored using poll(2), select(2),
> and epoll(7). When the process that it refers to terminates, the
> file descriptor indicates as readable. Note, however, that in the
> current implementation, nothing can be read from the file descrip‐
> tor.

“is indicated as readable” or “becomes readable”? Will reading block?

> The pidfd_open() system call is the preferred way of obtaining a
> PID file descriptor. The alternative is to obtain a file descrip‐
> tor by opening a /proc/[pid] directory. However, the latter tech‐
> nique is possible only if the proc(5) file system is mounted; fur‐
> thermore, the file descriptor obtained in this way is not pol‐
> lable.

One question is whether the glibc wrapper should fall back back to the
/proc subdirectory if it is not available. Probably not.

> static
> int pidfd_open(pid_t pid, unsigned int flags)
> {
> return syscall(__NR_pidfd_open, pid, flags);
> }

Please call this function something else (not pidfd_open), so that the
example continues to work if glibc provides the system call wrapper.

2019-09-25 11:58:21

by Daniel Colascione

[permalink] [raw]

Subject: Re: For review: pidfd_open(2) manual page

On Mon, Sep 23, 2019 at 3:53 AM Florian Weimer <[email protected]> wrote:
>
> * Michael Kerrisk:
>
> > SYNOPSIS
> > int pidfd_open(pid_t pid, unsigned int flags);
>
> Should this mention <sys/types.h> for pid_t?
>
> > ERRORS
> > EINVAL flags is not 0.
> >
> > EINVAL pid is not valid.
> >
> > ESRCH The process specified by pid does not exist.
>
> Presumably, EMFILE and ENFILE are also possible errors, and so is
> ENOMEM.
>
> > A PID file descriptor can be monitored using poll(2), select(2),
> > and epoll(7). When the process that it refers to terminates, the
> > file descriptor indicates as readable.

The phrase "becomes readable" is simpler than "indicates as readable"
and conveys the same meaning. I agree with Florian's comment on this
point below.

> > Note, however, that in the
> > current implementation, nothing can be read from the file descrip‐
> > tor.
>
> “is indicated as readable” or “becomes readable”? Will reading block?
>
> > The pidfd_open() system call is the preferred way of obtaining a
> > PID file descriptor. The alternative is to obtain a file descrip‐
> > tor by opening a /proc/[pid] directory. However, the latter tech‐
> > nique is possible only if the proc(5) file system is mounted; fur‐
> > thermore, the file descriptor obtained in this way is not pol‐
> > lable.

Referring to procfs directory FDs as pidfds will probably confuse
people. I'd just omit this paragraph.

> One question is whether the glibc wrapper should fall back back to the
> /proc subdirectory if it is not available. Probably not.

I'd prefer that glibc not provide this kind of fallback.
posix_fallocate-style emulation is, IMHO, too surprising.

2019-09-25 14:52:42

by Christian Brauner

[permalink] [raw]

Subject: Re: For review: pidfd_open(2) manual page

On Mon, Sep 23, 2019 at 12:53:09PM +0200, Florian Weimer wrote:
> * Michael Kerrisk:
>
> > SYNOPSIS
> > int pidfd_open(pid_t pid, unsigned int flags);
>
> Should this mention <sys/types.h> for pid_t?
>
> > ERRORS
> > EINVAL flags is not 0.
> >
> > EINVAL pid is not valid.
> >
> > ESRCH The process specified by pid does not exist.
>
> Presumably, EMFILE and ENFILE are also possible errors, and so is
> ENOMEM.

So, error codes that could surface are:
EMFILE: too many open files
ENODEV: the anon inode filesystem is not available in this kernel (unlikely)
ENOMEM: not enough memory (to allocate the backing struct file)
ENFILE: you're over the max_files limit which can be set through proc

I think that should be it.

>
> > A PID file descriptor can be monitored using poll(2), select(2),
> > and epoll(7). When the process that it refers to terminates, the
> > file descriptor indicates as readable. Note, however, that in the
> > current implementation, nothing can be read from the file descrip‐
> > tor.
>
> “is indicated as readable” or “becomes readable”? Will reading block?
>
> > The pidfd_open() system call is the preferred way of obtaining a
> > PID file descriptor. The alternative is to obtain a file descrip‐
> > tor by opening a /proc/[pid] directory. However, the latter tech‐
> > nique is possible only if the proc(5) file system is mounted; fur‐
> > thermore, the file descriptor obtained in this way is not pol‐
> > lable.
>
> One question is whether the glibc wrapper should fall back back to the
> /proc subdirectory if it is not available. Probably not.

No, that would not be transparent to userspace. Especially because both
fds differ in what can be done with them.

>
> > static
> > int pidfd_open(pid_t pid, unsigned int flags)
> > {
> > return syscall(__NR_pidfd_open, pid, flags);
> > }
>
> Please call this function something else (not pidfd_open), so that the
> example continues to work if glibc provides the system call wrapper.

Agreed!

2019-09-25 17:43:05

by Christian Brauner

[permalink] [raw]

Subject: Re: For review: pidfd_open(2) manual page

On Mon, Sep 23, 2019 at 11:11:53AM +0200, Michael Kerrisk (man-pages) wrote:
> Hello Christian and all,
>
> Below, I have the rendered version of the current draft of
> the pidfd_open(2) manual page that I have written.
> The page source can be found in a Git branch at:
> https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/log/?h=draft_pidfd
>
> I would be pleased to receive corrections and notes on any
> details that should be added. (For example, are there error
> cases that I have missed?)
>
> Would you be able to review please?

Again, thank you Michael for doing this!

>
> Thanks,
>
> Michael
>
>
> NAME
> pidfd_open - obtain a file descriptor that refers to a process
>
> SYNOPSIS
> int pidfd_open(pid_t pid, unsigned int flags);
>
> DESCRIPTION
> The pidfd_open() system creates a file descriptor that refers to

s/system/system call/

> the process whose PID is specified in pid. The file descriptor is
> returned as the function result; the close-on-exec flag is set on
> the file descriptor.
>
> The flags argument is reserved for future use; currently, this
> argument must be specified as 0.
>
> RETURN VALUE
> On success, pidfd_open() returns a nonnegative file descriptor.
> On success, -1 is returned and errno is set to indicate the cause

s/On success/On error/g

> of the error.
>
> ERRORS
> EINVAL flags is not 0.
>
> EINVAL pid is not valid.
>
> ESRCH The process specified by pid does not exist.
>
> VERSIONS
> pidfd_open() first appeared in Linux 5.3.
>
> CONFORMING TO
> pidfd_open() is Linux specific.
>
> NOTES
> Currently, there is no glibc wrapper for this system call; call it
> using syscall(2).
>
> The pidfd_send_signal(2) system call can be used to send a signal
> to the process referred to by a PID file descriptor.
>
> A PID file descriptor can be monitored using poll(2), select(2),
> and epoll(7). When the process that it refers to terminates, the
> file descriptor indicates as readable. Note, however, that in the

Not a native English speaker but should this be "indicates it is
readable"?

> current implementation, nothing can be read from the file descrip‐
> tor.
>
> The pidfd_open() system call is the preferred way of obtaining a
> PID file descriptor. The alternative is to obtain a file descrip‐
> tor by opening a /proc/[pid] directory. However, the latter tech‐
> nique is possible only if the proc(5) file system is mounted; fur‐
> thermore, the file descriptor obtained in this way is not pol‐
> lable.

I mentioned this already in the CLONE_PIDFD manpage, we should probably
not make a big deal out of this and not mention /proc/<pid> here at all.
(Crazy idea, but we could also have a config option that allows you to
turn of proc-pid-dirfds as pidfds if we start to feel really strongly
about this or a sysctl whatever...)

>
> See also the discussion of the CLONE_PIDFD flag in clone(2).
>
> EXAMPLE
> The program below opens a PID file descriptor for the process
> whose PID is specified as its command-line argument. It then mon‐
> itors the file descriptor for readability (POLLIN) using poll(2).

Yeah, maybe say "monitors the file descriptor for process exit indicated
by an EPOLLIN event" or something. Readability might be confusing.

> When the process with the specified by PID terminates, poll(2)
> returns, and indicates that the file descriptor is readable.

See comment above "readable". (I'm on my phone and I think someone
pointed this out already.)

>
> Program source
>
> #define _GNU_SOURCE
> #include <sys/syscall.h>
> #include <unistd.h>
> #include <poll.h>
> #include <stdlib.h>
> #include <stdio.h>
>
> #ifndef __NR_pidfd_open
> #define __NR_pidfd_open 434
> #endif

Alpha is special... (and not in a good way).
So you would need to special case Alpha since that's the only arch where
we haven't been able to unify syscall numbering. :D
But it's not super important.

I like the program example.

>
> static
> int pidfd_open(pid_t pid, unsigned int flags)
> {
> return syscall(__NR_pidfd_open, pid, flags);
> }
>
> int
> main(int argc, char *argv[])
> {
> struct pollfd pollfd;
> int pidfd, ready;
>
> if (argc != 2) {
> fprintf(stderr, "Usage: %s <pid>\n", argv[0]);
> exit(EXIT_SUCCESS);
> }
>
> pidfd = pidfd_open(atoi(argv[1]), 0);
> if (pidfd == -1) {
> perror("pidfd_open");
> exit(EXIT_FAILURE);
> }
>
> pollfd.fd = pidfd;
> pollfd.events = POLLIN;
>
> ready = poll(&pollfd, 1, -1);
> if (ready == -1) {
> perror("poll");
> exit(EXIT_FAILURE);
> }
>
> printf("Events (0x%x): POLLIN is %sset\n", pollfd.revents,
> (pollfd.revents & POLLIN) ? "" : "not ");
>
> exit(EXIT_SUCCESS);
> }
>
> SEE ALSO
> clone(2), kill(2), pidfd_send_signal(2), poll(2), select(2),
> epoll(7)
>
>
> --
> Michael Kerrisk
> Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
> Linux/UNIX System Programming Training: http://man7.org/training/

2019-09-25 23:35:13

by Michael Kerrisk (man-pages)

[permalink] [raw]

Subject: Re: For review: pidfd_open(2) manual page

Hello Florian,

Thanks for taking a look at this page.

On 9/23/19 12:53 PM, Florian Weimer wrote:
> * Michael Kerrisk:
>
>> SYNOPSIS
>> int pidfd_open(pid_t pid, unsigned int flags);
>
> Should this mention <sys/types.h> for pid_t?

Seems reasonable. I added this.

>> ERRORS
>> EINVAL flags is not 0.
>>
>> EINVAL pid is not valid.
>>
>> ESRCH The process specified by pid does not exist.
>
> Presumably, EMFILE and ENFILE are also possible errors, and so is
> ENOMEM.

Thanks. I've added those.

>> A PID file descriptor can be monitored using poll(2), select(2),
>> and epoll(7). When the process that it refers to terminates, the
>> file descriptor indicates as readable. Note, however, that in the
>> current implementation, nothing can be read from the file descrip‐
>> tor.
>
> “is indicated as readable” or “becomes readable”? Will reading block?

It won't block. Reads from a pidfd always fail with the error EINVAL
(regardless of whether the target process has terminated).

I specifically wanted to avoid "becomes readable" to avoid any
suggestion that read() does something for a pidfd. I thought
"indicates as readable" was fine, but you, Christian and Joel
all called this wording out, so I changed this to:

"When the process that it refers to terminates,
these interfaces indicate the file descriptor as readable."

>> The pidfd_open() system call is the preferred way of obtaining a
>> PID file descriptor. The alternative is to obtain a file descrip‐
>> tor by opening a /proc/[pid] directory. However, the latter tech‐
>> nique is possible only if the proc(5) file system is mounted; fur‐
>> thermore, the file descriptor obtained in this way is not pol‐
>> lable.
>
> One question is whether the glibc wrapper should fall back back to the
> /proc subdirectory if it is not available. Probably not.

No, since the FD returned by opening /proc/PID is less functional
(it is not pollable) than the one returned by pidfd_open().

>> static
>> int pidfd_open(pid_t pid, unsigned int flags)
>> {
>> return syscall(__NR_pidfd_open, pid, flags);
>> }
>
> Please call this function something else (not pidfd_open), so that the
> example continues to work if glibc provides the system call wrapper.

I figured that if the syscall does get added to glibc, then I would
modify the example. In the meantime, this does seem the most natural
way of doing things, since the example then uses the real syscall
name as it would be used if there were a wrapper function.

But, this leads to the question: what do you think the likelihood
is that this system call will land in glibc?

Thanks for your feedback, Florian. I've pushed various changes
to the Git branch at
https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/log/?h=draft_pidfd

Cheers,

Michael

--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

2019-09-25 23:35:36

by Michael Kerrisk (man-pages)

[permalink] [raw]

Subject: Re: For review: pidfd_open(2) manual page

Hello Christian,

On 9/23/19 4:47 PM, Christian Brauner wrote:
> On Mon, Sep 23, 2019 at 12:53:09PM +0200, Florian Weimer wrote:
>> * Michael Kerrisk:
>>
>>> SYNOPSIS
>>> int pidfd_open(pid_t pid, unsigned int flags);
>>
>> Should this mention <sys/types.h> for pid_t?
>>
>>> ERRORS
>>> EINVAL flags is not 0.
>>>
>>> EINVAL pid is not valid.
>>>
>>> ESRCH The process specified by pid does not exist.
>>
>> Presumably, EMFILE and ENFILE are also possible errors, and so is
>> ENOMEM.
>
> So, error codes that could surface are:
> EMFILE: too many open files
> ENODEV: the anon inode filesystem is not available in this kernel (unlikely)
> ENOMEM: not enough memory (to allocate the backing struct file)
> ENFILE: you're over the max_files limit which can be set through proc
>
> I think that should be it.

Thanks. I've added those.
>>> A PID file descriptor can be monitored using poll(2), select(2),
>>> and epoll(7). When the process that it refers to terminates, the
>>> file descriptor indicates as readable. Note, however, that in the
>>> current implementation, nothing can be read from the file descrip‐
>>> tor.
>>
>> “is indicated as readable” or “becomes readable”? Will reading block?
>>
>>> The pidfd_open() system call is the preferred way of obtaining a
>>> PID file descriptor. The alternative is to obtain a file descrip‐
>>> tor by opening a /proc/[pid] directory. However, the latter tech‐
>>> nique is possible only if the proc(5) file system is mounted; fur‐
>>> thermore, the file descriptor obtained in this way is not pol‐
>>> lable.
>>
>> One question is whether the glibc wrapper should fall back back to the
>> /proc subdirectory if it is not available. Probably not.
>
> No, that would not be transparent to userspace. Especially because both
> fds differ in what can be done with them.
>
>>
>>> static
>>> int pidfd_open(pid_t pid, unsigned int flags)
>>> {
>>> return syscall(__NR_pidfd_open, pid, flags);
>>> }
>>
>> Please call this function something else (not pidfd_open), so that the
>> example continues to work if glibc provides the system call wrapper.
>
> Agreed!

See my reply to Florian. (So far, I didn't change anything here.)

Thanks,

Michael

--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

2019-09-25 23:36:39

by Michael Kerrisk (man-pages)

[permalink] [raw]

Subject: Re: For review: pidfd_open(2) manual page

Hello Daniel,

Than you for reviewing the page!

On 9/23/19 1:26 PM, Daniel Colascione wrote:
> On Mon, Sep 23, 2019 at 3:53 AM Florian Weimer <[email protected]> wrote:
>>
>> * Michael Kerrisk:
>>
>>> SYNOPSIS
>>> int pidfd_open(pid_t pid, unsigned int flags);
>>
>> Should this mention <sys/types.h> for pid_t?
>>
>>> ERRORS
>>> EINVAL flags is not 0.
>>>
>>> EINVAL pid is not valid.
>>>
>>> ESRCH The process specified by pid does not exist.
>>
>> Presumably, EMFILE and ENFILE are also possible errors, and so is
>> ENOMEM.
>>
>>> A PID file descriptor can be monitored using poll(2), select(2),
>>> and epoll(7). When the process that it refers to terminates, the
>>> file descriptor indicates as readable.
>
> The phrase "becomes readable" is simpler than "indicates as readable"
> and conveys the same meaning. I agree with Florian's comment on this
> point below.

See my reply to Florian. (I did change the text here.)

>>> Note, however, that in the
>>> current implementation, nothing can be read from the file descrip‐
>>> tor.
>>
>> “is indicated as readable” or “becomes readable”? Will reading block?
>>
>>> The pidfd_open() system call is the preferred way of obtaining a
>>> PID file descriptor. The alternative is to obtain a file descrip‐
>>> tor by opening a /proc/[pid] directory. However, the latter tech‐
>>> nique is possible only if the proc(5) file system is mounted; fur‐
>>> thermore, the file descriptor obtained in this way is not pol‐
>>> lable.
>
> Referring to procfs directory FDs as pidfds will probably confuse
> people. I'd just omit this paragraph.

See my reply to Christian (and feel free to argue the point, please).
So far, I have made no change here.

>> One question is whether the glibc wrapper should fall back back to the
>> /proc subdirectory if it is not available. Probably not.
>
> I'd prefer that glibc not provide this kind of fallback.
> posix_fallocate-style emulation is, IMHO, too surprising.

Agreed.

Cheers,

Michael

--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

2019-09-25 23:50:01

by Michael Kerrisk (man-pages)

[permalink] [raw]

Subject: Re: For review: pidfd_open(2) manual page

Hello Florian,

On 9/23/19 10:41 PM, Florian Weimer wrote:
> * Michael Kerrisk:
>
>>>> static
>>>> int pidfd_open(pid_t pid, unsigned int flags)
>>>> {
>>>> return syscall(__NR_pidfd_open, pid, flags);
>>>> }
>>>
>>> Please call this function something else (not pidfd_open), so that the
>>> example continues to work if glibc provides the system call wrapper.
>>
>> I figured that if the syscall does get added to glibc, then I would
>> modify the example. In the meantime, this does seem the most natural
>> way of doing things, since the example then uses the real syscall
>> name as it would be used if there were a wrapper function.
>
> The problem is that programs do this as well, so they fail to build
> once they are built on a newer glibc version.

But isn't such a failure a good thing? I mean: it encourages
people to rid their programs of uses of syscall(2).

>> But, this leads to the question: what do you think the likelihood
>> is that this system call will land in glibc?
>
> Quite likely. It's easy enough to document, there are no P&C issues,
> and it doesn't need any new types.

Okay.

> pidfd_send_signal is slightly more difficult because we probably need
> to add rt_sigqueueinfo first, for consistency.

Okay. I see that's a little more problematic.

Cheers,

Michael

--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

2019-09-26 00:47:29

by Christian Brauner

[permalink] [raw]

Subject: Re: For review: pidfd_open(2) manual page

On Mon, Sep 23, 2019 at 10:41:19PM +0200, Florian Weimer wrote:
> * Michael Kerrisk:
>
> >>> static
> >>> int pidfd_open(pid_t pid, unsigned int flags)
> >>> {
> >>> return syscall(__NR_pidfd_open, pid, flags);
> >>> }
> >>
> >> Please call this function something else (not pidfd_open), so that the
> >> example continues to work if glibc provides the system call wrapper.
> >
> > I figured that if the syscall does get added to glibc, then I would
> > modify the example. In the meantime, this does seem the most natural
> > way of doing things, since the example then uses the real syscall
> > name as it would be used if there were a wrapper function.
>
> The problem is that programs do this as well, so they fail to build
> once they are built on a newer glibc version.
>
> > But, this leads to the question: what do you think the likelihood
> > is that this system call will land in glibc?
>
> Quite likely. It's easy enough to document, there are no P&C issues,
> and it doesn't need any new types.

My previous mail probably didn't make it so here it is again: I think
especially with the recently established glibc consensus to provide
wrappers for all new system calls (with some sensible exceptions) I'd
expect this to be the case.

>
> pidfd_send_signal is slightly more difficult because we probably need
> to add rt_sigqueueinfo first, for consistency.

Oh, huh. Somehow I thought we already provide that.

Christian

2019-09-26 02:53:47

by Michael Kerrisk (man-pages)

[permalink] [raw]

Subject: Re: For review: pidfd_open(2) manual page

Hello Christian,

On 9/23/19 4:38 PM, Christian Brauner wrote:
> On Mon, Sep 23, 2019 at 11:11:53AM +0200, Michael Kerrisk (man-pages) wrote:
>> Hello Christian and all,
>>
>> Below, I have the rendered version of the current draft of
>> the pidfd_open(2) manual page that I have written.
>> The page source can be found in a Git branch at:
>> https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/log/?h=draft_pidfd
>>
>> I would be pleased to receive corrections and notes on any
>> details that should be added. (For example, are there error
>> cases that I have missed?)
>>
>> Would you be able to review please?
>
> Again, thank you Michael for doing this!
>
>>
>> Thanks,
>>
>> Michael
>>
>>
>> NAME
>> pidfd_open - obtain a file descriptor that refers to a process
>>
>> SYNOPSIS
>> int pidfd_open(pid_t pid, unsigned int flags);
>>
>> DESCRIPTION
>> The pidfd_open() system creates a file descriptor that refers to
>
> s/system/system call/

Fixed.

>> the process whose PID is specified in pid. The file descriptor is
>> returned as the function result; the close-on-exec flag is set on
>> the file descriptor.
>>
>> The flags argument is reserved for future use; currently, this
>> argument must be specified as 0.
>>
>> RETURN VALUE
>> On success, pidfd_open() returns a nonnegative file descriptor.
>> On success, -1 is returned and errno is set to indicate the cause
>
> s/On success/On error/g

Fixed.

>> of the error.
>>
>> ERRORS
>> EINVAL flags is not 0.
>>
>> EINVAL pid is not valid.
>>
>> ESRCH The process specified by pid does not exist.
>>
>> VERSIONS
>> pidfd_open() first appeared in Linux 5.3.
>>
>> CONFORMING TO
>> pidfd_open() is Linux specific.
>>
>> NOTES
>> Currently, there is no glibc wrapper for this system call; call it
>> using syscall(2).
>>
>> The pidfd_send_signal(2) system call can be used to send a signal
>> to the process referred to by a PID file descriptor.
>>
>> A PID file descriptor can be monitored using poll(2), select(2),
>> and epoll(7). When the process that it refers to terminates, the
>> file descriptor indicates as readable. Note, however, that in the
>
> Not a native English speaker but should this be "indicates it is
> readable"?

See my reply to Florian.

>> current implementation, nothing can be read from the file descrip‐
>> tor.
>>
>> The pidfd_open() system call is the preferred way of obtaining a
>> PID file descriptor. The alternative is to obtain a file descrip‐
>> tor by opening a /proc/[pid] directory. However, the latter tech‐
>> nique is possible only if the proc(5) file system is mounted; fur‐
>> thermore, the file descriptor obtained in this way is not pol‐
>> lable.
>
> I mentioned this already in the CLONE_PIDFD manpage, we should probably
> not make a big deal out of this and not mention /proc/<pid> here at all.

The thing is, people *will* learn about these two different types
of FDs, whether we document them or not. So, I think it's better to
be up front about what's available, and make a suitably strong
recommendation about the preferred technique.

Reading between the lines, it sounds like just a couple of releases
after it was implemented, you're saying that implementing
open(/proc/PID) was a mistake?

> (Crazy idea, but we could also have a config option that allows you to
> turn of proc-pid-dirfds as pidfds if we start to feel really strongly
> about this or a sysctl whatever...)
>
>>
>> See also the discussion of the CLONE_PIDFD flag in clone(2).
>>
>> EXAMPLE
>> The program below opens a PID file descriptor for the process
>> whose PID is specified as its command-line argument. It then mon‐
>> itors the file descriptor for readability (POLLIN) using poll(2).
>
> Yeah, maybe say "monitors the file descriptor for process exit indicated
> by an EPOLLIN event" or something. Readability might be confusing.

I like that suggestion! I reworded to something close to what you suggest.

>> When the process with the specified by PID terminates, poll(2)
>> returns, and indicates that the file descriptor is readable.
>
> See comment above "readable". (I'm on my phone and I think someone
> pointed this out already.)

Actually, I think I can just remove that sentence. It doesn't really
add much.

>> Program source
>>
>> #define _GNU_SOURCE
>> #include <sys/syscall.h>
>> #include <unistd.h>
>> #include <poll.h>
>> #include <stdlib.h>
>> #include <stdio.h>
>>
>> #ifndef __NR_pidfd_open
>> #define __NR_pidfd_open 434
>> #endif
>
> Alpha is special... (and not in a good way).
> So you would need to special case Alpha since that's the only arch where
> we haven't been able to unify syscall numbering. :D
> But it's not super important.

Okay.

> I like the program example.

Good.

Thanks for reviewing! I've pushed various changes
to the Git branch at
https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/log/?h=draft_pidfd

Cheers,

Michael

>>
>> static
>> int pidfd_open(pid_t pid, unsigned int flags)
>> {
>> return syscall(__NR_pidfd_open, pid, flags);
>> }
>>
>> int
>> main(int argc, char *argv[])
>> {
>> struct pollfd pollfd;
>> int pidfd, ready;
>>
>> if (argc != 2) {
>> fprintf(stderr, "Usage: %s <pid>\n", argv[0]);
>> exit(EXIT_SUCCESS);
>> }
>>
>> pidfd = pidfd_open(atoi(argv[1]), 0);
>> if (pidfd == -1) {
>> perror("pidfd_open");
>> exit(EXIT_FAILURE);
>> }
>>
>> pollfd.fd = pidfd;
>> pollfd.events = POLLIN;
>>
>> ready = poll(&pollfd, 1, -1);
>> if (ready == -1) {
>> perror("poll");
>> exit(EXIT_FAILURE);
>> }
>>
>> printf("Events (0x%x): POLLIN is %sset\n", pollfd.revents,
>> (pollfd.revents & POLLIN) ? "" : "not ");
>>
>> exit(EXIT_SUCCESS);
>> }
>>
>> SEE ALSO
>> clone(2), kill(2), pidfd_send_signal(2), poll(2), select(2),
>> epoll(7)
>>
>>
>> --
>> Michael Kerrisk
>> Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
>> Linux/UNIX System Programming Training: http://man7.org/training/
>

--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

2019-09-26 04:57:34

by Florian Weimer

[permalink] [raw]

Subject: Re: For review: pidfd_open(2) manual page

* Michael Kerrisk:

>>> static
>>> int pidfd_open(pid_t pid, unsigned int flags)
>>> {
>>> return syscall(__NR_pidfd_open, pid, flags);
>>> }
>>
>> Please call this function something else (not pidfd_open), so that the
>> example continues to work if glibc provides the system call wrapper.
>
> I figured that if the syscall does get added to glibc, then I would
> modify the example. In the meantime, this does seem the most natural
> way of doing things, since the example then uses the real syscall
> name as it would be used if there were a wrapper function.

The problem is that programs do this as well, so they fail to build
once they are built on a newer glibc version.

> But, this leads to the question: what do you think the likelihood
> is that this system call will land in glibc?

Quite likely. It's easy enough to document, there are no P&C issues,
and it doesn't need any new types.

pidfd_send_signal is slightly more difficult because we probably need
to add rt_sigqueueinfo first, for consistency.