2019-09-24 17:01:19

by Florian Weimer

[permalink] [raw]
Subject: Re: For review: pidfd_send_signal(2) manual page

* Michael Kerrisk:

> SYNOPSIS
> int pidfd_send_signal(int pidfd, int sig, siginfo_t info,
> unsigned int flags);

This probably should reference a header for siginfo_t.

> ESRCH The target process does not exist.

If the descriptor is valid, does this mean the process has been waited
for? Maybe this can be made more explicit.

> The pidfd_send_signal() system call allows the avoidance of race
> conditions that occur when using traditional interfaces (such as
> kill(2)) to signal a process. The problem is that the traditional
> interfaces specify the target process via a process ID (PID), with
> the result that the sender may accidentally send a signal to the
> wrong process if the originally intended target process has termi‐
> nated and its PID has been recycled for another process. By con‐
> trast, a PID file descriptor is a stable reference to a specific
> process; if that process terminates, then the file descriptor
> ceases to be valid and the caller of pidfd_send_signal() is
> informed of this fact via an ESRCH error.

It would be nice to explain somewhere how you can avoid the race using
a PID descriptor. Is there anything else besides CLONE_PIDFD?

> static
> int pidfd_send_signal(int pidfd, int sig, siginfo_t *info,
> unsigned int flags)
> {
> return syscall(__NR_pidfd_send_signal, pidfd, sig, info, flags);
> }

Please use a different function name. Thanks.


2019-09-25 17:41:24

by Christian Brauner

[permalink] [raw]
Subject: Re: For review: pidfd_send_signal(2) manual page

On Mon, Sep 23, 2019 at 01:26:34PM +0200, Florian Weimer wrote:
> * Michael Kerrisk:
>
> > SYNOPSIS
> > int pidfd_send_signal(int pidfd, int sig, siginfo_t info,
> > unsigned int flags);
>
> This probably should reference a header for siginfo_t.

Agreed.

>
> > ESRCH The target process does not exist.
>
> If the descriptor is valid, does this mean the process has been waited
> for? Maybe this can be made more explicit.

If by valid you mean "refers to a process/thread-group leader" aka is a
pidfd then yes: Getting ESRCH means that the process has exited and has
already been waited upon.
If it had only exited but not waited upon aka is a zombie, then sending
a signal will just work because that's currently how sending signals to
zombies works, i.e. if you only send a signal and don't do any
additional checks you won't notice a difference between a process being
alive and a process being a zombie. The userspace visible behavior in
terms of signaling them is identical.

>
> > The pidfd_send_signal() system call allows the avoidance of race
> > conditions that occur when using traditional interfaces (such as
> > kill(2)) to signal a process. The problem is that the traditional
> > interfaces specify the target process via a process ID (PID), with
> > the result that the sender may accidentally send a signal to the
> > wrong process if the originally intended target process has termi‐
> > nated and its PID has been recycled for another process. By con‐
> > trast, a PID file descriptor is a stable reference to a specific
> > process; if that process terminates, then the file descriptor
> > ceases to be valid and the caller of pidfd_send_signal() is
> > informed of this fact via an ESRCH error.
>
> It would be nice to explain somewhere how you can avoid the race using
> a PID descriptor. Is there anything else besides CLONE_PIDFD?

If you're the parent of the process you can do this without CLONE_PIDFD:
pid = fork();
pidfd = pidfd_open();
ret = pidfd_send_signal(pidfd, 0, NULL, 0);
if (ret < 0 && errno == ESRCH)
/* pidfd refers to another, recycled process */

>
> > static
> > int pidfd_send_signal(int pidfd, int sig, siginfo_t *info,
> > unsigned int flags)
> > {
> > return syscall(__NR_pidfd_send_signal, pidfd, sig, info, flags);
> > }
>
> Please use a different function name. Thanks.

Subject: Re: For review: pidfd_send_signal(2) manual page

Hello Florian,

On 9/23/19 1:26 PM, Florian Weimer wrote:
> * Michael Kerrisk:
>
>> SYNOPSIS
>> int pidfd_send_signal(int pidfd, int sig, siginfo_t info,
>> unsigned int flags);
>
> This probably should reference a header for siginfo_t.

Thanks. I added: #include <signal.h>

>> ESRCH The target process does not exist.
>
> If the descriptor is valid, does this mean the process has been waited
> for? Maybe this can be made more explicit.

Yes. I added "(i.e., it has terminated and been waited on)".

>> The pidfd_send_signal() system call allows the avoidance of race
>> conditions that occur when using traditional interfaces (such as
>> kill(2)) to signal a process. The problem is that the traditional
>> interfaces specify the target process via a process ID (PID), with
>> the result that the sender may accidentally send a signal to the
>> wrong process if the originally intended target process has termi‐
>> nated and its PID has been recycled for another process. By con‐
>> trast, a PID file descriptor is a stable reference to a specific
>> process; if that process terminates, then the file descriptor
>> ceases to be valid and the caller of pidfd_send_signal() is
>> informed of this fact via an ESRCH error.
>
> It would be nice to explain somewhere how you can avoid the race using
> a PID descriptor. Is there anything else besides CLONE_PIDFD?

Please see my comment in reply to Christian (which will be sent just
after this).

>> static
>> int pidfd_send_signal(int pidfd, int sig, siginfo_t *info,
>> unsigned int flags)
>> {
>> return syscall(__NR_pidfd_send_signal, pidfd, sig, info, flags);
>> }
>
> Please use a different function name. Thanks.

Please see my open question in the thread on pidfd_open().

Thanks for the review, Florian.

Cheers,

Michael

Michael

--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

Subject: Re: For review: pidfd_send_signal(2) manual page

Hello Christian,

On 9/23/19 4:23 PM, Christian Brauner wrote:
> On Mon, Sep 23, 2019 at 01:26:34PM +0200, Florian Weimer wrote:
>> * Michael Kerrisk:
>>
>>> SYNOPSIS
>>> int pidfd_send_signal(int pidfd, int sig, siginfo_t info,
>>> unsigned int flags);
>>
>> This probably should reference a header for siginfo_t.
>
> Agreed.
>
>>
>>> ESRCH The target process does not exist.
>>
>> If the descriptor is valid, does this mean the process has been waited
>> for? Maybe this can be made more explicit.
>
> If by valid you mean "refers to a process/thread-group leader" aka is a
> pidfd then yes: Getting ESRCH means that the process has exited and has
> already been waited upon.
> If it had only exited but not waited upon aka is a zombie, then sending
> a signal will just work because that's currently how sending signals to
> zombies works, i.e. if you only send a signal and don't do any
> additional checks you won't notice a difference between a process being
> alive and a process being a zombie. The userspace visible behavior in
> terms of signaling them is identical.

(Thanks for the clarification. I added the text "(i.e., it has
terminated and been waited on)" to the ESRCH error.)

>>> The pidfd_send_signal() system call allows the avoidance of race
>>> conditions that occur when using traditional interfaces (such as
>>> kill(2)) to signal a process. The problem is that the traditional
>>> interfaces specify the target process via a process ID (PID), with
>>> the result that the sender may accidentally send a signal to the
>>> wrong process if the originally intended target process has termi‐
>>> nated and its PID has been recycled for another process. By con‐
>>> trast, a PID file descriptor is a stable reference to a specific
>>> process; if that process terminates, then the file descriptor
>>> ceases to be valid and the caller of pidfd_send_signal() is
>>> informed of this fact via an ESRCH error.
>>
>> It would be nice to explain somewhere how you can avoid the race using
>> a PID descriptor. Is there anything else besides CLONE_PIDFD?
>
> If you're the parent of the process you can do this without CLONE_PIDFD:
> pid = fork();
> pidfd = pidfd_open();
> ret = pidfd_send_signal(pidfd, 0, NULL, 0);
> if (ret < 0 && errno == ESRCH)
> /* pidfd refers to another, recycled process */

Although there is still the race between the fork() and the
pidfd_open(), right?

>>> static
>>> int pidfd_send_signal(int pidfd, int sig, siginfo_t *info,
>>> unsigned int flags)
>>> {
>>> return syscall(__NR_pidfd_send_signal, pidfd, sig, info, flags);
>>> }
>>
>> Please use a different function name. Thanks.

Covered in another thread. I await some further feedback from Florian.

Thanks,

Michael



--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

2019-09-26 08:46:09

by Christian Brauner

[permalink] [raw]
Subject: Re: For review: pidfd_send_signal(2) manual page

On Tue, Sep 24, 2019 at 09:44:49PM +0200, Michael Kerrisk (man-pages) wrote:
> Hello Christian,
>
> On 9/23/19 4:23 PM, Christian Brauner wrote:
> > On Mon, Sep 23, 2019 at 01:26:34PM +0200, Florian Weimer wrote:
> >> * Michael Kerrisk:
> >>
> >>> SYNOPSIS
> >>> int pidfd_send_signal(int pidfd, int sig, siginfo_t info,
> >>> unsigned int flags);
> >>
> >> This probably should reference a header for siginfo_t.
> >
> > Agreed.
> >
> >>
> >>> ESRCH The target process does not exist.
> >>
> >> If the descriptor is valid, does this mean the process has been waited
> >> for? Maybe this can be made more explicit.
> >
> > If by valid you mean "refers to a process/thread-group leader" aka is a
> > pidfd then yes: Getting ESRCH means that the process has exited and has
> > already been waited upon.
> > If it had only exited but not waited upon aka is a zombie, then sending
> > a signal will just work because that's currently how sending signals to
> > zombies works, i.e. if you only send a signal and don't do any
> > additional checks you won't notice a difference between a process being
> > alive and a process being a zombie. The userspace visible behavior in
> > terms of signaling them is identical.
>
> (Thanks for the clarification. I added the text "(i.e., it has
> terminated and been waited on)" to the ESRCH error.)
>
> >>> The pidfd_send_signal() system call allows the avoidance of race
> >>> conditions that occur when using traditional interfaces (such as
> >>> kill(2)) to signal a process. The problem is that the traditional
> >>> interfaces specify the target process via a process ID (PID), with
> >>> the result that the sender may accidentally send a signal to the
> >>> wrong process if the originally intended target process has termi‐
> >>> nated and its PID has been recycled for another process. By con‐
> >>> trast, a PID file descriptor is a stable reference to a specific
> >>> process; if that process terminates, then the file descriptor
> >>> ceases to be valid and the caller of pidfd_send_signal() is
> >>> informed of this fact via an ESRCH error.
> >>
> >> It would be nice to explain somewhere how you can avoid the race using
> >> a PID descriptor. Is there anything else besides CLONE_PIDFD?
> >
> > If you're the parent of the process you can do this without CLONE_PIDFD:
> > pid = fork();
> > pidfd = pidfd_open();
> > ret = pidfd_send_signal(pidfd, 0, NULL, 0);
> > if (ret < 0 && errno == ESRCH)
> > /* pidfd refers to another, recycled process */
>
> Although there is still the race between the fork() and the
> pidfd_open(), right?

Actually no and my code is even too complex.
If you are the parent, and this is really a sequence that obeys the
ordering pidfd_open() before waiting:

pid = fork();
if (pid == 0)
exit(EXIT_SUCCESS);
pidfd = pidfd_open(pid, 0);
waitid(pid, ...);

Then you are guaranteed that pidfd will refer to pid. No recycling can
happen since the process has not been waited upon yet (That is,
excluding special cases such as where you have a mainloop where a
callback reacts to a SIGCHLD event and waits on the child behind your
back and your next callback in the mainloop calls pidfd_open() while the
pid has been recycled etc.).
A race could only appear in sequences where waiting happens before
pidfd_open():

pid = fork();
if (pid == 0)
exit(EXIT_SUCCESS);
waitid(pid, ...);
pidfd = pidfd_open(pid, 0);

which honestly simply doesn't make any sense. So if you're the parent
and you combine fork() + pidfd_open() correctly things should be fine
without even having to verify via pidfd_send_signal() (I missed that in
my first mail.).
(Now, it gets more hairy when one considers clone(CLONE_PARENT) but that
would be wildly esoteric because at that point you're using clone()
already and then you should simply pass clone(CLONE_PARENT | CLONE_PIDFD).)

If you're _not_ the parent then CLONE_PIDFD and sending around the pidfd
are your only option to avoiding the race imho.

>
> >>> static
> >>> int pidfd_send_signal(int pidfd, int sig, siginfo_t *info,
> >>> unsigned int flags)
> >>> {
> >>> return syscall(__NR_pidfd_send_signal, pidfd, sig, info, flags);
> >>> }
> >>
> >> Please use a different function name. Thanks.
>
> Covered in another thread. I await some further feedback from Florian.

Right, that wasn't my suggestion anyway. :)

Thanks!
Christian

2019-09-26 08:46:24

by Christian Brauner

[permalink] [raw]
Subject: Re: For review: pidfd_send_signal(2) manual page

On Tue, Sep 24, 2019 at 09:57:04PM +0200, Christian Brauner wrote:
> On Tue, Sep 24, 2019 at 09:44:49PM +0200, Michael Kerrisk (man-pages) wrote:
> > Hello Christian,
> >
> > On 9/23/19 4:23 PM, Christian Brauner wrote:
> > > On Mon, Sep 23, 2019 at 01:26:34PM +0200, Florian Weimer wrote:
> > >> * Michael Kerrisk:
> > >>
> > >>> SYNOPSIS
> > >>> int pidfd_send_signal(int pidfd, int sig, siginfo_t info,
> > >>> unsigned int flags);
> > >>
> > >> This probably should reference a header for siginfo_t.
> > >
> > > Agreed.
> > >
> > >>
> > >>> ESRCH The target process does not exist.
> > >>
> > >> If the descriptor is valid, does this mean the process has been waited
> > >> for? Maybe this can be made more explicit.
> > >
> > > If by valid you mean "refers to a process/thread-group leader" aka is a
> > > pidfd then yes: Getting ESRCH means that the process has exited and has
> > > already been waited upon.
> > > If it had only exited but not waited upon aka is a zombie, then sending
> > > a signal will just work because that's currently how sending signals to
> > > zombies works, i.e. if you only send a signal and don't do any
> > > additional checks you won't notice a difference between a process being
> > > alive and a process being a zombie. The userspace visible behavior in
> > > terms of signaling them is identical.
> >
> > (Thanks for the clarification. I added the text "(i.e., it has
> > terminated and been waited on)" to the ESRCH error.)
> >
> > >>> The pidfd_send_signal() system call allows the avoidance of race
> > >>> conditions that occur when using traditional interfaces (such as
> > >>> kill(2)) to signal a process. The problem is that the traditional
> > >>> interfaces specify the target process via a process ID (PID), with
> > >>> the result that the sender may accidentally send a signal to the
> > >>> wrong process if the originally intended target process has termi‐
> > >>> nated and its PID has been recycled for another process. By con‐
> > >>> trast, a PID file descriptor is a stable reference to a specific
> > >>> process; if that process terminates, then the file descriptor
> > >>> ceases to be valid and the caller of pidfd_send_signal() is
> > >>> informed of this fact via an ESRCH error.
> > >>
> > >> It would be nice to explain somewhere how you can avoid the race using
> > >> a PID descriptor. Is there anything else besides CLONE_PIDFD?
> > >
> > > If you're the parent of the process you can do this without CLONE_PIDFD:
> > > pid = fork();
> > > pidfd = pidfd_open();
> > > ret = pidfd_send_signal(pidfd, 0, NULL, 0);
> > > if (ret < 0 && errno == ESRCH)
> > > /* pidfd refers to another, recycled process */
> >
> > Although there is still the race between the fork() and the
> > pidfd_open(), right?
>
> Actually no and my code is even too complex.
> If you are the parent, and this is really a sequence that obeys the
> ordering pidfd_open() before waiting:
>
> pid = fork();
> if (pid == 0)
> exit(EXIT_SUCCESS);
> pidfd = pidfd_open(pid, 0);
> waitid(pid, ...);
>
> Then you are guaranteed that pidfd will refer to pid. No recycling can
> happen since the process has not been waited upon yet (That is,
> excluding special cases such as where you have a mainloop where a
> callback reacts to a SIGCHLD event and waits on the child behind your
> back and your next callback in the mainloop calls pidfd_open() while the
> pid has been recycled etc.).

If we wanted to be super nitpicky one could also get in that situation
where you do:

signal(SIGCHLD,SIG_IGN);

// or

struct sigaction sa;
sa.sa_handler = SIG_IGN;
sigemptyset(&sa.sa_mask);
sa.sa_flags = 0;
sigaction(SIGCHLD, &sa, 0)

pid = fork();
if (pid == 0)
exit(EXIT_SUCCESS);
pidfd = pidfd_open();

because then the process gets autoreaped and can be recycled. But again,
that's just bad form and in that scenario one should again use
clone(CLONE_PIDFD) instead of fork().

Christian

Subject: Re: For review: pidfd_send_signal(2) manual page

Hello Christian,

>>> If you're the parent of the process you can do this without CLONE_PIDFD:
>>> pid = fork();
>>> pidfd = pidfd_open();
>>> ret = pidfd_send_signal(pidfd, 0, NULL, 0);
>>> if (ret < 0 && errno == ESRCH)
>>> /* pidfd refers to another, recycled process */
>>
>> Although there is still the race between the fork() and the
>> pidfd_open(), right?
>
> Actually no and my code is even too complex.
> If you are the parent, and this is really a sequence that obeys the
> ordering pidfd_open() before waiting:
>
> pid = fork();
> if (pid == 0)
> exit(EXIT_SUCCESS);
> pidfd = pidfd_open(pid, 0);
> waitid(pid, ...);
>
> Then you are guaranteed that pidfd will refer to pid. No recycling can
> happen since the process has not been waited upon yet (That is,

D'oh! Yes, of course.

> excluding special cases such as where you have a mainloop where a
> callback reacts to a SIGCHLD event and waits on the child behind your
> back and your next callback in the mainloop calls pidfd_open() while the
> pid has been recycled etc.).
> A race could only appear in sequences where waiting happens before
> pidfd_open():
>
> pid = fork();
> if (pid == 0)
> exit(EXIT_SUCCESS);
> waitid(pid, ...);
> pidfd = pidfd_open(pid, 0);
>
> which honestly simply doesn't make any sense. So if you're the parent
> and you combine fork() + pidfd_open() correctly things should be fine
> without even having to verify via pidfd_send_signal() (I missed that in
> my first mail.).

Thanks for the additional detail.

I added the following to the pidfd_open() page, to
prevent people making the same thinko as me:

The following code sequence can be used to obtain a file descrip‐
tor for the child of fork(2):

pid = fork();
if (pid > 0) { /* If parent */
pidfd = pidfd_open(pid, 0);
...
}

Even if the child process has already terminated by the time of
the pidfd_open() call, the returned file descriptor is guaranteed
to refer to the child because the parent has not yet waited on the
child (and therefore, the child's ID has not been recycled).

Thanks,

Michael

--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

2019-09-26 08:50:43

by Daniel Colascione

[permalink] [raw]
Subject: Re: For review: pidfd_send_signal(2) manual page

On Tue, Sep 24, 2019 at 2:00 PM Michael Kerrisk (man-pages)
<[email protected]> wrote:
>
> Hello Christian,
>
> >>> If you're the parent of the process you can do this without CLONE_PIDFD:
> >>> pid = fork();
> >>> pidfd = pidfd_open();
> >>> ret = pidfd_send_signal(pidfd, 0, NULL, 0);
> >>> if (ret < 0 && errno == ESRCH)
> >>> /* pidfd refers to another, recycled process */
> >>
> >> Although there is still the race between the fork() and the
> >> pidfd_open(), right?
> >
> > Actually no and my code is even too complex.
> > If you are the parent, and this is really a sequence that obeys the
> > ordering pidfd_open() before waiting:
> >
> > pid = fork();
> > if (pid == 0)
> > exit(EXIT_SUCCESS);
> > pidfd = pidfd_open(pid, 0);
> > waitid(pid, ...);
> >
> > Then you are guaranteed that pidfd will refer to pid. No recycling can
> > happen since the process has not been waited upon yet (That is,
>
> D'oh! Yes, of course.

You still have a race if you're the parent and you have SIGCHLD set to
SIG_IGN though.

> > excluding special cases such as where you have a mainloop where a
> > callback reacts to a SIGCHLD event and waits on the child behind your
> > back and your next callback in the mainloop calls pidfd_open() while the
> > pid has been recycled etc.).

That's a pretty common case though, especially if you're a library.

> > A race could only appear in sequences where waiting happens before
> > pidfd_open():
> >
> > pid = fork();
> > if (pid == 0)
> > exit(EXIT_SUCCESS);
> > waitid(pid, ...);
> > pidfd = pidfd_open(pid, 0);
> >
> > which honestly simply doesn't make any sense. So if you're the parent
> > and you combine fork() + pidfd_open() correctly things should be fine
> > without even having to verify via pidfd_send_signal() (I missed that in
> > my first mail.).
>
> Thanks for the additional detail.
>
> I added the following to the pidfd_open() page, to
> prevent people making the same thinko as me:
>
> The following code sequence can be used to obtain a file descrip‐
> tor for the child of fork(2):
>
> pid = fork();
> if (pid > 0) { /* If parent */
> pidfd = pidfd_open(pid, 0);
> ...
> }
>
> Even if the child process has already terminated by the time of
> the pidfd_open() call, the returned file descriptor is guaranteed
> to refer to the child because the parent has not yet waited on the
> child (and therefore, the child's ID has not been recycled).

I'd prefer that sample code be robust in all cases.

2019-09-26 08:54:58

by Christian Brauner

[permalink] [raw]
Subject: Re: For review: pidfd_send_signal(2) manual page

On Tue, Sep 24, 2019 at 11:00:03PM +0200, Michael Kerrisk (man-pages) wrote:
> Hello Christian,
>
> >>> If you're the parent of the process you can do this without CLONE_PIDFD:
> >>> pid = fork();
> >>> pidfd = pidfd_open();
> >>> ret = pidfd_send_signal(pidfd, 0, NULL, 0);
> >>> if (ret < 0 && errno == ESRCH)
> >>> /* pidfd refers to another, recycled process */
> >>
> >> Although there is still the race between the fork() and the
> >> pidfd_open(), right?
> >
> > Actually no and my code is even too complex.
> > If you are the parent, and this is really a sequence that obeys the
> > ordering pidfd_open() before waiting:
> >
> > pid = fork();
> > if (pid == 0)
> > exit(EXIT_SUCCESS);
> > pidfd = pidfd_open(pid, 0);
> > waitid(pid, ...);
> >
> > Then you are guaranteed that pidfd will refer to pid. No recycling can
> > happen since the process has not been waited upon yet (That is,
>
> D'oh! Yes, of course.
>
> > excluding special cases such as where you have a mainloop where a
> > callback reacts to a SIGCHLD event and waits on the child behind your
> > back and your next callback in the mainloop calls pidfd_open() while the
> > pid has been recycled etc.).
> > A race could only appear in sequences where waiting happens before
> > pidfd_open():
> >
> > pid = fork();
> > if (pid == 0)
> > exit(EXIT_SUCCESS);
> > waitid(pid, ...);
> > pidfd = pidfd_open(pid, 0);
> >
> > which honestly simply doesn't make any sense. So if you're the parent
> > and you combine fork() + pidfd_open() correctly things should be fine
> > without even having to verify via pidfd_send_signal() (I missed that in
> > my first mail.).
>
> Thanks for the additional detail.

You're very welcome.

>
> I added the following to the pidfd_open() page, to
> prevent people making the same thinko as me:
>
> The following code sequence can be used to obtain a file descrip‐
> tor for the child of fork(2):
>
> pid = fork();
> if (pid > 0) { /* If parent */
> pidfd = pidfd_open(pid, 0);
> ...
> }
>
> Even if the child process has already terminated by the time of
> the pidfd_open() call, the returned file descriptor is guaranteed
> to refer to the child because the parent has not yet waited on the
> child (and therefore, the child's ID has not been recycled).

Thanks! I'm fine with the example. The code illustrates the basics. If
you want to go overboard, you can mention my callback example and put my
SIG_IGN code snippet from my earlier mails (cf. [1] and [2]) in there.
But imho, that'll complicate the manpage and I'm not sure it's worth it.

Thanks!
Christian

/* References */
[1]: https://lore.kernel.org/r/20190924195701.7pw2olbviieqsg5q@wittgenstein
[2]: https://lore.kernel.org/r/20190924200735.2dvqhan7ynnmfc7s@wittgenstein

2019-09-26 09:04:59

by Jann Horn

[permalink] [raw]
Subject: Re: For review: pidfd_send_signal(2) manual page

On Mon, Sep 23, 2019 at 1:26 PM Florian Weimer <[email protected]> wrote:
> * Michael Kerrisk:
> > The pidfd_send_signal() system call allows the avoidance of race
> > conditions that occur when using traditional interfaces (such as
> > kill(2)) to signal a process. The problem is that the traditional
> > interfaces specify the target process via a process ID (PID), with
> > the result that the sender may accidentally send a signal to the
> > wrong process if the originally intended target process has termi‐
> > nated and its PID has been recycled for another process. By con‐
> > trast, a PID file descriptor is a stable reference to a specific
> > process; if that process terminates, then the file descriptor
> > ceases to be valid and the caller of pidfd_send_signal() is
> > informed of this fact via an ESRCH error.
>
> It would be nice to explain somewhere how you can avoid the race using
> a PID descriptor. Is there anything else besides CLONE_PIDFD?

My favorite example here is that you could implement "killall" without
PID reuse races. With /proc/$pid file descriptors, you could do it
like this (rough pseudocode with missing error handling and resource
leaks and such):

for each pid {
procfs_pid_fd = open("/proc/"+pid);
if (procfs_pid_fd == -1) continue;
comm_fd = openat(procfs_pid_fd, "comm");
if (comm_fd == -1) continue;
char buf[1000];
int n = read(comm_fd, buf, sizeof(buf)-1);
buf[n] = 0;
if (strcmp(buf, expected_comm) == 0) {
pidfd_send_signal(procfs_pid_fd, SIGKILL, NULL, 0);
}
}

If you want to avoid using a procfs fd for this, I think you can still
do it, the dance just gets more complicated:

for each pid {
procfs_pid_fd = open("/proc/"+pid);
if (procfs_pid_fd == -1) continue;
pid_fd = pidfd_open(pid, 0);
if (pid_fd == -1) continue;
/* at this point procfs_pid_fd and pid_fd may refer to different processes */
comm_fd = openat(procfs_pid_fd, "comm");
if (comm_fd == -1) continue;
/* at this point we know that procfs_pid_fd and pid_fd refer to the
same struct pid, because otherwise the procfs_pid_fd must point to a
directory that throws -ESRCH for everything */
char buf[1000];
int n = read(comm_fd, buf, sizeof(buf)-1);
buf[n] = 0;
if (strcmp(buf, expected_comm) == 0) {
pidfd_send_signal(pid_fd, SIGKILL, NULL, 0);
}
}

But I don't think anyone is actually interested in using pidfds for
this kind of usecase right now.

Subject: Re: For review: pidfd_send_signal(2) manual page

Hello Daniel,

On 9/24/19 11:08 PM, Daniel Colascione wrote:
> On Tue, Sep 24, 2019 at 2:00 PM Michael Kerrisk (man-pages)
> <[email protected]> wrote:
>>
>> Hello Christian,
>>
>>>>> If you're the parent of the process you can do this without CLONE_PIDFD:
>>>>> pid = fork();
>>>>> pidfd = pidfd_open();
>>>>> ret = pidfd_send_signal(pidfd, 0, NULL, 0);
>>>>> if (ret < 0 && errno == ESRCH)
>>>>> /* pidfd refers to another, recycled process */
>>>>
>>>> Although there is still the race between the fork() and the
>>>> pidfd_open(), right?
>>>
>>> Actually no and my code is even too complex.
>>> If you are the parent, and this is really a sequence that obeys the
>>> ordering pidfd_open() before waiting:
>>>
>>> pid = fork();
>>> if (pid == 0)
>>> exit(EXIT_SUCCESS);
>>> pidfd = pidfd_open(pid, 0);
>>> waitid(pid, ...);
>>>
>>> Then you are guaranteed that pidfd will refer to pid. No recycling can
>>> happen since the process has not been waited upon yet (That is,
>>
>> D'oh! Yes, of course.
>
> You still have a race if you're the parent and you have SIGCHLD set to
> SIG_IGN though.

Yes, thanks for reminding me of that (as did Christian also).

>>> excluding special cases such as where you have a mainloop where a
>>> callback reacts to a SIGCHLD event and waits on the child behind your
>>> back and your next callback in the mainloop calls pidfd_open() while the
>>> pid has been recycled etc.).
>
> That's a pretty common case though, especially if you're a library.

I presume that conventionally the only real resolution of this kind of
problem is that the mainloop SIGCHLD call back has a waitpid() loop
that (in a nonblocking fashion) loops checking each of the PIDs created
by the mainloop, right?

>>> A race could only appear in sequences where waiting happens before
>>> pidfd_open():
>>>
>>> pid = fork();
>>> if (pid == 0)
>>> exit(EXIT_SUCCESS);
>>> waitid(pid, ...);
>>> pidfd = pidfd_open(pid, 0);
>>>
>>> which honestly simply doesn't make any sense. So if you're the parent
>>> and you combine fork() + pidfd_open() correctly things should be fine
>>> without even having to verify via pidfd_send_signal() (I missed that in
>>> my first mail.).
>>
>> Thanks for the additional detail.
>>
>> I added the following to the pidfd_open() page, to
>> prevent people making the same thinko as me:
>>
>> The following code sequence can be used to obtain a file descrip‐
>> tor for the child of fork(2):
>>
>> pid = fork();
>> if (pid > 0) { /* If parent */
>> pidfd = pidfd_open(pid, 0);
>> ...
>> }
>>
>> Even if the child process has already terminated by the time of
>> the pidfd_open() call, the returned file descriptor is guaranteed
>> to refer to the child because the parent has not yet waited on the
>> child (and therefore, the child's ID has not been recycled).
>
> I'd prefer that sample code be robust in all cases.

I'm not clear what you think is missing. Or do you mean that the code
can't be robust in the face of (1) waitpid(-1) in another thread or an
asynchronous SIGCHLD handler and (2) SIGCHLD disposition set to SIG_IGN?

Thanks,

Michael

--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

Subject: Re: For review: pidfd_send_signal(2) manual page

On 9/24/19 11:53 PM, Christian Brauner wrote:
> On Tue, Sep 24, 2019 at 11:00:03PM +0200, Michael Kerrisk (man-pages) wrote:
>> Hello Christian,
>>
>>>>> If you're the parent of the process you can do this without CLONE_PIDFD:
>>>>> pid = fork();
>>>>> pidfd = pidfd_open();
>>>>> ret = pidfd_send_signal(pidfd, 0, NULL, 0);
>>>>> if (ret < 0 && errno == ESRCH)
>>>>> /* pidfd refers to another, recycled process */
>>>>
>>>> Although there is still the race between the fork() and the
>>>> pidfd_open(), right?
>>>
>>> Actually no and my code is even too complex.
>>> If you are the parent, and this is really a sequence that obeys the
>>> ordering pidfd_open() before waiting:
>>>
>>> pid = fork();
>>> if (pid == 0)
>>> exit(EXIT_SUCCESS);
>>> pidfd = pidfd_open(pid, 0);
>>> waitid(pid, ...);
>>>
>>> Then you are guaranteed that pidfd will refer to pid. No recycling can
>>> happen since the process has not been waited upon yet (That is,
>>
>> D'oh! Yes, of course.
>>
>>> excluding special cases such as where you have a mainloop where a
>>> callback reacts to a SIGCHLD event and waits on the child behind your
>>> back and your next callback in the mainloop calls pidfd_open() while the
>>> pid has been recycled etc.).
>>> A race could only appear in sequences where waiting happens before
>>> pidfd_open():
>>>
>>> pid = fork();
>>> if (pid == 0)
>>> exit(EXIT_SUCCESS);
>>> waitid(pid, ...);
>>> pidfd = pidfd_open(pid, 0);
>>>
>>> which honestly simply doesn't make any sense. So if you're the parent
>>> and you combine fork() + pidfd_open() correctly things should be fine
>>> without even having to verify via pidfd_send_signal() (I missed that in
>>> my first mail.).
>>
>> Thanks for the additional detail.
>
> You're very welcome.
>
>>
>> I added the following to the pidfd_open() page, to
>> prevent people making the same thinko as me:
>>
>> The following code sequence can be used to obtain a file descrip‐
>> tor for the child of fork(2):
>>
>> pid = fork();
>> if (pid > 0) { /* If parent */
>> pidfd = pidfd_open(pid, 0);
>> ...
>> }
>>
>> Even if the child process has already terminated by the time of
>> the pidfd_open() call, the returned file descriptor is guaranteed
>> to refer to the child because the parent has not yet waited on the
>> child (and therefore, the child's ID has not been recycled).
>
> Thanks! I'm fine with the example. The code illustrates the basics. If
> you want to go overboard, you can mention my callback example and put my
> SIG_IGN code snippet from my earlier mails (cf. [1] and [2]) in there.
> But imho, that'll complicate the manpage and I'm not sure it's worth it.

I agree that we should not complicate this discussion with more code,
but how about we refine the text as follows:

The following code sequence can be used to obtain a file descrip‐
tor for the child of fork(2):

pid = fork();
if (pid > 0) { /* If parent */
pidfd = pidfd_open(pid, 0);
...
}

Even if the child has already terminated by the time of the
pidfd_open() call, its PID will not have been recycled and the
returned file descriptor will refer to the resulting zombie
process. Note, however, that this is guaranteed only if the fol‐
lowing conditions hold true:

* the disposition of SIGCHLD has not been explicitly set to
SIG_IGN (see sigaction(2)); and

* the zombie process was not reaped elsewhere in the program
(e.g., either by an asynchronously executed signal handler or
by wait(2) or similar in another thread).

If these conditions don't hold true, then the child process should
instead be created using clone(2) with the CLONE_PID flag.

Thanks,

Michael

--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/