2019-03-12 13:54:21

by Christian Brauner

[permalink] [raw]
Subject: [GIT PULL RESEND] pidfd changes for v5.1-rc1

Hi Linus,

This is a resend of the pull request for the pidfd_send_signal() syscall
which I sent last Tuesday. I'm not sure whether you just wanted to take a
closer look.

The following changes since commit f17b5f06cb92ef2250513a1e154c47b78df07d40:

Linux 5.0-rc4 (2019-01-27 15:18:05 -0800)

are available in the Git repository at:

git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux.git tags/pidfd-v5.1-rc1

The patchset introduces the ability to use file descriptors from proc/<pid>
as stable handles on struct pid. Even if a pid is recycled the handle will
not change. For a start these fds can be used to send signals to the
processes they refer to.

With the ability to use /proc/<pid> fds as stable handles on struct pid we
can fix a long-standing issue where after a process has exited its pid can
be reused by another process. If a caller sends a signal to a reused pid it
will end up signaling the wrong process.
With this patchset we enable a variety of use cases. One obvious example is
that we can now safely delegate an important part of process management -
sending signals - to processes other than the parent of a given process by
sending file descriptors around via scm rights and not fearing that the
given process will have been recycled in the meantime.
It also allows for easy testing whether a given process is still alive or
not by sending signal 0 to a pidfd which is quite handy.
There has been some interest in this feature e.g. from systems management
(systemd, glibc) and container managers. I have requested and gotten
comments from glibc to make sure that this syscall is suitable for their
needs as well. In the future I expect it to take on most other pid-based
signal syscalls. But such features are left for the future once they are
needed.

The patchset has been sitting in linux-next for quite a while and has
not caused any issues. It comes with selftests which verify basic
functionality and also test that a recycled pid cannot be signaled via a
pidfd.

Jon has written about a prior version of this patchset. It should cover the
basic functionality since not a lot has changed since then:

https://lwn.net/Articles/773459/

The commit message for the syscall itself is extensively documenting the
syscall, including it's functionality and extensibility.

/* Merge conflict and sycall number coordination */
Please note, there will be a merge conflict between the Jens' io_uring
patch set in the block tree and this tree. To minimize its impact Arnd
worked with Jens and me to coordinate syscall numbers in advance.
pidfd_send_signal() takes 424 and Jens' patchset took 425 to 427.

/* Separate tree on kernel.org */
At the beginning of last merge cycle it was suggested to move this patchset
into a separate tree on kernel.org as there will be more work coming that
will be extending the use of file descriptors for processes. The tree was
announced in January:

https://lore.kernel.org/lkml/[email protected]/

The pidfd tree is located on kernel.org

https://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux.git/

and it's for-next branch is already tracked by Stephen in linux-next since
the beginning of the 5.0 development cycle. I'm prepared to deal with any
fallouts coming from this work going forward.

The only thing that has changed recently in these patches was the addition
of two more Acked-by/Reviewed-by from David Howells and tglx after the
last round of reviews.

Please consider pulling these changes from the signed pidfd-v5.1-rc1 tag.

Thanks!
Christian

----------------------------------------------------------------
pidfd patches for v5.1-rc1

----------------------------------------------------------------
Christian Brauner (2):
signal: add pidfd_send_signal() syscall
selftests: add tests for pidfd_send_signal()

arch/x86/entry/syscalls/syscall_32.tbl | 1 +
arch/x86/entry/syscalls/syscall_64.tbl | 1 +
fs/proc/base.c | 9 +
include/linux/proc_fs.h | 6 +
include/linux/syscalls.h | 3 +
include/uapi/asm-generic/unistd.h | 4 +-
kernel/signal.c | 133 +++++++++-
kernel/sys_ni.c | 1 +
tools/testing/selftests/Makefile | 1 +
tools/testing/selftests/pidfd/Makefile | 6 +
tools/testing/selftests/pidfd/pidfd_test.c | 381 +++++++++++++++++++++++++++++
11 files changed, 539 insertions(+), 7 deletions(-)
create mode 100644 tools/testing/selftests/pidfd/Makefile
create mode 100644 tools/testing/selftests/pidfd/pidfd_test.c


2019-03-16 01:27:20

by Joel Fernandes

[permalink] [raw]
Subject: Re: [GIT PULL RESEND] pidfd changes for v5.1-rc1

On Tue, Mar 12, 2019 at 6:53 AM Christian Brauner <[email protected]> wrote:
>
> Hi Linus,
>
> This is a resend of the pull request for the pidfd_send_signal() syscall
> which I sent last Tuesday. I'm not sure whether you just wanted to take a
> closer look.
>
> The following changes since commit f17b5f06cb92ef2250513a1e154c47b78df07d40:
>
> Linux 5.0-rc4 (2019-01-27 15:18:05 -0800)
>
> are available in the Git repository at:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux.git tags/pidfd-v5.1-rc1
>
> The patchset introduces the ability to use file descriptors from proc/<pid>
> as stable handles on struct pid. Even if a pid is recycled the handle will
> not change. For a start these fds can be used to send signals to the
> processes they refer to.

Joel from the Android team here. This will solve a long standing issue we
have with Android's low memory killer daemon (lmkd) where the killing of
a PID is racy with the traditional signal delivery methods. With this new API,
we can kill things correctly in a race free way. I hope this will get merged
soon and I look forward to further developing on top of this (such as
for support knowing when something was killed and waiting for it reliably -
right now we have a very suboptimal 100ms periodic polling loop to
check for process death, whichslows down how fast we can kill processes to
reclaim their memory).

thanks,

- Joel


>
> With the ability to use /proc/<pid> fds as stable handles on struct pid we
> can fix a long-standing issue where after a process has exited its pid can
> be reused by another process. If a caller sends a signal to a reused pid it
> will end up signaling the wrong process.
> With this patchset we enable a variety of use cases. One obvious example is
> that we can now safely delegate an important part of process management -
> sending signals - to processes other than the parent of a given process by
> sending file descriptors around via scm rights and not fearing that the
> given process will have been recycled in the meantime.
> It also allows for easy testing whether a given process is still alive or
> not by sending signal 0 to a pidfd which is quite handy.
> There has been some interest in this feature e.g. from systems management
> (systemd, glibc) and container managers. I have requested and gotten
> comments from glibc to make sure that this syscall is suitable for their
> needs as well. In the future I expect it to take on most other pid-based
> signal syscalls. But such features are left for the future once they are
> needed.
>
> The patchset has been sitting in linux-next for quite a while and has
> not caused any issues. It comes with selftests which verify basic
> functionality and also test that a recycled pid cannot be signaled via a
> pidfd.
>
> Jon has written about a prior version of this patchset. It should cover the
> basic functionality since not a lot has changed since then:
>
> https://lwn.net/Articles/773459/
>
> The commit message for the syscall itself is extensively documenting the
> syscall, including it's functionality and extensibility.
>
> /* Merge conflict and sycall number coordination */
> Please note, there will be a merge conflict between the Jens' io_uring
> patch set in the block tree and this tree. To minimize its impact Arnd
> worked with Jens and me to coordinate syscall numbers in advance.
> pidfd_send_signal() takes 424 and Jens' patchset took 425 to 427.
>
> /* Separate tree on kernel.org */
> At the beginning of last merge cycle it was suggested to move this patchset
> into a separate tree on kernel.org as there will be more work coming that
> will be extending the use of file descriptors for processes. The tree was
> announced in January:
>
> https://lore.kernel.org/lkml/[email protected]/
>
> The pidfd tree is located on kernel.org
>
> https://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux.git/
>
> and it's for-next branch is already tracked by Stephen in linux-next since
> the beginning of the 5.0 development cycle. I'm prepared to deal with any
> fallouts coming from this work going forward.
>
> The only thing that has changed recently in these patches was the addition
> of two more Acked-by/Reviewed-by from David Howells and tglx after the
> last round of reviews.
>
> Please consider pulling these changes from the signed pidfd-v5.1-rc1 tag.
>
> Thanks!
> Christian
>
> ----------------------------------------------------------------
> pidfd patches for v5.1-rc1
>
> ----------------------------------------------------------------
> Christian Brauner (2):
> signal: add pidfd_send_signal() syscall
> selftests: add tests for pidfd_send_signal()
>
> arch/x86/entry/syscalls/syscall_32.tbl | 1 +
> arch/x86/entry/syscalls/syscall_64.tbl | 1 +
> fs/proc/base.c | 9 +
> include/linux/proc_fs.h | 6 +
> include/linux/syscalls.h | 3 +
> include/uapi/asm-generic/unistd.h | 4 +-
> kernel/signal.c | 133 +++++++++-
> kernel/sys_ni.c | 1 +
> tools/testing/selftests/Makefile | 1 +
> tools/testing/selftests/pidfd/Makefile | 6 +
> tools/testing/selftests/pidfd/pidfd_test.c | 381 +++++++++++++++++++++++++++++
> 11 files changed, 539 insertions(+), 7 deletions(-)
> create mode 100644 tools/testing/selftests/pidfd/Makefile
> create mode 100644 tools/testing/selftests/pidfd/pidfd_test.c