2023-03-27 18:31:37

by Christian Brauner

[permalink] [raw]
Subject: [PATCH 0/3] pidfd: add pidfd_prepare()

This adds the pidfd_prepare() helper which allows the caller to reserve
a pidfd number and allocates a new pidfd file that stashes the provided
struct pid.

This will allow us to remove places that either open code this
functionality e.g., during copy_process() or that currently call
pidfd_create() but then have to call close_fd() because there are still
failure points after pidfd_create() has been called.

Other functionality wants to make use of pidfd's as well and they need a
pidfd_prepare() internal api as well.

I've tested the fanotify and fork changes via LTP which provides
coverage for all the affected codepaths.

Signed-off-by: Christian Brauner <[email protected]>
---
Christian Brauner (3):
pid: add pidfd_prepare()
fork: use pidfd_prepare()
fanotify: use pidfd_prepare()

fs/notify/fanotify/fanotify_user.c | 13 ++++---
include/linux/pid.h | 1 +
kernel/fork.c | 12 +------
kernel/pid.c | 69 +++++++++++++++++++++++++++++++-------
4 files changed, 68 insertions(+), 27 deletions(-)
---
base-commit: 197b6b60ae7bc51dd0814953c562833143b292aa
change-id: 20230327-pidfd-file-api-8b28d68cf0a9


2023-03-27 18:32:14

by Christian Brauner

[permalink] [raw]
Subject: [PATCH 2/3] fork: use pidfd_prepare()

Stop open-coding get_unused_fd_flags() and anon_inode_getfile(). That's
brittle just for keeping the flags between both calls in sync. Use the
dedicated helper.

Signed-off-by: Christian Brauner <[email protected]>
---
kernel/fork.c | 12 +-----------
1 file changed, 1 insertion(+), 11 deletions(-)

diff --git a/kernel/fork.c b/kernel/fork.c
index c0257cbee093..1f5eb854ba3e 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2291,21 +2291,11 @@ static __latent_entropy struct task_struct *copy_process(
* if the fd table isn't shared).
*/
if (clone_flags & CLONE_PIDFD) {
- retval = get_unused_fd_flags(O_RDWR | O_CLOEXEC);
+ retval = pidfd_prepare(pid, O_RDWR | O_CLOEXEC, &pidfile);
if (retval < 0)
goto bad_fork_free_pid;
-
pidfd = retval;

- pidfile = anon_inode_getfile("[pidfd]", &pidfd_fops, pid,
- O_RDWR | O_CLOEXEC);
- if (IS_ERR(pidfile)) {
- put_unused_fd(pidfd);
- retval = PTR_ERR(pidfile);
- goto bad_fork_free_pid;
- }
- get_pid(pid); /* held by pidfile now */
-
retval = put_user(pidfd, args->pidfd);
if (retval)
goto bad_fork_put_pidfd;

--
2.34.1

2023-03-27 18:32:45

by Christian Brauner

[permalink] [raw]
Subject: [PATCH 3/3] fanotify: use pidfd_prepare()

We generally try to avoid installing a file descriptor into the caller's
file descriptor table just to close it again via close_fd() in case an
error occurs. Instead we reserve a file descriptor but don't install it
into the caller's file descriptor table yet. If we fail for other,
unrelated reasons we can just close the reserved file descriptor and if
we make it past all meaningful error paths we just install it. Fanotify
gets this right already for one fd type but not for pidfds.

Use the new pidfd_prepare() helper to reserve a pidfd and a pidfd file
and switch to the more common fd allocation and installation pattern.

Signed-off-by: Christian Brauner <[email protected]>
---
fs/notify/fanotify/fanotify_user.c | 13 +++++++++----
1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
index 8f430bfad487..22fb1cf7e1fc 100644
--- a/fs/notify/fanotify/fanotify_user.c
+++ b/fs/notify/fanotify/fanotify_user.c
@@ -663,7 +663,7 @@ static ssize_t copy_event_to_user(struct fsnotify_group *group,
struct fanotify_info *info = fanotify_event_info(event);
unsigned int info_mode = FAN_GROUP_FLAG(group, FANOTIFY_INFO_MODES);
unsigned int pidfd_mode = info_mode & FAN_REPORT_PIDFD;
- struct file *f = NULL;
+ struct file *f = NULL, *pidfd_file = NULL;
int ret, pidfd = FAN_NOPIDFD, fd = FAN_NOFD;

pr_debug("%s: group=%p event=%p\n", __func__, group, event);
@@ -718,7 +718,7 @@ static ssize_t copy_event_to_user(struct fsnotify_group *group,
!pid_has_task(event->pid, PIDTYPE_TGID)) {
pidfd = FAN_NOPIDFD;
} else {
- pidfd = pidfd_create(event->pid, 0);
+ pidfd = pidfd_prepare(event->pid, 0, &pidfd_file);
if (pidfd < 0)
pidfd = FAN_EPIDFD;
}
@@ -751,6 +751,9 @@ static ssize_t copy_event_to_user(struct fsnotify_group *group,
if (f)
fd_install(fd, f);

+ if (pidfd_file)
+ fd_install(pidfd, pidfd_file);
+
return metadata.event_len;

out_close_fd:
@@ -759,8 +762,10 @@ static ssize_t copy_event_to_user(struct fsnotify_group *group,
fput(f);
}

- if (pidfd >= 0)
- close_fd(pidfd);
+ if (pidfd >= 0) {
+ put_unused_fd(pidfd);
+ fput(pidfd_file);
+ }

return ret;
}

--
2.34.1

2023-03-27 18:34:13

by Christian Brauner

[permalink] [raw]
Subject: [PATCH 1/3] pid: add pidfd_prepare()

Add a new helper that allows to reserve a pidfd and allocates a new
pidfd file that stashes the provided struct pid. This will allow us to
remove places that either open code this function or that call
pidfd_create() but then have to call close_fd() because there are still
failure points after pidfd_create() has been called.

Signed-off-by: Christian Brauner <[email protected]>
---
include/linux/pid.h | 1 +
kernel/pid.c | 69 +++++++++++++++++++++++++++++++++++++++++++----------
2 files changed, 58 insertions(+), 12 deletions(-)

diff --git a/include/linux/pid.h b/include/linux/pid.h
index 343abf22092e..b75de288a8c2 100644
--- a/include/linux/pid.h
+++ b/include/linux/pid.h
@@ -80,6 +80,7 @@ extern struct pid *pidfd_pid(const struct file *file);
struct pid *pidfd_get_pid(unsigned int fd, unsigned int *flags);
struct task_struct *pidfd_get_task(int pidfd, unsigned int *flags);
int pidfd_create(struct pid *pid, unsigned int flags);
+int pidfd_prepare(struct pid *pid, unsigned int flags, struct file **ret);

static inline struct pid *get_pid(struct pid *pid)
{
diff --git a/kernel/pid.c b/kernel/pid.c
index 3fbc5e46b721..95e7e01574c8 100644
--- a/kernel/pid.c
+++ b/kernel/pid.c
@@ -576,6 +576,56 @@ struct task_struct *pidfd_get_task(int pidfd, unsigned int *flags)
return task;
}

+/**
+ * pidfd_prepare - allocate a new pidfd_file and reserve a pidfd
+ * @pid: the struct pid for which to create a pidfd
+ * @flags: flags of the new @pidfd
+ * @pidfd: the pidfd to return
+ *
+ * Allocate a new file that stashes @pid and reserve a new pidfd number in the
+ * caller's file descriptor table. The pidfd is reserved but not installed yet.
+ *
+ * If this function returns successfully the caller is responsible to either
+ * call fd_install() passing the returned pidfd and pidfd file as arguments in
+ * order to install the pidfd into its file descriptor table or they must use
+ * put_unused_fd() and fput() on the returned pidfd and pidfd file
+ * respectively.
+ *
+ * This function is useful when a pidfd must already be reserved but there
+ * might still be points of failure afterwards and the caller wants to ensure
+ * that no pidfd is leaked into its file descriptor table.
+ *
+ * Return: On success, a reserved pidfd is returned from the function and a new
+ * pidfd file is returned in the last argument to the function. On
+ * error, a negative error code is returned from the function and the
+ * last argument remains unchanged.
+ */
+int pidfd_prepare(struct pid *pid, unsigned int flags, struct file **ret)
+{
+ int pidfd;
+ struct file *pidfd_file;
+
+ if (!pid || !pid_has_task(pid, PIDTYPE_TGID))
+ return -EINVAL;
+
+ if (flags & ~(O_NONBLOCK | O_RDWR | O_CLOEXEC))
+ return -EINVAL;
+
+ pidfd = get_unused_fd_flags(O_RDWR | O_CLOEXEC);
+ if (pidfd < 0)
+ return pidfd;
+
+ pidfd_file = anon_inode_getfile("[pidfd]", &pidfd_fops, pid,
+ flags | O_RDWR | O_CLOEXEC);
+ if (IS_ERR(pidfd_file)) {
+ put_unused_fd(pidfd);
+ return PTR_ERR(pidfd_file);
+ }
+ get_pid(pid); /* held by pidfd_file now */
+ *ret = pidfd_file;
+ return pidfd;
+}
+
/**
* pidfd_create() - Create a new pid file descriptor.
*
@@ -594,20 +644,15 @@ struct task_struct *pidfd_get_task(int pidfd, unsigned int *flags)
*/
int pidfd_create(struct pid *pid, unsigned int flags)
{
- int fd;
+ int pidfd;
+ struct file *pidfd_file;

- if (!pid || !pid_has_task(pid, PIDTYPE_TGID))
- return -EINVAL;
-
- if (flags & ~(O_NONBLOCK | O_RDWR | O_CLOEXEC))
- return -EINVAL;
-
- fd = anon_inode_getfd("[pidfd]", &pidfd_fops, get_pid(pid),
- flags | O_RDWR | O_CLOEXEC);
- if (fd < 0)
- put_pid(pid);
+ pidfd = pidfd_prepare(pid, flags, &pidfd_file);
+ if (pidfd < 0)
+ return pidfd;

- return fd;
+ fd_install(pidfd, pidfd_file);
+ return pidfd;
}

/**

--
2.34.1

2023-03-28 08:01:26

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH 3/3] fanotify: use pidfd_prepare()

On Mon 27-03-23 20:22:53, Christian Brauner wrote:
> We generally try to avoid installing a file descriptor into the caller's
> file descriptor table just to close it again via close_fd() in case an
> error occurs. Instead we reserve a file descriptor but don't install it
> into the caller's file descriptor table yet. If we fail for other,
> unrelated reasons we can just close the reserved file descriptor and if
> we make it past all meaningful error paths we just install it. Fanotify
> gets this right already for one fd type but not for pidfds.
>
> Use the new pidfd_prepare() helper to reserve a pidfd and a pidfd file
> and switch to the more common fd allocation and installation pattern.
>
> Signed-off-by: Christian Brauner <[email protected]>

Thanks for the improvement! It looks good to me. Feel free to add:

Acked-by: Jan Kara <[email protected]>

Honza

> ---
> fs/notify/fanotify/fanotify_user.c | 13 +++++++++----
> 1 file changed, 9 insertions(+), 4 deletions(-)
>
> diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
> index 8f430bfad487..22fb1cf7e1fc 100644
> --- a/fs/notify/fanotify/fanotify_user.c
> +++ b/fs/notify/fanotify/fanotify_user.c
> @@ -663,7 +663,7 @@ static ssize_t copy_event_to_user(struct fsnotify_group *group,
> struct fanotify_info *info = fanotify_event_info(event);
> unsigned int info_mode = FAN_GROUP_FLAG(group, FANOTIFY_INFO_MODES);
> unsigned int pidfd_mode = info_mode & FAN_REPORT_PIDFD;
> - struct file *f = NULL;
> + struct file *f = NULL, *pidfd_file = NULL;
> int ret, pidfd = FAN_NOPIDFD, fd = FAN_NOFD;
>
> pr_debug("%s: group=%p event=%p\n", __func__, group, event);
> @@ -718,7 +718,7 @@ static ssize_t copy_event_to_user(struct fsnotify_group *group,
> !pid_has_task(event->pid, PIDTYPE_TGID)) {
> pidfd = FAN_NOPIDFD;
> } else {
> - pidfd = pidfd_create(event->pid, 0);
> + pidfd = pidfd_prepare(event->pid, 0, &pidfd_file);
> if (pidfd < 0)
> pidfd = FAN_EPIDFD;
> }
> @@ -751,6 +751,9 @@ static ssize_t copy_event_to_user(struct fsnotify_group *group,
> if (f)
> fd_install(fd, f);
>
> + if (pidfd_file)
> + fd_install(pidfd, pidfd_file);
> +
> return metadata.event_len;
>
> out_close_fd:
> @@ -759,8 +762,10 @@ static ssize_t copy_event_to_user(struct fsnotify_group *group,
> fput(f);
> }
>
> - if (pidfd >= 0)
> - close_fd(pidfd);
> + if (pidfd >= 0) {
> + put_unused_fd(pidfd);
> + fput(pidfd_file);
> + }
>
> return ret;
> }
>
> --
> 2.34.1
>
--
Jan Kara <[email protected]>
SUSE Labs, CR

2023-03-28 09:07:55

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH 1/3] pid: add pidfd_prepare()

On Mon 27-03-23 20:22:51, Christian Brauner wrote:
> Add a new helper that allows to reserve a pidfd and allocates a new
> pidfd file that stashes the provided struct pid. This will allow us to
> remove places that either open code this function or that call
> pidfd_create() but then have to call close_fd() because there are still
> failure points after pidfd_create() has been called.
>
> Signed-off-by: Christian Brauner <[email protected]>

Looks good to me. Feel free to add:

Reviewed-by: Jan Kara <[email protected]>

Honza

> ---
> include/linux/pid.h | 1 +
> kernel/pid.c | 69 +++++++++++++++++++++++++++++++++++++++++++----------
> 2 files changed, 58 insertions(+), 12 deletions(-)
>
> diff --git a/include/linux/pid.h b/include/linux/pid.h
> index 343abf22092e..b75de288a8c2 100644
> --- a/include/linux/pid.h
> +++ b/include/linux/pid.h
> @@ -80,6 +80,7 @@ extern struct pid *pidfd_pid(const struct file *file);
> struct pid *pidfd_get_pid(unsigned int fd, unsigned int *flags);
> struct task_struct *pidfd_get_task(int pidfd, unsigned int *flags);
> int pidfd_create(struct pid *pid, unsigned int flags);
> +int pidfd_prepare(struct pid *pid, unsigned int flags, struct file **ret);
>
> static inline struct pid *get_pid(struct pid *pid)
> {
> diff --git a/kernel/pid.c b/kernel/pid.c
> index 3fbc5e46b721..95e7e01574c8 100644
> --- a/kernel/pid.c
> +++ b/kernel/pid.c
> @@ -576,6 +576,56 @@ struct task_struct *pidfd_get_task(int pidfd, unsigned int *flags)
> return task;
> }
>
> +/**
> + * pidfd_prepare - allocate a new pidfd_file and reserve a pidfd
> + * @pid: the struct pid for which to create a pidfd
> + * @flags: flags of the new @pidfd
> + * @pidfd: the pidfd to return
> + *
> + * Allocate a new file that stashes @pid and reserve a new pidfd number in the
> + * caller's file descriptor table. The pidfd is reserved but not installed yet.
> + *
> + * If this function returns successfully the caller is responsible to either
> + * call fd_install() passing the returned pidfd and pidfd file as arguments in
> + * order to install the pidfd into its file descriptor table or they must use
> + * put_unused_fd() and fput() on the returned pidfd and pidfd file
> + * respectively.
> + *
> + * This function is useful when a pidfd must already be reserved but there
> + * might still be points of failure afterwards and the caller wants to ensure
> + * that no pidfd is leaked into its file descriptor table.
> + *
> + * Return: On success, a reserved pidfd is returned from the function and a new
> + * pidfd file is returned in the last argument to the function. On
> + * error, a negative error code is returned from the function and the
> + * last argument remains unchanged.
> + */
> +int pidfd_prepare(struct pid *pid, unsigned int flags, struct file **ret)
> +{
> + int pidfd;
> + struct file *pidfd_file;
> +
> + if (!pid || !pid_has_task(pid, PIDTYPE_TGID))
> + return -EINVAL;
> +
> + if (flags & ~(O_NONBLOCK | O_RDWR | O_CLOEXEC))
> + return -EINVAL;
> +
> + pidfd = get_unused_fd_flags(O_RDWR | O_CLOEXEC);
> + if (pidfd < 0)
> + return pidfd;
> +
> + pidfd_file = anon_inode_getfile("[pidfd]", &pidfd_fops, pid,
> + flags | O_RDWR | O_CLOEXEC);
> + if (IS_ERR(pidfd_file)) {
> + put_unused_fd(pidfd);
> + return PTR_ERR(pidfd_file);
> + }
> + get_pid(pid); /* held by pidfd_file now */
> + *ret = pidfd_file;
> + return pidfd;
> +}
> +
> /**
> * pidfd_create() - Create a new pid file descriptor.
> *
> @@ -594,20 +644,15 @@ struct task_struct *pidfd_get_task(int pidfd, unsigned int *flags)
> */
> int pidfd_create(struct pid *pid, unsigned int flags)
> {
> - int fd;
> + int pidfd;
> + struct file *pidfd_file;
>
> - if (!pid || !pid_has_task(pid, PIDTYPE_TGID))
> - return -EINVAL;
> -
> - if (flags & ~(O_NONBLOCK | O_RDWR | O_CLOEXEC))
> - return -EINVAL;
> -
> - fd = anon_inode_getfd("[pidfd]", &pidfd_fops, get_pid(pid),
> - flags | O_RDWR | O_CLOEXEC);
> - if (fd < 0)
> - put_pid(pid);
> + pidfd = pidfd_prepare(pid, flags, &pidfd_file);
> + if (pidfd < 0)
> + return pidfd;
>
> - return fd;
> + fd_install(pidfd, pidfd_file);
> + return pidfd;
> }
>
> /**
>
> --
> 2.34.1
>
--
Jan Kara <[email protected]>
SUSE Labs, CR

2023-03-29 06:48:05

by Christian Brauner

[permalink] [raw]
Subject: Re: [PATCH 0/3] pidfd: add pidfd_prepare()


On Mon, 27 Mar 2023 20:22:50 +0200, Christian Brauner wrote:
> This adds the pidfd_prepare() helper which allows the caller to reserve
> a pidfd number and allocates a new pidfd file that stashes the provided
> struct pid.
>
> This will allow us to remove places that either open code this
> functionality e.g., during copy_process() or that currently call
> pidfd_create() but then have to call close_fd() because there are still
> failure points after pidfd_create() has been called.
>
> [...]

Jan, thanks for the reviews.

I've picked this up now. Please note that this series is considered stable and
has thus been tagged. The reason is that the SCM_PIDFD work in the networking
depends wants to depend on this work. So they'll get a stable tag,

tree: git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux.git
branch: pidfd.file.api
tag: pidfd.file.api.v6.4

[1/3] pid: add pidfd_prepare()
commit: 7021c1b14f83d9151ecaf976eaa6c1d5c6bb5dc7
[2/3] fork: use pidfd_prepare()
commit: 761ce43fda7ebcdf1b1aa8e797ec83fae0e34c47
[3/3] fanotify: use pidfd_prepare()
commit: 909939fc167d82cf09cd93ae44e968be916b6e41

Thanks!
Christian