2023-08-07 11:10:58

by David Rheinsberg

[permalink] [raw]
Subject: [PATCH] pid: allow pidfds for reaped tasks

A pidfd can currently only be created for tasks that are thread-group
leaders and not reaped. This patch changes the pidfd-core to allow for
pidfds on reapead thread-group leaders as well.

A pidfd can outlive the task it refers to, and thus user-space must
already be prepared that the task underlying a pidfd is gone at the time
they get their hands on the pidfd. For instance, resolving the pidfd to
a PID via the fdinfo must be prepared to read `-1`.

Despite user-space knowing that a pidfd might be stale, several kernel
APIs currently add another layer that checks for this. In particular,
SO_PEERPIDFD returns `EINVAL` if the peer-task was already reaped,
but returns a stale pidfd if the task is reaped immediately after the
respective alive-check.

This has the unfortunate effect that user-space now has two ways to
check for the exact same scenario: A syscall might return
EINVAL/ESRCH/... *or* the pidfd might be stale, even though there is no
particular reason to distinguish both cases. This also propagates
through user-space APIs, which pass on pidfds. They must be prepared to
pass on `-1` *or* the pidfd, because there is no guaranteed way to get a
stale pidfd from the kernel.

This patch changes the core pidfd helpers to allow creation of pidfds
even if the PID is no longer linked to any task. This only affects one
of the three pidfd users that currently exist:

1) fanotify already tests for a linked TGID-task manually before
creating the PIDFD, thus it is not directly affected by this change.
However, note that the current fanotify code fails with an error if
the target process is reaped exactly between the TGID-check in
fanotify and the test in pidfd_prepare(). With this patch, this
will no longer be the case.

2) pidfd_open(2) calls find_get_pid() before creating the pidfd, thus
it is also not directly affected by this change.
Again, similar to fanotify, there is a race between the
find_get_pid() call and pidfd_prepare(), which currently causes
pidfd_open(2) to return EINVAL rather than ESRCH if the process is
reaped just between those two checks. With this patch, this will no
longer be the case.

3) SO_PEERPIDFD will be affected by this change and from now on return
stale pidfds rather than EINVAL if the respective peer task is
reaped already.

Given that users of SO_PEERPIDFD must already deal with stale pidfds,
this change hopefully simplifies the API of SO_PEERPIDFD, and all
dependent user-space APIs (e.g., GetConnectionCredentials() on D-Bus
driver APIs). Also note that SO_PEERPIDFD is still pending to be
released with linux-6.5.

Signed-off-by: David Rheinsberg <[email protected]>
---
kernel/fork.c | 11 +++++++++--
1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/kernel/fork.c b/kernel/fork.c
index d2e12b6d2b18..4dde19a8c264 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2161,7 +2161,7 @@ static int __pidfd_prepare(struct pid *pid, unsigned int flags, struct file **re
* Allocate a new file that stashes @pid and reserve a new pidfd number in the
* caller's file descriptor table. The pidfd is reserved but not installed yet.
*
- * The helper verifies that @pid is used as a thread group leader.
+ * The helper verifies that @pid is/was used as a thread group leader.
*
* If this function returns successfully the caller is responsible to either
* call fd_install() passing the returned pidfd and pidfd file as arguments in
@@ -2180,7 +2180,14 @@ static int __pidfd_prepare(struct pid *pid, unsigned int flags, struct file **re
*/
int pidfd_prepare(struct pid *pid, unsigned int flags, struct file **ret)
{
- if (!pid || !pid_has_task(pid, PIDTYPE_TGID))
+ if (!pid)
+ return -EINVAL;
+
+ /*
+ * Non thread-group leaders cannot have pidfds, but we allow them for
+ * reaped thread-group leaders.
+ */
+ if (pid_has_task(pid, PIDTYPE_PID) && !pid_has_task(pid, PIDTYPE_TGID))
return -EINVAL;

return __pidfd_prepare(pid, flags, ret);
--
2.41.0



2023-08-11 12:16:05

by Christian Brauner

[permalink] [raw]
Subject: Re: [PATCH] pid: allow pidfds for reaped tasks

On Fri, Aug 11, 2023 at 01:29:11PM +0200, Oleg Nesterov wrote:
> Hi Christian,
>
> Sorry for delay, I've just returned from vacation and I am slowly

Absolutely no problem! Thanks for getting back to us.

> crawling my email backlog.
>
>
>
> On 08/07, Christian Brauner wrote:
> >
> > > int pidfd_prepare(struct pid *pid, unsigned int flags, struct file **ret)
> > > {
> > > - if (!pid || !pid_has_task(pid, PIDTYPE_TGID))
> > > + if (!pid)
> > > + return -EINVAL;
> > > +
> > > + /*
> > > + * Non thread-group leaders cannot have pidfds, but we allow them for
> > > + * reaped thread-group leaders.
> > > + */
> > > + if (pid_has_task(pid, PIDTYPE_PID) && !pid_has_task(pid, PIDTYPE_TGID))
> > > return -EINVAL;
> >
> > TL;DR userspace wants to be able to get a pidfd to an already reaped
> > thread-group leader. I don't see any issues with this.
>
> I guess I need to read the whole thread carefully, but right now
> I don't understand this patch and the problem...
>
> OK, suppose we have a group leader L with pid 100 and its sub-thread
> T with pid 101.
>
> With this patch pidfd_open(101) can succeed if T exits right after
> find_get_pid(101) because pid_has_task(pid, PIDTYPE_PID) above will
> fail, right?
>
> This looks wrong, 101 was never a leader pid...

Well, let me simplify the question:

What code do we need to allow userspace to open a pidfd to a leader pid
even if it has already been exited and reaped (without also accidently
allowing to open non-lead pid pidfds)?

I hope that clarifies?

2023-08-11 13:14:31

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH] pid: allow pidfds for reaped tasks

On 08/11, Christian Brauner wrote:
>
> > > > int pidfd_prepare(struct pid *pid, unsigned int flags, struct file **ret)
> > > > {
> > > > - if (!pid || !pid_has_task(pid, PIDTYPE_TGID))
> > > > + if (!pid)
> > > > + return -EINVAL;
> > > > +
> > > > + /*
> > > > + * Non thread-group leaders cannot have pidfds, but we allow them for
> > > > + * reaped thread-group leaders.
> > > > + */
> > > > + if (pid_has_task(pid, PIDTYPE_PID) && !pid_has_task(pid, PIDTYPE_TGID))
> > > > return -EINVAL;
> > >
> > > TL;DR userspace wants to be able to get a pidfd to an already reaped
> > > thread-group leader. I don't see any issues with this.
> >
> > I guess I need to read the whole thread carefully, but right now
> > I don't understand this patch and the problem...
> >
> > OK, suppose we have a group leader L with pid 100 and its sub-thread
> > T with pid 101.
> >
> > With this patch pidfd_open(101) can succeed if T exits right after
> > find_get_pid(101) because pid_has_task(pid, PIDTYPE_PID) above will
> > fail, right?
> >
> > This looks wrong, 101 was never a leader pid...
>
> Well, let me simplify the question:

Thanks,

> What code do we need to allow userspace to open a pidfd to a leader pid
> even if it has already been exited and reaped (without also accidently
> allowing to open non-lead pid pidfds)?

I'll try to think more, but can you also explain why do we need this?

See my another email. Can't we simply shift the pid_has_task(PIDTYPE_TGID)
check from pidfd_prepare() to pidfd_create() ? (and then we can kill
pidfd_prepare and rename __pidfd_prepare to pidfd_prepare).

Oleg.


2023-08-14 06:19:21

by David Rheinsberg

[permalink] [raw]
Subject: Re: [PATCH] pid: allow pidfds for reaped tasks

Hi Oleg,

On Fri, Aug 11, 2023, at 1:57 PM, Oleg Nesterov wrote:
>> What code do we need to allow userspace to open a pidfd to a leader pid
>> even if it has already been exited and reaped (without also accidently
>> allowing to open non-lead pid pidfds)?
>
> I'll try to think more, but can you also explain why do we need this?
>
> See my another email. Can't we simply shift the pid_has_task(PIDTYPE_TGID)
> check from pidfd_prepare() to pidfd_create() ? (and then we can kill
> pidfd_prepare and rename __pidfd_prepare to pidfd_prepare).

Yes, the easiest solution would be to use `__pidfd_prepare()` and ensure that the caller only ever calls this on tg-leaders. This would work just fine, imo. And this was my initial approach.

I think Christian preferred an explicit assertion that ensures we do not accidentally hand out pidfds for non-tg-leaders. The question is thus whether there is an easy way to assert this even for reaped tasks? Or whether there is a simple way to flag a pid that was used as tg-leader? Or, ultimately, whether this has limited use and we should just use `__pidfd_prepare()`?

Thanks a lot!
David

2023-08-14 14:42:28

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH] pid: allow pidfds for reaped tasks

On 08/14, David Rheinsberg wrote:
>
> Hi Oleg,
>
> On Fri, Aug 11, 2023, at 1:57 PM, Oleg Nesterov wrote:
> >> What code do we need to allow userspace to open a pidfd to a leader pid
> >> even if it has already been exited and reaped (without also accidently
> >> allowing to open non-lead pid pidfds)?
> >
> > I'll try to think more, but can you also explain why do we need this?
> >
> > See my another email. Can't we simply shift the pid_has_task(PIDTYPE_TGID)
> > check from pidfd_prepare() to pidfd_create() ? (and then we can kill
> > pidfd_prepare and rename __pidfd_prepare to pidfd_prepare).
>
> Yes, the easiest solution would be to use `__pidfd_prepare()` and ensure
> that the caller only ever calls this on tg-leaders. This would work just
> fine, imo. And this was my initial approach.

Great,

> I think Christian preferred an explicit assertion that ensures we do not
> accidentally hand out pidfds for non-tg-leaders. The question is thus whether
> there is an easy way to assert this even for reaped tasks?
> Or whether there is a simple way to flag a pid that was used as tg-leader?

I do not see how can we check if a detached pid was a leader pid, and I don't
think it makes sense to add a new member into struct pid...

> Or, ultimately, whether this has limited use and we should just use
> `__pidfd_prepare()`?

Well, if you confirm that sk->sk_peer_pid and scm->pid are always initialized with
task_tgid(current), I'd certainly prefer this approach unless Christian objects.

Oleg.


2023-08-14 15:30:27

by Alexander Mikhalitsyn

[permalink] [raw]
Subject: Re: [PATCH] pid: allow pidfds for reaped tasks

On Mon, Aug 14, 2023 at 3:21 PM Oleg Nesterov <[email protected]> wrote:
>
> On 08/14, David Rheinsberg wrote:
> >
> > Hi Oleg,
> >
> > On Fri, Aug 11, 2023, at 1:57 PM, Oleg Nesterov wrote:
> > >> What code do we need to allow userspace to open a pidfd to a leader pid
> > >> even if it has already been exited and reaped (without also accidently
> > >> allowing to open non-lead pid pidfds)?
> > >
> > > I'll try to think more, but can you also explain why do we need this?
> > >
> > > See my another email. Can't we simply shift the pid_has_task(PIDTYPE_TGID)
> > > check from pidfd_prepare() to pidfd_create() ? (and then we can kill
> > > pidfd_prepare and rename __pidfd_prepare to pidfd_prepare).
> >
> > Yes, the easiest solution would be to use `__pidfd_prepare()` and ensure
> > that the caller only ever calls this on tg-leaders. This would work just
> > fine, imo. And this was my initial approach.
>
> Great,
>
> > I think Christian preferred an explicit assertion that ensures we do not
> > accidentally hand out pidfds for non-tg-leaders. The question is thus whether
> > there is an easy way to assert this even for reaped tasks?
> > Or whether there is a simple way to flag a pid that was used as tg-leader?
>
> I do not see how can we check if a detached pid was a leader pid, and I don't
> think it makes sense to add a new member into struct pid...
>
> > Or, ultimately, whether this has limited use and we should just use
> > `__pidfd_prepare()`?
>
> Well, if you confirm that sk->sk_peer_pid and scm->pid are always initialized with
> task_tgid(current), I'd certainly prefer this approach unless Christian objects.

Dear colleagues,

I can confirm that sk->sk_peer_pid and scm->pid are always thread-group leaders.

Kind regards,
Alex

>
> Oleg.
>

2023-08-14 16:04:28

by Christian Brauner

[permalink] [raw]
Subject: Re: [PATCH] pid: allow pidfds for reaped tasks

On Mon, Aug 14, 2023 at 03:20:39PM +0200, Oleg Nesterov wrote:
> On 08/14, David Rheinsberg wrote:
> >
> > Hi Oleg,
> >
> > On Fri, Aug 11, 2023, at 1:57 PM, Oleg Nesterov wrote:
> > >> What code do we need to allow userspace to open a pidfd to a leader pid
> > >> even if it has already been exited and reaped (without also accidently
> > >> allowing to open non-lead pid pidfds)?
> > >
> > > I'll try to think more, but can you also explain why do we need this?
> > >
> > > See my another email. Can't we simply shift the pid_has_task(PIDTYPE_TGID)
> > > check from pidfd_prepare() to pidfd_create() ? (and then we can kill
> > > pidfd_prepare and rename __pidfd_prepare to pidfd_prepare).
> >
> > Yes, the easiest solution would be to use `__pidfd_prepare()` and ensure
> > that the caller only ever calls this on tg-leaders. This would work just
> > fine, imo. And this was my initial approach.
>
> Great,
>
> > I think Christian preferred an explicit assertion that ensures we do not
> > accidentally hand out pidfds for non-tg-leaders. The question is thus whether
> > there is an easy way to assert this even for reaped tasks?
> > Or whether there is a simple way to flag a pid that was used as tg-leader?
>
> I do not see how can we check if a detached pid was a leader pid, and I don't
> think it makes sense to add a new member into struct pid...
>
> > Or, ultimately, whether this has limited use and we should just use
> > `__pidfd_prepare()`?
>
> Well, if you confirm that sk->sk_peer_pid and scm->pid are always initialized with
> task_tgid(current), I'd certainly prefer this approach unless Christian objects.

No no, I'm absolutely not objecting. I specifically want you to take the
opinionated lead here. :) Thanks for chiming in!