2022-07-21 11:31:04

by Lee Jones

[permalink] [raw]
Subject: [PATCH 1/1] bpf: Drop unprotected find_vpid() in favour of find_get_pid()

The documentation for find_pid() clearly states:

"Must be called with the tasklist_lock or rcu_read_lock() held."

Presently we do neither.

In an ideal world we would wrap the in-lined call to find_vpid() along
with get_pid_task() in the suggested rcu_read_lock() and have done.
However, looking at get_pid_task()'s internals, it already does that
independently, so this would lead to deadlock.

Instead, we'll use find_get_pid() which searches for the vpid, then
takes a reference to it preventing early free, all within the safety
of rcu_read_lock(). Once we have our reference we can safely make use
of it up until the point it is put.

Cc: Alexei Starovoitov <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Cc: John Fastabend <[email protected]>
Cc: Andrii Nakryiko <[email protected]>
Cc: Martin KaFai Lau <[email protected]>
Cc: Song Liu <[email protected]>
Cc: Yonghong Song <[email protected]>
Cc: KP Singh <[email protected]>
Cc: Stanislav Fomichev <[email protected]>
Cc: Hao Luo <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: [email protected]
Fixes: 41bdc4b40ed6f ("bpf: introduce bpf subcommand BPF_TASK_FD_QUERY")
Signed-off-by: Lee Jones <[email protected]>
---
kernel/bpf/syscall.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 83c7136c5788d..c20cff30581c4 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -4385,6 +4385,7 @@ static int bpf_task_fd_query(const union bpf_attr *attr,
const struct perf_event *event;
struct task_struct *task;
struct file *file;
+ struct pid *ppid;
int err;

if (CHECK_ATTR(BPF_TASK_FD_QUERY))
@@ -4396,7 +4397,9 @@ static int bpf_task_fd_query(const union bpf_attr *attr,
if (attr->task_fd_query.flags != 0)
return -EINVAL;

- task = get_pid_task(find_vpid(pid), PIDTYPE_PID);
+ ppid = find_get_pid(pid);
+ task = get_pid_task(ppid, PIDTYPE_PID);
+ put_pid(ppid);
if (!task)
return -ENOENT;

--
2.37.0.170.g444d1eabd0-goog


2022-07-21 12:01:13

by Lee Jones

[permalink] [raw]
Subject: Re: [PATCH 1/1] bpf: Drop unprotected find_vpid() in favour of find_get_pid()

On Thu, 21 Jul 2022, Jiri Olsa wrote:

> On Thu, Jul 21, 2022 at 12:14:30PM +0100, Lee Jones wrote:
> > The documentation for find_pid() clearly states:
> >
> > "Must be called with the tasklist_lock or rcu_read_lock() held."
> >
> > Presently we do neither.
> >
> > In an ideal world we would wrap the in-lined call to find_vpid() along
> > with get_pid_task() in the suggested rcu_read_lock() and have done.
> > However, looking at get_pid_task()'s internals, it already does that
> > independently, so this would lead to deadlock.
>
> hm, we can have nested rcu_read_lock calls, right?

I assumed not, but that might be an oversight on my part.

Would that be your preference?

> > Instead, we'll use find_get_pid() which searches for the vpid, then
> > takes a reference to it preventing early free, all within the safety
> > of rcu_read_lock(). Once we have our reference we can safely make use
> > of it up until the point it is put.
> >
> > Cc: Alexei Starovoitov <[email protected]>
> > Cc: Daniel Borkmann <[email protected]>
> > Cc: John Fastabend <[email protected]>
> > Cc: Andrii Nakryiko <[email protected]>
> > Cc: Martin KaFai Lau <[email protected]>
> > Cc: Song Liu <[email protected]>
> > Cc: Yonghong Song <[email protected]>
> > Cc: KP Singh <[email protected]>
> > Cc: Stanislav Fomichev <[email protected]>
> > Cc: Hao Luo <[email protected]>
> > Cc: Jiri Olsa <[email protected]>
> > Cc: [email protected]
> > Fixes: 41bdc4b40ed6f ("bpf: introduce bpf subcommand BPF_TASK_FD_QUERY")
> > Signed-off-by: Lee Jones <[email protected]>
> > ---
> > kernel/bpf/syscall.c | 5 ++++-
> > 1 file changed, 4 insertions(+), 1 deletion(-)
> >
> > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> > index 83c7136c5788d..c20cff30581c4 100644
> > --- a/kernel/bpf/syscall.c
> > +++ b/kernel/bpf/syscall.c
> > @@ -4385,6 +4385,7 @@ static int bpf_task_fd_query(const union bpf_attr *attr,
> > const struct perf_event *event;
> > struct task_struct *task;
> > struct file *file;
> > + struct pid *ppid;
> > int err;
> >
> > if (CHECK_ATTR(BPF_TASK_FD_QUERY))
> > @@ -4396,7 +4397,9 @@ static int bpf_task_fd_query(const union bpf_attr *attr,
> > if (attr->task_fd_query.flags != 0)
> > return -EINVAL;
> >
> > - task = get_pid_task(find_vpid(pid), PIDTYPE_PID);
> > + ppid = find_get_pid(pid);
> > + task = get_pid_task(ppid, PIDTYPE_PID);
> > + put_pid(ppid);
> > if (!task)
> > return -ENOENT;
> >

--
Lee Jones [李琼斯]

2022-07-21 12:51:11

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH 1/1] bpf: Drop unprotected find_vpid() in favour of find_get_pid()

On Thu, Jul 21, 2022 at 12:14:30PM +0100, Lee Jones wrote:
> The documentation for find_pid() clearly states:
>
> "Must be called with the tasklist_lock or rcu_read_lock() held."
>
> Presently we do neither.
>
> In an ideal world we would wrap the in-lined call to find_vpid() along
> with get_pid_task() in the suggested rcu_read_lock() and have done.
> However, looking at get_pid_task()'s internals, it already does that
> independently, so this would lead to deadlock.

hm, we can have nested rcu_read_lock calls, right?

jirka

>
> Instead, we'll use find_get_pid() which searches for the vpid, then
> takes a reference to it preventing early free, all within the safety
> of rcu_read_lock(). Once we have our reference we can safely make use
> of it up until the point it is put.
>
> Cc: Alexei Starovoitov <[email protected]>
> Cc: Daniel Borkmann <[email protected]>
> Cc: John Fastabend <[email protected]>
> Cc: Andrii Nakryiko <[email protected]>
> Cc: Martin KaFai Lau <[email protected]>
> Cc: Song Liu <[email protected]>
> Cc: Yonghong Song <[email protected]>
> Cc: KP Singh <[email protected]>
> Cc: Stanislav Fomichev <[email protected]>
> Cc: Hao Luo <[email protected]>
> Cc: Jiri Olsa <[email protected]>
> Cc: [email protected]
> Fixes: 41bdc4b40ed6f ("bpf: introduce bpf subcommand BPF_TASK_FD_QUERY")
> Signed-off-by: Lee Jones <[email protected]>
> ---
> kernel/bpf/syscall.c | 5 ++++-
> 1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index 83c7136c5788d..c20cff30581c4 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -4385,6 +4385,7 @@ static int bpf_task_fd_query(const union bpf_attr *attr,
> const struct perf_event *event;
> struct task_struct *task;
> struct file *file;
> + struct pid *ppid;
> int err;
>
> if (CHECK_ATTR(BPF_TASK_FD_QUERY))
> @@ -4396,7 +4397,9 @@ static int bpf_task_fd_query(const union bpf_attr *attr,
> if (attr->task_fd_query.flags != 0)
> return -EINVAL;
>
> - task = get_pid_task(find_vpid(pid), PIDTYPE_PID);
> + ppid = find_get_pid(pid);
> + task = get_pid_task(ppid, PIDTYPE_PID);
> + put_pid(ppid);
> if (!task)
> return -ENOENT;
>
> --
> 2.37.0.170.g444d1eabd0-goog
>

2022-07-21 12:59:57

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH 1/1] bpf: Drop unprotected find_vpid() in favour of find_get_pid()

On Thu, Jul 21, 2022 at 12:59:09PM +0100, Lee Jones wrote:
> On Thu, 21 Jul 2022, Jiri Olsa wrote:
>
> > On Thu, Jul 21, 2022 at 12:14:30PM +0100, Lee Jones wrote:
> > > The documentation for find_pid() clearly states:

typo find_vpid

> > >
> > > "Must be called with the tasklist_lock or rcu_read_lock() held."
> > >
> > > Presently we do neither.

just curious, did you see crash related to this or you just spot that

> > >
> > > In an ideal world we would wrap the in-lined call to find_vpid() along
> > > with get_pid_task() in the suggested rcu_read_lock() and have done.
> > > However, looking at get_pid_task()'s internals, it already does that
> > > independently, so this would lead to deadlock.
> >
> > hm, we can have nested rcu_read_lock calls, right?
>
> I assumed not, but that might be an oversight on my part.
>
> Would that be your preference?

seems simpler than calling get/put for ppid

jirka

>
> > > Instead, we'll use find_get_pid() which searches for the vpid, then
> > > takes a reference to it preventing early free, all within the safety
> > > of rcu_read_lock(). Once we have our reference we can safely make use
> > > of it up until the point it is put.
> > >
> > > Cc: Alexei Starovoitov <[email protected]>
> > > Cc: Daniel Borkmann <[email protected]>
> > > Cc: John Fastabend <[email protected]>
> > > Cc: Andrii Nakryiko <[email protected]>
> > > Cc: Martin KaFai Lau <[email protected]>
> > > Cc: Song Liu <[email protected]>
> > > Cc: Yonghong Song <[email protected]>
> > > Cc: KP Singh <[email protected]>
> > > Cc: Stanislav Fomichev <[email protected]>
> > > Cc: Hao Luo <[email protected]>
> > > Cc: Jiri Olsa <[email protected]>
> > > Cc: [email protected]
> > > Fixes: 41bdc4b40ed6f ("bpf: introduce bpf subcommand BPF_TASK_FD_QUERY")
> > > Signed-off-by: Lee Jones <[email protected]>
> > > ---
> > > kernel/bpf/syscall.c | 5 ++++-
> > > 1 file changed, 4 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> > > index 83c7136c5788d..c20cff30581c4 100644
> > > --- a/kernel/bpf/syscall.c
> > > +++ b/kernel/bpf/syscall.c
> > > @@ -4385,6 +4385,7 @@ static int bpf_task_fd_query(const union bpf_attr *attr,
> > > const struct perf_event *event;
> > > struct task_struct *task;
> > > struct file *file;
> > > + struct pid *ppid;
> > > int err;
> > >
> > > if (CHECK_ATTR(BPF_TASK_FD_QUERY))
> > > @@ -4396,7 +4397,9 @@ static int bpf_task_fd_query(const union bpf_attr *attr,
> > > if (attr->task_fd_query.flags != 0)
> > > return -EINVAL;
> > >
> > > - task = get_pid_task(find_vpid(pid), PIDTYPE_PID);
> > > + ppid = find_get_pid(pid);
> > > + task = get_pid_task(ppid, PIDTYPE_PID);
> > > + put_pid(ppid);
> > > if (!task)
> > > return -ENOENT;
> > >
>
> --
> Lee Jones [李琼斯]

2022-07-21 16:07:59

by Yonghong Song

[permalink] [raw]
Subject: Re: [PATCH 1/1] bpf: Drop unprotected find_vpid() in favour of find_get_pid()



On 7/21/22 5:14 AM, Jiri Olsa wrote:
> On Thu, Jul 21, 2022 at 12:59:09PM +0100, Lee Jones wrote:
>> On Thu, 21 Jul 2022, Jiri Olsa wrote:
>>
>>> On Thu, Jul 21, 2022 at 12:14:30PM +0100, Lee Jones wrote:
>>>> The documentation for find_pid() clearly states:
>
> typo find_vpid
>
>>>>
>>>> "Must be called with the tasklist_lock or rcu_read_lock() held."
>>>>
>>>> Presently we do neither.
>
> just curious, did you see crash related to this or you just spot that
>
>>>>
>>>> In an ideal world we would wrap the in-lined call to find_vpid() along
>>>> with get_pid_task() in the suggested rcu_read_lock() and have done.
>>>> However, looking at get_pid_task()'s internals, it already does that
>>>> independently, so this would lead to deadlock.
>>>
>>> hm, we can have nested rcu_read_lock calls, right?
>>
>> I assumed not, but that might be an oversight on my part.

From kernel documentation, nested rcu_read_lock is allowed.
https://www.kernel.org/doc/Documentation/RCU/Design/Requirements/Requirements.html

RCU's grace-period guarantee allows updaters to wait for the completion
of all pre-existing RCU read-side critical sections. An RCU read-side
critical section begins with the marker rcu_read_lock() and ends with
the marker rcu_read_unlock(). These markers may be nested, and RCU
treats a nested set as one big RCU read-side critical section.
Production-quality implementations of rcu_read_lock() and
rcu_read_unlock() are extremely lightweight, and in fact have exactly
zero overhead in Linux kernels built for production use with
CONFIG_PREEMPT=n.

>>
>> Would that be your preference?
>
> seems simpler than calling get/put for ppid

The current implementation seems okay since we can hide
rcu_read_lock() inside find_get_pid(). We can also avoid
nested rcu_read_lock(), which is although allowed but
not pretty.

>
> jirka
>
>>
>>>> Instead, we'll use find_get_pid() which searches for the vpid, then
>>>> takes a reference to it preventing early free, all within the safety
>>>> of rcu_read_lock(). Once we have our reference we can safely make use
>>>> of it up until the point it is put.
>>>>
>>>> Cc: Alexei Starovoitov <[email protected]>
>>>> Cc: Daniel Borkmann <[email protected]>
>>>> Cc: John Fastabend <[email protected]>
>>>> Cc: Andrii Nakryiko <[email protected]>
>>>> Cc: Martin KaFai Lau <[email protected]>
>>>> Cc: Song Liu <[email protected]>
>>>> Cc: Yonghong Song <[email protected]>
>>>> Cc: KP Singh <[email protected]>
>>>> Cc: Stanislav Fomichev <[email protected]>
>>>> Cc: Hao Luo <[email protected]>
>>>> Cc: Jiri Olsa <[email protected]>
>>>> Cc: [email protected]
>>>> Fixes: 41bdc4b40ed6f ("bpf: introduce bpf subcommand BPF_TASK_FD_QUERY")
>>>> Signed-off-by: Lee Jones <[email protected]>
>>>> ---
>>>> kernel/bpf/syscall.c | 5 ++++-
>>>> 1 file changed, 4 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
>>>> index 83c7136c5788d..c20cff30581c4 100644
>>>> --- a/kernel/bpf/syscall.c
>>>> +++ b/kernel/bpf/syscall.c
>>>> @@ -4385,6 +4385,7 @@ static int bpf_task_fd_query(const union bpf_attr *attr,
>>>> const struct perf_event *event;
>>>> struct task_struct *task;
>>>> struct file *file;
>>>> + struct pid *ppid;
>>>> int err;
>>>>
>>>> if (CHECK_ATTR(BPF_TASK_FD_QUERY))
>>>> @@ -4396,7 +4397,9 @@ static int bpf_task_fd_query(const union bpf_attr *attr,
>>>> if (attr->task_fd_query.flags != 0)
>>>> return -EINVAL;
>>>>
>>>> - task = get_pid_task(find_vpid(pid), PIDTYPE_PID);
>>>> + ppid = find_get_pid(pid);
>>>> + task = get_pid_task(ppid, PIDTYPE_PID);
>>>> + put_pid(ppid);
>>>> if (!task)
>>>> return -ENOENT;
>>>>
>>
>> --
>> Lee Jones [李琼斯]

2022-07-21 20:59:23

by Lee Jones

[permalink] [raw]
Subject: Re: [PATCH 1/1] bpf: Drop unprotected find_vpid() in favour of find_get_pid()

On Thu, 21 Jul 2022, Yonghong Song wrote:

>
>
> On 7/21/22 5:14 AM, Jiri Olsa wrote:
> > On Thu, Jul 21, 2022 at 12:59:09PM +0100, Lee Jones wrote:
> > > On Thu, 21 Jul 2022, Jiri Olsa wrote:
> > >
> > > > On Thu, Jul 21, 2022 at 12:14:30PM +0100, Lee Jones wrote:
> > > > > The documentation for find_pid() clearly states:
> >
> > typo find_vpid
> >
> > > > >
> > > > > "Must be called with the tasklist_lock or rcu_read_lock() held."
> > > > >
> > > > > Presently we do neither.
> >
> > just curious, did you see crash related to this or you just spot that
> >
> > > > >
> > > > > In an ideal world we would wrap the in-lined call to find_vpid() along
> > > > > with get_pid_task() in the suggested rcu_read_lock() and have done.
> > > > > However, looking at get_pid_task()'s internals, it already does that
> > > > > independently, so this would lead to deadlock.
> > > >
> > > > hm, we can have nested rcu_read_lock calls, right?
> > >
> > > I assumed not, but that might be an oversight on my part.
>
> From kernel documentation, nested rcu_read_lock is allowed.
> https://www.kernel.org/doc/Documentation/RCU/Design/Requirements/Requirements.html
>
> RCU's grace-period guarantee allows updaters to wait for the completion of
> all pre-existing RCU read-side critical sections. An RCU read-side critical
> section begins with the marker rcu_read_lock() and ends with the marker
> rcu_read_unlock(). These markers may be nested, and RCU treats a nested set
> as one big RCU read-side critical section. Production-quality
> implementations of rcu_read_lock() and rcu_read_unlock() are extremely
> lightweight, and in fact have exactly zero overhead in Linux kernels built
> for production use with CONFIG_PREEMPT=n.
>
> > >
> > > Would that be your preference?
> >
> > seems simpler than calling get/put for ppid
>
> The current implementation seems okay since we can hide
> rcu_read_lock() inside find_get_pid(). We can also avoid
> nested rcu_read_lock(), which is although allowed but
> not pretty.

Right, this was my thinking.

Happy to go with whatever you guys decide though.

Make the call and I'll rework, or not.

--
Lee Jones [李琼斯]

2022-07-22 20:36:20

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH 1/1] bpf: Drop unprotected find_vpid() in favour of find_get_pid()

On Thu, Jul 21, 2022 at 09:58:00PM +0100, Lee Jones wrote:
> On Thu, 21 Jul 2022, Yonghong Song wrote:
>
> >
> >
> > On 7/21/22 5:14 AM, Jiri Olsa wrote:
> > > On Thu, Jul 21, 2022 at 12:59:09PM +0100, Lee Jones wrote:
> > > > On Thu, 21 Jul 2022, Jiri Olsa wrote:
> > > >
> > > > > On Thu, Jul 21, 2022 at 12:14:30PM +0100, Lee Jones wrote:
> > > > > > The documentation for find_pid() clearly states:
> > >
> > > typo find_vpid
> > >
> > > > > >
> > > > > > "Must be called with the tasklist_lock or rcu_read_lock() held."
> > > > > >
> > > > > > Presently we do neither.
> > >
> > > just curious, did you see crash related to this or you just spot that
> > >
> > > > > >
> > > > > > In an ideal world we would wrap the in-lined call to find_vpid() along
> > > > > > with get_pid_task() in the suggested rcu_read_lock() and have done.
> > > > > > However, looking at get_pid_task()'s internals, it already does that
> > > > > > independently, so this would lead to deadlock.
> > > > >
> > > > > hm, we can have nested rcu_read_lock calls, right?
> > > >
> > > > I assumed not, but that might be an oversight on my part.
> >
> > From kernel documentation, nested rcu_read_lock is allowed.
> > https://www.kernel.org/doc/Documentation/RCU/Design/Requirements/Requirements.html
> >
> > RCU's grace-period guarantee allows updaters to wait for the completion of
> > all pre-existing RCU read-side critical sections. An RCU read-side critical
> > section begins with the marker rcu_read_lock() and ends with the marker
> > rcu_read_unlock(). These markers may be nested, and RCU treats a nested set
> > as one big RCU read-side critical section. Production-quality
> > implementations of rcu_read_lock() and rcu_read_unlock() are extremely
> > lightweight, and in fact have exactly zero overhead in Linux kernels built
> > for production use with CONFIG_PREEMPT=n.
> >
> > > >
> > > > Would that be your preference?
> > >
> > > seems simpler than calling get/put for ppid
> >
> > The current implementation seems okay since we can hide
> > rcu_read_lock() inside find_get_pid(). We can also avoid
> > nested rcu_read_lock(), which is although allowed but
> > not pretty.
>
> Right, this was my thinking.
>
> Happy to go with whatever you guys decide though.
>
> Make the call and I'll rework, or not.

ok, I can live with the current version ;-) could you please resend
with fixed changelog?

thanks,
jirka