2020-10-30 15:28:19

by Qian Cai

[permalink] [raw]
Subject: [PATCH -next] fs: Fix memory leaks in do_renameat2() error paths

We will need to call putname() before do_renameat2() returning -EINVAL
to avoid memory leaks.

Fixes: 3c5499fa56f5 ("fs: make do_renameat2() take struct filename")
Signed-off-by: Qian Cai <[email protected]>
---
fs/namei.c | 12 ++++++++++--
1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 27f5a4e025fd..9dc5e1b139c9 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -4362,11 +4362,11 @@ int do_renameat2(int olddfd, struct filename *oldname, int newdfd,
int error;

if (flags & ~(RENAME_NOREPLACE | RENAME_EXCHANGE | RENAME_WHITEOUT))
- return -EINVAL;
+ goto out;

if ((flags & (RENAME_NOREPLACE | RENAME_WHITEOUT)) &&
(flags & RENAME_EXCHANGE))
- return -EINVAL;
+ goto out;

if (flags & RENAME_EXCHANGE)
target_flags = 0;
@@ -4486,6 +4486,14 @@ int do_renameat2(int olddfd, struct filename *oldname, int newdfd,
}
exit:
return error;
+out:
+ if (!IS_ERR(oldname))
+ putname(oldname);
+
+ if (!IS_ERR(newname))
+ putname(newname);
+
+ return -EINVAL;
}

SYSCALL_DEFINE5(renameat2, int, olddfd, const char __user *, oldname,
--
2.28.0


2020-10-30 15:29:33

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH -next] fs: Fix memory leaks in do_renameat2() error paths

On 10/30/20 9:24 AM, Qian Cai wrote:
> We will need to call putname() before do_renameat2() returning -EINVAL
> to avoid memory leaks.

Thanks, should mention that this isn't final by any stretch (which is
why it hasn't been posted yet), just pushed out for some exposure.

--
Jens Axboe

2020-10-30 15:55:49

by Qian Cai

[permalink] [raw]
Subject: Re: [PATCH -next] fs: Fix memory leaks in do_renameat2() error paths

On Fri, 2020-10-30 at 09:27 -0600, Jens Axboe wrote:
> On 10/30/20 9:24 AM, Qian Cai wrote:
> > We will need to call putname() before do_renameat2() returning -EINVAL
> > to avoid memory leaks.
>
> Thanks, should mention that this isn't final by any stretch (which is
> why it hasn't been posted yet), just pushed out for some exposure.

I don't know what other people think about this, but I do find a bit
discouraging in testing those half-baked patches in linux-next where it does not
even ready to post for a review.

https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=3c5499fa56f568005648e6e38201f8ae9ab88015

2020-10-30 16:53:48

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH -next] fs: Fix memory leaks in do_renameat2() error paths

On 10/30/20 9:52 AM, Qian Cai wrote:
> On Fri, 2020-10-30 at 09:27 -0600, Jens Axboe wrote:
>> On 10/30/20 9:24 AM, Qian Cai wrote:
>>> We will need to call putname() before do_renameat2() returning -EINVAL
>>> to avoid memory leaks.
>>
>> Thanks, should mention that this isn't final by any stretch (which is
>> why it hasn't been posted yet), just pushed out for some exposure.
>
> I don't know what other people think about this, but I do find a bit
> discouraging in testing those half-baked patches in linux-next where it does not
> even ready to post for a review.

I don't disagree with that and this doesn't normally happen. I don't
want to get into the reasonings, but things had to be shuffled which is
why they ended up in -next before being posted this week. They will go
out shortly.

--
Jens Axboe

2020-10-30 18:47:01

by Al Viro

[permalink] [raw]
Subject: Re: [PATCH -next] fs: Fix memory leaks in do_renameat2() error paths

On Fri, Oct 30, 2020 at 11:24:07AM -0400, Qian Cai wrote:
> We will need to call putname() before do_renameat2() returning -EINVAL
> to avoid memory leaks.
>
> Fixes: 3c5499fa56f5 ("fs: make do_renameat2() take struct filename")
> Signed-off-by: Qian Cai <[email protected]>

May I ask where has the original commit been posted for review? And why
the bleeding hell does io_uring touch rename-related codepaths in the
first place?

2020-10-30 18:48:33

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH -next] fs: Fix memory leaks in do_renameat2() error paths

On 10/30/20 12:42 PM, Al Viro wrote:
> On Fri, Oct 30, 2020 at 11:24:07AM -0400, Qian Cai wrote:
>> We will need to call putname() before do_renameat2() returning -EINVAL
>> to avoid memory leaks.
>>
>> Fixes: 3c5499fa56f5 ("fs: make do_renameat2() take struct filename")
>> Signed-off-by: Qian Cai <[email protected]>
>
> May I ask where has the original commit been posted for review? And why
> the bleeding hell does io_uring touch rename-related codepaths in the
> first place?

See other reply, it's being posted soon, just haven't gotten there yet
and it wasn't ready.

It's a prep patch so we can call do_renameat2 and pass in a filename
instead. The intent is not to have any functional changes in that prep
patch. But once we can pass in filenames instead of user pointers, it's
usable from io_uring.

I'll post this as soon as I get around to it, it's been on the back
burner for the last month or so.

--
Jens Axboe

2020-10-30 18:52:11

by Al Viro

[permalink] [raw]
Subject: Re: [PATCH -next] fs: Fix memory leaks in do_renameat2() error paths

On Fri, Oct 30, 2020 at 12:46:26PM -0600, Jens Axboe wrote:

> See other reply, it's being posted soon, just haven't gotten there yet
> and it wasn't ready.
>
> It's a prep patch so we can call do_renameat2 and pass in a filename
> instead. The intent is not to have any functional changes in that prep
> patch. But once we can pass in filenames instead of user pointers, it's
> usable from io_uring.

You do realize that pathname resolution is *NOT* offloadable to helper
threads, I hope...

2020-10-30 20:36:33

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH -next] fs: Fix memory leaks in do_renameat2() error paths

On 10/30/20 12:49 PM, Al Viro wrote:
> On Fri, Oct 30, 2020 at 12:46:26PM -0600, Jens Axboe wrote:
>
>> See other reply, it's being posted soon, just haven't gotten there yet
>> and it wasn't ready.
>>
>> It's a prep patch so we can call do_renameat2 and pass in a filename
>> instead. The intent is not to have any functional changes in that prep
>> patch. But once we can pass in filenames instead of user pointers, it's
>> usable from io_uring.
>
> You do realize that pathname resolution is *NOT* offloadable to helper
> threads, I hope...

How so? If we have all the necessary context assigned, what's preventing
it from working?

--
Jens Axboe

2020-10-30 22:26:13

by Al Viro

[permalink] [raw]
Subject: Re: [PATCH -next] fs: Fix memory leaks in do_renameat2() error paths

On Fri, Oct 30, 2020 at 02:33:11PM -0600, Jens Axboe wrote:
> On 10/30/20 12:49 PM, Al Viro wrote:
> > On Fri, Oct 30, 2020 at 12:46:26PM -0600, Jens Axboe wrote:
> >
> >> See other reply, it's being posted soon, just haven't gotten there yet
> >> and it wasn't ready.
> >>
> >> It's a prep patch so we can call do_renameat2 and pass in a filename
> >> instead. The intent is not to have any functional changes in that prep
> >> patch. But once we can pass in filenames instead of user pointers, it's
> >> usable from io_uring.
> >
> > You do realize that pathname resolution is *NOT* offloadable to helper
> > threads, I hope...
>
> How so? If we have all the necessary context assigned, what's preventing
> it from working?

Semantics of /proc/self/..., for starters (and things like /proc/mounts, etc.
*do* pass through that, /dev/stdin included)

2020-10-30 23:24:57

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH -next] fs: Fix memory leaks in do_renameat2() error paths

On 10/30/20 4:22 PM, Al Viro wrote:
> On Fri, Oct 30, 2020 at 02:33:11PM -0600, Jens Axboe wrote:
>> On 10/30/20 12:49 PM, Al Viro wrote:
>>> On Fri, Oct 30, 2020 at 12:46:26PM -0600, Jens Axboe wrote:
>>>
>>>> See other reply, it's being posted soon, just haven't gotten there yet
>>>> and it wasn't ready.
>>>>
>>>> It's a prep patch so we can call do_renameat2 and pass in a filename
>>>> instead. The intent is not to have any functional changes in that prep
>>>> patch. But once we can pass in filenames instead of user pointers, it's
>>>> usable from io_uring.
>>>
>>> You do realize that pathname resolution is *NOT* offloadable to helper
>>> threads, I hope...
>>
>> How so? If we have all the necessary context assigned, what's preventing
>> it from working?
>
> Semantics of /proc/self/..., for starters (and things like /proc/mounts, etc.
> *do* pass through that, /dev/stdin included)

Don't we just need ->thread_pid for that to work?

--
Jens Axboe

2020-11-02 18:44:57

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH -next] fs: Fix memory leaks in do_renameat2() error paths

Jens Axboe <[email protected]> writes:

> On 10/30/20 4:22 PM, Al Viro wrote:
>> On Fri, Oct 30, 2020 at 02:33:11PM -0600, Jens Axboe wrote:
>>> On 10/30/20 12:49 PM, Al Viro wrote:
>>>> On Fri, Oct 30, 2020 at 12:46:26PM -0600, Jens Axboe wrote:
>>>>
>>>>> See other reply, it's being posted soon, just haven't gotten there yet
>>>>> and it wasn't ready.
>>>>>
>>>>> It's a prep patch so we can call do_renameat2 and pass in a filename
>>>>> instead. The intent is not to have any functional changes in that prep
>>>>> patch. But once we can pass in filenames instead of user pointers, it's
>>>>> usable from io_uring.
>>>>
>>>> You do realize that pathname resolution is *NOT* offloadable to helper
>>>> threads, I hope...
>>>
>>> How so? If we have all the necessary context assigned, what's preventing
>>> it from working?
>>
>> Semantics of /proc/self/..., for starters (and things like /proc/mounts, etc.
>> *do* pass through that, /dev/stdin included)
>
> Don't we just need ->thread_pid for that to work?

Are you proposing changing the pid of a kernel thread to get that?

Currently it is an invariant in the kernel that pids do not change.

Eric

2020-11-02 19:30:02

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH -next] fs: Fix memory leaks in do_renameat2() error paths

Jens Axboe <[email protected]> writes:

> On 10/30/20 4:22 PM, Al Viro wrote:
>> On Fri, Oct 30, 2020 at 02:33:11PM -0600, Jens Axboe wrote:
>>> On 10/30/20 12:49 PM, Al Viro wrote:
>>>> On Fri, Oct 30, 2020 at 12:46:26PM -0600, Jens Axboe wrote:
>>>>
>>>>> See other reply, it's being posted soon, just haven't gotten there yet
>>>>> and it wasn't ready.
>>>>>
>>>>> It's a prep patch so we can call do_renameat2 and pass in a filename
>>>>> instead. The intent is not to have any functional changes in that prep
>>>>> patch. But once we can pass in filenames instead of user pointers, it's
>>>>> usable from io_uring.
>>>>
>>>> You do realize that pathname resolution is *NOT* offloadable to helper
>>>> threads, I hope...
>>>
>>> How so? If we have all the necessary context assigned, what's preventing
>>> it from working?
>>
>> Semantics of /proc/self/..., for starters (and things like /proc/mounts, etc.
>> *do* pass through that, /dev/stdin included)
>
> Don't we just need ->thread_pid for that to work?

No. You need ->signal.

You need ->signal->pids[PIDTYPE_TGID]. It is only for /proc/thread-self
that ->thread_pid is needed.

Even more so than ->thread_pid, it is a kernel invariant that ->signal
does not change.

Eric


2020-11-02 20:03:20

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH -next] fs: Fix memory leaks in do_renameat2() error paths

On 11/2/20 12:27 PM, Eric W. Biederman wrote:
> Jens Axboe <[email protected]> writes:
>
>> On 10/30/20 4:22 PM, Al Viro wrote:
>>> On Fri, Oct 30, 2020 at 02:33:11PM -0600, Jens Axboe wrote:
>>>> On 10/30/20 12:49 PM, Al Viro wrote:
>>>>> On Fri, Oct 30, 2020 at 12:46:26PM -0600, Jens Axboe wrote:
>>>>>
>>>>>> See other reply, it's being posted soon, just haven't gotten there yet
>>>>>> and it wasn't ready.
>>>>>>
>>>>>> It's a prep patch so we can call do_renameat2 and pass in a filename
>>>>>> instead. The intent is not to have any functional changes in that prep
>>>>>> patch. But once we can pass in filenames instead of user pointers, it's
>>>>>> usable from io_uring.
>>>>>
>>>>> You do realize that pathname resolution is *NOT* offloadable to helper
>>>>> threads, I hope...
>>>>
>>>> How so? If we have all the necessary context assigned, what's preventing
>>>> it from working?
>>>
>>> Semantics of /proc/self/..., for starters (and things like /proc/mounts, etc.
>>> *do* pass through that, /dev/stdin included)
>>
>> Don't we just need ->thread_pid for that to work?
>
> No. You need ->signal.
>
> You need ->signal->pids[PIDTYPE_TGID]. It is only for /proc/thread-self
> that ->thread_pid is needed.
>
> Even more so than ->thread_pid, it is a kernel invariant that ->signal
> does not change.

I don't care about the pid itself, my suggestion was to assign ->thread_pid
over the lookup operation to ensure that /proc/self/ worked the way that
you'd expect.

--
Jens Axboe

2020-11-02 20:15:39

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH -next] fs: Fix memory leaks in do_renameat2() error paths

Jens Axboe <[email protected]> writes:

> On 11/2/20 12:27 PM, Eric W. Biederman wrote:
>> Jens Axboe <[email protected]> writes:
>>
>>> On 10/30/20 4:22 PM, Al Viro wrote:
>>>> On Fri, Oct 30, 2020 at 02:33:11PM -0600, Jens Axboe wrote:
>>>>> On 10/30/20 12:49 PM, Al Viro wrote:
>>>>>> On Fri, Oct 30, 2020 at 12:46:26PM -0600, Jens Axboe wrote:
>>>>>>
>>>>>>> See other reply, it's being posted soon, just haven't gotten there yet
>>>>>>> and it wasn't ready.
>>>>>>>
>>>>>>> It's a prep patch so we can call do_renameat2 and pass in a filename
>>>>>>> instead. The intent is not to have any functional changes in that prep
>>>>>>> patch. But once we can pass in filenames instead of user pointers, it's
>>>>>>> usable from io_uring.
>>>>>>
>>>>>> You do realize that pathname resolution is *NOT* offloadable to helper
>>>>>> threads, I hope...
>>>>>
>>>>> How so? If we have all the necessary context assigned, what's preventing
>>>>> it from working?
>>>>
>>>> Semantics of /proc/self/..., for starters (and things like /proc/mounts, etc.
>>>> *do* pass through that, /dev/stdin included)
>>>
>>> Don't we just need ->thread_pid for that to work?
>>
>> No. You need ->signal.
>>
>> You need ->signal->pids[PIDTYPE_TGID]. It is only for /proc/thread-self
>> that ->thread_pid is needed.
>>
>> Even more so than ->thread_pid, it is a kernel invariant that ->signal
>> does not change.
>
> I don't care about the pid itself, my suggestion was to assign ->thread_pid
> over the lookup operation to ensure that /proc/self/ worked the way that
> you'd expect.

I understand that.

However /proc/self/ refers to the current process not to the current
thread. So ->thread_pid is not what you need to assign to make that
happen. What the code looks at is: ->signal->pids[PIDTYPE_TGID].

It will definitely break invariants to assign to ->signal.

Currently only exchange_tids assigns ->thread_pid and it is nasty. It
results in code that potentially results in infinite loops in
kernel/signal.c

To my knowledge nothing assigns ->signal->pids[PIDTYPE_TGID]. At best
it might work but I expect the it would completely confuse something in
the pid to task or pid to process mappings. Which is to say even if it
does work it would be an extremely fragile solution.

Eric

2020-11-02 20:35:19

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH -next] fs: Fix memory leaks in do_renameat2() error paths

On 11/2/20 1:12 PM, Eric W. Biederman wrote:
> Jens Axboe <[email protected]> writes:
>
>> On 11/2/20 12:27 PM, Eric W. Biederman wrote:
>>> Jens Axboe <[email protected]> writes:
>>>
>>>> On 10/30/20 4:22 PM, Al Viro wrote:
>>>>> On Fri, Oct 30, 2020 at 02:33:11PM -0600, Jens Axboe wrote:
>>>>>> On 10/30/20 12:49 PM, Al Viro wrote:
>>>>>>> On Fri, Oct 30, 2020 at 12:46:26PM -0600, Jens Axboe wrote:
>>>>>>>
>>>>>>>> See other reply, it's being posted soon, just haven't gotten there yet
>>>>>>>> and it wasn't ready.
>>>>>>>>
>>>>>>>> It's a prep patch so we can call do_renameat2 and pass in a filename
>>>>>>>> instead. The intent is not to have any functional changes in that prep
>>>>>>>> patch. But once we can pass in filenames instead of user pointers, it's
>>>>>>>> usable from io_uring.
>>>>>>>
>>>>>>> You do realize that pathname resolution is *NOT* offloadable to helper
>>>>>>> threads, I hope...
>>>>>>
>>>>>> How so? If we have all the necessary context assigned, what's preventing
>>>>>> it from working?
>>>>>
>>>>> Semantics of /proc/self/..., for starters (and things like /proc/mounts, etc.
>>>>> *do* pass through that, /dev/stdin included)
>>>>
>>>> Don't we just need ->thread_pid for that to work?
>>>
>>> No. You need ->signal.
>>>
>>> You need ->signal->pids[PIDTYPE_TGID]. It is only for /proc/thread-self
>>> that ->thread_pid is needed.
>>>
>>> Even more so than ->thread_pid, it is a kernel invariant that ->signal
>>> does not change.
>>
>> I don't care about the pid itself, my suggestion was to assign ->thread_pid
>> over the lookup operation to ensure that /proc/self/ worked the way that
>> you'd expect.
>
> I understand that.
>
> However /proc/self/ refers to the current process not to the current
> thread. So ->thread_pid is not what you need to assign to make that
> happen. What the code looks at is: ->signal->pids[PIDTYPE_TGID].
>
> It will definitely break invariants to assign to ->signal.
>
> Currently only exchange_tids assigns ->thread_pid and it is nasty. It
> results in code that potentially results in infinite loops in
> kernel/signal.c
>
> To my knowledge nothing assigns ->signal->pids[PIDTYPE_TGID]. At best
> it might work but I expect the it would completely confuse something in
> the pid to task or pid to process mappings. Which is to say even if it
> does work it would be an extremely fragile solution.

Thanks Eric, that's useful. Sounds to me like we're better off, at least
for now, to just expressly forbid async lookup of /proc/self/. Which
isn't really the end of the world as far as I'm concerned.

--
Jens Axboe

2020-11-02 21:41:22

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH -next] fs: Fix memory leaks in do_renameat2() error paths

On 11/2/20 1:31 PM, Jens Axboe wrote:
> On 11/2/20 1:12 PM, Eric W. Biederman wrote:
>> Jens Axboe <[email protected]> writes:
>>
>>> On 11/2/20 12:27 PM, Eric W. Biederman wrote:
>>>> Jens Axboe <[email protected]> writes:
>>>>
>>>>> On 10/30/20 4:22 PM, Al Viro wrote:
>>>>>> On Fri, Oct 30, 2020 at 02:33:11PM -0600, Jens Axboe wrote:
>>>>>>> On 10/30/20 12:49 PM, Al Viro wrote:
>>>>>>>> On Fri, Oct 30, 2020 at 12:46:26PM -0600, Jens Axboe wrote:
>>>>>>>>
>>>>>>>>> See other reply, it's being posted soon, just haven't gotten there yet
>>>>>>>>> and it wasn't ready.
>>>>>>>>>
>>>>>>>>> It's a prep patch so we can call do_renameat2 and pass in a filename
>>>>>>>>> instead. The intent is not to have any functional changes in that prep
>>>>>>>>> patch. But once we can pass in filenames instead of user pointers, it's
>>>>>>>>> usable from io_uring.
>>>>>>>>
>>>>>>>> You do realize that pathname resolution is *NOT* offloadable to helper
>>>>>>>> threads, I hope...
>>>>>>>
>>>>>>> How so? If we have all the necessary context assigned, what's preventing
>>>>>>> it from working?
>>>>>>
>>>>>> Semantics of /proc/self/..., for starters (and things like /proc/mounts, etc.
>>>>>> *do* pass through that, /dev/stdin included)
>>>>>
>>>>> Don't we just need ->thread_pid for that to work?
>>>>
>>>> No. You need ->signal.
>>>>
>>>> You need ->signal->pids[PIDTYPE_TGID]. It is only for /proc/thread-self
>>>> that ->thread_pid is needed.
>>>>
>>>> Even more so than ->thread_pid, it is a kernel invariant that ->signal
>>>> does not change.
>>>
>>> I don't care about the pid itself, my suggestion was to assign ->thread_pid
>>> over the lookup operation to ensure that /proc/self/ worked the way that
>>> you'd expect.
>>
>> I understand that.
>>
>> However /proc/self/ refers to the current process not to the current
>> thread. So ->thread_pid is not what you need to assign to make that
>> happen. What the code looks at is: ->signal->pids[PIDTYPE_TGID].
>>
>> It will definitely break invariants to assign to ->signal.
>>
>> Currently only exchange_tids assigns ->thread_pid and it is nasty. It
>> results in code that potentially results in infinite loops in
>> kernel/signal.c
>>
>> To my knowledge nothing assigns ->signal->pids[PIDTYPE_TGID]. At best
>> it might work but I expect the it would completely confuse something in
>> the pid to task or pid to process mappings. Which is to say even if it
>> does work it would be an extremely fragile solution.
>
> Thanks Eric, that's useful. Sounds to me like we're better off, at least
> for now, to just expressly forbid async lookup of /proc/self/. Which
> isn't really the end of the world as far as I'm concerned.

Alternatively, we just teach task_pid_ptr() where to look for an
alternate, if current->flags & PF_IO_WORKER is true. Then we don't have
to assign anything that's visible in task_struct, and in fact the async
worker can retain this stuff on the stack. As all requests are killed
before a task is allowed to exit, that should be safe.


diff --git a/kernel/pid.c b/kernel/pid.c
index 74ddbff1a6ba..5fd421a4864c 100644
--- a/kernel/pid.c
+++ b/kernel/pid.c
@@ -42,6 +42,7 @@
#include <linux/sched/signal.h>
#include <linux/sched/task.h>
#include <linux/idr.h>
+#include <linux/io_uring.h>
#include <net/sock.h>
#include <uapi/linux/pidfd.h>

@@ -320,6 +321,12 @@ EXPORT_SYMBOL_GPL(find_vpid);

static struct pid **task_pid_ptr(struct task_struct *task, enum pid_type type)
{
+ if ((task->flags & PF_IO_WORKER) && task->io_uring) {
+ return (type == PIDTYPE_PID) ?
+ &task->io_uring->thread_pid :
+ &task->io_uring->pids[type];
+ }
+
return (type == PIDTYPE_PID) ?
&task->thread_pid :
&task->signal->pids[type];

--
Jens Axboe

2020-11-03 14:49:12

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH -next] fs: Fix memory leaks in do_renameat2() error paths

Jens Axboe <[email protected]> writes:

> On 11/2/20 1:31 PM, Jens Axboe wrote:
>> On 11/2/20 1:12 PM, Eric W. Biederman wrote:
>>> Jens Axboe <[email protected]> writes:
>>>
>>>> On 11/2/20 12:27 PM, Eric W. Biederman wrote:
>>>>> Jens Axboe <[email protected]> writes:
>>>>>
>>>>>> On 10/30/20 4:22 PM, Al Viro wrote:
>>>>>>> On Fri, Oct 30, 2020 at 02:33:11PM -0600, Jens Axboe wrote:
>>>>>>>> On 10/30/20 12:49 PM, Al Viro wrote:
>>>>>>>>> On Fri, Oct 30, 2020 at 12:46:26PM -0600, Jens Axboe wrote:
>>>>>>>>>
>>>>>>>>>> See other reply, it's being posted soon, just haven't gotten there yet
>>>>>>>>>> and it wasn't ready.
>>>>>>>>>>
>>>>>>>>>> It's a prep patch so we can call do_renameat2 and pass in a filename
>>>>>>>>>> instead. The intent is not to have any functional changes in that prep
>>>>>>>>>> patch. But once we can pass in filenames instead of user pointers, it's
>>>>>>>>>> usable from io_uring.
>>>>>>>>>
>>>>>>>>> You do realize that pathname resolution is *NOT* offloadable to helper
>>>>>>>>> threads, I hope...
>>>>>>>>
>>>>>>>> How so? If we have all the necessary context assigned, what's preventing
>>>>>>>> it from working?
>>>>>>>
>>>>>>> Semantics of /proc/self/..., for starters (and things like /proc/mounts, etc.
>>>>>>> *do* pass through that, /dev/stdin included)
>>>>>>
>>>>>> Don't we just need ->thread_pid for that to work?
>>>>>
>>>>> No. You need ->signal.
>>>>>
>>>>> You need ->signal->pids[PIDTYPE_TGID]. It is only for /proc/thread-self
>>>>> that ->thread_pid is needed.
>>>>>
>>>>> Even more so than ->thread_pid, it is a kernel invariant that ->signal
>>>>> does not change.
>>>>
>>>> I don't care about the pid itself, my suggestion was to assign ->thread_pid
>>>> over the lookup operation to ensure that /proc/self/ worked the way that
>>>> you'd expect.
>>>
>>> I understand that.
>>>
>>> However /proc/self/ refers to the current process not to the current
>>> thread. So ->thread_pid is not what you need to assign to make that
>>> happen. What the code looks at is: ->signal->pids[PIDTYPE_TGID].
>>>
>>> It will definitely break invariants to assign to ->signal.
>>>
>>> Currently only exchange_tids assigns ->thread_pid and it is nasty. It
>>> results in code that potentially results in infinite loops in
>>> kernel/signal.c
>>>
>>> To my knowledge nothing assigns ->signal->pids[PIDTYPE_TGID]. At best
>>> it might work but I expect the it would completely confuse something in
>>> the pid to task or pid to process mappings. Which is to say even if it
>>> does work it would be an extremely fragile solution.
>>
>> Thanks Eric, that's useful. Sounds to me like we're better off, at least
>> for now, to just expressly forbid async lookup of /proc/self/. Which
>> isn't really the end of the world as far as I'm concerned.
>
> Alternatively, we just teach task_pid_ptr() where to look for an
> alternate, if current->flags & PF_IO_WORKER is true. Then we don't have
> to assign anything that's visible in task_struct, and in fact the async
> worker can retain this stuff on the stack. As all requests are killed
> before a task is allowed to exit, that should be safe.

That seems assumes task_pid_ptr is always called on current.

When you are looking at the task through the proc filesystem you want
things like /proc/<pid>/stat and /proc/<pid>/status to be able to
display the pids without problem. More than that it is desirable that
readdir does not get the view for the PF_IO_WORKER.

> diff --git a/kernel/pid.c b/kernel/pid.c
> index 74ddbff1a6ba..5fd421a4864c 100644
> --- a/kernel/pid.c
> +++ b/kernel/pid.c
> @@ -42,6 +42,7 @@
> #include <linux/sched/signal.h>
> #include <linux/sched/task.h>
> #include <linux/idr.h>
> +#include <linux/io_uring.h>
> #include <net/sock.h>
> #include <uapi/linux/pidfd.h>
>
> @@ -320,6 +321,12 @@ EXPORT_SYMBOL_GPL(find_vpid);
>
> static struct pid **task_pid_ptr(struct task_struct *task, enum pid_type type)
> {
> + if ((task->flags & PF_IO_WORKER) && task->io_uring) {
> + return (type == PIDTYPE_PID) ?
> + &task->io_uring->thread_pid :
> + &task->io_uring->pids[type];
> + }
> +
> return (type == PIDTYPE_PID) ?
> &task->thread_pid :
> &task->signal->pids[type];

The only thing I can think of that might work convincingly is to split
get_current() into two functions get_context() and get_task(). Maybe
accessed as current_context and current_task.

With get_context() returning just a pointer to the fields that are safe
to use in io_uring, and get_task returning the other fields.

With exit and exec invaliding the pending work on the contexts it should
be safe to just return a pointer to the context that invoked io_uring.
Data in the context would either need to be read-only or be modified
and read in a multi-thread safe way.

The rest of the data in the task_struct by default could be assume it is
only modified by the task.

That would give type-safety and something avoids playing whack-a-mole
with every new piece of context that userspace accesses.

Eric