2021-03-25 16:46:39

by Jens Axboe

[permalink] [raw]
Subject: [PATCH 0/2] Don't show PF_IO_WORKER in /proc/<pid>/task/

Hi,

Stefan reports that attaching to a task with io_uring will leave gdb
very confused and just repeatedly attempting to attach to the IO threads,
even though it receives an -EPERM every time. This patchset proposes to
skip PF_IO_WORKER threads as same_thread_group(), except for accounting
purposes which we still desire.

We also skip listing the IO threads in /proc/<pid>/task/ so that gdb
doesn't think it should stop and attach to them. This makes us consistent
with earlier kernels, where these async threads were not related to the
ring owning task, and hence gdb (and others) ignored them anyway.

Seems to me that this is the right approach, but open to comments on if
others agree with this. Oleg, I did see your messages as well on SIGSTOP,
and as was discussed with Eric as well, this is something we most
certainly can revisit. I do think that the visibility of these threads
is a separate issue. Even with SIGSTOP implemented (which I did try as
well), we're never going to allow ptrace attach and hence gdb would still
be broken. Hence I'd rather treat them as separate issues to attack.

--
Jens Axboe



2021-03-25 19:38:07

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH 0/2] Don't show PF_IO_WORKER in /proc/<pid>/task/

Jens Axboe <[email protected]> writes:

> Hi,
>
> Stefan reports that attaching to a task with io_uring will leave gdb
> very confused and just repeatedly attempting to attach to the IO threads,
> even though it receives an -EPERM every time. This patchset proposes to
> skip PF_IO_WORKER threads as same_thread_group(), except for accounting
> purposes which we still desire.
>
> We also skip listing the IO threads in /proc/<pid>/task/ so that gdb
> doesn't think it should stop and attach to them. This makes us consistent
> with earlier kernels, where these async threads were not related to the
> ring owning task, and hence gdb (and others) ignored them anyway.
>
> Seems to me that this is the right approach, but open to comments on if
> others agree with this. Oleg, I did see your messages as well on SIGSTOP,
> and as was discussed with Eric as well, this is something we most
> certainly can revisit. I do think that the visibility of these threads
> is a separate issue. Even with SIGSTOP implemented (which I did try as
> well), we're never going to allow ptrace attach and hence gdb would still
> be broken. Hence I'd rather treat them as separate issues to attack.

A quick skim shows that these threads are not showing up anywhere in
proc which appears to be a problem, as it hides them from top.

Sysadmins need the ability to dig into a system and find out where all
their cpu usage or io's have gone when there is a problem. I general I
think this argues that these threads should show up as threads of the
process so I am not even certain this is the right fix to deal with gdb.

Eric

2021-03-25 19:41:47

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH 0/2] Don't show PF_IO_WORKER in /proc/<pid>/task/

On Thu, Mar 25, 2021 at 12:34 PM Eric W. Biederman
<[email protected]> wrote:
>
> A quick skim shows that these threads are not showing up anywhere in
> proc which appears to be a problem, as it hides them from top.
>
> Sysadmins need the ability to dig into a system and find out where all
> their cpu usage or io's have gone when there is a problem. I general I
> think this argues that these threads should show up as threads of the
> process so I am not even certain this is the right fix to deal with gdb.

Yeah, I do think that hiding them is the wrong model, because it also
hides them from "ps" etc, which is very wrong.

I don't know what the gdb logic is, but maybe there's some other
option that makes gdb not react to them?

Linus

2021-03-25 19:42:59

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH 0/2] Don't show PF_IO_WORKER in /proc/<pid>/task/

On 3/25/21 1:38 PM, Linus Torvalds wrote:
> On Thu, Mar 25, 2021 at 12:34 PM Eric W. Biederman
> <[email protected]> wrote:
>>
>> A quick skim shows that these threads are not showing up anywhere in
>> proc which appears to be a problem, as it hides them from top.
>>
>> Sysadmins need the ability to dig into a system and find out where all
>> their cpu usage or io's have gone when there is a problem. I general I
>> think this argues that these threads should show up as threads of the
>> process so I am not even certain this is the right fix to deal with gdb.
>
> Yeah, I do think that hiding them is the wrong model, because it also
> hides them from "ps" etc, which is very wrong.

Totally agree.

> I don't know what the gdb logic is, but maybe there's some other
> option that makes gdb not react to them?

Guess it's time to dig out the gdb source... I'll take a look.

--
Jens Axboe

2021-03-25 19:44:13

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH 0/2] Don't show PF_IO_WORKER in /proc/<pid>/task/

On 3/25/21 1:33 PM, Eric W. Biederman wrote:
> Jens Axboe <[email protected]> writes:
>
>> Hi,
>>
>> Stefan reports that attaching to a task with io_uring will leave gdb
>> very confused and just repeatedly attempting to attach to the IO threads,
>> even though it receives an -EPERM every time. This patchset proposes to
>> skip PF_IO_WORKER threads as same_thread_group(), except for accounting
>> purposes which we still desire.
>>
>> We also skip listing the IO threads in /proc/<pid>/task/ so that gdb
>> doesn't think it should stop and attach to them. This makes us consistent
>> with earlier kernels, where these async threads were not related to the
>> ring owning task, and hence gdb (and others) ignored them anyway.
>>
>> Seems to me that this is the right approach, but open to comments on if
>> others agree with this. Oleg, I did see your messages as well on SIGSTOP,
>> and as was discussed with Eric as well, this is something we most
>> certainly can revisit. I do think that the visibility of these threads
>> is a separate issue. Even with SIGSTOP implemented (which I did try as
>> well), we're never going to allow ptrace attach and hence gdb would still
>> be broken. Hence I'd rather treat them as separate issues to attack.
>
> A quick skim shows that these threads are not showing up anywhere in
> proc which appears to be a problem, as it hides them from top.
>
> Sysadmins need the ability to dig into a system and find out where all
> their cpu usage or io's have gone when there is a problem. I general I
> think this argues that these threads should show up as threads of the
> process so I am not even certain this is the right fix to deal with gdb.

That's a good point, overall hiding was not really what I desired, just
getting them out of gdb's hands. And arguably it _is_ a gdb bug, but...

--
Jens Axboe

2021-03-25 19:45:11

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH 0/2] Don't show PF_IO_WORKER in /proc/<pid>/task/

On Thu, Mar 25, 2021 at 12:38 PM Linus Torvalds
<[email protected]> wrote:
>
> I don't know what the gdb logic is, but maybe there's some other
> option that makes gdb not react to them?

.. maybe we could have a different name for them under the task/
subdirectory, for example (not just the pid)? Although that probably
messes up 'ps' too..

Linus

2021-03-25 19:48:35

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH 0/2] Don't show PF_IO_WORKER in /proc/<pid>/task/

On 3/25/21 1:42 PM, Linus Torvalds wrote:
> On Thu, Mar 25, 2021 at 12:38 PM Linus Torvalds
> <[email protected]> wrote:
>>
>> I don't know what the gdb logic is, but maybe there's some other
>> option that makes gdb not react to them?
>
> .. maybe we could have a different name for them under the task/
> subdirectory, for example (not just the pid)? Although that probably
> messes up 'ps' too..

Heh, I can try, but my guess is that it would mess up _something_, if
not ps/top.

--
Jens Axboe

2021-03-25 20:14:46

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH 0/2] Don't show PF_IO_WORKER in /proc/<pid>/task/

On Thu, Mar 25, 2021 at 12:42 PM Linus Torvalds
<[email protected]> wrote:
>
> On Thu, Mar 25, 2021 at 12:38 PM Linus Torvalds
> <[email protected]> wrote:
> >
> > I don't know what the gdb logic is, but maybe there's some other
> > option that makes gdb not react to them?
>
> .. maybe we could have a different name for them under the task/
> subdirectory, for example (not just the pid)? Although that probably
> messes up 'ps' too..

Actually, maybe the right model is to simply make all the io threads
take signals, and get rid of all the special cases.

Sure, the signals will never be delivered to user space, but if we

- just made the thread loop do "get_signal()" when there are pending signals

- allowed ptrace_attach on them

they'd look pretty much like regular threads that just never do the
user-space part of signal handling.

The whole "signals are very special for IO threads" thing has caused
so many problems, that maybe the solution is simply to _not_ make them
special?

Linus

2021-03-25 20:24:32

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH 0/2] Don't show PF_IO_WORKER in /proc/<pid>/task/

Jens Axboe <[email protected]> writes:

> On 3/25/21 1:42 PM, Linus Torvalds wrote:
>> On Thu, Mar 25, 2021 at 12:38 PM Linus Torvalds
>> <[email protected]> wrote:
>>>
>>> I don't know what the gdb logic is, but maybe there's some other
>>> option that makes gdb not react to them?
>>
>> .. maybe we could have a different name for them under the task/
>> subdirectory, for example (not just the pid)? Although that probably
>> messes up 'ps' too..
>
> Heh, I can try, but my guess is that it would mess up _something_, if
> not ps/top.

Hmm.

So looking quickly the flip side of the coin is gdb (and other
debuggers) needs a way to know these threads are special, so it can know
not to attach.

I suspect getting -EPERM (or possibly a different error code) when
attempting attach is the right was to know that a thread is not
available to be debugged.

Eric

2021-03-25 20:34:57

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 0/2] Don't show PF_IO_WORKER in /proc/<pid>/task/

I didn't even try to read this series yet, will try tomorrow.

But sorry, I can't resist...

On 03/25, Jens Axboe wrote:
>
> On 3/25/21 1:33 PM, Eric W. Biederman wrote:
> > Jens Axboe <[email protected]> writes:
> >
> >> Hi,
> >>
> >> Stefan reports that attaching to a task with io_uring will leave gdb
> >> very confused and just repeatedly attempting to attach to the IO threads,
> >> even though it receives an -EPERM every time.

Heh. As expected :/

> And arguably it _is_ a gdb bug, but...

Why do you think so?

Oleg.

2021-03-25 20:41:52

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH 0/2] Don't show PF_IO_WORKER in /proc/<pid>/task/

On 3/25/21 2:12 PM, Linus Torvalds wrote:
> On Thu, Mar 25, 2021 at 12:42 PM Linus Torvalds
> <[email protected]> wrote:
>>
>> On Thu, Mar 25, 2021 at 12:38 PM Linus Torvalds
>> <[email protected]> wrote:
>>>
>>> I don't know what the gdb logic is, but maybe there's some other
>>> option that makes gdb not react to them?
>>
>> .. maybe we could have a different name for them under the task/
>> subdirectory, for example (not just the pid)? Although that probably
>> messes up 'ps' too..
>
> Actually, maybe the right model is to simply make all the io threads
> take signals, and get rid of all the special cases.
>
> Sure, the signals will never be delivered to user space, but if we
>
> - just made the thread loop do "get_signal()" when there are pending signals
>
> - allowed ptrace_attach on them
>
> they'd look pretty much like regular threads that just never do the
> user-space part of signal handling.
>
> The whole "signals are very special for IO threads" thing has caused
> so many problems, that maybe the solution is simply to _not_ make them
> special?

Just to wrap up the previous one, yes it broke all sorts of things to
make the 'tid' directory different. They just end up being hidden anyway
through that, for both ps and top.

Yes, I do think that maybe it's better to just embrace maybe just
embrace the signals, and have everything just work by default. It's
better than continually trying to make the threads special. I'll see
if there are some demons lurking down that path.

--
Jens Axboe

2021-03-25 20:44:06

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 0/2] Don't show PF_IO_WORKER in /proc/<pid>/task/

On 03/25, Eric W. Biederman wrote:
>
> So looking quickly the flip side of the coin is gdb (and other
> debuggers) needs a way to know these threads are special, so it can know
> not to attach.

may be,

> I suspect getting -EPERM (or possibly a different error code) when
> attempting attach is the right was to know that a thread is not
> available to be debugged.

may be.

But I don't think we can blame gdb. The kernel changed the rules, and this
broke gdb. IOW, I don't agree this is gdb bug.

Oleg.

2021-03-25 20:46:11

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH 0/2] Don't show PF_IO_WORKER in /proc/<pid>/task/

On 3/25/21 2:21 PM, Eric W. Biederman wrote:
> Jens Axboe <[email protected]> writes:
>
>> On 3/25/21 1:42 PM, Linus Torvalds wrote:
>>> On Thu, Mar 25, 2021 at 12:38 PM Linus Torvalds
>>> <[email protected]> wrote:
>>>>
>>>> I don't know what the gdb logic is, but maybe there's some other
>>>> option that makes gdb not react to them?
>>>
>>> .. maybe we could have a different name for them under the task/
>>> subdirectory, for example (not just the pid)? Although that probably
>>> messes up 'ps' too..
>>
>> Heh, I can try, but my guess is that it would mess up _something_, if
>> not ps/top.
>
> Hmm.
>
> So looking quickly the flip side of the coin is gdb (and other
> debuggers) needs a way to know these threads are special, so it can know
> not to attach.
>
> I suspect getting -EPERM (or possibly a different error code) when
> attempting attach is the right was to know that a thread is not
> available to be debugged.

But that's what's being returned right now, and gdb seemingly doesn't
really handle that. And even if it was just a gdb issue that could be
fixed it (it definitely is), I'd still greatly prefer not having to do
that. It just takes too long for packages to get updated in distros, and
it'd be years until it got fixed widely. Secondly, I'm even more worried
about cases that we haven't seen yet. I doubt that gdb is the only thing
that'd fall over, not expecting threads in there that it cannot attach
to.

--
Jens Axboe

2021-03-25 20:46:12

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH 0/2] Don't show PF_IO_WORKER in /proc/<pid>/task/

Linus Torvalds <[email protected]> writes:

> On Thu, Mar 25, 2021 at 12:42 PM Linus Torvalds
> <[email protected]> wrote:
>>
>> On Thu, Mar 25, 2021 at 12:38 PM Linus Torvalds
>> <[email protected]> wrote:
>> >
>> > I don't know what the gdb logic is, but maybe there's some other
>> > option that makes gdb not react to them?
>>
>> .. maybe we could have a different name for them under the task/
>> subdirectory, for example (not just the pid)? Although that probably
>> messes up 'ps' too..
>
> Actually, maybe the right model is to simply make all the io threads
> take signals, and get rid of all the special cases.
>
> Sure, the signals will never be delivered to user space, but if we
>
> - just made the thread loop do "get_signal()" when there are pending signals
>
> - allowed ptrace_attach on them
>
> they'd look pretty much like regular threads that just never do the
> user-space part of signal handling.
>
> The whole "signals are very special for IO threads" thing has caused
> so many problems, that maybe the solution is simply to _not_ make them
> special?

The special case in check_kill_permission is certainly unnecessary.
Having the signal blocked is enough to prevent signal_pending() from
being true.


The most straight forward thing I can see is to allow ptrace_attach and
to modify ptrace_check_attach to always return -ESRCH for io workers
unless ignore_state is set causing none of the other ptrace operations
to work.

That is what a long running in-kernel thread would do today so
user-space aka gdb may actually cope with it.


We might be able to support if io workers start supporting SIGSTOP but I
am not at all certain.

Eric

2021-03-25 20:46:37

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 0/2] Don't show PF_IO_WORKER in /proc/<pid>/task/

On 03/25, Linus Torvalds wrote:
>
> The whole "signals are very special for IO threads" thing has caused
> so many problems, that maybe the solution is simply to _not_ make them
> special?

Or may be IO threads should not abuse CLONE_THREAD?

Why does create_io_thread() abuse CLONE_THREAD ?

One reason (I think) is that this implies SIGKILL when the process exits/execs,
anything else?

Oleg.

2021-03-25 20:47:48

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH 0/2] Don't show PF_IO_WORKER in /proc/<pid>/task/

On 3/25/21 2:40 PM, Oleg Nesterov wrote:
> On 03/25, Eric W. Biederman wrote:
>>
>> So looking quickly the flip side of the coin is gdb (and other
>> debuggers) needs a way to know these threads are special, so it can know
>> not to attach.
>
> may be,
>
>> I suspect getting -EPERM (or possibly a different error code) when
>> attempting attach is the right was to know that a thread is not
>> available to be debugged.
>
> may be.
>
> But I don't think we can blame gdb. The kernel changed the rules, and this
> broke gdb. IOW, I don't agree this is gdb bug.

Right, that's what I was getting at too - and it's likely not just gdb.
We have to ensure that we don't break this use case, which seems to
imply that we:

1) Just make it work, or
2) Make them hidden in such a way that gdb doesn't see them, but
regular tooling does

#2 seems fraught with peril, and maybe not even possible.

--
Jens Axboe

2021-03-25 20:52:12

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH 0/2] Don't show PF_IO_WORKER in /proc/<pid>/task/

Oleg Nesterov <[email protected]> writes:

> On 03/25, Eric W. Biederman wrote:
>>
>> So looking quickly the flip side of the coin is gdb (and other
>> debuggers) needs a way to know these threads are special, so it can know
>> not to attach.
>
> may be,
>
>> I suspect getting -EPERM (or possibly a different error code) when
>> attempting attach is the right was to know that a thread is not
>> available to be debugged.
>
> may be.
>
> But I don't think we can blame gdb. The kernel changed the rules, and this
> broke gdb. IOW, I don't agree this is gdb bug.

My point would be it is not strictly a regression either. It is gdb not
handling new functionality.

If we can be backwards compatible and make ptrace_attach work that is
preferable. If we can't saying the handful of ptrace using applications
need an upgrade to support processes that use io_uring may be
acceptable.

I don't see any easy to implement path that is guaranteed to work.

Eric



2021-03-25 20:58:14

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH 0/2] Don't show PF_IO_WORKER in /proc/<pid>/task/

Oleg Nesterov <[email protected]> writes:

> On 03/25, Linus Torvalds wrote:
>>
>> The whole "signals are very special for IO threads" thing has caused
>> so many problems, that maybe the solution is simply to _not_ make them
>> special?
>
> Or may be IO threads should not abuse CLONE_THREAD?
>
> Why does create_io_thread() abuse CLONE_THREAD ?
>
> One reason (I think) is that this implies SIGKILL when the process exits/execs,
> anything else?

A lot.

The io workers perform work on behave of the ordinary userspace threads.
Some of that work is opening files. For things like rlimits to work
properly you need to share the signal_struct. But odds are if you find
anything in signal_struct (not counting signals) there will be an
io_uring code path that can exercise it as io_uring can traverse the
filesystem, open files and read/write files. So io_uring can exercise
all of proc.

Using create_io_thread with CLONE_THREAD is the least problematic way
(including all of the signal and ptrace problems we are looking at right
now) to implement the io worker threads.

They _really_ are threads of the process that just never execute any
code in userspace.

Eric

2021-03-25 21:22:29

by Stefan Metzmacher

[permalink] [raw]
Subject: Re: [PATCH 0/2] Don't show PF_IO_WORKER in /proc/<pid>/task/


Am 25.03.21 um 21:55 schrieb Eric W. Biederman:
> Oleg Nesterov <[email protected]> writes:
>
>> On 03/25, Linus Torvalds wrote:
>>>
>>> The whole "signals are very special for IO threads" thing has caused
>>> so many problems, that maybe the solution is simply to _not_ make them
>>> special?
>>
>> Or may be IO threads should not abuse CLONE_THREAD?
>>
>> Why does create_io_thread() abuse CLONE_THREAD ?
>>
>> One reason (I think) is that this implies SIGKILL when the process exits/execs,
>> anything else?
>
> A lot.
>
> The io workers perform work on behave of the ordinary userspace threads.
> Some of that work is opening files. For things like rlimits to work
> properly you need to share the signal_struct. But odds are if you find
> anything in signal_struct (not counting signals) there will be an
> io_uring code path that can exercise it as io_uring can traverse the
> filesystem, open files and read/write files. So io_uring can exercise
> all of proc.
>
> Using create_io_thread with CLONE_THREAD is the least problematic way
> (including all of the signal and ptrace problems we are looking at right
> now) to implement the io worker threads.
>
> They _really_ are threads of the process that just never execute any
> code in userspace.

So they should look like a userspace thread sitting in something like
epoll_pwait() with all signals blocked, which will never return to userspace again?

I think that would be useful, but I also think that userspace should see:
- /proc/$tidofiothread/cmdline as empty (in order to let ps and top use [iou-wrk-$tidofuserspacethread])
- /proc/$tidofiothread/exe as symlink to that not exists
- all of /proc/$tidofiothread/ shows root.root as owner and group
and things which still allow write access to /proc/$tidofiothread/comm similar things
with rw permissions should still disallow modifications:

For the other kernel threads e.g. "[cryptd]" I see the following:

LANG=C ls -l /proc/653 | grep rw
ls: cannot read symbolic link '/proc/653/exe': No such file or directory
-rw-r--r-- 1 root root 0 Mar 25 22:09 autogroup
-rw-r--r-- 1 root root 0 Mar 25 22:09 comm
-rw-r--r-- 1 root root 0 Mar 25 22:09 coredump_filter
lrwxrwxrwx 1 root root 0 Mar 25 22:09 cwd -> /
lrwxrwxrwx 1 root root 0 Mar 25 22:09 exe
-rw-r--r-- 1 root root 0 Mar 25 22:09 gid_map
-rw-r--r-- 1 root root 0 Mar 25 22:09 loginuid
-rw------- 1 root root 0 Mar 25 22:09 mem
-rw-r--r-- 1 root root 0 Mar 25 22:09 oom_adj
-rw-r--r-- 1 root root 0 Mar 25 22:09 oom_score_adj
-rw-r--r-- 1 root root 0 Mar 25 22:09 projid_map
lrwxrwxrwx 1 root root 0 Mar 25 22:09 root -> /
-rw-r--r-- 1 root root 0 Mar 25 22:09 sched
-rw-r--r-- 1 root root 0 Mar 25 22:09 setgroups
-rw-r--r-- 1 root root 0 Mar 25 22:09 timens_offsets
-rw-rw-rw- 1 root root 0 Mar 25 22:09 timerslack_ns
-rw-r--r-- 1 root root 0 Mar 25 22:09 uid_map

And this:

LANG=C echo "bla" > /proc/653/comm
-bash: echo: write error: Invalid argument

LANG=C echo "bla" > /proc/653/gid_map
-bash: echo: write error: Operation not permitted

Can't we do the same for iothreads regarding /proc?
Just make things read only there and empty "cmdline"/"exe"?

Maybe I'm too naive, but that what I'd assume as a userspace developer/admin.

Does at least parts of it make any sense?

metze

2021-03-25 21:48:22

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH 0/2] Don't show PF_IO_WORKER in /proc/<pid>/task/

On 3/25/21 2:40 PM, Jens Axboe wrote:
> On 3/25/21 2:12 PM, Linus Torvalds wrote:
>> On Thu, Mar 25, 2021 at 12:42 PM Linus Torvalds
>> <[email protected]> wrote:
>>>
>>> On Thu, Mar 25, 2021 at 12:38 PM Linus Torvalds
>>> <[email protected]> wrote:
>>>>
>>>> I don't know what the gdb logic is, but maybe there's some other
>>>> option that makes gdb not react to them?
>>>
>>> .. maybe we could have a different name for them under the task/
>>> subdirectory, for example (not just the pid)? Although that probably
>>> messes up 'ps' too..
>>
>> Actually, maybe the right model is to simply make all the io threads
>> take signals, and get rid of all the special cases.
>>
>> Sure, the signals will never be delivered to user space, but if we
>>
>> - just made the thread loop do "get_signal()" when there are pending signals
>>
>> - allowed ptrace_attach on them
>>
>> they'd look pretty much like regular threads that just never do the
>> user-space part of signal handling.
>>
>> The whole "signals are very special for IO threads" thing has caused
>> so many problems, that maybe the solution is simply to _not_ make them
>> special?
>
> Just to wrap up the previous one, yes it broke all sorts of things to
> make the 'tid' directory different. They just end up being hidden anyway
> through that, for both ps and top.
>
> Yes, I do think that maybe it's better to just embrace maybe just
> embrace the signals, and have everything just work by default. It's
> better than continually trying to make the threads special. I'll see
> if there are some demons lurking down that path.

In the spirit of "let's just try it", I ran with the below patch. With
that, I can gdb attach just fine to a test case that creates an io_uring
and a regular thread with pthread_create(). The regular thread uses
the ring, so you end up with two iou-mgr threads. Attach:

[root@archlinux ~]# gdb -p 360
[snip gdb noise]
Attaching to process 360
[New LWP 361]
[New LWP 362]
[New LWP 363]

warning: Selected architecture i386:x86-64 is not compatible with reported target architecture i386

warning: Architecture rejected target-supplied description
Error while reading shared library symbols for /usr/lib/libpthread.so.0:
Cannot find user-level thread for LWP 363: generic error
0x00007f7aa526e125 in clock_nanosleep@GLIBC_2.2.5 () from /usr/lib/libc.so.6
(gdb) info threads
Id Target Id Frame
* 1 LWP 360 "io_uring" 0x00007f7aa526e125 in clock_nanosleep@GLIBC_2.2.5 ()
from /usr/lib/libc.so.6
2 LWP 361 "iou-mgr-360" 0x0000000000000000 in ?? ()
3 LWP 362 "io_uring" 0x00007f7aa52a0a9d in syscall () from /usr/lib/libc.so.6
4 LWP 363 "iou-mgr-362" 0x0000000000000000 in ?? ()
(gdb) thread 2
[Switching to thread 2 (LWP 361)]
#0 0x0000000000000000 in ?? ()
(gdb) bt
#0 0x0000000000000000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x0
(gdb) cont
Continuing.
^C
Thread 1 "io_uring" received signal SIGINT, Interrupt.
[Switching to LWP 360]
0x00007f7aa526e125 in clock_nanosleep@GLIBC_2.2.5 () from /usr/lib/libc.so.6
(gdb) q
A debugging session is active.

Inferior 1 [process 360] will be detached.

Quit anyway? (y or n) y
Detaching from program: /root/git/fio/t/io_uring, process 360
[Inferior 1 (process 360) detached]

The iou-mgr-x threads are stopped just fine, gdb obviously can't get any
real info out of them. But it works... Regular test cases work fine too,
just a sanity check. Didn't expect them not to.

Only thing that I dislike a bit, but I guess that's just a Linuxism, is
that if can now kill an io_uring owning task by sending a signal to one
of its IO thread workers.


diff --git a/fs/io-wq.c b/fs/io-wq.c
index b7c1fa932cb3..2dbdc552f3ba 100644
--- a/fs/io-wq.c
+++ b/fs/io-wq.c
@@ -505,8 +505,14 @@ static int io_wqe_worker(void *data)
ret = schedule_timeout(WORKER_IDLE_TIMEOUT);
if (try_to_freeze() || ret)
continue;
- if (fatal_signal_pending(current))
- break;
+ if (signal_pending(current)) {
+ struct ksignal ksig;
+
+ if (fatal_signal_pending(current))
+ break;
+ get_signal(&ksig);
+ continue;
+ }
/* timed out, exit unless we're the fixed worker */
if (test_bit(IO_WQ_BIT_EXIT, &wq->state) ||
!(worker->flags & IO_WORKER_F_FIXED))
@@ -715,8 +721,15 @@ static int io_wq_manager(void *data)
io_wq_check_workers(wq);
schedule_timeout(HZ);
try_to_freeze();
- if (fatal_signal_pending(current))
- set_bit(IO_WQ_BIT_EXIT, &wq->state);
+ if (signal_pending(current)) {
+ struct ksignal ksig;
+
+ if (fatal_signal_pending(current))
+ set_bit(IO_WQ_BIT_EXIT, &wq->state);
+ else
+ get_signal(&ksig);
+ continue;
+ }
} while (!test_bit(IO_WQ_BIT_EXIT, &wq->state));

io_wq_check_workers(wq);
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 54ea561db4a5..3a9d021db328 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -6765,8 +6765,14 @@ static int io_sq_thread(void *data)
timeout = jiffies + sqd->sq_thread_idle;
continue;
}
- if (fatal_signal_pending(current))
- break;
+ if (signal_pending(current)) {
+ struct ksignal ksig;
+
+ if (fatal_signal_pending(current))
+ break;
+ get_signal(&ksig);
+ continue;
+ }
sqt_spin = false;
cap_entries = !list_is_singular(&sqd->ctx_list);
list_for_each_entry(ctx, &sqd->ctx_list, sqd_list) {
diff --git a/kernel/fork.c b/kernel/fork.c
index d3171e8e88e5..3b45d0f04044 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2436,6 +2436,7 @@ struct task_struct *create_io_thread(int (*fn)(void *), void *arg, int node)
if (!IS_ERR(tsk)) {
sigfillset(&tsk->blocked);
sigdelsetmask(&tsk->blocked, sigmask(SIGKILL));
+ sigdelsetmask(&tsk->blocked, sigmask(SIGSTOP));
}
return tsk;
}
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 821cf1723814..61db50f7ca86 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -375,7 +375,7 @@ static int ptrace_attach(struct task_struct *task, long request,
audit_ptrace(task);

retval = -EPERM;
- if (unlikely(task->flags & (PF_KTHREAD | PF_IO_WORKER)))
+ if (unlikely(task->flags & PF_KTHREAD))
goto out;
if (same_thread_group(task, current))
goto out;
diff --git a/kernel/signal.c b/kernel/signal.c
index f2a1b898da29..a5700557eb50 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -91,7 +91,7 @@ static bool sig_task_ignored(struct task_struct *t, int sig, bool force)
return true;

/* Only allow kernel generated signals to this kthread */
- if (unlikely((t->flags & (PF_KTHREAD | PF_IO_WORKER)) &&
+ if (unlikely((t->flags & PF_KTHREAD) &&
(handler == SIG_KTHREAD_KERNEL) && !force))
return true;

@@ -288,8 +288,7 @@ bool task_set_jobctl_pending(struct task_struct *task, unsigned long mask)
JOBCTL_STOP_SIGMASK | JOBCTL_TRAPPING));
BUG_ON((mask & JOBCTL_TRAPPING) && !(mask & JOBCTL_PENDING_MASK));

- if (unlikely(fatal_signal_pending(task) ||
- (task->flags & (PF_EXITING | PF_IO_WORKER))))
+ if (unlikely(fatal_signal_pending(task) || task->flags & PF_EXITING))
return false;

if (mask & JOBCTL_STOP_SIGMASK)
@@ -834,9 +833,6 @@ static int check_kill_permission(int sig, struct kernel_siginfo *info,

if (!valid_signal(sig))
return -EINVAL;
- /* PF_IO_WORKER threads don't take any signals */
- if (t->flags & PF_IO_WORKER)
- return -ESRCH;

if (!si_fromuser(info))
return 0;

--
Jens Axboe

2021-03-25 21:50:27

by Stefan Metzmacher

[permalink] [raw]
Subject: Re: [PATCH 0/2] Don't show PF_IO_WORKER in /proc/<pid>/task/


Am 25.03.21 um 22:20 schrieb Stefan Metzmacher:
>
> Am 25.03.21 um 21:55 schrieb Eric W. Biederman:
>> Oleg Nesterov <[email protected]> writes:
>>
>>> On 03/25, Linus Torvalds wrote:
>>>>
>>>> The whole "signals are very special for IO threads" thing has caused
>>>> so many problems, that maybe the solution is simply to _not_ make them
>>>> special?
>>>
>>> Or may be IO threads should not abuse CLONE_THREAD?
>>>
>>> Why does create_io_thread() abuse CLONE_THREAD ?
>>>
>>> One reason (I think) is that this implies SIGKILL when the process exits/execs,
>>> anything else?
>>
>> A lot.
>>
>> The io workers perform work on behave of the ordinary userspace threads.
>> Some of that work is opening files. For things like rlimits to work
>> properly you need to share the signal_struct. But odds are if you find
>> anything in signal_struct (not counting signals) there will be an
>> io_uring code path that can exercise it as io_uring can traverse the
>> filesystem, open files and read/write files. So io_uring can exercise
>> all of proc.
>>
>> Using create_io_thread with CLONE_THREAD is the least problematic way
>> (including all of the signal and ptrace problems we are looking at right
>> now) to implement the io worker threads.
>>
>> They _really_ are threads of the process that just never execute any
>> code in userspace.
>
> So they should look like a userspace thread sitting in something like
> epoll_pwait() with all signals blocked, which will never return to userspace again?

Would gdb work with that?
The question is what backtrace gdb would show for that thread.

Is it possible to block SIGSTOP/SIGCONT?

I also think that all signals to an iothread should not be delivered to
other threads and it may only react on a direct SIGSTOP/SIGCONT.
I guess even SIGKILL should be ignored as the shutdown should happen
via the exit path of the iothread parent only.

> I think that would be useful, but I also think that userspace should see:
> - /proc/$tidofiothread/cmdline as empty (in order to let ps and top use [iou-wrk-$tidofuserspacethread])
> - /proc/$tidofiothread/exe as symlink to that not exists
> - all of /proc/$tidofiothread/ shows root.root as owner and group
> and things which still allow write access to /proc/$tidofiothread/comm similar things
> with rw permissions should still disallow modifications:
>
> For the other kernel threads e.g. "[cryptd]" I see the following:
>
> LANG=C ls -l /proc/653 | grep rw
> ls: cannot read symbolic link '/proc/653/exe': No such file or directory
> -rw-r--r-- 1 root root 0 Mar 25 22:09 autogroup
> -rw-r--r-- 1 root root 0 Mar 25 22:09 comm
> -rw-r--r-- 1 root root 0 Mar 25 22:09 coredump_filter
> lrwxrwxrwx 1 root root 0 Mar 25 22:09 cwd -> /
> lrwxrwxrwx 1 root root 0 Mar 25 22:09 exe
> -rw-r--r-- 1 root root 0 Mar 25 22:09 gid_map
> -rw-r--r-- 1 root root 0 Mar 25 22:09 loginuid
> -rw------- 1 root root 0 Mar 25 22:09 mem
> -rw-r--r-- 1 root root 0 Mar 25 22:09 oom_adj
> -rw-r--r-- 1 root root 0 Mar 25 22:09 oom_score_adj
> -rw-r--r-- 1 root root 0 Mar 25 22:09 projid_map
> lrwxrwxrwx 1 root root 0 Mar 25 22:09 root -> /
> -rw-r--r-- 1 root root 0 Mar 25 22:09 sched
> -rw-r--r-- 1 root root 0 Mar 25 22:09 setgroups
> -rw-r--r-- 1 root root 0 Mar 25 22:09 timens_offsets
> -rw-rw-rw- 1 root root 0 Mar 25 22:09 timerslack_ns
> -rw-r--r-- 1 root root 0 Mar 25 22:09 uid_map
>
> And this:
>
> LANG=C echo "bla" > /proc/653/comm
> -bash: echo: write error: Invalid argument
>
> LANG=C echo "bla" > /proc/653/gid_map
> -bash: echo: write error: Operation not permitted
>
> Can't we do the same for iothreads regarding /proc?
> Just make things read only there and empty "cmdline"/"exe"?
>
> Maybe I'm too naive, but that what I'd assume as a userspace developer/admin.
>
> Does at least parts of it make any sense?

I think the strange glibc setuid() behavior should also be tests here,
I guess we don't want that to reset the credentials of an iothread!

Another idea would be to have the iothreads as a child process with it's threads,
but again I'm only looking as an admin to what I'd except to see under /proc
via ps and top.

metze

2021-03-25 21:52:17

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH 0/2] Don't show PF_IO_WORKER in /proc/<pid>/task/

On 3/25/21 2:43 PM, Eric W. Biederman wrote:
> Linus Torvalds <[email protected]> writes:
>
>> On Thu, Mar 25, 2021 at 12:42 PM Linus Torvalds
>> <[email protected]> wrote:
>>>
>>> On Thu, Mar 25, 2021 at 12:38 PM Linus Torvalds
>>> <[email protected]> wrote:
>>>>
>>>> I don't know what the gdb logic is, but maybe there's some other
>>>> option that makes gdb not react to them?
>>>
>>> .. maybe we could have a different name for them under the task/
>>> subdirectory, for example (not just the pid)? Although that probably
>>> messes up 'ps' too..
>>
>> Actually, maybe the right model is to simply make all the io threads
>> take signals, and get rid of all the special cases.
>>
>> Sure, the signals will never be delivered to user space, but if we
>>
>> - just made the thread loop do "get_signal()" when there are pending signals
>>
>> - allowed ptrace_attach on them
>>
>> they'd look pretty much like regular threads that just never do the
>> user-space part of signal handling.
>>
>> The whole "signals are very special for IO threads" thing has caused
>> so many problems, that maybe the solution is simply to _not_ make them
>> special?
>
> The special case in check_kill_permission is certainly unnecessary.
> Having the signal blocked is enough to prevent signal_pending() from
> being true.
>
>
> The most straight forward thing I can see is to allow ptrace_attach and
> to modify ptrace_check_attach to always return -ESRCH for io workers
> unless ignore_state is set causing none of the other ptrace operations
> to work.
>
> That is what a long running in-kernel thread would do today so
> user-space aka gdb may actually cope with it.
>
>
> We might be able to support if io workers start supporting SIGSTOP but I
> am not at all certain.

See patch just send out as a POC, mostly, not fully sanitized yet. But
I did try to return -ESRCH from ptrace_check_attach() if it's an IO
thread and ignore_state isn't set:

if (!ignore_state && child->flags & PF_IO_WORKER)
return -ESRCH;

and that causes gdb to abort at that thread. For the same test case
as in the previous email, you get:

Attaching to process 358
[New LWP 359]
[New LWP 360]
[New LWP 361]
Couldn't get CS register: No such process.
(gdb) 0x00007ffa58537125 in ?? ()

(gdb) bt
#0 0x00007ffa58537125 in ?? ()
#1 0x0000000000000000 in ?? ()
(gdb) info threads
Id Target Id Frame
* 1 LWP 358 "io_uring" 0x00007ffa58537125 in ?? ()
2 LWP 359 "iou-mgr-358" Couldn't get registers: No such process.
(gdb) q
A debugging session is active.

Inferior 1 [process 358] will be detached.

Quit anyway? (y or n) y
Couldn't write debug register: No such process.

where 360 here is a regular pthread created thread, and 361 is another
iou-mgr-x task. While gdb behaves better in this case, it does still
prevent you from inspecting thread 3 which would be totally valid.

--
Jens Axboe

2021-03-25 21:59:17

by Stefan Metzmacher

[permalink] [raw]
Subject: Re: [PATCH 0/2] Don't show PF_IO_WORKER in /proc/<pid>/task/


Am 25.03.21 um 22:44 schrieb Jens Axboe:
> On 3/25/21 2:40 PM, Jens Axboe wrote:
>> On 3/25/21 2:12 PM, Linus Torvalds wrote:
>>> On Thu, Mar 25, 2021 at 12:42 PM Linus Torvalds
>>> <[email protected]> wrote:
>>>>
>>>> On Thu, Mar 25, 2021 at 12:38 PM Linus Torvalds
>>>> <[email protected]> wrote:
>>>>>
>>>>> I don't know what the gdb logic is, but maybe there's some other
>>>>> option that makes gdb not react to them?
>>>>
>>>> .. maybe we could have a different name for them under the task/
>>>> subdirectory, for example (not just the pid)? Although that probably
>>>> messes up 'ps' too..
>>>
>>> Actually, maybe the right model is to simply make all the io threads
>>> take signals, and get rid of all the special cases.
>>>
>>> Sure, the signals will never be delivered to user space, but if we
>>>
>>> - just made the thread loop do "get_signal()" when there are pending signals
>>>
>>> - allowed ptrace_attach on them
>>>
>>> they'd look pretty much like regular threads that just never do the
>>> user-space part of signal handling.
>>>
>>> The whole "signals are very special for IO threads" thing has caused
>>> so many problems, that maybe the solution is simply to _not_ make them
>>> special?
>>
>> Just to wrap up the previous one, yes it broke all sorts of things to
>> make the 'tid' directory different. They just end up being hidden anyway
>> through that, for both ps and top.
>>
>> Yes, I do think that maybe it's better to just embrace maybe just
>> embrace the signals, and have everything just work by default. It's
>> better than continually trying to make the threads special. I'll see
>> if there are some demons lurking down that path.
>
> In the spirit of "let's just try it", I ran with the below patch. With
> that, I can gdb attach just fine to a test case that creates an io_uring
> and a regular thread with pthread_create(). The regular thread uses
> the ring, so you end up with two iou-mgr threads. Attach:
>
> [root@archlinux ~]# gdb -p 360
> [snip gdb noise]
> Attaching to process 360
> [New LWP 361]
> [New LWP 362]
> [New LWP 363]
>
> warning: Selected architecture i386:x86-64 is not compatible with reported target architecture i386
>
> warning: Architecture rejected target-supplied description
> Error while reading shared library symbols for /usr/lib/libpthread.so.0:
> Cannot find user-level thread for LWP 363: generic error
> 0x00007f7aa526e125 in clock_nanosleep@GLIBC_2.2.5 () from /usr/lib/libc.so.6
> (gdb) info threads
> Id Target Id Frame
> * 1 LWP 360 "io_uring" 0x00007f7aa526e125 in clock_nanosleep@GLIBC_2.2.5 ()
> from /usr/lib/libc.so.6
> 2 LWP 361 "iou-mgr-360" 0x0000000000000000 in ?? ()
> 3 LWP 362 "io_uring" 0x00007f7aa52a0a9d in syscall () from /usr/lib/libc.so.6
> 4 LWP 363 "iou-mgr-362" 0x0000000000000000 in ?? ()
> (gdb) thread 2
> [Switching to thread 2 (LWP 361)]
> #0 0x0000000000000000 in ?? ()
> (gdb) bt
> #0 0x0000000000000000 in ?? ()
> Backtrace stopped: Cannot access memory at address 0x0
> (gdb) cont
> Continuing.
> ^C
> Thread 1 "io_uring" received signal SIGINT, Interrupt.
> [Switching to LWP 360]
> 0x00007f7aa526e125 in clock_nanosleep@GLIBC_2.2.5 () from /usr/lib/libc.so.6
> (gdb) q
> A debugging session is active.
>
> Inferior 1 [process 360] will be detached.
>
> Quit anyway? (y or n) y
> Detaching from program: /root/git/fio/t/io_uring, process 360
> [Inferior 1 (process 360) detached]
>
> The iou-mgr-x threads are stopped just fine, gdb obviously can't get any
> real info out of them. But it works... Regular test cases work fine too,
> just a sanity check. Didn't expect them not to.

I guess that's basically what I tried to describe when I said they should
look like a userspace process that is blocked in a syscall forever.

> Only thing that I dislike a bit, but I guess that's just a Linuxism, is
> that if can now kill an io_uring owning task by sending a signal to one
> of its IO thread workers.

Can't we just only allow SIGSTOP, which will be only delivered to
the iothread itself? And also SIGKILL should not be allowed from userspace.

And /proc/$iothread/ should be read only and owned by root with
"cmdline" and "exe" being empty.

Thanks!
metze

2021-03-25 22:39:43

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH 0/2] Don't show PF_IO_WORKER in /proc/<pid>/task/

On Thu, Mar 25, 2021 at 2:44 PM Jens Axboe <[email protected]> wrote:
>
> In the spirit of "let's just try it", I ran with the below patch. With
> that, I can gdb attach just fine to a test case that creates an io_uring
> and a regular thread with pthread_create(). The regular thread uses
> the ring, so you end up with two iou-mgr threads. Attach:
>
> [root@archlinux ~]# gdb -p 360
> [snip gdb noise]
> Attaching to process 360
> [New LWP 361]
> [New LWP 362]
> [New LWP 363]
[..]

Looks fairly sane to me.

I think this ends up being the right approach - just the final part
(famous last words) of "io_uring threads act like normal threads".

Doing it for VM and FS got rid of all the special cases there, and now
doing it for signal handling gets rid of all these ptrace etc issues.

And the fact that a noticeable part of the patch was removing the
PF_IO_WORKER tests again looks like a very good sign to me.

In fact, I think you could now remove all the freezer hacks too -
because get_signal() will now do the proper try_to_freeze(), so all
those freezer things are stale as well.

Yeah, it's still going to be different in that there's no real user
space return, and so it will never look _entirely_ like a normal
thread, but on the whole I really like how this does seem to get rid
of another batch of special cases.

Linus

2021-03-26 00:10:06

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH 0/2] Don't show PF_IO_WORKER in /proc/<pid>/task/

On 3/25/21 4:37 PM, Linus Torvalds wrote:
> On Thu, Mar 25, 2021 at 2:44 PM Jens Axboe <[email protected]> wrote:
>>
>> In the spirit of "let's just try it", I ran with the below patch. With
>> that, I can gdb attach just fine to a test case that creates an io_uring
>> and a regular thread with pthread_create(). The regular thread uses
>> the ring, so you end up with two iou-mgr threads. Attach:
>>
>> [root@archlinux ~]# gdb -p 360
>> [snip gdb noise]
>> Attaching to process 360
>> [New LWP 361]
>> [New LWP 362]
>> [New LWP 363]
> [..]
>
> Looks fairly sane to me.
>
> I think this ends up being the right approach - just the final part
> (famous last words) of "io_uring threads act like normal threads".
>
> Doing it for VM and FS got rid of all the special cases there, and now
> doing it for signal handling gets rid of all these ptrace etc issues.
>
> And the fact that a noticeable part of the patch was removing the
> PF_IO_WORKER tests again looks like a very good sign to me.

I agree, and in fact there are more PF_IO_WORKER checks that can go
too. The patch is just the bare minimum.

> In fact, I think you could now remove all the freezer hacks too -
> because get_signal() will now do the proper try_to_freeze(), so all
> those freezer things are stale as well.

Yep

> Yeah, it's still going to be different in that there's no real user
> space return, and so it will never look _entirely_ like a normal
> thread, but on the whole I really like how this does seem to get rid
> of another batch of special cases.

That's what makes me feel better too. I think was so hung up on the
"never take signals" that it just didn't occur to me to go this
route instead.

I'll send out a clean series.

--
Jens Axboe

2021-03-26 00:13:19

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH 0/2] Don't show PF_IO_WORKER in /proc/<pid>/task/

On 3/25/21 3:57 PM, Stefan Metzmacher wrote:
>
> Am 25.03.21 um 22:44 schrieb Jens Axboe:
>> On 3/25/21 2:40 PM, Jens Axboe wrote:
>>> On 3/25/21 2:12 PM, Linus Torvalds wrote:
>>>> On Thu, Mar 25, 2021 at 12:42 PM Linus Torvalds
>>>> <[email protected]> wrote:
>>>>>
>>>>> On Thu, Mar 25, 2021 at 12:38 PM Linus Torvalds
>>>>> <[email protected]> wrote:
>>>>>>
>>>>>> I don't know what the gdb logic is, but maybe there's some other
>>>>>> option that makes gdb not react to them?
>>>>>
>>>>> .. maybe we could have a different name for them under the task/
>>>>> subdirectory, for example (not just the pid)? Although that probably
>>>>> messes up 'ps' too..
>>>>
>>>> Actually, maybe the right model is to simply make all the io threads
>>>> take signals, and get rid of all the special cases.
>>>>
>>>> Sure, the signals will never be delivered to user space, but if we
>>>>
>>>> - just made the thread loop do "get_signal()" when there are pending signals
>>>>
>>>> - allowed ptrace_attach on them
>>>>
>>>> they'd look pretty much like regular threads that just never do the
>>>> user-space part of signal handling.
>>>>
>>>> The whole "signals are very special for IO threads" thing has caused
>>>> so many problems, that maybe the solution is simply to _not_ make them
>>>> special?
>>>
>>> Just to wrap up the previous one, yes it broke all sorts of things to
>>> make the 'tid' directory different. They just end up being hidden anyway
>>> through that, for both ps and top.
>>>
>>> Yes, I do think that maybe it's better to just embrace maybe just
>>> embrace the signals, and have everything just work by default. It's
>>> better than continually trying to make the threads special. I'll see
>>> if there are some demons lurking down that path.
>>
>> In the spirit of "let's just try it", I ran with the below patch. With
>> that, I can gdb attach just fine to a test case that creates an io_uring
>> and a regular thread with pthread_create(). The regular thread uses
>> the ring, so you end up with two iou-mgr threads. Attach:
>>
>> [root@archlinux ~]# gdb -p 360
>> [snip gdb noise]
>> Attaching to process 360
>> [New LWP 361]
>> [New LWP 362]
>> [New LWP 363]
>>
>> warning: Selected architecture i386:x86-64 is not compatible with reported target architecture i386
>>
>> warning: Architecture rejected target-supplied description
>> Error while reading shared library symbols for /usr/lib/libpthread.so.0:
>> Cannot find user-level thread for LWP 363: generic error
>> 0x00007f7aa526e125 in clock_nanosleep@GLIBC_2.2.5 () from /usr/lib/libc.so.6
>> (gdb) info threads
>> Id Target Id Frame
>> * 1 LWP 360 "io_uring" 0x00007f7aa526e125 in clock_nanosleep@GLIBC_2.2.5 ()
>> from /usr/lib/libc.so.6
>> 2 LWP 361 "iou-mgr-360" 0x0000000000000000 in ?? ()
>> 3 LWP 362 "io_uring" 0x00007f7aa52a0a9d in syscall () from /usr/lib/libc.so.6
>> 4 LWP 363 "iou-mgr-362" 0x0000000000000000 in ?? ()
>> (gdb) thread 2
>> [Switching to thread 2 (LWP 361)]
>> #0 0x0000000000000000 in ?? ()
>> (gdb) bt
>> #0 0x0000000000000000 in ?? ()
>> Backtrace stopped: Cannot access memory at address 0x0
>> (gdb) cont
>> Continuing.
>> ^C
>> Thread 1 "io_uring" received signal SIGINT, Interrupt.
>> [Switching to LWP 360]
>> 0x00007f7aa526e125 in clock_nanosleep@GLIBC_2.2.5 () from /usr/lib/libc.so.6
>> (gdb) q
>> A debugging session is active.
>>
>> Inferior 1 [process 360] will be detached.
>>
>> Quit anyway? (y or n) y
>> Detaching from program: /root/git/fio/t/io_uring, process 360
>> [Inferior 1 (process 360) detached]
>>
>> The iou-mgr-x threads are stopped just fine, gdb obviously can't get any
>> real info out of them. But it works... Regular test cases work fine too,
>> just a sanity check. Didn't expect them not to.
>
> I guess that's basically what I tried to describe when I said they
> should look like a userspace process that is blocked in a syscall
> forever.

Right, that's almost what they look like, in practice that is what they
look like.

>> Only thing that I dislike a bit, but I guess that's just a Linuxism, is
>> that if can now kill an io_uring owning task by sending a signal to one
>> of its IO thread workers.
>
> Can't we just only allow SIGSTOP, which will be only delivered to
> the iothread itself? And also SIGKILL should not be allowed from userspace.

I don't think we can sanely block them, and we to cleanup and teardown
normally regardless of who gets the signal (owner or one of the
threads). So I'm not _too_ hung up on the "io thread gets signal goes to
owner" as that is what happens with normal threads too, though I would
prefer if that wasn't the case. But overall I feel better just embracing
the thread model, rather than having something that kinda sorta looks
like a thread, but differs in odd ways.

> And /proc/$iothread/ should be read only and owned by root with
> "cmdline" and "exe" being empty.

I know you brought this one up as part of your series, not sure I get
why you want it owned by root and read-only? cmdline and exe, yeah those
could be hidden, but is there really any point?

Maybe I'm missing something here, if so, do clue me in!

--
Jens Axboe

2021-03-26 12:01:08

by Stefan Metzmacher

[permalink] [raw]
Subject: Re: [PATCH 0/2] Don't show PF_IO_WORKER in /proc/<pid>/task/


Hi Jens,

>> And /proc/$iothread/ should be read only and owned by root with
>> "cmdline" and "exe" being empty.
>
> I know you brought this one up as part of your series, not sure I get
> why you want it owned by root and read-only? cmdline and exe, yeah those
> could be hidden, but is there really any point?
>
> Maybe I'm missing something here, if so, do clue me in!

I looked through /proc and I think it's mostly similar to
the unshare() case, if userspace wants to do stupid things
like changing "comm" of iothreads, it gets what was asked for.

But the "cmdline" hiding would be very useful.

While most tools use "comm", by default.

ps -eLf or 'iotop' use "cmdline".

Some processes use setproctitle to change "cmdline" in order
to identify the process better, without the 15 chars comm restriction,
that's why I very often press 'c' in 'top' to see the cmdline,
in that case it would be very helpful to see '[iou-wrk-1234]'
instead of the seeing the cmdline.

So I'd very much prefer if this could be applied:
https://lore.kernel.org/io-uring/d4487f959c778d0b1d4c5738b75bcff17d21df5b.1616197787.git.metze@samba.org/T/#u

If you want I can add a comment and a more verbose commit message...

metze

2021-04-01 18:34:19

by Stefan Metzmacher

[permalink] [raw]
Subject: Re: [PATCH 0/2] Don't show PF_IO_WORKER in /proc/<pid>/task/

Hi Jens,

>> I know you brought this one up as part of your series, not sure I get
>> why you want it owned by root and read-only? cmdline and exe, yeah those
>> could be hidden, but is there really any point?
>>
>> Maybe I'm missing something here, if so, do clue me in!
>
> I looked through /proc and I think it's mostly similar to
> the unshare() case, if userspace wants to do stupid things
> like changing "comm" of iothreads, it gets what was asked for.
>
> But the "cmdline" hiding would be very useful.
>
> While most tools use "comm", by default.
>
> ps -eLf or 'iotop' use "cmdline".
>
> Some processes use setproctitle to change "cmdline" in order
> to identify the process better, without the 15 chars comm restriction,
> that's why I very often press 'c' in 'top' to see the cmdline,
> in that case it would be very helpful to see '[iou-wrk-1234]'
> instead of the seeing the cmdline.
>
> So I'd very much prefer if this could be applied:
> https://lore.kernel.org/io-uring/d4487f959c778d0b1d4c5738b75bcff17d21df5b.1616197787.git.metze@samba.org/T/#u
>
> If you want I can add a comment and a more verbose commit message...

I noticed that 'iotop' actually appends ' [iou-wrk-1234]' to the cmdline value,
so that leaves us with 'ps -eLf' and 'top' (with 'c').

pstree -a -t -p is also fine:
│ └─io_uring-cp,1315 /root/kernel/linux-image-5.12.0-rc2+-dbg_5.12.0-rc2+-5_amd64.deb file
│ ├─{iou-mgr-1315},1316
│ ├─{iou-wrk-1315},1317
│ ├─{iou-wrk-1315},1318
│ ├─{iou-wrk-1315},1319
│ ├─{iou-wrk-1315},1320


In the spirit of "avoid special PF_IO_WORKER checks" I guess it's ok
to leave of as is...

metze