Hi,
As discussed in a previous thread today, the seemingly much saner approach
is just to allow signals (including SIGSTOP) for the PF_IO_WORKER IO
threads. If we just have the threads call get_signal() for
signal_pending(), then everything just falls out naturally with how
we receive and handle signals.
Patch 1 adds support for checking and calling get_signal() from the
regular IO workers, the manager, and the SQPOLL thread. Patch 2 unblocks
SIGSTOP from the default IO thread blocked mask, and the rest just revert
special cases that were put in place for PF_IO_WORKER threads.
With this done, only two special cases remain for PF_IO_WORKER, and they
aren't related to signals so not part of this patchset. But both of them
can go away as well now that we have "real" threads as IO workers, and
then we'll have zero special cases for PF_IO_WORKER.
This passes the usual regression testing, my other usual 24h run has been
kicked off. But I wanted to send this out early.
Thanks to Linus for the suggestion. As with most other good ideas, it's
obvious once you hear it. The fact that we end up with _zero_ special
cases with this is a clear sign that this is the right way to do it
indeed. The fact that this series is 2/3rds revert further drives that
point home. Also thanks to Eric for diligent review on the signal side
of things for the past changes (and hopefully ditto on this series :-))
--
Jens Axboe
We go through various hoops to disallow signals for the IO threads, but
there's really no reason why we cannot just allow them. The IO threads
never return to userspace like a normal thread, and hence don't go through
normal signal processing. Instead, just check for a pending signal as part
of the work loop, and call get_signal() to handle it for us if anything
is pending.
With that, we can support receiving signals, including special ones like
SIGSTOP.
Signed-off-by: Jens Axboe <[email protected]>
---
fs/io-wq.c | 21 +++++++++++++++++----
fs/io_uring.c | 10 ++++++++--
2 files changed, 25 insertions(+), 6 deletions(-)
diff --git a/fs/io-wq.c b/fs/io-wq.c
index b7c1fa932cb3..2dbdc552f3ba 100644
--- a/fs/io-wq.c
+++ b/fs/io-wq.c
@@ -505,8 +505,14 @@ static int io_wqe_worker(void *data)
ret = schedule_timeout(WORKER_IDLE_TIMEOUT);
if (try_to_freeze() || ret)
continue;
- if (fatal_signal_pending(current))
- break;
+ if (signal_pending(current)) {
+ struct ksignal ksig;
+
+ if (fatal_signal_pending(current))
+ break;
+ get_signal(&ksig);
+ continue;
+ }
/* timed out, exit unless we're the fixed worker */
if (test_bit(IO_WQ_BIT_EXIT, &wq->state) ||
!(worker->flags & IO_WORKER_F_FIXED))
@@ -715,8 +721,15 @@ static int io_wq_manager(void *data)
io_wq_check_workers(wq);
schedule_timeout(HZ);
try_to_freeze();
- if (fatal_signal_pending(current))
- set_bit(IO_WQ_BIT_EXIT, &wq->state);
+ if (signal_pending(current)) {
+ struct ksignal ksig;
+
+ if (fatal_signal_pending(current))
+ set_bit(IO_WQ_BIT_EXIT, &wq->state);
+ else
+ get_signal(&ksig);
+ continue;
+ }
} while (!test_bit(IO_WQ_BIT_EXIT, &wq->state));
io_wq_check_workers(wq);
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 54ea561db4a5..3a9d021db328 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -6765,8 +6765,14 @@ static int io_sq_thread(void *data)
timeout = jiffies + sqd->sq_thread_idle;
continue;
}
- if (fatal_signal_pending(current))
- break;
+ if (signal_pending(current)) {
+ struct ksignal ksig;
+
+ if (fatal_signal_pending(current))
+ break;
+ get_signal(&ksig);
+ continue;
+ }
sqt_spin = false;
cap_entries = !list_is_singular(&sqd->ctx_list);
list_for_each_entry(ctx, &sqd->ctx_list, sqd_list) {
--
2.31.0
This reverts commit 5be28c8f85ce99ed2d329d2ad8bdd18ea19473a5.
IO threads now take signals just fine, so there's no reason to limit them
specifically. Revert the change that prevented that from happening.
Signed-off-by: Jens Axboe <[email protected]>
---
kernel/signal.c | 3 ---
1 file changed, 3 deletions(-)
diff --git a/kernel/signal.c b/kernel/signal.c
index f2a1b898da29..cb9acdfb32fa 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -834,9 +834,6 @@ static int check_kill_permission(int sig, struct kernel_siginfo *info,
if (!valid_signal(sig))
return -EINVAL;
- /* PF_IO_WORKER threads don't take any signals */
- if (t->flags & PF_IO_WORKER)
- return -ESRCH;
if (!si_fromuser(info))
return 0;
--
2.31.0
This reverts commit 4db4b1a0d1779dc159f7b87feb97030ec0b12597.
The IO threads allow and handle SIGSTOP now, so don't special case them
anymore in task_set_jobctl_pending().
Signed-off-by: Jens Axboe <[email protected]>
---
kernel/signal.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/kernel/signal.c b/kernel/signal.c
index 8ce96078cb76..5ad8566534e7 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -288,8 +288,7 @@ bool task_set_jobctl_pending(struct task_struct *task, unsigned long mask)
JOBCTL_STOP_SIGMASK | JOBCTL_TRAPPING));
BUG_ON((mask & JOBCTL_TRAPPING) && !(mask & JOBCTL_PENDING_MASK));
- if (unlikely(fatal_signal_pending(task) ||
- (task->flags & (PF_EXITING | PF_IO_WORKER))))
+ if (unlikely(fatal_signal_pending(task) || (task->flags & PF_EXITING)))
return false;
if (mask & JOBCTL_STOP_SIGMASK)
--
2.31.0
This reverts commit 6fb8f43cede0e4bd3ead847de78d531424a96be9.
The IO threads do allow signals now, including SIGSTOP, and we can allow
ptrace attach. Attaching won't reveal anything interesting for the IO
threads, but it will allow eg gdb to attach to a task with io_urings
and IO threads without complaining. And once attached, it will allow
the usual introspection into regular threads.
Signed-off-by: Jens Axboe <[email protected]>
---
kernel/ptrace.c | 2 +-
kernel/signal.c | 4 ++--
2 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 821cf1723814..61db50f7ca86 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -375,7 +375,7 @@ static int ptrace_attach(struct task_struct *task, long request,
audit_ptrace(task);
retval = -EPERM;
- if (unlikely(task->flags & (PF_KTHREAD | PF_IO_WORKER)))
+ if (unlikely(task->flags & PF_KTHREAD))
goto out;
if (same_thread_group(task, current))
goto out;
diff --git a/kernel/signal.c b/kernel/signal.c
index cb9acdfb32fa..8ce96078cb76 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -91,7 +91,7 @@ static bool sig_task_ignored(struct task_struct *t, int sig, bool force)
return true;
/* Only allow kernel generated signals to this kthread */
- if (unlikely((t->flags & (PF_KTHREAD | PF_IO_WORKER)) &&
+ if (unlikely((t->flags & PF_KTHREAD) &&
(handler == SIG_KTHREAD_KERNEL) && !force))
return true;
@@ -1097,7 +1097,7 @@ static int __send_signal(int sig, struct kernel_siginfo *info, struct task_struc
/*
* Skip useless siginfo allocation for SIGKILL and kernel threads.
*/
- if ((sig == SIGKILL) || (t->flags & (PF_KTHREAD | PF_IO_WORKER)))
+ if ((sig == SIGKILL) || (t->flags & PF_KTHREAD))
goto out_set;
/*
--
2.31.0
With IO threads accepting signals, including SIGSTOP, unmask the
SIGSTOP signal from the default blocked mask.
Signed-off-by: Jens Axboe <[email protected]>
---
kernel/fork.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/fork.c b/kernel/fork.c
index d3171e8e88e5..d5a40552910f 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2435,7 +2435,7 @@ struct task_struct *create_io_thread(int (*fn)(void *), void *arg, int node)
tsk = copy_process(NULL, 0, node, &args);
if (!IS_ERR(tsk)) {
sigfillset(&tsk->blocked);
- sigdelsetmask(&tsk->blocked, sigmask(SIGKILL));
+ sigdelsetmask(&tsk->blocked, sigmask(SIGKILL)|sigmask(SIGSTOP));
}
return tsk;
}
--
2.31.0
This reverts commit 15b2219facadec583c24523eed40fa45865f859f.
Before IO threads accepted signals, the freezer using take signals to wake
up an IO thread would cause them to loop without any way to clear the
pending signal. That is no longer the case, so stop special casing
PF_IO_WORKER in the freezer.
Signed-off-by: Jens Axboe <[email protected]>
---
kernel/freezer.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/freezer.c b/kernel/freezer.c
index 1a2d57d1327c..dc520f01f99d 100644
--- a/kernel/freezer.c
+++ b/kernel/freezer.c
@@ -134,7 +134,7 @@ bool freeze_task(struct task_struct *p)
return false;
}
- if (!(p->flags & (PF_KTHREAD | PF_IO_WORKER)))
+ if (!(p->flags & PF_KTHREAD))
fake_signal_wake_up(p);
else
wake_up_state(p, TASK_INTERRUPTIBLE);
--
2.31.0
Am 26.03.21 um 01:39 schrieb Jens Axboe:
> Hi,
>
> As discussed in a previous thread today, the seemingly much saner approach
> is just to allow signals (including SIGSTOP) for the PF_IO_WORKER IO
> threads. If we just have the threads call get_signal() for
> signal_pending(), then everything just falls out naturally with how
> we receive and handle signals.
>
> Patch 1 adds support for checking and calling get_signal() from the
> regular IO workers, the manager, and the SQPOLL thread. Patch 2 unblocks
> SIGSTOP from the default IO thread blocked mask, and the rest just revert
> special cases that were put in place for PF_IO_WORKER threads.
>
> With this done, only two special cases remain for PF_IO_WORKER, and they
> aren't related to signals so not part of this patchset. But both of them
> can go away as well now that we have "real" threads as IO workers, and
> then we'll have zero special cases for PF_IO_WORKER.
>
> This passes the usual regression testing, my other usual 24h run has been
> kicked off. But I wanted to send this out early.
>
> Thanks to Linus for the suggestion. As with most other good ideas, it's
> obvious once you hear it. The fact that we end up with _zero_ special
> cases with this is a clear sign that this is the right way to do it
> indeed. The fact that this series is 2/3rds revert further drives that
> point home. Also thanks to Eric for diligent review on the signal side
> of things for the past changes (and hopefully ditto on this series :-))
Ok, I'm testing a8ff6a3b20bd16d071ef66824ae4428529d114f9 from
your io_uring-5.12 branch.
And using this patch:
diff --git a/examples/io_uring-cp.c b/examples/io_uring-cp.c
index cc7a227a5ec7..6e26a4214015 100644
--- a/examples/io_uring-cp.c
+++ b/examples/io_uring-cp.c
@@ -116,13 +116,16 @@ static void queue_write(struct io_uring *ring, struct io_data *data)
io_uring_submit(ring);
}
-static int copy_file(struct io_uring *ring, off_t insize)
+static int copy_file(struct io_uring *ring, off_t _insize)
{
+ off_t insize = _insize;
unsigned long reads, writes;
struct io_uring_cqe *cqe;
off_t write_left, offset;
int ret;
+again:
+ insize = _insize;
write_left = insize;
writes = reads = offset = 0;
@@ -221,6 +224,12 @@ static int copy_file(struct io_uring *ring, off_t insize)
}
}
+ {
+ struct timespec ts = { .tv_nsec = 999999, };
+ nanosleep(&ts, NULL);
+ goto again;
+ }
+
return 0;
}
Running ./io_uring-cp ~/linux-image-5.12.0-rc2+-dbg_5.12.0-rc2+-5_amd64.deb file
What I see is this:
kill -SIGSTOP to any thread I used a worker with pid 2061 here, results in
root@ub1704-166:~# head /proc/2061/status
Name: iou-wrk-2041
Umask: 0022
State: R (running)
Tgid: 2041
Ngid: 0
Pid: 2061
PPid: 1857
TracerPid: 0
Uid: 0 0 0 0
Gid: 0 0 0 0
root@ub1704-166:~# head /proc/2041/status
Name: io_uring-cp
Umask: 0022
State: T (stopped)
Tgid: 2041
Ngid: 0
Pid: 2041
PPid: 1857
TracerPid: 0
Uid: 0 0 0 0
Gid: 0 0 0 0
root@ub1704-166:~# head /proc/2042/status
Name: iou-mgr-2041
Umask: 0022
State: T (stopped)
Tgid: 2041
Ngid: 0
Pid: 2042
PPid: 1857
TracerPid: 0
Uid: 0 0 0 0
Gid: 0 0 0 0
So userspace and iou-mgr-2041 stop, but the workers don't.
49 workers burn cpu as much as possible.
kill -KILL 2061
results in this:
- all workers are gone
- iou-mgr-2041 is gone
- io_uring-cp waits in status D forever
root@ub1704-166:~# head /proc/2041/status
Name: io_uring-cp
Umask: 0022
State: D (disk sleep)
Tgid: 2041
Ngid: 0
Pid: 2041
PPid: 1857
TracerPid: 0
Uid: 0 0 0 0
Gid: 0 0 0 0
root@ub1704-166:~# cat /proc/2041/stack
[<0>] io_wq_destroy_manager+0x36/0xa0
[<0>] io_wq_put_and_exit+0x2b/0x40
[<0>] io_uring_clean_tctx+0xc5/0x110
[<0>] __io_uring_files_cancel+0x336/0x4e0
[<0>] do_exit+0x16b/0x13b0
[<0>] do_group_exit+0x8b/0x140
[<0>] get_signal+0x219/0xc90
[<0>] arch_do_signal_or_restart+0x1eb/0xeb0
[<0>] exit_to_user_mode_prepare+0x115/0x1a0
[<0>] syscall_exit_to_user_mode+0x27/0x50
[<0>] do_syscall_64+0x45/0x90
[<0>] entry_SYSCALL_64_after_hwframe+0x44/0xae
The 3rd problem is that gdb in a ubuntu 20.04 userspace vm hangs forever:
root@ub1704-166:~/samba.git# LANG=C strace -o /dev/shm/strace.txt -f -ttT gdb --pid 2417
GNU gdb (Ubuntu 9.2-0ubuntu1~20.04) 9.2
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word".
Attaching to process 2417
[New LWP 2418]
[New LWP 2419]
<here it hangs forever>
The related parts of 'pstree -a -t -p':
├─bash,2048
│ └─io_uring-cp,2417 /root/kernel/sn-devel-184-builds/linux-image-5.12.0-rc2+-dbg_5.12.0-rc2+-5_amd64.deb file
│ ├─{iou-mgr-2417},2418
│ └─{iou-wrk-2417},2419
├─bash,2167
│ └─strace,2489 -o /dev/shm/strace.txt -f -ttT gdb --pid 2417
│ └─gdb,2492 --pid 2417
│ └─gdb,2494 --pid 2417
root@ub1704-166:~# cat /proc/sys/kernel/yama/ptrace_scope
0
root@ub1704-166:~# head /proc/2417/status
Name: io_uring-cp
Umask: 0022
State: t (tracing stop)
Tgid: 2417
Ngid: 0
Pid: 2417
PPid: 2048
TracerPid: 2492
Uid: 0 0 0 0
Gid: 0 0 0 0
root@ub1704-166:~# head /proc/2418/status
Name: iou-mgr-2417
Umask: 0022
State: t (tracing stop)
Tgid: 2417
Ngid: 0
Pid: 2418
PPid: 2048
TracerPid: 2492
Uid: 0 0 0 0
Gid: 0 0 0 0
root@ub1704-166:~# head /proc/2419/status
Name: iou-wrk-2417
Umask: 0022
State: R (running)
Tgid: 2417
Ngid: 0
Pid: 2419
PPid: 2048
TracerPid: 2492
Uid: 0 0 0 0
Gid: 0 0 0 0
root@ub1704-166:~# head /proc/2492/status
Name: gdb
Umask: 0022
State: S (sleeping)
Tgid: 2492
Ngid: 0
Pid: 2492
PPid: 2489
TracerPid: 2489
Uid: 0 0 0 0
Gid: 0 0 0 0
root@ub1704-166:~# head /proc/2494/status
Name: gdb
Umask: 0022
State: t (tracing stop)
Tgid: 2494
Ngid: 0
Pid: 2494
PPid: 2492
TracerPid: 2489
Uid: 0 0 0 0
Gid: 0 0 0 0
Maybe these are related and 2494 gets the SIGSTOP that was supposed to
be handled by 2419.
strace.txt is attached.
Just a wild guess (I don't have time to test this), but maybe this
will fix it:
diff --git a/fs/io-wq.c b/fs/io-wq.c
index 07e7d61524c7..ee5a402450db 100644
--- a/fs/io-wq.c
+++ b/fs/io-wq.c
@@ -503,8 +503,7 @@ static int io_wqe_worker(void *data)
if (io_flush_signals())
continue;
ret = schedule_timeout(WORKER_IDLE_TIMEOUT);
- if (try_to_freeze() || ret)
- continue;
+ try_to_freeze();
if (signal_pending(current)) {
struct ksignal ksig;
@@ -514,8 +513,7 @@ static int io_wqe_worker(void *data)
continue;
}
/* timed out, exit unless we're the fixed worker */
- if (test_bit(IO_WQ_BIT_EXIT, &wq->state) ||
- !(worker->flags & IO_WORKER_F_FIXED))
+ if (ret == 0 && !(worker->flags & IO_WORKER_F_FIXED))
break;
}
When the worker got a signal I guess ret is not 0 and we'll
never hit the if (signal_pending()) statement...
metze
On 3/26/21 5:48 AM, Stefan Metzmacher wrote:
>
> Am 26.03.21 um 01:39 schrieb Jens Axboe:
>> Hi,
>>
>> As discussed in a previous thread today, the seemingly much saner approach
>> is just to allow signals (including SIGSTOP) for the PF_IO_WORKER IO
>> threads. If we just have the threads call get_signal() for
>> signal_pending(), then everything just falls out naturally with how
>> we receive and handle signals.
>>
>> Patch 1 adds support for checking and calling get_signal() from the
>> regular IO workers, the manager, and the SQPOLL thread. Patch 2 unblocks
>> SIGSTOP from the default IO thread blocked mask, and the rest just revert
>> special cases that were put in place for PF_IO_WORKER threads.
>>
>> With this done, only two special cases remain for PF_IO_WORKER, and they
>> aren't related to signals so not part of this patchset. But both of them
>> can go away as well now that we have "real" threads as IO workers, and
>> then we'll have zero special cases for PF_IO_WORKER.
>>
>> This passes the usual regression testing, my other usual 24h run has been
>> kicked off. But I wanted to send this out early.
>>
>> Thanks to Linus for the suggestion. As with most other good ideas, it's
>> obvious once you hear it. The fact that we end up with _zero_ special
>> cases with this is a clear sign that this is the right way to do it
>> indeed. The fact that this series is 2/3rds revert further drives that
>> point home. Also thanks to Eric for diligent review on the signal side
>> of things for the past changes (and hopefully ditto on this series :-))
>
> Ok, I'm testing a8ff6a3b20bd16d071ef66824ae4428529d114f9 from
> your io_uring-5.12 branch.
>
> And using this patch:
> diff --git a/examples/io_uring-cp.c b/examples/io_uring-cp.c
> index cc7a227a5ec7..6e26a4214015 100644
> --- a/examples/io_uring-cp.c
> +++ b/examples/io_uring-cp.c
> @@ -116,13 +116,16 @@ static void queue_write(struct io_uring *ring, struct io_data *data)
> io_uring_submit(ring);
> }
>
> -static int copy_file(struct io_uring *ring, off_t insize)
> +static int copy_file(struct io_uring *ring, off_t _insize)
> {
> + off_t insize = _insize;
> unsigned long reads, writes;
> struct io_uring_cqe *cqe;
> off_t write_left, offset;
> int ret;
>
> +again:
> + insize = _insize;
> write_left = insize;
> writes = reads = offset = 0;
>
> @@ -221,6 +224,12 @@ static int copy_file(struct io_uring *ring, off_t insize)
> }
> }
>
> + {
> + struct timespec ts = { .tv_nsec = 999999, };
> + nanosleep(&ts, NULL);
> + goto again;
> + }
> +
> return 0;
> }
>
> Running ./io_uring-cp ~/linux-image-5.12.0-rc2+-dbg_5.12.0-rc2+-5_amd64.deb file
> What I see is this:
>
> kill -SIGSTOP to any thread I used a worker with pid 2061 here, results in
>
> root@ub1704-166:~# head /proc/2061/status
> Name: iou-wrk-2041
> Umask: 0022
> State: R (running)
> Tgid: 2041
> Ngid: 0
> Pid: 2061
> PPid: 1857
> TracerPid: 0
> Uid: 0 0 0 0
> Gid: 0 0 0 0
> root@ub1704-166:~# head /proc/2041/status
> Name: io_uring-cp
> Umask: 0022
> State: T (stopped)
> Tgid: 2041
> Ngid: 0
> Pid: 2041
> PPid: 1857
> TracerPid: 0
> Uid: 0 0 0 0
> Gid: 0 0 0 0
> root@ub1704-166:~# head /proc/2042/status
> Name: iou-mgr-2041
> Umask: 0022
> State: T (stopped)
> Tgid: 2041
> Ngid: 0
> Pid: 2042
> PPid: 1857
> TracerPid: 0
> Uid: 0 0 0 0
> Gid: 0 0 0 0
>
> So userspace and iou-mgr-2041 stop, but the workers don't.
> 49 workers burn cpu as much as possible.
>
> kill -KILL 2061
> results in this:
> - all workers are gone
> - iou-mgr-2041 is gone
> - io_uring-cp waits in status D forever
>
> root@ub1704-166:~# head /proc/2041/status
> Name: io_uring-cp
> Umask: 0022
> State: D (disk sleep)
> Tgid: 2041
> Ngid: 0
> Pid: 2041
> PPid: 1857
> TracerPid: 0
> Uid: 0 0 0 0
> Gid: 0 0 0 0
> root@ub1704-166:~# cat /proc/2041/stack
> [<0>] io_wq_destroy_manager+0x36/0xa0
> [<0>] io_wq_put_and_exit+0x2b/0x40
> [<0>] io_uring_clean_tctx+0xc5/0x110
> [<0>] __io_uring_files_cancel+0x336/0x4e0
> [<0>] do_exit+0x16b/0x13b0
> [<0>] do_group_exit+0x8b/0x140
> [<0>] get_signal+0x219/0xc90
> [<0>] arch_do_signal_or_restart+0x1eb/0xeb0
> [<0>] exit_to_user_mode_prepare+0x115/0x1a0
> [<0>] syscall_exit_to_user_mode+0x27/0x50
> [<0>] do_syscall_64+0x45/0x90
> [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xae
>
> The 3rd problem is that gdb in a ubuntu 20.04 userspace vm hangs forever:
>
> root@ub1704-166:~/samba.git# LANG=C strace -o /dev/shm/strace.txt -f -ttT gdb --pid 2417
> GNU gdb (Ubuntu 9.2-0ubuntu1~20.04) 9.2
> Copyright (C) 2020 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law.
> Type "show copying" and "show warranty" for details.
> This GDB was configured as "x86_64-linux-gnu".
> Type "show configuration" for configuration details.
> For bug reporting instructions, please see:
> <http://www.gnu.org/software/gdb/bugs/>.
> Find the GDB manual and other documentation resources online at:
> <http://www.gnu.org/software/gdb/documentation/>.
>
> For help, type "help".
> Type "apropos word" to search for commands related to "word".
> Attaching to process 2417
> [New LWP 2418]
> [New LWP 2419]
>
> <here it hangs forever>
>
> The related parts of 'pstree -a -t -p':
>
> ├─bash,2048
> │ └─io_uring-cp,2417 /root/kernel/sn-devel-184-builds/linux-image-5.12.0-rc2+-dbg_5.12.0-rc2+-5_amd64.deb file
> │ ├─{iou-mgr-2417},2418
> │ └─{iou-wrk-2417},2419
> ├─bash,2167
> │ └─strace,2489 -o /dev/shm/strace.txt -f -ttT gdb --pid 2417
> │ └─gdb,2492 --pid 2417
> │ └─gdb,2494 --pid 2417
>
> root@ub1704-166:~# cat /proc/sys/kernel/yama/ptrace_scope
> 0
>
> root@ub1704-166:~# head /proc/2417/status
> Name: io_uring-cp
> Umask: 0022
> State: t (tracing stop)
> Tgid: 2417
> Ngid: 0
> Pid: 2417
> PPid: 2048
> TracerPid: 2492
> Uid: 0 0 0 0
> Gid: 0 0 0 0
> root@ub1704-166:~# head /proc/2418/status
> Name: iou-mgr-2417
> Umask: 0022
> State: t (tracing stop)
> Tgid: 2417
> Ngid: 0
> Pid: 2418
> PPid: 2048
> TracerPid: 2492
> Uid: 0 0 0 0
> Gid: 0 0 0 0
> root@ub1704-166:~# head /proc/2419/status
> Name: iou-wrk-2417
> Umask: 0022
> State: R (running)
> Tgid: 2417
> Ngid: 0
> Pid: 2419
> PPid: 2048
> TracerPid: 2492
> Uid: 0 0 0 0
> Gid: 0 0 0 0
> root@ub1704-166:~# head /proc/2492/status
> Name: gdb
> Umask: 0022
> State: S (sleeping)
> Tgid: 2492
> Ngid: 0
> Pid: 2492
> PPid: 2489
> TracerPid: 2489
> Uid: 0 0 0 0
> Gid: 0 0 0 0
> root@ub1704-166:~# head /proc/2494/status
> Name: gdb
> Umask: 0022
> State: t (tracing stop)
> Tgid: 2494
> Ngid: 0
> Pid: 2494
> PPid: 2492
> TracerPid: 2489
> Uid: 0 0 0 0
> Gid: 0 0 0 0
>
>
> Maybe these are related and 2494 gets the SIGSTOP that was supposed to
> be handled by 2419.
>
> strace.txt is attached.
>
> Just a wild guess (I don't have time to test this), but maybe this
> will fix it:
>
> diff --git a/fs/io-wq.c b/fs/io-wq.c
> index 07e7d61524c7..ee5a402450db 100644
> --- a/fs/io-wq.c
> +++ b/fs/io-wq.c
> @@ -503,8 +503,7 @@ static int io_wqe_worker(void *data)
> if (io_flush_signals())
> continue;
> ret = schedule_timeout(WORKER_IDLE_TIMEOUT);
> - if (try_to_freeze() || ret)
> - continue;
> + try_to_freeze();
> if (signal_pending(current)) {
> struct ksignal ksig;
>
> @@ -514,8 +513,7 @@ static int io_wqe_worker(void *data)
> continue;
> }
> /* timed out, exit unless we're the fixed worker */
> - if (test_bit(IO_WQ_BIT_EXIT, &wq->state) ||
> - !(worker->flags & IO_WORKER_F_FIXED))
> + if (ret == 0 && !(worker->flags & IO_WORKER_F_FIXED))
> break;
> }
>
> When the worker got a signal I guess ret is not 0 and we'll
> never hit the if (signal_pending()) statement...
Right, the logic was a bit wrong there, and we can also just drop
try_to_freeze() from all of them now, we don't have to special
case that anymore.
Can you try the current branch? I folded in fixes for that.
That will definitely fix case 1+3, the #2 with kill -KILL is kind
of puzzling. I'll try and reproduce that with the current tree and see
what happens. But that feels like it's either not a new thing, or it's
the same core issue as 1+3 (though I don't quite see how, unless the
failure to catch the signal will elude get_signal() forever in the
worker, I guess that's possible).
--
Jens Axboe
Am 26.03.21 um 13:56 schrieb Jens Axboe:
> On 3/26/21 5:48 AM, Stefan Metzmacher wrote:
>>
>> Am 26.03.21 um 01:39 schrieb Jens Axboe:
>>> Hi,
>>>
>>> As discussed in a previous thread today, the seemingly much saner approach
>>> is just to allow signals (including SIGSTOP) for the PF_IO_WORKER IO
>>> threads. If we just have the threads call get_signal() for
>>> signal_pending(), then everything just falls out naturally with how
>>> we receive and handle signals.
>>>
>>> Patch 1 adds support for checking and calling get_signal() from the
>>> regular IO workers, the manager, and the SQPOLL thread. Patch 2 unblocks
>>> SIGSTOP from the default IO thread blocked mask, and the rest just revert
>>> special cases that were put in place for PF_IO_WORKER threads.
>>>
>>> With this done, only two special cases remain for PF_IO_WORKER, and they
>>> aren't related to signals so not part of this patchset. But both of them
>>> can go away as well now that we have "real" threads as IO workers, and
>>> then we'll have zero special cases for PF_IO_WORKER.
>>>
>>> This passes the usual regression testing, my other usual 24h run has been
>>> kicked off. But I wanted to send this out early.
>>>
>>> Thanks to Linus for the suggestion. As with most other good ideas, it's
>>> obvious once you hear it. The fact that we end up with _zero_ special
>>> cases with this is a clear sign that this is the right way to do it
>>> indeed. The fact that this series is 2/3rds revert further drives that
>>> point home. Also thanks to Eric for diligent review on the signal side
>>> of things for the past changes (and hopefully ditto on this series :-))
>>
>> Ok, I'm testing a8ff6a3b20bd16d071ef66824ae4428529d114f9 from
>> your io_uring-5.12 branch.
>>
>> And using this patch:
>> diff --git a/examples/io_uring-cp.c b/examples/io_uring-cp.c
>> index cc7a227a5ec7..6e26a4214015 100644
>> --- a/examples/io_uring-cp.c
>> +++ b/examples/io_uring-cp.c
>> @@ -116,13 +116,16 @@ static void queue_write(struct io_uring *ring, struct io_data *data)
>> io_uring_submit(ring);
>> }
>>
>> -static int copy_file(struct io_uring *ring, off_t insize)
>> +static int copy_file(struct io_uring *ring, off_t _insize)
>> {
>> + off_t insize = _insize;
>> unsigned long reads, writes;
>> struct io_uring_cqe *cqe;
>> off_t write_left, offset;
>> int ret;
>>
>> +again:
>> + insize = _insize;
>> write_left = insize;
>> writes = reads = offset = 0;
>>
>> @@ -221,6 +224,12 @@ static int copy_file(struct io_uring *ring, off_t insize)
>> }
>> }
>>
>> + {
>> + struct timespec ts = { .tv_nsec = 999999, };
>> + nanosleep(&ts, NULL);
>> + goto again;
>> + }
>> +
>> return 0;
>> }
>>
>> Running ./io_uring-cp ~/linux-image-5.12.0-rc2+-dbg_5.12.0-rc2+-5_amd64.deb file
>> What I see is this:
>>
>> kill -SIGSTOP to any thread I used a worker with pid 2061 here, results in
>>
>> root@ub1704-166:~# head /proc/2061/status
>> Name: iou-wrk-2041
>> Umask: 0022
>> State: R (running)
>> Tgid: 2041
>> Ngid: 0
>> Pid: 2061
>> PPid: 1857
>> TracerPid: 0
>> Uid: 0 0 0 0
>> Gid: 0 0 0 0
>> root@ub1704-166:~# head /proc/2041/status
>> Name: io_uring-cp
>> Umask: 0022
>> State: T (stopped)
>> Tgid: 2041
>> Ngid: 0
>> Pid: 2041
>> PPid: 1857
>> TracerPid: 0
>> Uid: 0 0 0 0
>> Gid: 0 0 0 0
>> root@ub1704-166:~# head /proc/2042/status
>> Name: iou-mgr-2041
>> Umask: 0022
>> State: T (stopped)
>> Tgid: 2041
>> Ngid: 0
>> Pid: 2042
>> PPid: 1857
>> TracerPid: 0
>> Uid: 0 0 0 0
>> Gid: 0 0 0 0
>>
>> So userspace and iou-mgr-2041 stop, but the workers don't.
>> 49 workers burn cpu as much as possible.
>>
>> kill -KILL 2061
>> results in this:
>> - all workers are gone
>> - iou-mgr-2041 is gone
>> - io_uring-cp waits in status D forever
>>
>> root@ub1704-166:~# head /proc/2041/status
>> Name: io_uring-cp
>> Umask: 0022
>> State: D (disk sleep)
>> Tgid: 2041
>> Ngid: 0
>> Pid: 2041
>> PPid: 1857
>> TracerPid: 0
>> Uid: 0 0 0 0
>> Gid: 0 0 0 0
>> root@ub1704-166:~# cat /proc/2041/stack
>> [<0>] io_wq_destroy_manager+0x36/0xa0
>> [<0>] io_wq_put_and_exit+0x2b/0x40
>> [<0>] io_uring_clean_tctx+0xc5/0x110
>> [<0>] __io_uring_files_cancel+0x336/0x4e0
>> [<0>] do_exit+0x16b/0x13b0
>> [<0>] do_group_exit+0x8b/0x140
>> [<0>] get_signal+0x219/0xc90
>> [<0>] arch_do_signal_or_restart+0x1eb/0xeb0
>> [<0>] exit_to_user_mode_prepare+0x115/0x1a0
>> [<0>] syscall_exit_to_user_mode+0x27/0x50
>> [<0>] do_syscall_64+0x45/0x90
>> [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xae
>>
>> The 3rd problem is that gdb in a ubuntu 20.04 userspace vm hangs forever:
>>
>> root@ub1704-166:~/samba.git# LANG=C strace -o /dev/shm/strace.txt -f -ttT gdb --pid 2417
>> GNU gdb (Ubuntu 9.2-0ubuntu1~20.04) 9.2
>> Copyright (C) 2020 Free Software Foundation, Inc.
>> License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
>> This is free software: you are free to change and redistribute it.
>> There is NO WARRANTY, to the extent permitted by law.
>> Type "show copying" and "show warranty" for details.
>> This GDB was configured as "x86_64-linux-gnu".
>> Type "show configuration" for configuration details.
>> For bug reporting instructions, please see:
>> <http://www.gnu.org/software/gdb/bugs/>.
>> Find the GDB manual and other documentation resources online at:
>> <http://www.gnu.org/software/gdb/documentation/>.
>>
>> For help, type "help".
>> Type "apropos word" to search for commands related to "word".
>> Attaching to process 2417
>> [New LWP 2418]
>> [New LWP 2419]
>>
>> <here it hangs forever>
>>
>> The related parts of 'pstree -a -t -p':
>>
>> ├─bash,2048
>> │ └─io_uring-cp,2417 /root/kernel/sn-devel-184-builds/linux-image-5.12.0-rc2+-dbg_5.12.0-rc2+-5_amd64.deb file
>> │ ├─{iou-mgr-2417},2418
>> │ └─{iou-wrk-2417},2419
>> ├─bash,2167
>> │ └─strace,2489 -o /dev/shm/strace.txt -f -ttT gdb --pid 2417
>> │ └─gdb,2492 --pid 2417
>> │ └─gdb,2494 --pid 2417
>>
>> root@ub1704-166:~# cat /proc/sys/kernel/yama/ptrace_scope
>> 0
>>
>> root@ub1704-166:~# head /proc/2417/status
>> Name: io_uring-cp
>> Umask: 0022
>> State: t (tracing stop)
>> Tgid: 2417
>> Ngid: 0
>> Pid: 2417
>> PPid: 2048
>> TracerPid: 2492
>> Uid: 0 0 0 0
>> Gid: 0 0 0 0
>> root@ub1704-166:~# head /proc/2418/status
>> Name: iou-mgr-2417
>> Umask: 0022
>> State: t (tracing stop)
>> Tgid: 2417
>> Ngid: 0
>> Pid: 2418
>> PPid: 2048
>> TracerPid: 2492
>> Uid: 0 0 0 0
>> Gid: 0 0 0 0
>> root@ub1704-166:~# head /proc/2419/status
>> Name: iou-wrk-2417
>> Umask: 0022
>> State: R (running)
>> Tgid: 2417
>> Ngid: 0
>> Pid: 2419
>> PPid: 2048
>> TracerPid: 2492
>> Uid: 0 0 0 0
>> Gid: 0 0 0 0
>> root@ub1704-166:~# head /proc/2492/status
>> Name: gdb
>> Umask: 0022
>> State: S (sleeping)
>> Tgid: 2492
>> Ngid: 0
>> Pid: 2492
>> PPid: 2489
>> TracerPid: 2489
>> Uid: 0 0 0 0
>> Gid: 0 0 0 0
>> root@ub1704-166:~# head /proc/2494/status
>> Name: gdb
>> Umask: 0022
>> State: t (tracing stop)
>> Tgid: 2494
>> Ngid: 0
>> Pid: 2494
>> PPid: 2492
>> TracerPid: 2489
>> Uid: 0 0 0 0
>> Gid: 0 0 0 0
>>
>>
>> Maybe these are related and 2494 gets the SIGSTOP that was supposed to
>> be handled by 2419.
>>
>> strace.txt is attached.
>>
>> Just a wild guess (I don't have time to test this), but maybe this
>> will fix it:
>>
>> diff --git a/fs/io-wq.c b/fs/io-wq.c
>> index 07e7d61524c7..ee5a402450db 100644
>> --- a/fs/io-wq.c
>> +++ b/fs/io-wq.c
>> @@ -503,8 +503,7 @@ static int io_wqe_worker(void *data)
>> if (io_flush_signals())
>> continue;
>> ret = schedule_timeout(WORKER_IDLE_TIMEOUT);
>> - if (try_to_freeze() || ret)
>> - continue;
>> + try_to_freeze();
>> if (signal_pending(current)) {
>> struct ksignal ksig;
>>
>> @@ -514,8 +513,7 @@ static int io_wqe_worker(void *data)
>> continue;
>> }
>> /* timed out, exit unless we're the fixed worker */
>> - if (test_bit(IO_WQ_BIT_EXIT, &wq->state) ||
>> - !(worker->flags & IO_WORKER_F_FIXED))
>> + if (ret == 0 && !(worker->flags & IO_WORKER_F_FIXED))
>> break;
>> }
>>
>> When the worker got a signal I guess ret is not 0 and we'll
>> never hit the if (signal_pending()) statement...
>
> Right, the logic was a bit wrong there, and we can also just drop
> try_to_freeze() from all of them now, we don't have to special
> case that anymore.
I tested my version and it fixes the SIGSTOP problem, I guess your
branch will also fix it.
So the looping workers are gone and doesn't hang anymore.
But gdb still shows very strange things, with a 5.10 kernel I see this:
root@ub1704-166:~# LANG=C gdb --pid 1274
GNU gdb (Ubuntu 9.2-0ubuntu1~20.04) 9.2
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word".
Attaching to process 1274
Reading symbols from /root/liburing/examples/io_uring-cp...
Reading symbols from /lib/x86_64-linux-gnu/libc.so.6...
Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/libc-2.31.so...
Reading symbols from /lib64/ld-linux-x86-64.so.2...
Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/ld-2.31.so...
syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
38 ../sysdeps/unix/sysv/linux/x86_64/syscall.S: No such file or directory.
(gdb)
And now:
root@ub1704-166:~# LANG=C gdb --pid 1320
GNU gdb (Ubuntu 9.2-0ubuntu1~20.04) 9.2
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word".
Attaching to process 1320
[New LWP 1321]
[New LWP 1322]
warning: Selected architecture i386:x86-64 is not compatible with reported target architecture i386
warning: Architecture rejected target-supplied description
syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
38 ../sysdeps/unix/sysv/linux/x86_64/syscall.S: No such file or directory.
(gdb)
pstree -a -t -p looks like this:
│ └─io_uring-cp,1320 /root/kernel/sn-devel-184-builds/linux-image-5.12.0-rc2+-dbg_5.12.0-rc2+-5_amd64.deb file
│ ├─{iou-mgr-1320},1321
│ └─{iou-wrk-1320},1322
I'm wondering why there's the architecture related stuff...
> Can you try the current branch? I folded in fixes for that.
> That will definitely fix case 1+3, the #2 with kill -KILL is kind
> of puzzling. I'll try and reproduce that with the current tree and see
> what happens. But that feels like it's either not a new thing, or it's
> the same core issue as 1+3 (though I don't quite see how, unless the
> failure to catch the signal will elude get_signal() forever in the
> worker, I guess that's possible).
The KILL after STOP deadlock still exists.
Does io_wq_manager() exits without cleaning up on SIGKILL?
metze
Jens, sorry, I got lost :/
On 03/25, Jens Axboe wrote:
>
> With IO threads accepting signals, including SIGSTOP,
where can I find this change? Looks like I wasn't cc'ed...
> unmask the
> SIGSTOP signal from the default blocked mask.
>
> Signed-off-by: Jens Axboe <[email protected]>
> ---
> kernel/fork.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/fork.c b/kernel/fork.c
> index d3171e8e88e5..d5a40552910f 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -2435,7 +2435,7 @@ struct task_struct *create_io_thread(int (*fn)(void *), void *arg, int node)
> tsk = copy_process(NULL, 0, node, &args);
> if (!IS_ERR(tsk)) {
> sigfillset(&tsk->blocked);
> - sigdelsetmask(&tsk->blocked, sigmask(SIGKILL));
> + sigdelsetmask(&tsk->blocked, sigmask(SIGKILL)|sigmask(SIGSTOP));
siginitsetinv(blocked, sigmask(SIGKILL)|sigmask(SIGSTOP)) but this is minor.
To remind, either way this is racy and can't really help.
And if "IO threads accepting signals" then I don't understand why. Sorry,
I must have missed something.
Oleg.
On 3/26/21 7:31 AM, Stefan Metzmacher wrote:
> Am 26.03.21 um 13:56 schrieb Jens Axboe:
>> On 3/26/21 5:48 AM, Stefan Metzmacher wrote:
>>>
>>> Am 26.03.21 um 01:39 schrieb Jens Axboe:
>>>> Hi,
>>>>
>>>> As discussed in a previous thread today, the seemingly much saner approach
>>>> is just to allow signals (including SIGSTOP) for the PF_IO_WORKER IO
>>>> threads. If we just have the threads call get_signal() for
>>>> signal_pending(), then everything just falls out naturally with how
>>>> we receive and handle signals.
>>>>
>>>> Patch 1 adds support for checking and calling get_signal() from the
>>>> regular IO workers, the manager, and the SQPOLL thread. Patch 2 unblocks
>>>> SIGSTOP from the default IO thread blocked mask, and the rest just revert
>>>> special cases that were put in place for PF_IO_WORKER threads.
>>>>
>>>> With this done, only two special cases remain for PF_IO_WORKER, and they
>>>> aren't related to signals so not part of this patchset. But both of them
>>>> can go away as well now that we have "real" threads as IO workers, and
>>>> then we'll have zero special cases for PF_IO_WORKER.
>>>>
>>>> This passes the usual regression testing, my other usual 24h run has been
>>>> kicked off. But I wanted to send this out early.
>>>>
>>>> Thanks to Linus for the suggestion. As with most other good ideas, it's
>>>> obvious once you hear it. The fact that we end up with _zero_ special
>>>> cases with this is a clear sign that this is the right way to do it
>>>> indeed. The fact that this series is 2/3rds revert further drives that
>>>> point home. Also thanks to Eric for diligent review on the signal side
>>>> of things for the past changes (and hopefully ditto on this series :-))
>>>
>>> Ok, I'm testing a8ff6a3b20bd16d071ef66824ae4428529d114f9 from
>>> your io_uring-5.12 branch.
>>>
>>> And using this patch:
>>> diff --git a/examples/io_uring-cp.c b/examples/io_uring-cp.c
>>> index cc7a227a5ec7..6e26a4214015 100644
>>> --- a/examples/io_uring-cp.c
>>> +++ b/examples/io_uring-cp.c
>>> @@ -116,13 +116,16 @@ static void queue_write(struct io_uring *ring, struct io_data *data)
>>> io_uring_submit(ring);
>>> }
>>>
>>> -static int copy_file(struct io_uring *ring, off_t insize)
>>> +static int copy_file(struct io_uring *ring, off_t _insize)
>>> {
>>> + off_t insize = _insize;
>>> unsigned long reads, writes;
>>> struct io_uring_cqe *cqe;
>>> off_t write_left, offset;
>>> int ret;
>>>
>>> +again:
>>> + insize = _insize;
>>> write_left = insize;
>>> writes = reads = offset = 0;
>>>
>>> @@ -221,6 +224,12 @@ static int copy_file(struct io_uring *ring, off_t insize)
>>> }
>>> }
>>>
>>> + {
>>> + struct timespec ts = { .tv_nsec = 999999, };
>>> + nanosleep(&ts, NULL);
>>> + goto again;
>>> + }
>>> +
>>> return 0;
>>> }
>>>
>>> Running ./io_uring-cp ~/linux-image-5.12.0-rc2+-dbg_5.12.0-rc2+-5_amd64.deb file
>>> What I see is this:
>>>
>>> kill -SIGSTOP to any thread I used a worker with pid 2061 here, results in
>>>
>>> root@ub1704-166:~# head /proc/2061/status
>>> Name: iou-wrk-2041
>>> Umask: 0022
>>> State: R (running)
>>> Tgid: 2041
>>> Ngid: 0
>>> Pid: 2061
>>> PPid: 1857
>>> TracerPid: 0
>>> Uid: 0 0 0 0
>>> Gid: 0 0 0 0
>>> root@ub1704-166:~# head /proc/2041/status
>>> Name: io_uring-cp
>>> Umask: 0022
>>> State: T (stopped)
>>> Tgid: 2041
>>> Ngid: 0
>>> Pid: 2041
>>> PPid: 1857
>>> TracerPid: 0
>>> Uid: 0 0 0 0
>>> Gid: 0 0 0 0
>>> root@ub1704-166:~# head /proc/2042/status
>>> Name: iou-mgr-2041
>>> Umask: 0022
>>> State: T (stopped)
>>> Tgid: 2041
>>> Ngid: 0
>>> Pid: 2042
>>> PPid: 1857
>>> TracerPid: 0
>>> Uid: 0 0 0 0
>>> Gid: 0 0 0 0
>>>
>>> So userspace and iou-mgr-2041 stop, but the workers don't.
>>> 49 workers burn cpu as much as possible.
>>>
>>> kill -KILL 2061
>>> results in this:
>>> - all workers are gone
>>> - iou-mgr-2041 is gone
>>> - io_uring-cp waits in status D forever
>>>
>>> root@ub1704-166:~# head /proc/2041/status
>>> Name: io_uring-cp
>>> Umask: 0022
>>> State: D (disk sleep)
>>> Tgid: 2041
>>> Ngid: 0
>>> Pid: 2041
>>> PPid: 1857
>>> TracerPid: 0
>>> Uid: 0 0 0 0
>>> Gid: 0 0 0 0
>>> root@ub1704-166:~# cat /proc/2041/stack
>>> [<0>] io_wq_destroy_manager+0x36/0xa0
>>> [<0>] io_wq_put_and_exit+0x2b/0x40
>>> [<0>] io_uring_clean_tctx+0xc5/0x110
>>> [<0>] __io_uring_files_cancel+0x336/0x4e0
>>> [<0>] do_exit+0x16b/0x13b0
>>> [<0>] do_group_exit+0x8b/0x140
>>> [<0>] get_signal+0x219/0xc90
>>> [<0>] arch_do_signal_or_restart+0x1eb/0xeb0
>>> [<0>] exit_to_user_mode_prepare+0x115/0x1a0
>>> [<0>] syscall_exit_to_user_mode+0x27/0x50
>>> [<0>] do_syscall_64+0x45/0x90
>>> [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xae
>>>
>>> The 3rd problem is that gdb in a ubuntu 20.04 userspace vm hangs forever:
>>>
>>> root@ub1704-166:~/samba.git# LANG=C strace -o /dev/shm/strace.txt -f -ttT gdb --pid 2417
>>> GNU gdb (Ubuntu 9.2-0ubuntu1~20.04) 9.2
>>> Copyright (C) 2020 Free Software Foundation, Inc.
>>> License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
>>> This is free software: you are free to change and redistribute it.
>>> There is NO WARRANTY, to the extent permitted by law.
>>> Type "show copying" and "show warranty" for details.
>>> This GDB was configured as "x86_64-linux-gnu".
>>> Type "show configuration" for configuration details.
>>> For bug reporting instructions, please see:
>>> <http://www.gnu.org/software/gdb/bugs/>.
>>> Find the GDB manual and other documentation resources online at:
>>> <http://www.gnu.org/software/gdb/documentation/>.
>>>
>>> For help, type "help".
>>> Type "apropos word" to search for commands related to "word".
>>> Attaching to process 2417
>>> [New LWP 2418]
>>> [New LWP 2419]
>>>
>>> <here it hangs forever>
>>>
>>> The related parts of 'pstree -a -t -p':
>>>
>>> ├─bash,2048
>>> │ └─io_uring-cp,2417 /root/kernel/sn-devel-184-builds/linux-image-5.12.0-rc2+-dbg_5.12.0-rc2+-5_amd64.deb file
>>> │ ├─{iou-mgr-2417},2418
>>> │ └─{iou-wrk-2417},2419
>>> ├─bash,2167
>>> │ └─strace,2489 -o /dev/shm/strace.txt -f -ttT gdb --pid 2417
>>> │ └─gdb,2492 --pid 2417
>>> │ └─gdb,2494 --pid 2417
>>>
>>> root@ub1704-166:~# cat /proc/sys/kernel/yama/ptrace_scope
>>> 0
>>>
>>> root@ub1704-166:~# head /proc/2417/status
>>> Name: io_uring-cp
>>> Umask: 0022
>>> State: t (tracing stop)
>>> Tgid: 2417
>>> Ngid: 0
>>> Pid: 2417
>>> PPid: 2048
>>> TracerPid: 2492
>>> Uid: 0 0 0 0
>>> Gid: 0 0 0 0
>>> root@ub1704-166:~# head /proc/2418/status
>>> Name: iou-mgr-2417
>>> Umask: 0022
>>> State: t (tracing stop)
>>> Tgid: 2417
>>> Ngid: 0
>>> Pid: 2418
>>> PPid: 2048
>>> TracerPid: 2492
>>> Uid: 0 0 0 0
>>> Gid: 0 0 0 0
>>> root@ub1704-166:~# head /proc/2419/status
>>> Name: iou-wrk-2417
>>> Umask: 0022
>>> State: R (running)
>>> Tgid: 2417
>>> Ngid: 0
>>> Pid: 2419
>>> PPid: 2048
>>> TracerPid: 2492
>>> Uid: 0 0 0 0
>>> Gid: 0 0 0 0
>>> root@ub1704-166:~# head /proc/2492/status
>>> Name: gdb
>>> Umask: 0022
>>> State: S (sleeping)
>>> Tgid: 2492
>>> Ngid: 0
>>> Pid: 2492
>>> PPid: 2489
>>> TracerPid: 2489
>>> Uid: 0 0 0 0
>>> Gid: 0 0 0 0
>>> root@ub1704-166:~# head /proc/2494/status
>>> Name: gdb
>>> Umask: 0022
>>> State: t (tracing stop)
>>> Tgid: 2494
>>> Ngid: 0
>>> Pid: 2494
>>> PPid: 2492
>>> TracerPid: 2489
>>> Uid: 0 0 0 0
>>> Gid: 0 0 0 0
>>>
>>>
>>> Maybe these are related and 2494 gets the SIGSTOP that was supposed to
>>> be handled by 2419.
>>>
>>> strace.txt is attached.
>>>
>>> Just a wild guess (I don't have time to test this), but maybe this
>>> will fix it:
>>>
>>> diff --git a/fs/io-wq.c b/fs/io-wq.c
>>> index 07e7d61524c7..ee5a402450db 100644
>>> --- a/fs/io-wq.c
>>> +++ b/fs/io-wq.c
>>> @@ -503,8 +503,7 @@ static int io_wqe_worker(void *data)
>>> if (io_flush_signals())
>>> continue;
>>> ret = schedule_timeout(WORKER_IDLE_TIMEOUT);
>>> - if (try_to_freeze() || ret)
>>> - continue;
>>> + try_to_freeze();
>>> if (signal_pending(current)) {
>>> struct ksignal ksig;
>>>
>>> @@ -514,8 +513,7 @@ static int io_wqe_worker(void *data)
>>> continue;
>>> }
>>> /* timed out, exit unless we're the fixed worker */
>>> - if (test_bit(IO_WQ_BIT_EXIT, &wq->state) ||
>>> - !(worker->flags & IO_WORKER_F_FIXED))
>>> + if (ret == 0 && !(worker->flags & IO_WORKER_F_FIXED))
>>> break;
>>> }
>>>
>>> When the worker got a signal I guess ret is not 0 and we'll
>>> never hit the if (signal_pending()) statement...
>>
>> Right, the logic was a bit wrong there, and we can also just drop
>> try_to_freeze() from all of them now, we don't have to special
>> case that anymore.
>
> I tested my version and it fixes the SIGSTOP problem, I guess your
> branch will also fix it.
>
> So the looping workers are gone and doesn't hang anymore.
>
> But gdb still shows very strange things, with a 5.10 kernel I see this:
>
> root@ub1704-166:~# LANG=C gdb --pid 1274
> GNU gdb (Ubuntu 9.2-0ubuntu1~20.04) 9.2
> Copyright (C) 2020 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law.
> Type "show copying" and "show warranty" for details.
> This GDB was configured as "x86_64-linux-gnu".
> Type "show configuration" for configuration details.
> For bug reporting instructions, please see:
> <http://www.gnu.org/software/gdb/bugs/>.
> Find the GDB manual and other documentation resources online at:
> <http://www.gnu.org/software/gdb/documentation/>.
>
> For help, type "help".
> Type "apropos word" to search for commands related to "word".
> Attaching to process 1274
> Reading symbols from /root/liburing/examples/io_uring-cp...
> Reading symbols from /lib/x86_64-linux-gnu/libc.so.6...
> Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/libc-2.31.so...
> Reading symbols from /lib64/ld-linux-x86-64.so.2...
> Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/ld-2.31.so...
> syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
> 38 ../sysdeps/unix/sysv/linux/x86_64/syscall.S: No such file or directory.
> (gdb)
>
>
> And now:
>
> root@ub1704-166:~# LANG=C gdb --pid 1320
> GNU gdb (Ubuntu 9.2-0ubuntu1~20.04) 9.2
> Copyright (C) 2020 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law.
> Type "show copying" and "show warranty" for details.
> This GDB was configured as "x86_64-linux-gnu".
> Type "show configuration" for configuration details.
> For bug reporting instructions, please see:
> <http://www.gnu.org/software/gdb/bugs/>.
> Find the GDB manual and other documentation resources online at:
> <http://www.gnu.org/software/gdb/documentation/>.
>
> For help, type "help".
> Type "apropos word" to search for commands related to "word".
> Attaching to process 1320
> [New LWP 1321]
> [New LWP 1322]
>
> warning: Selected architecture i386:x86-64 is not compatible with reported target architecture i386
>
> warning: Architecture rejected target-supplied description
> syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
> 38 ../sysdeps/unix/sysv/linux/x86_64/syscall.S: No such file or directory.
> (gdb)
>
> pstree -a -t -p looks like this:
>
> │ └─io_uring-cp,1320 /root/kernel/sn-devel-184-builds/linux-image-5.12.0-rc2+-dbg_5.12.0-rc2+-5_amd64.deb file
> │ ├─{iou-mgr-1320},1321
> │ └─{iou-wrk-1320},1322
>
> I'm wondering why there's the architecture related stuff...
I'm not sure about that either, I'm guessing it's because it's not
receiving any valid information on those IO threads and the decoding
ends up being a bit nonsensical and the warning too.
>> Can you try the current branch? I folded in fixes for that.
>> That will definitely fix case 1+3, the #2 with kill -KILL is kind
>> of puzzling. I'll try and reproduce that with the current tree and see
>> what happens. But that feels like it's either not a new thing, or it's
>> the same core issue as 1+3 (though I don't quite see how, unless the
>> failure to catch the signal will elude get_signal() forever in the
>> worker, I guess that's possible).
>
> The KILL after STOP deadlock still exists.
In which tree? Sounds like you're still on the old one with that
incremental you sent, which wasn't complete.
> Does io_wq_manager() exits without cleaning up on SIGKILL?
No, it should kill up in all cases. I'll try your stop + kill, I just
tested both of them separately and didn't observe anything. I also ran
your io_uring-cp example (and found a bug in the example, fixed and
pushed), fwiw.
--
Jens Axboe
On 3/26/21 7:54 AM, Jens Axboe wrote:
>> The KILL after STOP deadlock still exists.
>
> In which tree? Sounds like you're still on the old one with that
> incremental you sent, which wasn't complete.
>
>> Does io_wq_manager() exits without cleaning up on SIGKILL?
>
> No, it should kill up in all cases. I'll try your stop + kill, I just
> tested both of them separately and didn't observe anything. I also ran
> your io_uring-cp example (and found a bug in the example, fixed and
> pushed), fwiw.
I can reproduce this one! I'll take a closer look.
--
Jens Axboe
On 3/26/21 7:59 AM, Jens Axboe wrote:
> On 3/26/21 7:54 AM, Jens Axboe wrote:
>>> The KILL after STOP deadlock still exists.
>>
>> In which tree? Sounds like you're still on the old one with that
>> incremental you sent, which wasn't complete.
>>
>>> Does io_wq_manager() exits without cleaning up on SIGKILL?
>>
>> No, it should kill up in all cases. I'll try your stop + kill, I just
>> tested both of them separately and didn't observe anything. I also ran
>> your io_uring-cp example (and found a bug in the example, fixed and
>> pushed), fwiw.
>
> I can reproduce this one! I'll take a closer look.
OK, that one is actually pretty straight forward - we rely on cleaning
up on exit, but for fatal cases, get_signal() will call do_exit() for us
and never return. So we might need a special case in there to deal with
that, or some other way of ensuring that fatal signal gets processed
correctly for IO threads.
--
Jens Axboe
Am 26.03.21 um 15:38 schrieb Jens Axboe:
> On 3/26/21 7:59 AM, Jens Axboe wrote:
>> On 3/26/21 7:54 AM, Jens Axboe wrote:
>>>> The KILL after STOP deadlock still exists.
>>>
>>> In which tree? Sounds like you're still on the old one with that
>>> incremental you sent, which wasn't complete.
>>>
>>>> Does io_wq_manager() exits without cleaning up on SIGKILL?
>>>
>>> No, it should kill up in all cases. I'll try your stop + kill, I just
>>> tested both of them separately and didn't observe anything. I also ran
>>> your io_uring-cp example (and found a bug in the example, fixed and
>>> pushed), fwiw.
>>
>> I can reproduce this one! I'll take a closer look.
>
> OK, that one is actually pretty straight forward - we rely on cleaning
> up on exit, but for fatal cases, get_signal() will call do_exit() for us
> and never return. So we might need a special case in there to deal with
> that, or some other way of ensuring that fatal signal gets processed
> correctly for IO threads.
And if (fatal_signal_pending(current)) doesn't prevent get_signal() from being called?
metze
Am 26.03.21 um 15:43 schrieb Stefan Metzmacher:
> Am 26.03.21 um 15:38 schrieb Jens Axboe:
>> On 3/26/21 7:59 AM, Jens Axboe wrote:
>>> On 3/26/21 7:54 AM, Jens Axboe wrote:
>>>>> The KILL after STOP deadlock still exists.
>>>>
>>>> In which tree? Sounds like you're still on the old one with that
>>>> incremental you sent, which wasn't complete.
>>>>
>>>>> Does io_wq_manager() exits without cleaning up on SIGKILL?
>>>>
>>>> No, it should kill up in all cases. I'll try your stop + kill, I just
>>>> tested both of them separately and didn't observe anything. I also ran
>>>> your io_uring-cp example (and found a bug in the example, fixed and
>>>> pushed), fwiw.
>>>
>>> I can reproduce this one! I'll take a closer look.
>>
>> OK, that one is actually pretty straight forward - we rely on cleaning
>> up on exit, but for fatal cases, get_signal() will call do_exit() for us
>> and never return. So we might need a special case in there to deal with
>> that, or some other way of ensuring that fatal signal gets processed
>> correctly for IO threads.
>
> And if (fatal_signal_pending(current)) doesn't prevent get_signal() from being called?
Ah, we're still in the first get_signal() from SIGSTOP, correct?
metze
On 3/26/21 8:43 AM, Stefan Metzmacher wrote:
> Am 26.03.21 um 15:38 schrieb Jens Axboe:
>> On 3/26/21 7:59 AM, Jens Axboe wrote:
>>> On 3/26/21 7:54 AM, Jens Axboe wrote:
>>>>> The KILL after STOP deadlock still exists.
>>>>
>>>> In which tree? Sounds like you're still on the old one with that
>>>> incremental you sent, which wasn't complete.
>>>>
>>>>> Does io_wq_manager() exits without cleaning up on SIGKILL?
>>>>
>>>> No, it should kill up in all cases. I'll try your stop + kill, I just
>>>> tested both of them separately and didn't observe anything. I also ran
>>>> your io_uring-cp example (and found a bug in the example, fixed and
>>>> pushed), fwiw.
>>>
>>> I can reproduce this one! I'll take a closer look.
>>
>> OK, that one is actually pretty straight forward - we rely on cleaning
>> up on exit, but for fatal cases, get_signal() will call do_exit() for us
>> and never return. So we might need a special case in there to deal with
>> that, or some other way of ensuring that fatal signal gets processed
>> correctly for IO threads.
>
> And if (fatal_signal_pending(current)) doesn't prevent get_signal()
> from being called?
Usually yes, but this case is first doing SIGSTOP, so we're waiting in
get_signal() -> do_signal_stop() when the SIGKILL arrives. Hence there's
no way to catch it in the worker themselves.
--
Jens Axboe
On 3/26/21 8:45 AM, Stefan Metzmacher wrote:
> Am 26.03.21 um 15:43 schrieb Stefan Metzmacher:
>> Am 26.03.21 um 15:38 schrieb Jens Axboe:
>>> On 3/26/21 7:59 AM, Jens Axboe wrote:
>>>> On 3/26/21 7:54 AM, Jens Axboe wrote:
>>>>>> The KILL after STOP deadlock still exists.
>>>>>
>>>>> In which tree? Sounds like you're still on the old one with that
>>>>> incremental you sent, which wasn't complete.
>>>>>
>>>>>> Does io_wq_manager() exits without cleaning up on SIGKILL?
>>>>>
>>>>> No, it should kill up in all cases. I'll try your stop + kill, I just
>>>>> tested both of them separately and didn't observe anything. I also ran
>>>>> your io_uring-cp example (and found a bug in the example, fixed and
>>>>> pushed), fwiw.
>>>>
>>>> I can reproduce this one! I'll take a closer look.
>>>
>>> OK, that one is actually pretty straight forward - we rely on cleaning
>>> up on exit, but for fatal cases, get_signal() will call do_exit() for us
>>> and never return. So we might need a special case in there to deal with
>>> that, or some other way of ensuring that fatal signal gets processed
>>> correctly for IO threads.
>>
>> And if (fatal_signal_pending(current)) doesn't prevent get_signal() from being called?
>
> Ah, we're still in the first get_signal() from SIGSTOP, correct?
Yes exactly, we're waiting in there being stopped. So we either need to
check to something ala:
relock:
+ if (current->flags & PF_IO_WORKER && fatal_signal_pending(current))
+ return false;
to catch it upfront and from the relock case, or add:
fatal:
+ if (current->flags & PF_IO_WORKER)
+ return false;
to catch it in the fatal section.
--
Jens Axboe
On 3/26/21 8:53 AM, Jens Axboe wrote:
> On 3/26/21 8:45 AM, Stefan Metzmacher wrote:
>> Am 26.03.21 um 15:43 schrieb Stefan Metzmacher:
>>> Am 26.03.21 um 15:38 schrieb Jens Axboe:
>>>> On 3/26/21 7:59 AM, Jens Axboe wrote:
>>>>> On 3/26/21 7:54 AM, Jens Axboe wrote:
>>>>>>> The KILL after STOP deadlock still exists.
>>>>>>
>>>>>> In which tree? Sounds like you're still on the old one with that
>>>>>> incremental you sent, which wasn't complete.
>>>>>>
>>>>>>> Does io_wq_manager() exits without cleaning up on SIGKILL?
>>>>>>
>>>>>> No, it should kill up in all cases. I'll try your stop + kill, I just
>>>>>> tested both of them separately and didn't observe anything. I also ran
>>>>>> your io_uring-cp example (and found a bug in the example, fixed and
>>>>>> pushed), fwiw.
>>>>>
>>>>> I can reproduce this one! I'll take a closer look.
>>>>
>>>> OK, that one is actually pretty straight forward - we rely on cleaning
>>>> up on exit, but for fatal cases, get_signal() will call do_exit() for us
>>>> and never return. So we might need a special case in there to deal with
>>>> that, or some other way of ensuring that fatal signal gets processed
>>>> correctly for IO threads.
>>>
>>> And if (fatal_signal_pending(current)) doesn't prevent get_signal() from being called?
>>
>> Ah, we're still in the first get_signal() from SIGSTOP, correct?
>
> Yes exactly, we're waiting in there being stopped. So we either need to
> check to something ala:
>
> relock:
> + if (current->flags & PF_IO_WORKER && fatal_signal_pending(current))
> + return false;
>
> to catch it upfront and from the relock case, or add:
>
> fatal:
> + if (current->flags & PF_IO_WORKER)
> + return false;
>
> to catch it in the fatal section.
Can you try this? Not crazy about adding a special case, but I don't
think there's any way around this one. And should be pretty cheap, as
we're already pulling in ->flags right above anyway.
diff --git a/kernel/signal.c b/kernel/signal.c
index 5ad8566534e7..5b75fbe3d2d6 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2752,6 +2752,15 @@ bool get_signal(struct ksignal *ksig)
*/
current->flags |= PF_SIGNALED;
+ /*
+ * PF_IO_WORKER threads will catch and exit on fatal signals
+ * themselves. They have cleanup that must be performed, so
+ * we cannot call do_exit() on their behalf. coredumps also
+ * do not apply to them.
+ */
+ if (current->flags & PF_IO_WORKER)
+ return false;
+
if (sig_kernel_coredump(signr)) {
if (print_fatal_signals)
print_fatal_signal(ksig->info.si_signo);
--
Jens Axboe
On 3/26/21 7:48 AM, Oleg Nesterov wrote:
> Jens, sorry, I got lost :/
Let's bring you back in :-)
> On 03/25, Jens Axboe wrote:
>>
>> With IO threads accepting signals, including SIGSTOP,
>
> where can I find this change? Looks like I wasn't cc'ed...
It's this very series.
>> unmask the
>> SIGSTOP signal from the default blocked mask.
>>
>> Signed-off-by: Jens Axboe <[email protected]>
>> ---
>> kernel/fork.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/kernel/fork.c b/kernel/fork.c
>> index d3171e8e88e5..d5a40552910f 100644
>> --- a/kernel/fork.c
>> +++ b/kernel/fork.c
>> @@ -2435,7 +2435,7 @@ struct task_struct *create_io_thread(int (*fn)(void *), void *arg, int node)
>> tsk = copy_process(NULL, 0, node, &args);
>> if (!IS_ERR(tsk)) {
>> sigfillset(&tsk->blocked);
>> - sigdelsetmask(&tsk->blocked, sigmask(SIGKILL));
>> + sigdelsetmask(&tsk->blocked, sigmask(SIGKILL)|sigmask(SIGSTOP));
>
> siginitsetinv(blocked, sigmask(SIGKILL)|sigmask(SIGSTOP)) but this is minor.
Ah thanks.
> To remind, either way this is racy and can't really help.
>
> And if "IO threads accepting signals" then I don't understand why. Sorry,
> I must have missed something.
I do think the above is a no-op at this point, and we can probably just
kill it. Let me double check, hopefully we can just remove this blocked
part.
--
Jens Axboe
Am 26.03.21 um 15:53 schrieb Jens Axboe:
> On 3/26/21 8:45 AM, Stefan Metzmacher wrote:
>> Am 26.03.21 um 15:43 schrieb Stefan Metzmacher:
>>> Am 26.03.21 um 15:38 schrieb Jens Axboe:
>>>> On 3/26/21 7:59 AM, Jens Axboe wrote:
>>>>> On 3/26/21 7:54 AM, Jens Axboe wrote:
>>>>>>> The KILL after STOP deadlock still exists.
>>>>>>
>>>>>> In which tree? Sounds like you're still on the old one with that
>>>>>> incremental you sent, which wasn't complete.
>>>>>>
>>>>>>> Does io_wq_manager() exits without cleaning up on SIGKILL?
>>>>>>
>>>>>> No, it should kill up in all cases. I'll try your stop + kill, I just
>>>>>> tested both of them separately and didn't observe anything. I also ran
>>>>>> your io_uring-cp example (and found a bug in the example, fixed and
>>>>>> pushed), fwiw.
>>>>>
>>>>> I can reproduce this one! I'll take a closer look.
>>>>
>>>> OK, that one is actually pretty straight forward - we rely on cleaning
>>>> up on exit, but for fatal cases, get_signal() will call do_exit() for us
>>>> and never return. So we might need a special case in there to deal with
>>>> that, or some other way of ensuring that fatal signal gets processed
>>>> correctly for IO threads.
>>>
>>> And if (fatal_signal_pending(current)) doesn't prevent get_signal() from being called?
>>
>> Ah, we're still in the first get_signal() from SIGSTOP, correct?
>
> Yes exactly, we're waiting in there being stopped. So we either need to
> check to something ala:
>
> relock:
> + if (current->flags & PF_IO_WORKER && fatal_signal_pending(current))
> + return false;
>
> to catch it upfront and from the relock case, or add:
>
> fatal:
> + if (current->flags & PF_IO_WORKER)
> + return false;
>
> to catch it in the fatal section.
>
Or something like io_uring_files_cancel()
Maybe change current->pf_io_worker with a generic current->io_thread
structure which, has exit hooks, as well as
io_wq_worker_sleeping() and io_wq_worker_running().
Maybe create_io_thread would take such an structure
as argument instead of a single function pointer.
struct io_thread_description {
const char *name;
int (*thread_fn)(struct io_thread_description *);
void (*sleeping_fn)((struct io_thread_description *);
void (*running_fn)((struct io_thread_description *);
void (*exit_fn)((struct io_thread_description *);
};
And then
struct io_wq_manager {
struct io_thread_description description;
... manager specific stuff...
};
metze
Am 26.03.21 um 15:55 schrieb Jens Axboe:
> On 3/26/21 8:53 AM, Jens Axboe wrote:
>> On 3/26/21 8:45 AM, Stefan Metzmacher wrote:
>>> Am 26.03.21 um 15:43 schrieb Stefan Metzmacher:
>>>> Am 26.03.21 um 15:38 schrieb Jens Axboe:
>>>>> On 3/26/21 7:59 AM, Jens Axboe wrote:
>>>>>> On 3/26/21 7:54 AM, Jens Axboe wrote:
>>>>>>>> The KILL after STOP deadlock still exists.
>>>>>>>
>>>>>>> In which tree? Sounds like you're still on the old one with that
>>>>>>> incremental you sent, which wasn't complete.
>>>>>>>
>>>>>>>> Does io_wq_manager() exits without cleaning up on SIGKILL?
>>>>>>>
>>>>>>> No, it should kill up in all cases. I'll try your stop + kill, I just
>>>>>>> tested both of them separately and didn't observe anything. I also ran
>>>>>>> your io_uring-cp example (and found a bug in the example, fixed and
>>>>>>> pushed), fwiw.
>>>>>>
>>>>>> I can reproduce this one! I'll take a closer look.
>>>>>
>>>>> OK, that one is actually pretty straight forward - we rely on cleaning
>>>>> up on exit, but for fatal cases, get_signal() will call do_exit() for us
>>>>> and never return. So we might need a special case in there to deal with
>>>>> that, or some other way of ensuring that fatal signal gets processed
>>>>> correctly for IO threads.
>>>>
>>>> And if (fatal_signal_pending(current)) doesn't prevent get_signal() from being called?
>>>
>>> Ah, we're still in the first get_signal() from SIGSTOP, correct?
>>
>> Yes exactly, we're waiting in there being stopped. So we either need to
>> check to something ala:
>>
>> relock:
>> + if (current->flags & PF_IO_WORKER && fatal_signal_pending(current))
>> + return false;
>>
>> to catch it upfront and from the relock case, or add:
>>
>> fatal:
>> + if (current->flags & PF_IO_WORKER)
>> + return false;
>>
>> to catch it in the fatal section.
>
> Can you try this? Not crazy about adding a special case, but I don't
> think there's any way around this one. And should be pretty cheap, as
> we're already pulling in ->flags right above anyway.
>
> diff --git a/kernel/signal.c b/kernel/signal.c
> index 5ad8566534e7..5b75fbe3d2d6 100644
> --- a/kernel/signal.c
> +++ b/kernel/signal.c
> @@ -2752,6 +2752,15 @@ bool get_signal(struct ksignal *ksig)
> */
> current->flags |= PF_SIGNALED;
>
> + /*
> + * PF_IO_WORKER threads will catch and exit on fatal signals
> + * themselves. They have cleanup that must be performed, so
> + * we cannot call do_exit() on their behalf. coredumps also
> + * do not apply to them.
> + */
> + if (current->flags & PF_IO_WORKER)
> + return false;
> +
> if (sig_kernel_coredump(signr)) {
> if (print_fatal_signals)
> print_fatal_signal(ksig->info.si_signo);
>
I guess not before next week, but if it resolves the problem for you,
I guess it would be good to get this into rc5.
metze
On 3/26/21 9:04 AM, Stefan Metzmacher wrote:
>
> Am 26.03.21 um 15:53 schrieb Jens Axboe:
>> On 3/26/21 8:45 AM, Stefan Metzmacher wrote:
>>> Am 26.03.21 um 15:43 schrieb Stefan Metzmacher:
>>>> Am 26.03.21 um 15:38 schrieb Jens Axboe:
>>>>> On 3/26/21 7:59 AM, Jens Axboe wrote:
>>>>>> On 3/26/21 7:54 AM, Jens Axboe wrote:
>>>>>>>> The KILL after STOP deadlock still exists.
>>>>>>>
>>>>>>> In which tree? Sounds like you're still on the old one with that
>>>>>>> incremental you sent, which wasn't complete.
>>>>>>>
>>>>>>>> Does io_wq_manager() exits without cleaning up on SIGKILL?
>>>>>>>
>>>>>>> No, it should kill up in all cases. I'll try your stop + kill, I just
>>>>>>> tested both of them separately and didn't observe anything. I also ran
>>>>>>> your io_uring-cp example (and found a bug in the example, fixed and
>>>>>>> pushed), fwiw.
>>>>>>
>>>>>> I can reproduce this one! I'll take a closer look.
>>>>>
>>>>> OK, that one is actually pretty straight forward - we rely on cleaning
>>>>> up on exit, but for fatal cases, get_signal() will call do_exit() for us
>>>>> and never return. So we might need a special case in there to deal with
>>>>> that, or some other way of ensuring that fatal signal gets processed
>>>>> correctly for IO threads.
>>>>
>>>> And if (fatal_signal_pending(current)) doesn't prevent get_signal() from being called?
>>>
>>> Ah, we're still in the first get_signal() from SIGSTOP, correct?
>>
>> Yes exactly, we're waiting in there being stopped. So we either need to
>> check to something ala:
>>
>> relock:
>> + if (current->flags & PF_IO_WORKER && fatal_signal_pending(current))
>> + return false;
>>
>> to catch it upfront and from the relock case, or add:
>>
>> fatal:
>> + if (current->flags & PF_IO_WORKER)
>> + return false;
>>
>> to catch it in the fatal section.
>>
>
> Or something like io_uring_files_cancel()
>
> Maybe change current->pf_io_worker with a generic current->io_thread
> structure which, has exit hooks, as well as
> io_wq_worker_sleeping() and io_wq_worker_running().
>
> Maybe create_io_thread would take such an structure
> as argument instead of a single function pointer.
>
> struct io_thread_description {
> const char *name;
> int (*thread_fn)(struct io_thread_description *);
> void (*sleeping_fn)((struct io_thread_description *);
> void (*running_fn)((struct io_thread_description *);
> void (*exit_fn)((struct io_thread_description *);
> };
>
> And then
> struct io_wq_manager {
> struct io_thread_description description;
> ... manager specific stuff...
> };
I did consider something like that, but seems a bit over-engineered
just for catching this case. And any kind of logic for PF_EXITING
ends up being a bit tricky for cancelations.
We can look into doing that for 5.13 potentially.
--
Jens Axboe
On 3/26/21 9:08 AM, Stefan Metzmacher wrote:
> Am 26.03.21 um 15:55 schrieb Jens Axboe:
>> On 3/26/21 8:53 AM, Jens Axboe wrote:
>>> On 3/26/21 8:45 AM, Stefan Metzmacher wrote:
>>>> Am 26.03.21 um 15:43 schrieb Stefan Metzmacher:
>>>>> Am 26.03.21 um 15:38 schrieb Jens Axboe:
>>>>>> On 3/26/21 7:59 AM, Jens Axboe wrote:
>>>>>>> On 3/26/21 7:54 AM, Jens Axboe wrote:
>>>>>>>>> The KILL after STOP deadlock still exists.
>>>>>>>>
>>>>>>>> In which tree? Sounds like you're still on the old one with that
>>>>>>>> incremental you sent, which wasn't complete.
>>>>>>>>
>>>>>>>>> Does io_wq_manager() exits without cleaning up on SIGKILL?
>>>>>>>>
>>>>>>>> No, it should kill up in all cases. I'll try your stop + kill, I just
>>>>>>>> tested both of them separately and didn't observe anything. I also ran
>>>>>>>> your io_uring-cp example (and found a bug in the example, fixed and
>>>>>>>> pushed), fwiw.
>>>>>>>
>>>>>>> I can reproduce this one! I'll take a closer look.
>>>>>>
>>>>>> OK, that one is actually pretty straight forward - we rely on cleaning
>>>>>> up on exit, but for fatal cases, get_signal() will call do_exit() for us
>>>>>> and never return. So we might need a special case in there to deal with
>>>>>> that, or some other way of ensuring that fatal signal gets processed
>>>>>> correctly for IO threads.
>>>>>
>>>>> And if (fatal_signal_pending(current)) doesn't prevent get_signal() from being called?
>>>>
>>>> Ah, we're still in the first get_signal() from SIGSTOP, correct?
>>>
>>> Yes exactly, we're waiting in there being stopped. So we either need to
>>> check to something ala:
>>>
>>> relock:
>>> + if (current->flags & PF_IO_WORKER && fatal_signal_pending(current))
>>> + return false;
>>>
>>> to catch it upfront and from the relock case, or add:
>>>
>>> fatal:
>>> + if (current->flags & PF_IO_WORKER)
>>> + return false;
>>>
>>> to catch it in the fatal section.
>>
>> Can you try this? Not crazy about adding a special case, but I don't
>> think there's any way around this one. And should be pretty cheap, as
>> we're already pulling in ->flags right above anyway.
>>
>> diff --git a/kernel/signal.c b/kernel/signal.c
>> index 5ad8566534e7..5b75fbe3d2d6 100644
>> --- a/kernel/signal.c
>> +++ b/kernel/signal.c
>> @@ -2752,6 +2752,15 @@ bool get_signal(struct ksignal *ksig)
>> */
>> current->flags |= PF_SIGNALED;
>>
>> + /*
>> + * PF_IO_WORKER threads will catch and exit on fatal signals
>> + * themselves. They have cleanup that must be performed, so
>> + * we cannot call do_exit() on their behalf. coredumps also
>> + * do not apply to them.
>> + */
>> + if (current->flags & PF_IO_WORKER)
>> + return false;
>> +
>> if (sig_kernel_coredump(signr)) {
>> if (print_fatal_signals)
>> print_fatal_signal(ksig->info.si_signo);
>>
>
> I guess not before next week, but if it resolves the problem for you,
> I guess it would be good to get this into rc5.
It does, I pushed out a new branch. I'll send out a v2 series in a bit.
--
Jens Axboe
On 3/26/21 9:11 AM, Stefan Metzmacher wrote:
> Am 26.03.21 um 16:10 schrieb Jens Axboe:
>> On 3/26/21 9:08 AM, Stefan Metzmacher wrote:
>>> Am 26.03.21 um 15:55 schrieb Jens Axboe:
>>>> On 3/26/21 8:53 AM, Jens Axboe wrote:
>>>>> On 3/26/21 8:45 AM, Stefan Metzmacher wrote:
>>>>>> Am 26.03.21 um 15:43 schrieb Stefan Metzmacher:
>>>>>>> Am 26.03.21 um 15:38 schrieb Jens Axboe:
>>>>>>>> On 3/26/21 7:59 AM, Jens Axboe wrote:
>>>>>>>>> On 3/26/21 7:54 AM, Jens Axboe wrote:
>>>>>>>>>>> The KILL after STOP deadlock still exists.
>>>>>>>>>>
>>>>>>>>>> In which tree? Sounds like you're still on the old one with that
>>>>>>>>>> incremental you sent, which wasn't complete.
>>>>>>>>>>
>>>>>>>>>>> Does io_wq_manager() exits without cleaning up on SIGKILL?
>>>>>>>>>>
>>>>>>>>>> No, it should kill up in all cases. I'll try your stop + kill, I just
>>>>>>>>>> tested both of them separately and didn't observe anything. I also ran
>>>>>>>>>> your io_uring-cp example (and found a bug in the example, fixed and
>>>>>>>>>> pushed), fwiw.
>>>>>>>>>
>>>>>>>>> I can reproduce this one! I'll take a closer look.
>>>>>>>>
>>>>>>>> OK, that one is actually pretty straight forward - we rely on cleaning
>>>>>>>> up on exit, but for fatal cases, get_signal() will call do_exit() for us
>>>>>>>> and never return. So we might need a special case in there to deal with
>>>>>>>> that, or some other way of ensuring that fatal signal gets processed
>>>>>>>> correctly for IO threads.
>>>>>>>
>>>>>>> And if (fatal_signal_pending(current)) doesn't prevent get_signal() from being called?
>>>>>>
>>>>>> Ah, we're still in the first get_signal() from SIGSTOP, correct?
>>>>>
>>>>> Yes exactly, we're waiting in there being stopped. So we either need to
>>>>> check to something ala:
>>>>>
>>>>> relock:
>>>>> + if (current->flags & PF_IO_WORKER && fatal_signal_pending(current))
>>>>> + return false;
>>>>>
>>>>> to catch it upfront and from the relock case, or add:
>>>>>
>>>>> fatal:
>>>>> + if (current->flags & PF_IO_WORKER)
>>>>> + return false;
>>>>>
>>>>> to catch it in the fatal section.
>>>>
>>>> Can you try this? Not crazy about adding a special case, but I don't
>>>> think there's any way around this one. And should be pretty cheap, as
>>>> we're already pulling in ->flags right above anyway.
>>>>
>>>> diff --git a/kernel/signal.c b/kernel/signal.c
>>>> index 5ad8566534e7..5b75fbe3d2d6 100644
>>>> --- a/kernel/signal.c
>>>> +++ b/kernel/signal.c
>>>> @@ -2752,6 +2752,15 @@ bool get_signal(struct ksignal *ksig)
>>>> */
>>>> current->flags |= PF_SIGNALED;
>>>>
>>>> + /*
>>>> + * PF_IO_WORKER threads will catch and exit on fatal signals
>>>> + * themselves. They have cleanup that must be performed, so
>>>> + * we cannot call do_exit() on their behalf. coredumps also
>>>> + * do not apply to them.
>>>> + */
>>>> + if (current->flags & PF_IO_WORKER)
>>>> + return false;
>>>> +
>>>> if (sig_kernel_coredump(signr)) {
>>>> if (print_fatal_signals)
>>>> print_fatal_signal(ksig->info.si_signo);
>>>>
>>>
>>> I guess not before next week, but if it resolves the problem for you,
>>> I guess it would be good to get this into rc5.
>>
>> It does, I pushed out a new branch. I'll send out a v2 series in a bit.
>
> Great, thanks!
>
> Any chance to get the "cmdline" hiding included?
I'll take a look at your response there, haven't yet. Wanted to get this
one sorted first.
--
Jens Axboe
Am 26.03.21 um 16:10 schrieb Jens Axboe:
> On 3/26/21 9:08 AM, Stefan Metzmacher wrote:
>> Am 26.03.21 um 15:55 schrieb Jens Axboe:
>>> On 3/26/21 8:53 AM, Jens Axboe wrote:
>>>> On 3/26/21 8:45 AM, Stefan Metzmacher wrote:
>>>>> Am 26.03.21 um 15:43 schrieb Stefan Metzmacher:
>>>>>> Am 26.03.21 um 15:38 schrieb Jens Axboe:
>>>>>>> On 3/26/21 7:59 AM, Jens Axboe wrote:
>>>>>>>> On 3/26/21 7:54 AM, Jens Axboe wrote:
>>>>>>>>>> The KILL after STOP deadlock still exists.
>>>>>>>>>
>>>>>>>>> In which tree? Sounds like you're still on the old one with that
>>>>>>>>> incremental you sent, which wasn't complete.
>>>>>>>>>
>>>>>>>>>> Does io_wq_manager() exits without cleaning up on SIGKILL?
>>>>>>>>>
>>>>>>>>> No, it should kill up in all cases. I'll try your stop + kill, I just
>>>>>>>>> tested both of them separately and didn't observe anything. I also ran
>>>>>>>>> your io_uring-cp example (and found a bug in the example, fixed and
>>>>>>>>> pushed), fwiw.
>>>>>>>>
>>>>>>>> I can reproduce this one! I'll take a closer look.
>>>>>>>
>>>>>>> OK, that one is actually pretty straight forward - we rely on cleaning
>>>>>>> up on exit, but for fatal cases, get_signal() will call do_exit() for us
>>>>>>> and never return. So we might need a special case in there to deal with
>>>>>>> that, or some other way of ensuring that fatal signal gets processed
>>>>>>> correctly for IO threads.
>>>>>>
>>>>>> And if (fatal_signal_pending(current)) doesn't prevent get_signal() from being called?
>>>>>
>>>>> Ah, we're still in the first get_signal() from SIGSTOP, correct?
>>>>
>>>> Yes exactly, we're waiting in there being stopped. So we either need to
>>>> check to something ala:
>>>>
>>>> relock:
>>>> + if (current->flags & PF_IO_WORKER && fatal_signal_pending(current))
>>>> + return false;
>>>>
>>>> to catch it upfront and from the relock case, or add:
>>>>
>>>> fatal:
>>>> + if (current->flags & PF_IO_WORKER)
>>>> + return false;
>>>>
>>>> to catch it in the fatal section.
>>>
>>> Can you try this? Not crazy about adding a special case, but I don't
>>> think there's any way around this one. And should be pretty cheap, as
>>> we're already pulling in ->flags right above anyway.
>>>
>>> diff --git a/kernel/signal.c b/kernel/signal.c
>>> index 5ad8566534e7..5b75fbe3d2d6 100644
>>> --- a/kernel/signal.c
>>> +++ b/kernel/signal.c
>>> @@ -2752,6 +2752,15 @@ bool get_signal(struct ksignal *ksig)
>>> */
>>> current->flags |= PF_SIGNALED;
>>>
>>> + /*
>>> + * PF_IO_WORKER threads will catch and exit on fatal signals
>>> + * themselves. They have cleanup that must be performed, so
>>> + * we cannot call do_exit() on their behalf. coredumps also
>>> + * do not apply to them.
>>> + */
>>> + if (current->flags & PF_IO_WORKER)
>>> + return false;
>>> +
>>> if (sig_kernel_coredump(signr)) {
>>> if (print_fatal_signals)
>>> print_fatal_signal(ksig->info.si_signo);
>>>
>>
>> I guess not before next week, but if it resolves the problem for you,
>> I guess it would be good to get this into rc5.
>
> It does, I pushed out a new branch. I'll send out a v2 series in a bit.
Great, thanks!
Any chance to get the "cmdline" hiding included?
metze
Am 26.03.21 um 16:01 schrieb Jens Axboe:
> On 3/26/21 7:48 AM, Oleg Nesterov wrote:
>> Jens, sorry, I got lost :/
>
> Let's bring you back in :-)
>
>> On 03/25, Jens Axboe wrote:
>>>
>>> With IO threads accepting signals, including SIGSTOP,
>>
>> where can I find this change? Looks like I wasn't cc'ed...
>
> It's this very series.
>
>>> unmask the
>>> SIGSTOP signal from the default blocked mask.
>>>
>>> Signed-off-by: Jens Axboe <[email protected]>
>>> ---
>>> kernel/fork.c | 2 +-
>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/kernel/fork.c b/kernel/fork.c
>>> index d3171e8e88e5..d5a40552910f 100644
>>> --- a/kernel/fork.c
>>> +++ b/kernel/fork.c
>>> @@ -2435,7 +2435,7 @@ struct task_struct *create_io_thread(int (*fn)(void *), void *arg, int node)
>>> tsk = copy_process(NULL, 0, node, &args);
>>> if (!IS_ERR(tsk)) {
>>> sigfillset(&tsk->blocked);
>>> - sigdelsetmask(&tsk->blocked, sigmask(SIGKILL));
>>> + sigdelsetmask(&tsk->blocked, sigmask(SIGKILL)|sigmask(SIGSTOP));
>>
>> siginitsetinv(blocked, sigmask(SIGKILL)|sigmask(SIGSTOP)) but this is minor.
>
> Ah thanks.
>
>> To remind, either way this is racy and can't really help.
>>
>> And if "IO threads accepting signals" then I don't understand why. Sorry,
>> I must have missed something.
>
> I do think the above is a no-op at this point, and we can probably just
> kill it. Let me double check, hopefully we can just remove this blocked
> part.
Is this really correct to drop in your "kernel: stop masking signals in create_io_thread()"
commit?
I don't assume signals wanted by userspace should potentially handled in an io_thread...
e.g. things set with fcntl(fd, F_SETSIG,) used together with F_SETLEASE?
metze
On 3/26/21 9:23 AM, Stefan Metzmacher wrote:
> Am 26.03.21 um 16:01 schrieb Jens Axboe:
>> On 3/26/21 7:48 AM, Oleg Nesterov wrote:
>>> Jens, sorry, I got lost :/
>>
>> Let's bring you back in :-)
>>
>>> On 03/25, Jens Axboe wrote:
>>>>
>>>> With IO threads accepting signals, including SIGSTOP,
>>>
>>> where can I find this change? Looks like I wasn't cc'ed...
>>
>> It's this very series.
>>
>>>> unmask the
>>>> SIGSTOP signal from the default blocked mask.
>>>>
>>>> Signed-off-by: Jens Axboe <[email protected]>
>>>> ---
>>>> kernel/fork.c | 2 +-
>>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/kernel/fork.c b/kernel/fork.c
>>>> index d3171e8e88e5..d5a40552910f 100644
>>>> --- a/kernel/fork.c
>>>> +++ b/kernel/fork.c
>>>> @@ -2435,7 +2435,7 @@ struct task_struct *create_io_thread(int (*fn)(void *), void *arg, int node)
>>>> tsk = copy_process(NULL, 0, node, &args);
>>>> if (!IS_ERR(tsk)) {
>>>> sigfillset(&tsk->blocked);
>>>> - sigdelsetmask(&tsk->blocked, sigmask(SIGKILL));
>>>> + sigdelsetmask(&tsk->blocked, sigmask(SIGKILL)|sigmask(SIGSTOP));
>>>
>>> siginitsetinv(blocked, sigmask(SIGKILL)|sigmask(SIGSTOP)) but this is minor.
>>
>> Ah thanks.
>>
>>> To remind, either way this is racy and can't really help.
>>>
>>> And if "IO threads accepting signals" then I don't understand why. Sorry,
>>> I must have missed something.
>>
>> I do think the above is a no-op at this point, and we can probably just
>> kill it. Let me double check, hopefully we can just remove this blocked
>> part.
>
> Is this really correct to drop in your "kernel: stop masking signals in create_io_thread()"
> commit?
>
> I don't assume signals wanted by userspace should potentially handled in an io_thread...
> e.g. things set with fcntl(fd, F_SETSIG,) used together with F_SETLEASE?
I guess we do actually need it, if we're not fiddling with
wants_signal() for them. To quell Oleg's concerns, we can just move it
to post dup_task_struct(), that should eliminate any race concerns
there.
--
Jens Axboe
Am 26.03.21 um 16:29 schrieb Jens Axboe:
> On 3/26/21 9:23 AM, Stefan Metzmacher wrote:
>> Am 26.03.21 um 16:01 schrieb Jens Axboe:
>>> On 3/26/21 7:48 AM, Oleg Nesterov wrote:
>>>> Jens, sorry, I got lost :/
>>>
>>> Let's bring you back in :-)
>>>
>>>> On 03/25, Jens Axboe wrote:
>>>>>
>>>>> With IO threads accepting signals, including SIGSTOP,
>>>>
>>>> where can I find this change? Looks like I wasn't cc'ed...
>>>
>>> It's this very series.
>>>
>>>>> unmask the
>>>>> SIGSTOP signal from the default blocked mask.
>>>>>
>>>>> Signed-off-by: Jens Axboe <[email protected]>
>>>>> ---
>>>>> kernel/fork.c | 2 +-
>>>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/kernel/fork.c b/kernel/fork.c
>>>>> index d3171e8e88e5..d5a40552910f 100644
>>>>> --- a/kernel/fork.c
>>>>> +++ b/kernel/fork.c
>>>>> @@ -2435,7 +2435,7 @@ struct task_struct *create_io_thread(int (*fn)(void *), void *arg, int node)
>>>>> tsk = copy_process(NULL, 0, node, &args);
>>>>> if (!IS_ERR(tsk)) {
>>>>> sigfillset(&tsk->blocked);
>>>>> - sigdelsetmask(&tsk->blocked, sigmask(SIGKILL));
>>>>> + sigdelsetmask(&tsk->blocked, sigmask(SIGKILL)|sigmask(SIGSTOP));
>>>>
>>>> siginitsetinv(blocked, sigmask(SIGKILL)|sigmask(SIGSTOP)) but this is minor.
>>>
>>> Ah thanks.
>>>
>>>> To remind, either way this is racy and can't really help.
>>>>
>>>> And if "IO threads accepting signals" then I don't understand why. Sorry,
>>>> I must have missed something.
>>>
>>> I do think the above is a no-op at this point, and we can probably just
>>> kill it. Let me double check, hopefully we can just remove this blocked
>>> part.
>>
>> Is this really correct to drop in your "kernel: stop masking signals in create_io_thread()"
>> commit?
>>
>> I don't assume signals wanted by userspace should potentially handled in an io_thread...
>> e.g. things set with fcntl(fd, F_SETSIG,) used together with F_SETLEASE?
>
> I guess we do actually need it, if we're not fiddling with
> wants_signal() for them. To quell Oleg's concerns, we can just move it
> to post dup_task_struct(), that should eliminate any race concerns
> there.
If that one is racy, don' we better also want this one?
https://lore.kernel.org/io-uring/438b738c1e4827a7fdfe43087da88bbe17eedc72.1616197787.git.metze@samba.org/T/#u
And clear tsk->pf_io_worker ?
metze
On 3/26/21 12:01 PM, Stefan Metzmacher wrote:
> Am 26.03.21 um 16:29 schrieb Jens Axboe:
>> On 3/26/21 9:23 AM, Stefan Metzmacher wrote:
>>> Am 26.03.21 um 16:01 schrieb Jens Axboe:
>>>> On 3/26/21 7:48 AM, Oleg Nesterov wrote:
>>>>> Jens, sorry, I got lost :/
>>>>
>>>> Let's bring you back in :-)
>>>>
>>>>> On 03/25, Jens Axboe wrote:
>>>>>>
>>>>>> With IO threads accepting signals, including SIGSTOP,
>>>>>
>>>>> where can I find this change? Looks like I wasn't cc'ed...
>>>>
>>>> It's this very series.
>>>>
>>>>>> unmask the
>>>>>> SIGSTOP signal from the default blocked mask.
>>>>>>
>>>>>> Signed-off-by: Jens Axboe <[email protected]>
>>>>>> ---
>>>>>> kernel/fork.c | 2 +-
>>>>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>>
>>>>>> diff --git a/kernel/fork.c b/kernel/fork.c
>>>>>> index d3171e8e88e5..d5a40552910f 100644
>>>>>> --- a/kernel/fork.c
>>>>>> +++ b/kernel/fork.c
>>>>>> @@ -2435,7 +2435,7 @@ struct task_struct *create_io_thread(int (*fn)(void *), void *arg, int node)
>>>>>> tsk = copy_process(NULL, 0, node, &args);
>>>>>> if (!IS_ERR(tsk)) {
>>>>>> sigfillset(&tsk->blocked);
>>>>>> - sigdelsetmask(&tsk->blocked, sigmask(SIGKILL));
>>>>>> + sigdelsetmask(&tsk->blocked, sigmask(SIGKILL)|sigmask(SIGSTOP));
>>>>>
>>>>> siginitsetinv(blocked, sigmask(SIGKILL)|sigmask(SIGSTOP)) but this is minor.
>>>>
>>>> Ah thanks.
>>>>
>>>>> To remind, either way this is racy and can't really help.
>>>>>
>>>>> And if "IO threads accepting signals" then I don't understand why. Sorry,
>>>>> I must have missed something.
>>>>
>>>> I do think the above is a no-op at this point, and we can probably just
>>>> kill it. Let me double check, hopefully we can just remove this blocked
>>>> part.
>>>
>>> Is this really correct to drop in your "kernel: stop masking signals in create_io_thread()"
>>> commit?
>>>
>>> I don't assume signals wanted by userspace should potentially handled in an io_thread...
>>> e.g. things set with fcntl(fd, F_SETSIG,) used together with F_SETLEASE?
>>
>> I guess we do actually need it, if we're not fiddling with
>> wants_signal() for them. To quell Oleg's concerns, we can just move it
>> to post dup_task_struct(), that should eliminate any race concerns
>> there.
>
> If that one is racy, don' we better also want this one?
> https://lore.kernel.org/io-uring/438b738c1e4827a7fdfe43087da88bbe17eedc72.1616197787.git.metze@samba.org/T/#u
>
> And clear tsk->pf_io_worker ?
Definitely prudent. I'll get round 2 queued up shortly.
--
Jens Axboe
Hi Jens,
> root@ub1704-166:~# LANG=C gdb --pid 1320
> GNU gdb (Ubuntu 9.2-0ubuntu1~20.04) 9.2
> Copyright (C) 2020 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law.
> Type "show copying" and "show warranty" for details.
> This GDB was configured as "x86_64-linux-gnu".
> Type "show configuration" for configuration details.
> For bug reporting instructions, please see:
> <http://www.gnu.org/software/gdb/bugs/>.
> Find the GDB manual and other documentation resources online at:
> <http://www.gnu.org/software/gdb/documentation/>.
>
> For help, type "help".
> Type "apropos word" to search for commands related to "word".
> Attaching to process 1320
> [New LWP 1321]
> [New LWP 1322]
>
> warning: Selected architecture i386:x86-64 is not compatible with reported target architecture i386
>
> warning: Architecture rejected target-supplied description
> syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
> 38 ../sysdeps/unix/sysv/linux/x86_64/syscall.S: No such file or directory.
> (gdb)
Ok, the following makes gdb happy again:
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -163,6 +163,8 @@ int copy_thread(unsigned long clone_flags, unsigned long sp, unsigned long arg,
/* Kernel thread ? */
if (unlikely(p->flags & (PF_KTHREAD | PF_IO_WORKER))) {
memset(childregs, 0, sizeof(struct pt_regs));
+ if (p->flags & PF_IO_WORKER)
+ childregs->cs = current_pt_regs()->cs;
kthread_frame_init(frame, sp, arg);
return 0;
}
I'm wondering if we should decouple the PF_KTHREAD and PF_IO_WORKER cases even more
and keep as much of a userspace-like copy_thread as possible.
metze
On 3/26/21 7:46 PM, Stefan Metzmacher wrote:
>
> Hi Jens,
>
>> root@ub1704-166:~# LANG=C gdb --pid 1320
>> GNU gdb (Ubuntu 9.2-0ubuntu1~20.04) 9.2
>> Copyright (C) 2020 Free Software Foundation, Inc.
>> License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
>> This is free software: you are free to change and redistribute it.
>> There is NO WARRANTY, to the extent permitted by law.
>> Type "show copying" and "show warranty" for details.
>> This GDB was configured as "x86_64-linux-gnu".
>> Type "show configuration" for configuration details.
>> For bug reporting instructions, please see:
>> <http://www.gnu.org/software/gdb/bugs/>.
>> Find the GDB manual and other documentation resources online at:
>> <http://www.gnu.org/software/gdb/documentation/>.
>>
>> For help, type "help".
>> Type "apropos word" to search for commands related to "word".
>> Attaching to process 1320
>> [New LWP 1321]
>> [New LWP 1322]
>>
>> warning: Selected architecture i386:x86-64 is not compatible with reported target architecture i386
>>
>> warning: Architecture rejected target-supplied description
>> syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
>> 38 ../sysdeps/unix/sysv/linux/x86_64/syscall.S: No such file or directory.
>> (gdb)
>
> Ok, the following makes gdb happy again:
>
> --- a/arch/x86/kernel/process.c
> +++ b/arch/x86/kernel/process.c
> @@ -163,6 +163,8 @@ int copy_thread(unsigned long clone_flags, unsigned long sp, unsigned long arg,
> /* Kernel thread ? */
> if (unlikely(p->flags & (PF_KTHREAD | PF_IO_WORKER))) {
> memset(childregs, 0, sizeof(struct pt_regs));
> + if (p->flags & PF_IO_WORKER)
> + childregs->cs = current_pt_regs()->cs;
> kthread_frame_init(frame, sp, arg);
> return 0;
> }
Confirmed, it stops complaining about the arch at that point.
> I'm wondering if we should decouple the PF_KTHREAD and PF_IO_WORKER
> cases even more and keep as much of a userspace-like copy_thread as
> possible.
Probably makes sense, the only thing they really share is the func+arg
setup. Hence PF_IO_WORKER threads likely just use the rest of the init,
where it doesn't conflict with the frame setup.
--
Jens Axboe
Hi Jens,
>> For help, type "help".
>> Type "apropos word" to search for commands related to "word".
>> Attaching to process 1320
>> [New LWP 1321]
>> [New LWP 1322]
>>
>> warning: Selected architecture i386:x86-64 is not compatible with reported target architecture i386
>>
>> warning: Architecture rejected target-supplied description
>> syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
>> 38 ../sysdeps/unix/sysv/linux/x86_64/syscall.S: No such file or directory.
>> (gdb)
>
> Ok, the following makes gdb happy again:
>
> --- a/arch/x86/kernel/process.c
> +++ b/arch/x86/kernel/process.c
> @@ -163,6 +163,8 @@ int copy_thread(unsigned long clone_flags, unsigned long sp, unsigned long arg,
> /* Kernel thread ? */
> if (unlikely(p->flags & (PF_KTHREAD | PF_IO_WORKER))) {
> memset(childregs, 0, sizeof(struct pt_regs));
> + if (p->flags & PF_IO_WORKER)
> + childregs->cs = current_pt_regs()->cs;
> kthread_frame_init(frame, sp, arg);
> return 0;
> }
>
> I'm wondering if we should decouple the PF_KTHREAD and PF_IO_WORKER cases even more
> and keep as much of a userspace-like copy_thread as possible.
Would it be possible to fix this remaining problem before 5.12 final?
(I don't think my change would be the correct fix actually
and other architectures may have similar problems).
Thanks!
metze
Am 01.04.21 um 17:39 schrieb Linus Torvalds:
> On Thu, Apr 1, 2021 at 7:58 AM Stefan Metzmacher <[email protected]> wrote:
>>
>>>
>>> Ok, the following makes gdb happy again:
>>>
>>> --- a/arch/x86/kernel/process.c
>>> +++ b/arch/x86/kernel/process.c
>>> @@ -163,6 +163,8 @@ int copy_thread(unsigned long clone_flags, unsigned long sp, unsigned long arg,
>>> /* Kernel thread ? */
>>> if (unlikely(p->flags & (PF_KTHREAD | PF_IO_WORKER))) {
>>> memset(childregs, 0, sizeof(struct pt_regs));
>>> + if (p->flags & PF_IO_WORKER)
>>> + childregs->cs = current_pt_regs()->cs;
>>> kthread_frame_init(frame, sp, arg);
>>> return 0;
>>> }
>>
>> Would it be possible to fix this remaining problem before 5.12 final?
>
> Please not that way.
>
> But doing something like
>
> childregs->cs = __USER_CS;
> childregs->ss = __USER_DS;
> childregs->ds = __USER_DS;
> childregs->es = __USER_DS;
>
> might make sense (just do it unconditionally, rather than making it
> special to PF_IO_WORKER).
>
> Does that make gdb happy too?
I haven't tried it, but it seems gdb tries to use PTRACE_PEEKUSR
against the last thread tid listed under /proc/<pid>/tasks/ in order to
get the architecture for the userspace application, so my naive assumption
would be that it wouldn't allow the detection of a 32-bit application
using a 64-bit kernel.
metze
On Thu, Apr 1, 2021 at 7:58 AM Stefan Metzmacher <[email protected]> wrote:
>
> >
> > Ok, the following makes gdb happy again:
> >
> > --- a/arch/x86/kernel/process.c
> > +++ b/arch/x86/kernel/process.c
> > @@ -163,6 +163,8 @@ int copy_thread(unsigned long clone_flags, unsigned long sp, unsigned long arg,
> > /* Kernel thread ? */
> > if (unlikely(p->flags & (PF_KTHREAD | PF_IO_WORKER))) {
> > memset(childregs, 0, sizeof(struct pt_regs));
> > + if (p->flags & PF_IO_WORKER)
> > + childregs->cs = current_pt_regs()->cs;
> > kthread_frame_init(frame, sp, arg);
> > return 0;
> > }
>
> Would it be possible to fix this remaining problem before 5.12 final?
Please not that way.
But doing something like
childregs->cs = __USER_CS;
childregs->ss = __USER_DS;
childregs->ds = __USER_DS;
childregs->es = __USER_DS;
might make sense (just do it unconditionally, rather than making it
special to PF_IO_WORKER).
Does that make gdb happy too?
Linus
Am 01.04.21 um 18:24 schrieb Linus Torvalds:
> On Thu, Apr 1, 2021 at 9:00 AM Stefan Metzmacher <[email protected]> wrote:
>>
>> I haven't tried it, but it seems gdb tries to use PTRACE_PEEKUSR
>> against the last thread tid listed under /proc/<pid>/tasks/ in order to
>> get the architecture for the userspace application
>
> Christ, what an odd hack. Why wouldn't it just do it on the initial
> thread you actually attached to?
>
> Are you sure it's not simply because your test-case was to attach to
> the io_uring thread? Because the io_uring thread might as well be
> considered to be 64-bit.
│ └─io_uring-cp,1396 Makefile file
│ ├─{iou-mgr-1396},1397
│ ├─{iou-wrk-1396},1398
│ └─{iou-wrk-1396},1399
strace -ttT -o strace-uring-fail.txt gdb --pid 1396
(note strace -f would deadlock gdb with SIGSTOP)
I've attached the strace-uring-fail.txt
(I guess there was a race and the workers 1398 and 1399 exited in between,
that's it using 1397):
18:46:35.429498 ptrace(PTRACE_PEEKUSER, 1397, 8*CS, [NULL]) = 0 <0.000022>
>> so my naive assumption
>> would be that it wouldn't allow the detection of a 32-bit application
>> using a 64-bit kernel.
>
> I'm not entirely convinced we want to care about a confused gdb
> implementation and somebody debugging a case that I don't believe
> happens in practice.
>
> 32-bit user space is legacy. And legacy isn't io_uring. If somebody
> insists on doing odd things, they can live with the odd results.
Ok, any ideas regarding similar problems for other architectures?
metze
Hi Jens,
>>> I don't assume signals wanted by userspace should potentially handled in an io_thread...
>>> e.g. things set with fcntl(fd, F_SETSIG,) used together with F_SETLEASE?
>>
>> I guess we do actually need it, if we're not fiddling with
>> wants_signal() for them. To quell Oleg's concerns, we can just move it
>> to post dup_task_struct(), that should eliminate any race concerns
>> there.
>
> If that one is racy, don' we better also want this one?
> https://lore.kernel.org/io-uring/438b738c1e4827a7fdfe43087da88bbe17eedc72.1616197787.git.metze@samba.org/T/#u
>
> And clear tsk->pf_io_worker ?
As the workers don't clone other workers I guess it's fine to defer this to 5.13.
metze
On Thu, Apr 1, 2021 at 9:00 AM Stefan Metzmacher <[email protected]> wrote:
>
> I haven't tried it, but it seems gdb tries to use PTRACE_PEEKUSR
> against the last thread tid listed under /proc/<pid>/tasks/ in order to
> get the architecture for the userspace application
Christ, what an odd hack. Why wouldn't it just do it on the initial
thread you actually attached to?
Are you sure it's not simply because your test-case was to attach to
the io_uring thread? Because the io_uring thread might as well be
considered to be 64-bit.
> so my naive assumption
> would be that it wouldn't allow the detection of a 32-bit application
> using a 64-bit kernel.
I'm not entirely convinced we want to care about a confused gdb
implementation and somebody debugging a case that I don't believe
happens in practice.
32-bit user space is legacy. And legacy isn't io_uring. If somebody
insists on doing odd things, they can live with the odd results.
Linus
Am 01.04.21 um 18:24 schrieb Linus Torvalds:
> On Thu, Apr 1, 2021 at 9:00 AM Stefan Metzmacher <[email protected]> wrote:
>>
>> I haven't tried it, but it seems gdb tries to use PTRACE_PEEKUSR
>> against the last thread tid listed under /proc/<pid>/tasks/ in order to
>> get the architecture for the userspace application
>
> Christ, what an odd hack. Why wouldn't it just do it on the initial
> thread you actually attached to?
>
> Are you sure it's not simply because your test-case was to attach to
> the io_uring thread? Because the io_uring thread might as well be
> considered to be 64-bit.
│ └─io_uring-cp,1396 Makefile file
│ ├─{iou-mgr-1396},1397
│ ├─{iou-wrk-1396},1398
│ └─{iou-wrk-1396},1399
strace -ttT -o strace-uring-fail.txt gdb --pid 1396
(note strace -f would deadlock gdb with SIGSTOP)
The full file can be found here:
https://www.samba.org/~metze/strace-uring-fail.txt
(I guess there was a race and the workers 1398 and 1399 exited in between,
that's it using 1397):
18:46:35.429498 ptrace(PTRACE_PEEKUSER, 1397, 8*CS, [NULL]) = 0 <0.000022>
>> so my naive assumption
>> would be that it wouldn't allow the detection of a 32-bit application
>> using a 64-bit kernel.
>
> I'm not entirely convinced we want to care about a confused gdb
> implementation and somebody debugging a case that I don't believe
> happens in practice.
>
> 32-bit user space is legacy. And legacy isn't io_uring. If somebody
> insists on doing odd things, they can live with the odd results.
Ok, I'd agree for 32-bit applications, but what about libraries?
E.g. distributions ship libraries like libsmbclient or nss modules
as 64-bit and 32-bit version, in order to support legacy applications
to run. Why shouldn't the 32-bit library builds not support io_uring?
(Note libsmbclient don't use io_uring yet, but I guess it will be in future).
Any ideas regarding similar problems for other architectures?
metze