LinuxLists.cc - Re: [RFC] coredump: Do not interrupt dump for TIF_NOTIFY

2021-10-22 14:14:44

Subject: Re: [RFC] coredump: Do not interrupt dump for TIF_NOTIFY_SIGNAL

On 6/9/21 21:17, Eric W. Biederman wrote:
>
> Folks,
>
> Olivier Langlois has been struggling with coredumps getting truncated in
> tasks using io_uring. He has also apparently been struggling with
> the some of his email messages not making it to the lists.

Looks syzbot hit something relevant, see
https://lore.kernel.org/io-uring/[email protected]/

In short, a task creates an io_uring worker thread, then the worker
submits a task_work item to the creator task and won't die until
the item is executed/cancelled. And I found that the creator task is
sleeping in do_coredump() -> wait_for_completion()

0xffffffff81343ccb is in do_coredump (fs/coredump.c:469).
464
465 if (core_waiters > 0) {
466 struct core_thread *ptr;
467
468 freezer_do_not_count();
469 wait_for_completion(&core_state->startup);
470 freezer_count();

A hack executing tws there helps (see diff below).
Any chance anyone knows what this is and how to fix it?

diff --git a/fs/coredump.c b/fs/coredump.c
index 3224dee44d30..f6f9dfb02296 100644
--- a/fs/coredump.c
+++ b/fs/coredump.c
@@ -466,7 +466,8 @@ static int coredump_wait(int exit_code, struct core_state *core_state)
struct core_thread *ptr;

freezer_do_not_count();
- wait_for_completion(&core_state->startup);
+ while (wait_for_completion_interruptible(&core_state->startup))
+ tracehook_notify_signal();
freezer_count();
/*
* Wait for all the threads to become inactive, so that

>
> We were talking about some of his struggles and questions in this area
> and he pointed me to this patch he thought he had posted but I could not
> find in the list archives.
>
> In short the coredump code deliberately supports being interrupted by
> SIGKILL, and depends upon prepare_signal to filter out all other
> signals. With the io_uring code comes an extra test in signal_pending
> for TIF_NOTIFY_SIGNAL (which is something about asking a task to run
> task_work_run).
>
> I am baffled why the dumper thread would be getting interrupted by
> TIF_NOTIFY_SIGNAL but apparently it is. Perhaps it is an io_uring
> thread that is causing the dump.
>
> Now that we know the problem the question becomes how to fix this issue.
>
> Is there any chance all of this TWA_SIGNAL logic could simply be removed
> now that io_uring threads are normal process threads?
>
> There are only the two call sites so I perhaps the could test
> signal->flags & SIGNAL_FLAG_COREDUMP before scheduling a work on
> a process that is dumping core?
>
> Perhaps the coredump code needs to call task_work_run before it does
> anything?
>
> -----
>
> From: Olivier Langlois <[email protected]>
> Subject: [PATCH] coredump: Do not interrupt dump for TIF_NOTIFY_SIGNAL
> Date: Mon, 07 Jun 2021 16:25:06 -0400
>
> io_uring is a big user of task_work and any event that io_uring made a
> task waiting for that occurs during the core dump generation will
> generate a TIF_NOTIFY_SIGNAL.
>
> Here are the detailed steps of the problem:
> 1. io_uring calls vfs_poll() to install a task to a file wait queue
> with io_async_wake() as the wakeup function cb from io_arm_poll_handler()
> 2. wakeup function ends up calling task_work_add() with TWA_SIGNAL
> 3. task_work_add() sets the TIF_NOTIFY_SIGNAL bit by calling
> set_notify_signal()
>
> Signed-off-by: Olivier Langlois <[email protected]>
> ---
> fs/coredump.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/fs/coredump.c b/fs/coredump.c
> index 2868e3e171ae..79c6e3f114db 100644
> --- a/fs/coredump.c
> +++ b/fs/coredump.c
> @@ -519,7 +519,7 @@ static bool dump_interrupted(void)
> * but then we need to teach dump_write() to restart and clear
> * TIF_SIGPENDING.
> */
> - return signal_pending(current);
> + return task_sigpending(current);
> }
>
> static void wait_for_dump_helpers(struct file *file)
>

--
Pavel Begunkov

2021-12-24 02:33:27

by Olivier Langlois

[permalink] [raw]

Subject: Re: [RFC] coredump: Do not interrupt dump for TIF_NOTIFY_SIGNAL

On Fri, 2021-10-22 at 15:13 +0100, Pavel Begunkov wrote:
> On 6/9/21 21:17, Eric W. Biederman wrote:
> >
> > Folks,
> >
> > Olivier Langlois has been struggling with coredumps getting
> > truncated in
> > tasks using io_uring.? He has also apparently been struggling with
> > the some of his email messages not making it to the lists.
>
> Looks syzbot hit something relevant, see
> https://lore.kernel.org/io-
> uring/[email protected]/
>
> In short, a task creates an io_uring worker thread, then the worker
> submits a task_work item to the creator task and won't die until
> the item is executed/cancelled. And I found that the creator task is
> sleeping in do_coredump() -> wait_for_completion()
>
> 0xffffffff81343ccb is in do_coredump (fs/coredump.c:469).
> 464
> 465???????????? if (core_waiters > 0) {
> 466???????????????????? struct core_thread *ptr;
> 467
> 468???????????????????? freezer_do_not_count();
> 469???????????????????? wait_for_completion(&core_state->startup);
> 470???????????????????? freezer_count();
>
>
> A hack executing tws there helps (see diff below).
> Any chance anyone knows what this is and how to fix it?
>
>
> diff --git a/fs/coredump.c b/fs/coredump.c
> index 3224dee44d30..f6f9dfb02296 100644
> --- a/fs/coredump.c
> +++ b/fs/coredump.c
> @@ -466,7 +466,8 @@ static int coredump_wait(int exit_code, struct
> core_state *core_state)
> ????????? struct core_thread *ptr;
> ?
> ????????? freezer_do_not_count();
> -??????? wait_for_completion(&core_state->startup);
> +??????? while (wait_for_completion_interruptible(&core_state-
> >startup))
> +??????????? tracehook_notify_signal();
> ????????? freezer_count();
> ????????? /*
> ?????????? * Wait for all the threads to become inactive, so that
>
>
>
>
Pavel,

I cannot comment on the merit of the proposed hack but my proposed
patch to fix the coredump truncation issue when a process using
io_uring core dumps that I submitted back in August is still
unreviewed!

https://lore.kernel.org/lkml/1625bc89782bf83d9d8c7c63e8ffcb651ccb15fa.1629655338.git.olivier@trillion01.com/

I have been using it since then I must have generated many dozens of
perfect core dump files with it and I have not seen a single truncated
core dump files like I used to have prior to the patch.

I am bringing back my patch to your attention because one nice side
effect of it is that it would have avoided totally the problem that you
have encountered in coredump_wait() since it does cancel io_uring
resources before calling coredump_wait()!

Greetings,
Olivier

2021-12-24 10:38:13

by Pavel Begunkov

[permalink] [raw]

Subject: Re: [RFC] coredump: Do not interrupt dump for TIF_NOTIFY_SIGNAL

On 12/24/21 01:34, Olivier Langlois wrote:
> On Fri, 2021-10-22 at 15:13 +0100, Pavel Begunkov wrote:
>> On 6/9/21 21:17, Eric W. Biederman wrote:
>> In short, a task creates an io_uring worker thread, then the worker
>> submits a task_work item to the creator task and won't die until
>> the item is executed/cancelled. And I found that the creator task is
>> sleeping in do_coredump() -> wait_for_completion()
>>
[...]
>> A hack executing tws there helps (see diff below).
>> Any chance anyone knows what this is and how to fix it?
>>
[...]
> Pavel,
>
> I cannot comment on the merit of the proposed hack but my proposed
> patch to fix the coredump truncation issue when a process using
> io_uring core dumps that I submitted back in August is still
> unreviewed!

That's unfortunate. Not like I can help in any case, but I assumed
it was dealt with by

commit 06af8679449d4ed282df13191fc52d5ba28ec536
Author: Eric W. Biederman <[email protected]>
Date: Thu Jun 10 15:11:11 2021 -0500

coredump: Limit what can interrupt coredumps

Olivier Langlois has been struggling with coredumps being incompletely written in
processes using io_uring.
...

> https://lore.kernel.org/lkml/1625bc89782bf83d9d8c7c63e8ffcb651ccb15fa.1629655338.git.olivier@trillion01.com/
>
> I have been using it since then I must have generated many dozens of
> perfect core dump files with it and I have not seen a single truncated
> core dump files like I used to have prior to the patch.
>
> I am bringing back my patch to your attention because one nice side
> effect of it is that it would have avoided totally the problem that you
> have encountered in coredump_wait() since it does cancel io_uring
> resources before calling coredump_wait()!

FWIW, I worked it around in io_uring back then by breaking the
dependency.

--
Pavel Begunkov

2021-12-24 19:53:03

by Eric W. Biederman

[permalink] [raw]

Subject: Re: [RFC] coredump: Do not interrupt dump for TIF_NOTIFY_SIGNAL

Pavel Begunkov <[email protected]> writes:

> On 12/24/21 01:34, Olivier Langlois wrote:
>> On Fri, 2021-10-22 at 15:13 +0100, Pavel Begunkov wrote:
>>> On 6/9/21 21:17, Eric W. Biederman wrote:
>>> In short, a task creates an io_uring worker thread, then the worker
>>> submits a task_work item to the creator task and won't die until
>>> the item is executed/cancelled. And I found that the creator task is
>>> sleeping in do_coredump() -> wait_for_completion()
>>>
> [...]
>>> A hack executing tws there helps (see diff below).
>>> Any chance anyone knows what this is and how to fix it?
>>>
> [...]
>> Pavel,
>>
>> I cannot comment on the merit of the proposed hack but my proposed
>> patch to fix the coredump truncation issue when a process using
>> io_uring core dumps that I submitted back in August is still
>> unreviewed!
>
> That's unfortunate. Not like I can help in any case, but I assumed
> it was dealt with by
>
> commit 06af8679449d4ed282df13191fc52d5ba28ec536
> Author: Eric W. Biederman <[email protected]>
> Date: Thu Jun 10 15:11:11 2021 -0500
>
> coredump: Limit what can interrupt coredumps
> Olivier Langlois has been struggling with coredumps being incompletely
> written in
> processes using io_uring.
> ...

I thought it had been too.

>> https://lore.kernel.org/lkml/1625bc89782bf83d9d8c7c63e8ffcb651ccb15fa.1629655338.git.olivier@trillion01.com/
>>
>> I have been using it since then I must have generated many dozens of
>> perfect core dump files with it and I have not seen a single truncated
>> core dump files like I used to have prior to the patch.
>>
>> I am bringing back my patch to your attention because one nice side
>> effect of it is that it would have avoided totally the problem that you
>> have encountered in coredump_wait() since it does cancel io_uring
>> resources before calling coredump_wait()!
>
> FWIW, I worked it around in io_uring back then by breaking the
> dependency.

I am in the middle of untangling the dependencies between ptrace,
coredump, signal handling and maybe a few related things.

Do folks have a reproducer I can look at? Pavel especially if you have
something that reproduces on the current kernels.

As part of that I am in the process of guaranteeing all of the coredump
work happens in get_signal so nothing of io_uring or any cleanup
anywhere else runs until the coredump completes.

I haven't quite posted the code for review because it's the holidays.
But I am aiming at v5.17 or possibly v5.18, as the code is just about
ready.

Eric

2021-12-28 11:26:36

by Pavel Begunkov

[permalink] [raw]

Subject: Re: [RFC] coredump: Do not interrupt dump for TIF_NOTIFY_SIGNAL

On 12/24/21 19:52, Eric W. Biederman wrote:
> Pavel Begunkov <[email protected]> writes:
[...]
>> FWIW, I worked it around in io_uring back then by breaking the
>> dependency.
>
> I am in the middle of untangling the dependencies between ptrace,
> coredump, signal handling and maybe a few related things.

Sounds great

> Do folks have a reproducer I can look at? Pavel especially if you have
> something that reproduces on the current kernels.

A syz reproducer was triggering it reliably, I'd try to revert the
commit below and test:
https://syzkaller.appspot.com/text?tag=ReproC&x=15d3600cb00000

It should hung a task. Syzbot report for reference:
https://syzkaller.appspot.com/bug?extid=27d62ee6f256b186883e

commit 1d5f5ea7cb7d15b9fb1cc82673ebb054f02cd7d2
Author: Pavel Begunkov <[email protected]>
Date: Fri Oct 29 13:11:33 2021 +0100

io-wq: remove worker to owner tw dependency

INFO: task iou-wrk-6609:6612 blocked for more than 143 seconds.
Not tainted 5.15.0-rc5-syzkaller #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:iou-wrk-6609 state:D stack:27944 pid: 6612 ppid: 6526 flags:0x00004006
Call Trace:
context_switch kernel/sched/core.c:4940 [inline]
__schedule+0xb44/0x5960 kernel/sched/core.c:6287
schedule+0xd3/0x270 kernel/sched/core.c:6366
schedule_timeout+0x1db/0x2a0 kernel/time/timer.c:1857
do_wait_for_common kernel/sched/completion.c:85 [inline]
__wait_for_common kernel/sched/completion.c:106 [inline]
wait_for_common kernel/sched/completion.c:117 [inline]
wait_for_completion+0x176/0x280 kernel/sched/completion.c:138
io_worker_exit fs/io-wq.c:183 [inline]
io_wqe_worker+0x66d/0xc40 fs/io-wq.c:597
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
...

> As part of that I am in the process of guaranteeing all of the coredump
> work happens in get_signal so nothing of io_uring or any cleanup
> anywhere else runs until the coredump completes.
>
> I haven't quite posted the code for review because it's the holidays.
> But I am aiming at v5.17 or possibly v5.18, as the code is just about
> ready.

--
Pavel Begunkov

2022-01-05 19:39:16

by Olivier Langlois

[permalink] [raw]

Subject: Re: [RFC] coredump: Do not interrupt dump for TIF_NOTIFY_SIGNAL

On Fri, 2021-12-24 at 10:37 +0000, Pavel Begunkov wrote:
> On 12/24/21 01:34, Olivier Langlois wrote:
> > On Fri, 2021-10-22 at 15:13 +0100, Pavel Begunkov wrote:
> > > On 6/9/21 21:17, Eric W. Biederman wrote:
> > > In short, a task creates an io_uring worker thread, then the
> > > worker
> > > submits a task_work item to the creator task and won't die until
> > > the item is executed/cancelled. And I found that the creator task
> > > is
> > > sleeping in do_coredump() -> wait_for_completion()
> > >
> [...]
> > > A hack executing tws there helps (see diff below).
> > > Any chance anyone knows what this is and how to fix it?
> > >
> [...]
> > Pavel,
> >
> > I cannot comment on the merit of the proposed hack but my proposed
> > patch to fix the coredump truncation issue when a process using
> > io_uring core dumps that I submitted back in August is still
> > unreviewed!
>
> That's unfortunate. Not like I can help in any case, but I assumed
> it was dealt with by
>
> commit 06af8679449d4ed282df13191fc52d5ba28ec536
> Author: Eric W. Biederman <[email protected]>
> Date:?? Thu Jun 10 15:11:11 2021 -0500
>
> ???? coredump: Limit what can interrupt coredumps
> ????
> ???? Olivier Langlois has been struggling with coredumps being
> incompletely written in
> ???? processes using io_uring.
> ???? ...
>
It was a partial fix only. Oleg Nesterov pointed out that the fix was
not good if if the core dump was written into a pipe.

https://lore.kernel.org/io-uring/[email protected]/

> > https://lore.kernel.org/lkml/1625bc89782bf83d9d8c7c63e8ffcb651ccb15
> > [email protected]/
> >
> > I have been using it since then I must have generated many dozens
> > of
> > perfect core dump files with it and I have not seen a single
> > truncated
> > core dump files like I used to have prior to the patch.
> >
> > I am bringing back my patch to your attention because one nice side
> > effect of it is that it would have avoided totally the problem that
> > you
> > have encountered in coredump_wait() since it does cancel io_uring
> > resources before calling coredump_wait()!
>
> FWIW, I worked it around in io_uring back then by breaking the
> dependency.
>
Yes I have seen your work to fix the problem.

It just seems to me that there is no good reason to keep io_uring
process requests once you are generating a core dump and simply
cancelling io_uring before generating the core dump would have avoided
the problem that you have encountered plus any other similar issues yet
to come...

Greetings,

2022-03-17 05:01:24

by Eric W. Biederman

[permalink] [raw]

Subject: Re: [RFC] coredump: Do not interrupt dump for TIF_NOTIFY_SIGNAL

Pavel Begunkov <[email protected]> writes:

> On 12/24/21 19:52, Eric W. Biederman wrote:
>> Pavel Begunkov <[email protected]> writes:
> [...]
>>> FWIW, I worked it around in io_uring back then by breaking the
>>> dependency.
>> I am in the middle of untangling the dependencies between ptrace,
>> coredump, signal handling and maybe a few related things.
>
> Sounds great
>
>> Do folks have a reproducer I can look at? Pavel especially if you have
>> something that reproduces on the current kernels.
>
> A syz reproducer was triggering it reliably, I'd try to revert the
> commit below and test:
> https://syzkaller.appspot.com/text?tag=ReproC&x=15d3600cb00000
>
> It should hung a task. Syzbot report for reference:
> https://syzkaller.appspot.com/bug?extid=27d62ee6f256b186883e
>
>
> commit 1d5f5ea7cb7d15b9fb1cc82673ebb054f02cd7d2
> Author: Pavel Begunkov <[email protected]>
> Date: Fri Oct 29 13:11:33 2021 +0100
>
> io-wq: remove worker to owner tw dependency
>
> INFO: task iou-wrk-6609:6612 blocked for more than 143 seconds.
> Not tainted 5.15.0-rc5-syzkaller #0
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> task:iou-wrk-6609 state:D stack:27944 pid: 6612 ppid: 6526 flags:0x00004006
> Call Trace:
> context_switch kernel/sched/core.c:4940 [inline]
> __schedule+0xb44/0x5960 kernel/sched/core.c:6287
> schedule+0xd3/0x270 kernel/sched/core.c:6366
> schedule_timeout+0x1db/0x2a0 kernel/time/timer.c:1857
> do_wait_for_common kernel/sched/completion.c:85 [inline]
> __wait_for_common kernel/sched/completion.c:106 [inline]
> wait_for_common kernel/sched/completion.c:117 [inline]
> wait_for_completion+0x176/0x280 kernel/sched/completion.c:138
> io_worker_exit fs/io-wq.c:183 [inline]
> io_wqe_worker+0x66d/0xc40 fs/io-wq.c:597
> ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
> ...

Thank you very much for this. There were some bugs elsewhere I had to
deal with so I am slower looking at this part of the code than I was
expecting.

I have now reproduced this with the commit reverted on current kernels
and the repro.c from the syzcaller report. I am starting to look into
how this interacts with my planned code changes in this area.

In combination with my other planned changes I think all that needs to
happen in do_coredump is to clear TIF_NOTIFY_SIGNAL along with
TIF_SIGPENDING to prevent io_uring interaction problems. But we will
see.

The deadlock you demonstrate here shows that it is definitely not enough
to clear TIF_NOTIFY_SIGNAL (without other changes) so that
signal_pending returns false, which I was hoping was be the case.

Eric