2021-05-27 16:23:00

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH] io_uring: handle signals before letting io-worker exit

On 5/26/21 12:21 PM, Olivier Langlois wrote:
> This is required for proper core dump generation.
>
> Because signals are normally serviced before resuming userspace and an
> io_worker thread will never resume userspace, it needs to explicitly
> call the signal servicing functions.
>
> Also, notice that it is possible to exit from the io_wqe_worker()
> function main loop while having a pending signal such as when
> the IO_WQ_BIT_EXIT bit is set.
>
> It is crucial to service any pending signal before calling do_exit()
> Proper coredump generation is relying on PF_SIGNALED to be set.
>
> More specifically, exit_mm() is using this flag to wait for the
> core dump completion before releasing its memory descriptor.
>
> Signed-off-by: Olivier Langlois <[email protected]>
> ---
> fs/io-wq.c | 22 ++++++++++++++++++++--
> 1 file changed, 20 insertions(+), 2 deletions(-)
>
> diff --git a/fs/io-wq.c b/fs/io-wq.c
> index 5361a9b4b47b..b76c61e9aff2 100644
> --- a/fs/io-wq.c
> +++ b/fs/io-wq.c
> @@ -9,8 +9,6 @@
> #include <linux/init.h>
> #include <linux/errno.h>
> #include <linux/sched/signal.h>
> -#include <linux/mm.h>
> -#include <linux/sched/mm.h>
> #include <linux/percpu.h>
> #include <linux/slab.h>
> #include <linux/rculist_nulls.h>
> @@ -193,6 +191,26 @@ static void io_worker_exit(struct io_worker *worker)
>
> kfree_rcu(worker, rcu);
> io_worker_ref_put(wqe->wq);
> + /*
> + * Because signals are normally serviced before resuming userspace and an
> + * io_worker thread will never resume userspace, it needs to explicitly
> + * call the signal servicing functions.
> + *
> + * Also notice that it is possible to exit from the io_wqe_worker()
> + * function main loop while having a pending signal such as when
> + * the IO_WQ_BIT_EXIT bit is set.
> + *
> + * It is crucial to service any pending signal before calling do_exit()
> + * Proper coredump generation is relying on PF_SIGNALED to be set.
> + *
> + * More specifically, exit_mm() is using this flag to wait for the
> + * core dump completion before releasing its memory descriptor.
> + */
> + if (signal_pending(current)) {
> + struct ksignal ksig;
> +
> + get_signal(&ksig);
> + }
> do_exit(0);
> }

Do we need the same thing in fs/io_uring.c:io_sq_thread()?

--
Jens Axboe


2021-05-27 17:32:09

by Olivier Langlois

[permalink] [raw]
Subject: Re: [PATCH] io_uring: handle signals before letting io-worker exit

On Thu, 2021-05-27 at 07:46 -0600, Jens Axboe wrote:
> On 5/26/21 12:21 PM, Olivier Langlois wrote:
> > This is required for proper core dump generation.
> >
> > Because signals are normally serviced before resuming userspace and
> > an
> > io_worker thread will never resume userspace, it needs to
> > explicitly
> > call the signal servicing functions.
> >
> > Also, notice that it is possible to exit from the io_wqe_worker()
> > function main loop while having a pending signal such as when
> > the IO_WQ_BIT_EXIT bit is set.
> >
> > It is crucial to service any pending signal before calling
> > do_exit()
> > Proper coredump generation is relying on PF_SIGNALED to be set.
> >
> > More specifically, exit_mm() is using this flag to wait for the
> > core dump completion before releasing its memory descriptor.
> >
> > Signed-off-by: Olivier Langlois <[email protected]>
> > ---
> > ?fs/io-wq.c | 22 ++++++++++++++++++++--
> > ?1 file changed, 20 insertions(+), 2 deletions(-)
> >
> > diff --git a/fs/io-wq.c b/fs/io-wq.c
> > index 5361a9b4b47b..b76c61e9aff2 100644
> > --- a/fs/io-wq.c
> > +++ b/fs/io-wq.c
> > @@ -9,8 +9,6 @@
> > ?#include <linux/init.h>
> > ?#include <linux/errno.h>
> > ?#include <linux/sched/signal.h>
> > -#include <linux/mm.h>
> > -#include <linux/sched/mm.h>
> > ?#include <linux/percpu.h>
> > ?#include <linux/slab.h>
> > ?#include <linux/rculist_nulls.h>
> > @@ -193,6 +191,26 @@ static void io_worker_exit(struct io_worker
> > *worker)
> > ?
> > ????????kfree_rcu(worker, rcu);
> > ????????io_worker_ref_put(wqe->wq);
> > +???????/*
> > +??????? * Because signals are normally serviced before resuming
> > userspace and an
> > +??????? * io_worker thread will never resume userspace, it needs
> > to explicitly
> > +??????? * call the signal servicing functions.
> > +??????? *
> > +??????? * Also notice that it is possible to exit from the
> > io_wqe_worker()
> > +??????? * function main loop while having a pending signal such as
> > when
> > +??????? * the IO_WQ_BIT_EXIT bit is set.
> > +??????? *
> > +??????? * It is crucial to service any pending signal before
> > calling do_exit()
> > +??????? * Proper coredump generation is relying on PF_SIGNALED to
> > be set.
> > +??????? *
> > +??????? * More specifically, exit_mm() is using this flag to wait
> > for the
> > +??????? * core dump completion before releasing its memory
> > descriptor.
> > +??????? */
> > +???????if (signal_pending(current)) {
> > +???????????????struct ksignal ksig;
> > +
> > +???????????????get_signal(&ksig);
> > +???????}
> > ????????do_exit(0);
> > ?}
>
> Do we need the same thing in fs/io_uring.c:io_sq_thread()?
>
Jens,

You are 100% correct. In fact, this is the same problem for ALL
currently existing and future io threads. Therefore, I start to think
that the right place for the fix might be straight into do_exit()...


2021-05-27 17:32:53

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH] io_uring: handle signals before letting io-worker exit

On 5/27/21 9:21 AM, Olivier Langlois wrote:
> On Thu, 2021-05-27 at 07:46 -0600, Jens Axboe wrote:
>> On 5/26/21 12:21 PM, Olivier Langlois wrote:
>>> This is required for proper core dump generation.
>>>
>>> Because signals are normally serviced before resuming userspace and
>>> an
>>> io_worker thread will never resume userspace, it needs to
>>> explicitly
>>> call the signal servicing functions.
>>>
>>> Also, notice that it is possible to exit from the io_wqe_worker()
>>> function main loop while having a pending signal such as when
>>> the IO_WQ_BIT_EXIT bit is set.
>>>
>>> It is crucial to service any pending signal before calling
>>> do_exit()
>>> Proper coredump generation is relying on PF_SIGNALED to be set.
>>>
>>> More specifically, exit_mm() is using this flag to wait for the
>>> core dump completion before releasing its memory descriptor.
>>>
>>> Signed-off-by: Olivier Langlois <[email protected]>
>>> ---
>>> fs/io-wq.c | 22 ++++++++++++++++++++--
>>> 1 file changed, 20 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/fs/io-wq.c b/fs/io-wq.c
>>> index 5361a9b4b47b..b76c61e9aff2 100644
>>> --- a/fs/io-wq.c
>>> +++ b/fs/io-wq.c
>>> @@ -9,8 +9,6 @@
>>> #include <linux/init.h>
>>> #include <linux/errno.h>
>>> #include <linux/sched/signal.h>
>>> -#include <linux/mm.h>
>>> -#include <linux/sched/mm.h>
>>> #include <linux/percpu.h>
>>> #include <linux/slab.h>
>>> #include <linux/rculist_nulls.h>
>>> @@ -193,6 +191,26 @@ static void io_worker_exit(struct io_worker
>>> *worker)
>>>
>>> kfree_rcu(worker, rcu);
>>> io_worker_ref_put(wqe->wq);
>>> + /*
>>> + * Because signals are normally serviced before resuming
>>> userspace and an
>>> + * io_worker thread will never resume userspace, it needs
>>> to explicitly
>>> + * call the signal servicing functions.
>>> + *
>>> + * Also notice that it is possible to exit from the
>>> io_wqe_worker()
>>> + * function main loop while having a pending signal such as
>>> when
>>> + * the IO_WQ_BIT_EXIT bit is set.
>>> + *
>>> + * It is crucial to service any pending signal before
>>> calling do_exit()
>>> + * Proper coredump generation is relying on PF_SIGNALED to
>>> be set.
>>> + *
>>> + * More specifically, exit_mm() is using this flag to wait
>>> for the
>>> + * core dump completion before releasing its memory
>>> descriptor.
>>> + */
>>> + if (signal_pending(current)) {
>>> + struct ksignal ksig;
>>> +
>>> + get_signal(&ksig);
>>> + }
>>> do_exit(0);
>>> }
>>
>> Do we need the same thing in fs/io_uring.c:io_sq_thread()?
>>
> Jens,
>
> You are 100% correct. In fact, this is the same problem for ALL
> currently existing and future io threads. Therefore, I start to think
> that the right place for the fix might be straight into do_exit()...

That is what I was getting at. To avoid poluting do_exit() with it, I
think it'd be best to add an io_thread_exit() that simply does:

void io_thread_exit(void)
{
if (signal_pending(current)) {
struct ksignal ksig;
get_signal(&ksig);
}
do_exit(0);
}

and convert the do_exit() calls in io_uring/io-wq to io_thread_exit()
instead.

--
Jens Axboe

2021-05-27 18:11:31

by Olivier Langlois

[permalink] [raw]
Subject: Re: [PATCH] io_uring: handle signals before letting io-worker exit

On Thu, 2021-05-27 at 09:30 -0600, Jens Axboe wrote:
> > >
> > Jens,
> >
> > You are 100% correct. In fact, this is the same problem for ALL
> > currently existing and future io threads. Therefore, I start to
> > think
> > that the right place for the fix might be straight into
> > do_exit()...
>
> That is what I was getting at. To avoid poluting do_exit() with it, I
> think it'd be best to add an io_thread_exit() that simply does:
>
> void io_thread_exit(void)
> {
> ????????if (signal_pending(current)) {
> ????????????????struct ksignal ksig;
> ????????????????get_signal(&ksig);
> ????????}
> ????????do_exit(0);
> }
>
> and convert the do_exit() calls in io_uring/io-wq to io_thread_exit()
> instead.
>
IMHO, that would be an acceptable compromise because it does fix my
problem. However, I am of the opinion that it wouldn't be poluting
do_exit() and would in fact be the right place to do it considering
that create_io_thread() is in kernel and theoritically, anyone can call
it to create an io_thread and would be susceptible to get bitten by the
exact same problem and would have to come up with a similar solution if
it is not addressed directly by the kernel.

Also, since I have submitted the patch, I have made the following
realization:

I got bitten by the problem because of a race condition between the io-
mgr thread and its io-wrks threads for processing their pending SIGKILL
and the proposed patch does correct my problem.

The issue would have most likely been buried by 5.13 io-mgr removal...

BUT, even the proposed patch isn't 100% perfect. AFAIK, it is still
possible, but very unlikely, to get a signal between calling
signal_pending() and do_exit().

It might be possible to implement the solution and be 100% correct all
the time by doing it inside do_exit()... I am currently eyeing
exit_signals() as a potential good site for the patch...