2022-04-22 19:28:17

by Jens Axboe

[permalink] [raw]
Subject: [PATCH] task_work: allow TWA_SIGNAL without a rescheduling IPI

Some use cases don't always need an IPI when sending a TWA_SIGNAL
notification. Add TWA_SIGNAL_NO_IPI, which is just like TWA_SIGNAL,
except it doesn't send an IPI to the target task. It merely sets
TIF_NOTIFY_SIGNAL and wakes up the task.

Signed-off-by: Jens Axboe <[email protected]>

---

This is a prep patch for an io_uring change where we don't need the IPI,
and skipping it can reduce rescheduling/IPI rate by tens to hundreds of
thousands per second.

diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h
index 3c8b34876744..66b689f6cfcb 100644
--- a/include/linux/sched/signal.h
+++ b/include/linux/sched/signal.h
@@ -355,14 +355,23 @@ static inline void clear_notify_signal(void)
smp_mb__after_atomic();
}

+/*
+ * Returns 'true' if kick_process() is needed to force a transition from
+ * user -> kernel to guarantee expedient run of TWA_SIGNAL based task_work.
+ */
+static inline bool __set_notify_signal(struct task_struct *task)
+{
+ return !test_and_set_tsk_thread_flag(task, TIF_NOTIFY_SIGNAL) &&
+ !wake_up_state(task, TASK_INTERRUPTIBLE);
+}
+
/*
* Called to break out of interruptible wait loops, and enter the
* exit_to_user_mode_loop().
*/
static inline void set_notify_signal(struct task_struct *task)
{
- if (!test_and_set_tsk_thread_flag(task, TIF_NOTIFY_SIGNAL) &&
- !wake_up_state(task, TASK_INTERRUPTIBLE))
+ if (__set_notify_signal(task))
kick_process(task);
}

diff --git a/include/linux/task_work.h b/include/linux/task_work.h
index 897494b597ba..795ef5a68429 100644
--- a/include/linux/task_work.h
+++ b/include/linux/task_work.h
@@ -17,6 +17,7 @@ enum task_work_notify_mode {
TWA_NONE,
TWA_RESUME,
TWA_SIGNAL,
+ TWA_SIGNAL_NO_IPI,
};

static inline bool task_work_pending(struct task_struct *task)
diff --git a/kernel/task_work.c b/kernel/task_work.c
index c59e1a49bc40..fa8fdd04aa17 100644
--- a/kernel/task_work.c
+++ b/kernel/task_work.c
@@ -13,11 +13,13 @@ static struct callback_head work_exited; /* all we need is ->next == NULL */
*
* Queue @work for task_work_run() below and notify the @task if @notify
* is @TWA_RESUME or @TWA_SIGNAL. @TWA_SIGNAL works like signals, in that the
- * it will interrupt the targeted task and run the task_work. @TWA_RESUME
- * work is run only when the task exits the kernel and returns to user mode,
- * or before entering guest mode. Fails if the @task is exiting/exited and thus
- * it can't process this @work. Otherwise @work->func() will be called when the
- * @task goes through one of the aforementioned transitions, or exits.
+ * it will interrupt the targeted task and run the task_work. @TWA_SIGNAL_NO_IPI
+ * works like @TWA_SIGNAL, except it doesn't send a reschedule IPI to force the
+ * targeted task to reschedule and run task_work. @TWA_RESUME work is run only
+ * when the task exits the kernel and returns to user mode, or before entering
+ * guest mode. Fails if the @task is exiting/exited and thus it can't process
+ * this @work. Otherwise @work->func() will be called when the @task goes
+ * through one of the aforementioned transitions, or exits.
*
* If the targeted task is exiting, then an error is returned and the work item
* is not queued. It's up to the caller to arrange for an alternative mechanism
@@ -53,6 +55,9 @@ int task_work_add(struct task_struct *task, struct callback_head *work,
case TWA_SIGNAL:
set_notify_signal(task);
break;
+ case TWA_SIGNAL_NO_IPI:
+ __set_notify_signal(task);
+ break;
default:
WARN_ON_ONCE(1);
break;

--
Jens Axboe


2022-04-26 08:13:34

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH] task_work: allow TWA_SIGNAL without a rescheduling IPI

On 4/22/22 8:34 AM, Jens Axboe wrote:
> Some use cases don't always need an IPI when sending a TWA_SIGNAL
> notification. Add TWA_SIGNAL_NO_IPI, which is just like TWA_SIGNAL,
> except it doesn't send an IPI to the target task. It merely sets
> TIF_NOTIFY_SIGNAL and wakes up the task.

Adding Peter and Thomas.

>
> Signed-off-by: Jens Axboe <[email protected]>
>
> ---
>
> This is a prep patch for an io_uring change where we don't need the IPI,
> and skipping it can reduce rescheduling/IPI rate by tens to hundreds of
> thousands per second.
>
> diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h
> index 3c8b34876744..66b689f6cfcb 100644
> --- a/include/linux/sched/signal.h
> +++ b/include/linux/sched/signal.h
> @@ -355,14 +355,23 @@ static inline void clear_notify_signal(void)
> smp_mb__after_atomic();
> }
>
> +/*
> + * Returns 'true' if kick_process() is needed to force a transition from
> + * user -> kernel to guarantee expedient run of TWA_SIGNAL based task_work.
> + */
> +static inline bool __set_notify_signal(struct task_struct *task)
> +{
> + return !test_and_set_tsk_thread_flag(task, TIF_NOTIFY_SIGNAL) &&
> + !wake_up_state(task, TASK_INTERRUPTIBLE);
> +}
> +
> /*
> * Called to break out of interruptible wait loops, and enter the
> * exit_to_user_mode_loop().
> */
> static inline void set_notify_signal(struct task_struct *task)
> {
> - if (!test_and_set_tsk_thread_flag(task, TIF_NOTIFY_SIGNAL) &&
> - !wake_up_state(task, TASK_INTERRUPTIBLE))
> + if (__set_notify_signal(task))
> kick_process(task);
> }
>
> diff --git a/include/linux/task_work.h b/include/linux/task_work.h
> index 897494b597ba..795ef5a68429 100644
> --- a/include/linux/task_work.h
> +++ b/include/linux/task_work.h
> @@ -17,6 +17,7 @@ enum task_work_notify_mode {
> TWA_NONE,
> TWA_RESUME,
> TWA_SIGNAL,
> + TWA_SIGNAL_NO_IPI,
> };
>
> static inline bool task_work_pending(struct task_struct *task)
> diff --git a/kernel/task_work.c b/kernel/task_work.c
> index c59e1a49bc40..fa8fdd04aa17 100644
> --- a/kernel/task_work.c
> +++ b/kernel/task_work.c
> @@ -13,11 +13,13 @@ static struct callback_head work_exited; /* all we need is ->next == NULL */
> *
> * Queue @work for task_work_run() below and notify the @task if @notify
> * is @TWA_RESUME or @TWA_SIGNAL. @TWA_SIGNAL works like signals, in that the
> - * it will interrupt the targeted task and run the task_work. @TWA_RESUME
> - * work is run only when the task exits the kernel and returns to user mode,
> - * or before entering guest mode. Fails if the @task is exiting/exited and thus
> - * it can't process this @work. Otherwise @work->func() will be called when the
> - * @task goes through one of the aforementioned transitions, or exits.
> + * it will interrupt the targeted task and run the task_work. @TWA_SIGNAL_NO_IPI
> + * works like @TWA_SIGNAL, except it doesn't send a reschedule IPI to force the
> + * targeted task to reschedule and run task_work. @TWA_RESUME work is run only
> + * when the task exits the kernel and returns to user mode, or before entering
> + * guest mode. Fails if the @task is exiting/exited and thus it can't process
> + * this @work. Otherwise @work->func() will be called when the @task goes
> + * through one of the aforementioned transitions, or exits.
> *
> * If the targeted task is exiting, then an error is returned and the work item
> * is not queued. It's up to the caller to arrange for an alternative mechanism
> @@ -53,6 +55,9 @@ int task_work_add(struct task_struct *task, struct callback_head *work,
> case TWA_SIGNAL:
> set_notify_signal(task);
> break;
> + case TWA_SIGNAL_NO_IPI:
> + __set_notify_signal(task);
> + break;
> default:
> WARN_ON_ONCE(1);
> break;
>


--
Jens Axboe

2022-04-28 18:18:56

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH] task_work: allow TWA_SIGNAL without a rescheduling IPI

On 4/28/22 3:08 AM, Peter Zijlstra wrote:
> On Mon, Apr 25, 2022 at 07:52:31PM -0600, Jens Axboe wrote:
>> On 4/22/22 8:34 AM, Jens Axboe wrote:
>>> Some use cases don't always need an IPI when sending a TWA_SIGNAL
>>> notification. Add TWA_SIGNAL_NO_IPI, which is just like TWA_SIGNAL,
>>> except it doesn't send an IPI to the target task. It merely sets
>>> TIF_NOTIFY_SIGNAL and wakes up the task.
>
> Could you perphaps elaborate on those use-cases? How do they guarantee
> the task_work is ran before userspace?

The task is still marked as having task_work, so there should be no
differences in how it's run before returning to userspace. That would
not have delivered an IPI before, if it was in the kernel.

The difference would be in the task currently running in userspace, and
whether we force a reschedule to ensure the task_work gets run now.
Without the forced reschedule, running of the task_work (from io_uring)
becomes more cooperative - it'll happen when the task transitions to the
kernel anyway (eg to wait for events).

--
Jens Axboe

2022-04-28 23:48:19

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH] task_work: allow TWA_SIGNAL without a rescheduling IPI

On Thu, Apr 28 2022 at 06:21, Jens Axboe wrote:
> On 4/28/22 3:08 AM, Peter Zijlstra wrote:
>> On Mon, Apr 25, 2022 at 07:52:31PM -0600, Jens Axboe wrote:
>>> On 4/22/22 8:34 AM, Jens Axboe wrote:
>>>> Some use cases don't always need an IPI when sending a TWA_SIGNAL
>>>> notification. Add TWA_SIGNAL_NO_IPI, which is just like TWA_SIGNAL,
>>>> except it doesn't send an IPI to the target task. It merely sets
>>>> TIF_NOTIFY_SIGNAL and wakes up the task.
>>
>> Could you perphaps elaborate on those use-cases? How do they guarantee
>> the task_work is ran before userspace?
>
> The task is still marked as having task_work, so there should be no
> differences in how it's run before returning to userspace. That would
> not have delivered an IPI before, if it was in the kernel.
>
> The difference would be in the task currently running in userspace, and
> whether we force a reschedule to ensure the task_work gets run now.
> Without the forced reschedule, running of the task_work (from io_uring)
> becomes more cooperative - it'll happen when the task transitions to the
> kernel anyway (eg to wait for events).

I can see why you want that, but that needs to be part of the change log
and it also needs a comprehensive comment for TWA_SIGNAL_NO_IPI.

@TWA_SIGNAL_NO_IPI works like @TWA_SIGNAL.... does not really explain
much.

Thanks,

tglx

2022-04-29 14:29:12

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] task_work: allow TWA_SIGNAL without a rescheduling IPI

On Mon, Apr 25, 2022 at 07:52:31PM -0600, Jens Axboe wrote:
> On 4/22/22 8:34 AM, Jens Axboe wrote:
> > Some use cases don't always need an IPI when sending a TWA_SIGNAL
> > notification. Add TWA_SIGNAL_NO_IPI, which is just like TWA_SIGNAL,
> > except it doesn't send an IPI to the target task. It merely sets
> > TIF_NOTIFY_SIGNAL and wakes up the task.

Could you perphaps elaborate on those use-cases? How do they guarantee
the task_work is ran before userspace?

2022-05-01 12:29:50

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH] task_work: allow TWA_SIGNAL without a rescheduling IPI

On 4/28/22 4:28 PM, Thomas Gleixner wrote:
> On Thu, Apr 28 2022 at 06:21, Jens Axboe wrote:
>> On 4/28/22 3:08 AM, Peter Zijlstra wrote:
>>> On Mon, Apr 25, 2022 at 07:52:31PM -0600, Jens Axboe wrote:
>>>> On 4/22/22 8:34 AM, Jens Axboe wrote:
>>>>> Some use cases don't always need an IPI when sending a TWA_SIGNAL
>>>>> notification. Add TWA_SIGNAL_NO_IPI, which is just like TWA_SIGNAL,
>>>>> except it doesn't send an IPI to the target task. It merely sets
>>>>> TIF_NOTIFY_SIGNAL and wakes up the task.
>>>
>>> Could you perphaps elaborate on those use-cases? How do they guarantee
>>> the task_work is ran before userspace?
>>
>> The task is still marked as having task_work, so there should be no
>> differences in how it's run before returning to userspace. That would
>> not have delivered an IPI before, if it was in the kernel.
>>
>> The difference would be in the task currently running in userspace, and
>> whether we force a reschedule to ensure the task_work gets run now.
>> Without the forced reschedule, running of the task_work (from io_uring)
>> becomes more cooperative - it'll happen when the task transitions to the
>> kernel anyway (eg to wait for events).
>
> I can see why you want that, but that needs to be part of the change log
> and it also needs a comprehensive comment for TWA_SIGNAL_NO_IPI.
>
> @TWA_SIGNAL_NO_IPI works like @TWA_SIGNAL.... does not really explain
> much.

Agree, it should be better. I'll send out a new one with an improved
commit message and also with a better comment in the code for the NO_IPI
variant.

Thanks!

--
Jens Axboe