2020-10-05 15:07:41

by Jens Axboe

[permalink] [raw]
Subject: [PATCHSET RFC v3 0/6] Add support for TIF_NOTIFY_SIGNAL

Hi,

The goal is this patch series is to decouple TWA_SIGNAL based task_work
from real signals and signal delivery. The motivation is speeding up
TWA_SIGNAL based task_work, particularly for threaded setups where
->sighand is shared across threads.

Patch 1 is just a cleanup patchs, since I noticed that notify_resume
handling has some arch redundancy..

Patch 2 provides an abstraction around signal_pending(), in preparation
for allowing TIF_NOTIFY_SIGNAL to break scheduling loops.

Patch 3 just splits system call restart handling from the arch signal
delivery. Only the generic entry code is updated.

Patch 4 adds generic support for TIF_NOTIFY_SIGNAL, calling the
appropriate tracehook_notify_signal() if TIF_NOTIFY_SIGNAL is set.

Patch 5 just defines TIF_NOTIFY_SIGNAL for x86, since the generic code
is now ready for it.

Patch 6 finally allows task_work to use notify_signal to handle
TWA_SIGNAL based task_work.

Changes since v2:
- notify resume cleanup
- split patch series
- improve commit messages and comments
- kvm TIF_NOTIFY_SIGNAL handling

--
Jens Axboe



2020-10-05 15:08:52

by Jens Axboe

[permalink] [raw]
Subject: [PATCH 6/6] task_work: use TIF_NOTIFY_SIGNAL if available

If the arch supports TIF_NOTIFY_SIGNAL, then use that for TWA_SIGNAL as
it's more efficient than using the signal delivery method. This is
especially true on threaded applications, where ->sighand is shared
across threads, but it's also lighter weight on non-shared cases.

io_uring is a heavy consumer of TWA_SIGNAL based task_work. On my test
box, even just using 16 threads shows a nice improvement running an
io_uring based echo server.

stock kernel:
0.01% <= 0.1 milliseconds
95.86% <= 0.2 milliseconds
98.27% <= 0.3 milliseconds
99.71% <= 0.4 milliseconds
100.00% <= 0.5 milliseconds
100.00% <= 0.6 milliseconds
100.00% <= 0.7 milliseconds
100.00% <= 0.8 milliseconds
100.00% <= 0.9 milliseconds
100.00% <= 1.0 milliseconds
100.00% <= 1.1 milliseconds
100.00% <= 2 milliseconds
100.00% <= 3 milliseconds
100.00% <= 3 milliseconds
1378930.00 requests per second
~1600% CPU

1.38M requests/second, and all 16 CPUs are maxed out.

patched kernel:
0.01% <= 0.1 milliseconds
98.24% <= 0.2 milliseconds
99.47% <= 0.3 milliseconds
99.99% <= 0.4 milliseconds
100.00% <= 0.5 milliseconds
100.00% <= 0.6 milliseconds
100.00% <= 0.7 milliseconds
100.00% <= 0.8 milliseconds
100.00% <= 0.9 milliseconds
100.00% <= 1.2 milliseconds
1666111.38 requests per second
~1450% CPU

1.67M requests/second, and we're no longer just hammering on the sighand
lock. The original reporter states:

"For 5.7.15 my benchmark achieves 1.6M qps and system cpu is at ~80%.
for 5.7.16 or later it achieves only 1M qps and the system cpu is is
at ~100%"

with the only difference there being that TWA_SIGNAL is used
unconditionally in 5.7.16, since we need it to be able to solve an
inability to run task_work if the application is waiting in the kernel
already on an event that needs task_work run to be satisfied. Also
see commit 0ba9c9edcd15.

Reported-by: Roman Gershman <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
---
kernel/task_work.c | 41 +++++++++++++++++++++++++++++------------
1 file changed, 29 insertions(+), 12 deletions(-)

diff --git a/kernel/task_work.c b/kernel/task_work.c
index 613b2d634af8..95604e57af46 100644
--- a/kernel/task_work.c
+++ b/kernel/task_work.c
@@ -5,6 +5,34 @@

static struct callback_head work_exited; /* all we need is ->next == NULL */

+/*
+ * TWA_SIGNAL signaling - use TIF_NOTIFY_SIGNAL, if available, as it's faster
+ * than TIF_SIGPENDING as there's no dependency on ->sighand. The latter is
+ * shared for threads, and can cause contention on sighand->lock. Even for
+ * the non-threaded case TIF_NOTIFY_SIGNAL is more efficient, as no locking
+ * or IRQ disabling is involved for notification (or running) purposes.
+ */
+static void task_work_notify_signal(struct task_struct *task)
+{
+#ifdef TIF_NOTIFY_SIGNAL
+ set_notify_signal(task);
+#else
+ unsigned long flags;
+
+ /*
+ * Only grab the sighand lock if we don't already have some
+ * task_work pending. This pairs with the smp_store_mb()
+ * in get_signal(), see comment there.
+ */
+ if (!(READ_ONCE(task->jobctl) & JOBCTL_TASK_WORK) &&
+ lock_task_sighand(task, &flags)) {
+ task->jobctl |= JOBCTL_TASK_WORK;
+ signal_wake_up(task, 0);
+ unlock_task_sighand(task, &flags);
+ }
+#endif
+}
+
/**
* task_work_add - ask the @task to execute @work->func()
* @task: the task which should run the callback
@@ -28,7 +56,6 @@ int
task_work_add(struct task_struct *task, struct callback_head *work, int notify)
{
struct callback_head *head;
- unsigned long flags;

do {
head = READ_ONCE(task->task_works);
@@ -42,17 +69,7 @@ task_work_add(struct task_struct *task, struct callback_head *work, int notify)
set_notify_resume(task);
break;
case TWA_SIGNAL:
- /*
- * Only grab the sighand lock if we don't already have some
- * task_work pending. This pairs with the smp_store_mb()
- * in get_signal(), see comment there.
- */
- if (!(READ_ONCE(task->jobctl) & JOBCTL_TASK_WORK) &&
- lock_task_sighand(task, &flags)) {
- task->jobctl |= JOBCTL_TASK_WORK;
- signal_wake_up(task, 0);
- unlock_task_sighand(task, &flags);
- }
+ task_work_notify_signal(task);
break;
}

--
2.28.0

2020-10-05 15:09:33

by Jens Axboe

[permalink] [raw]
Subject: [PATCH 2/6] kernel: add task_sigpending() helper

This is in preparation for maintaining signal_pending() as the decider
of whether or not a schedule() loop should be broken, or continue
sleeping. This is different than the core signal use cases, where we
really want to know if an actual signal is pending or not.
task_sigpending() returns non-zero if TIF_SIGPENDING is set.

Only core kernel use cases should care about the distinction between
the two, make sure those use the task_sigpending() helper.

Signed-off-by: Jens Axboe <[email protected]>
---
include/linux/sched/signal.h | 13 +++++++++----
kernel/events/uprobes.c | 2 +-
kernel/ptrace.c | 2 +-
kernel/signal.c | 12 ++++++------
4 files changed, 17 insertions(+), 12 deletions(-)

diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h
index 1bad18a1d8ba..e6f34d8fbf4d 100644
--- a/include/linux/sched/signal.h
+++ b/include/linux/sched/signal.h
@@ -353,11 +353,16 @@ static inline int restart_syscall(void)
return -ERESTARTNOINTR;
}

-static inline int signal_pending(struct task_struct *p)
+static inline int task_sigpending(struct task_struct *p)
{
return unlikely(test_tsk_thread_flag(p,TIF_SIGPENDING));
}

+static inline int signal_pending(struct task_struct *p)
+{
+ return task_sigpending(p);
+}
+
static inline int __fatal_signal_pending(struct task_struct *p)
{
return unlikely(sigismember(&p->pending.signal, SIGKILL));
@@ -365,14 +370,14 @@ static inline int __fatal_signal_pending(struct task_struct *p)

static inline int fatal_signal_pending(struct task_struct *p)
{
- return signal_pending(p) && __fatal_signal_pending(p);
+ return task_sigpending(p) && __fatal_signal_pending(p);
}

static inline int signal_pending_state(long state, struct task_struct *p)
{
if (!(state & (TASK_INTERRUPTIBLE | TASK_WAKEKILL)))
return 0;
- if (!signal_pending(p))
+ if (!task_sigpending(p))
return 0;

return (state & TASK_INTERRUPTIBLE) || __fatal_signal_pending(p);
@@ -389,7 +394,7 @@ static inline bool fault_signal_pending(vm_fault_t fault_flags,
{
return unlikely((fault_flags & VM_FAULT_RETRY) &&
(fatal_signal_pending(current) ||
- (user_mode(regs) && signal_pending(current))));
+ (user_mode(regs) && task_sigpending(current))));
}

/*
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 0e18aaf23a7b..8bb26a338e06 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -1973,7 +1973,7 @@ bool uprobe_deny_signal(void)

WARN_ON_ONCE(utask->state != UTASK_SSTEP);

- if (signal_pending(t)) {
+ if (task_sigpending(t)) {
spin_lock_irq(&t->sighand->siglock);
clear_tsk_thread_flag(t, TIF_SIGPENDING);
spin_unlock_irq(&t->sighand->siglock);
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 43d6179508d6..583b8da4c207 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -773,7 +773,7 @@ static int ptrace_peek_siginfo(struct task_struct *child,
data += sizeof(siginfo_t);
i++;

- if (signal_pending(current))
+ if (task_sigpending(current))
break;

cond_resched();
diff --git a/kernel/signal.c b/kernel/signal.c
index a38b3edc6851..d57aafd9116c 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -983,7 +983,7 @@ static inline bool wants_signal(int sig, struct task_struct *p)
if (task_is_stopped_or_traced(p))
return false;

- return task_curr(p) || !signal_pending(p);
+ return task_curr(p) || !task_sigpending(p);
}

static void complete_signal(int sig, struct task_struct *p, enum pid_type type)
@@ -2822,7 +2822,7 @@ static void retarget_shared_pending(struct task_struct *tsk, sigset_t *which)
/* Remove the signals this thread can handle. */
sigandsets(&retarget, &retarget, &t->blocked);

- if (!signal_pending(t))
+ if (!task_sigpending(t))
signal_wake_up(t, 0);

if (sigisemptyset(&retarget))
@@ -2856,7 +2856,7 @@ void exit_signals(struct task_struct *tsk)

cgroup_threadgroup_change_end(tsk);

- if (!signal_pending(tsk))
+ if (!task_sigpending(tsk))
goto out;

unblocked = tsk->blocked;
@@ -2900,7 +2900,7 @@ long do_no_restart_syscall(struct restart_block *param)

static void __set_task_blocked(struct task_struct *tsk, const sigset_t *newset)
{
- if (signal_pending(tsk) && !thread_group_empty(tsk)) {
+ if (task_sigpending(tsk) && !thread_group_empty(tsk)) {
sigset_t newblocked;
/* A set of now blocked but previously unblocked signals. */
sigandnsets(&newblocked, newset, &current->blocked);
@@ -4447,7 +4447,7 @@ SYSCALL_DEFINE0(pause)
__set_current_state(TASK_INTERRUPTIBLE);
schedule();
}
- return -ERESTARTNOHAND;
+ return task_sigpending(current) ? -ERESTARTNOHAND : -ERESTARTSYS;
}

#endif
@@ -4462,7 +4462,7 @@ static int sigsuspend(sigset_t *set)
schedule();
}
set_restore_sigmask();
- return -ERESTARTNOHAND;
+ return task_sigpending(current) ? -ERESTARTNOHAND : -ERESTARTSYS;
}

/**
--
2.28.0

2020-10-08 13:01:19

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 2/6] kernel: add task_sigpending() helper

On 10/05, Jens Axboe wrote:
>
> static inline int signal_pending_state(long state, struct task_struct *p)
> {
> if (!(state & (TASK_INTERRUPTIBLE | TASK_WAKEKILL)))
> return 0;
> - if (!signal_pending(p))
> + if (!task_sigpending(p))
> return 0;

This looks obviously wrong. Say, schedule() in TASK_INTERRUPTIBLE should
not block if TIF_NOTIFY_SIGNAL is set.

With this change set_notify_signal() will not force the task to return
from wait_event_interruptible, mutex_lock_interruptible, etc.

> return (state & TASK_INTERRUPTIBLE) || __fatal_signal_pending(p);
> @@ -389,7 +394,7 @@ static inline bool fault_signal_pending(vm_fault_t fault_flags,
> {
> return unlikely((fault_flags & VM_FAULT_RETRY) &&
> (fatal_signal_pending(current) ||
> - (user_mode(regs) && signal_pending(current))));
> + (user_mode(regs) && task_sigpending(current))));

This looks unnecessary,

> @@ -773,7 +773,7 @@ static int ptrace_peek_siginfo(struct task_struct *child,
> data += sizeof(siginfo_t);
> i++;
>
> - if (signal_pending(current))
> + if (task_sigpending(current))

This too.

IMO, this patch should do s/signal_pending/task_sigpending/ only if it is
strictly needed for correctness.

Oleg.

2020-10-08 13:30:27

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 2/6] kernel: add task_sigpending() helper

On 10/05, Jens Axboe wrote:
>
> @@ -4447,7 +4447,7 @@ SYSCALL_DEFINE0(pause)
> __set_current_state(TASK_INTERRUPTIBLE);
> schedule();
> }
> - return -ERESTARTNOHAND;
> + return task_sigpending(current) ? -ERESTARTNOHAND : -ERESTARTSYS;
> }
>
> #endif
> @@ -4462,7 +4462,7 @@ static int sigsuspend(sigset_t *set)
> schedule();
> }
> set_restore_sigmask();
> - return -ERESTARTNOHAND;
> + return task_sigpending(current) ? -ERESTARTNOHAND : -ERESTARTSYS;
> }

Both changes are equally wrong. Why do you think sigsuspend() should ever
return -ERESTARTSYS ?

If get_signal() deques a signal, handle_signal() will restart this syscall
if ERESTARTSYS, this is wrong.

Oleg.

2020-10-08 13:39:26

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH 2/6] kernel: add task_sigpending() helper

On 10/8/20 6:58 AM, Oleg Nesterov wrote:
> On 10/05, Jens Axboe wrote:
>>
>> static inline int signal_pending_state(long state, struct task_struct *p)
>> {
>> if (!(state & (TASK_INTERRUPTIBLE | TASK_WAKEKILL)))
>> return 0;
>> - if (!signal_pending(p))
>> + if (!task_sigpending(p))
>> return 0;
>
> This looks obviously wrong. Say, schedule() in TASK_INTERRUPTIBLE should
> not block if TIF_NOTIFY_SIGNAL is set.
>
> With this change set_notify_signal() will not force the task to return
> from wait_event_interruptible, mutex_lock_interruptible, etc.

True, not sure why I made the distinction there. I'll fix that one up.

>
>> return (state & TASK_INTERRUPTIBLE) || __fatal_signal_pending(p);
>> @@ -389,7 +394,7 @@ static inline bool fault_signal_pending(vm_fault_t fault_flags,
>> {
>> return unlikely((fault_flags & VM_FAULT_RETRY) &&
>> (fatal_signal_pending(current) ||
>> - (user_mode(regs) && signal_pending(current))));
>> + (user_mode(regs) && task_sigpending(current))));
>
> This looks unnecessary,

Dropped

>> @@ -773,7 +773,7 @@ static int ptrace_peek_siginfo(struct task_struct *child,
>> data += sizeof(siginfo_t);
>> i++;
>>
>> - if (signal_pending(current))
>> + if (task_sigpending(current))
>
> This too.
>
> IMO, this patch should do s/signal_pending/task_sigpending/ only if it is
> strictly needed for correctness.

Agree, I'll kill the ones you identified.

--
Jens Axboe

2020-10-08 13:42:19

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH 2/6] kernel: add task_sigpending() helper

On 10/8/20 7:24 AM, Oleg Nesterov wrote:
> On 10/05, Jens Axboe wrote:
>>
>> @@ -4447,7 +4447,7 @@ SYSCALL_DEFINE0(pause)
>> __set_current_state(TASK_INTERRUPTIBLE);
>> schedule();
>> }
>> - return -ERESTARTNOHAND;
>> + return task_sigpending(current) ? -ERESTARTNOHAND : -ERESTARTSYS;
>> }
>>
>> #endif
>> @@ -4462,7 +4462,7 @@ static int sigsuspend(sigset_t *set)
>> schedule();
>> }
>> set_restore_sigmask();
>> - return -ERESTARTNOHAND;
>> + return task_sigpending(current) ? -ERESTARTNOHAND : -ERESTARTSYS;
>> }
>
> Both changes are equally wrong. Why do you think sigsuspend() should ever
> return -ERESTARTSYS ?
>
> If get_signal() deques a signal, handle_signal() will restart this syscall
> if ERESTARTSYS, this is wrong.

The intent was that if we get woken up and signal_pending() is true, then
we want to restart it if we're just doing TIF_SIGNAL_NOTIFY. But I guess
it can't be 100% reliable, even if TIF_SIGPENDING isn't set at this point,
but it is by the time a signal is attempted dequeued.

I'll drop these too, thanks Oleg.

--
Jens Axboe

2020-10-08 15:04:02

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCHSET RFC v3 0/6] Add support for TIF_NOTIFY_SIGNAL

On 10/8/20 8:56 AM, Oleg Nesterov wrote:
> On 10/05, Jens Axboe wrote:
>>
>> Hi,
>>
>> The goal is this patch series is to decouple TWA_SIGNAL based task_work
>> from real signals and signal delivery.
>
> I think TIF_NOTIFY_SIGNAL can have more users. Say, we can move
> try_to_freeze() from get_signal() to tracehook_notify_signal(), kill
> fake_signal_wake_up(), and remove freezing() from recalc_sigpending().
>
> Probably the same for TIF_PATCH_PENDING, klp_send_signals() can use
> set_notify_signal() rather than signal_wake_up().

Totally agree, which is why I liked your suggestion of turning it into a
tracehook.

I've rebased and collapsed the series with the changes, initial tests
look good here. I'll run it through some more testing and send out a v4.
I really like that it's down to 3 core patches now, instead of 5, and
the last one is just wiring up task_work. The changes you suggested also
means it's a lot easier to wire up new archs, so we could potentially
have full support for TIF_NOTIFY_SIGNAL very quickly and can drop the
JOBCTL etc parts.

I'll work on that next, if we have agreement that v4 is sound. Thanks a
lot for your reviews, Oleg! It might've started out a bit nasty on the
RFC front, but with the current direction, we'll end up deleting a lot
of extra code on top.

--
Jens Axboe

2020-10-08 16:46:09

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCHSET RFC v3 0/6] Add support for TIF_NOTIFY_SIGNAL

On 10/05, Jens Axboe wrote:
>
> Hi,
>
> The goal is this patch series is to decouple TWA_SIGNAL based task_work
> from real signals and signal delivery.

I think TIF_NOTIFY_SIGNAL can have more users. Say, we can move
try_to_freeze() from get_signal() to tracehook_notify_signal(), kill
fake_signal_wake_up(), and remove freezing() from recalc_sigpending().

Probably the same for TIF_PATCH_PENDING, klp_send_signals() can use
set_notify_signal() rather than signal_wake_up().

Oleg.

2020-10-09 08:04:21

by Miroslav Benes

[permalink] [raw]
Subject: Re: [PATCHSET RFC v3 0/6] Add support for TIF_NOTIFY_SIGNAL

On Thu, 8 Oct 2020, Oleg Nesterov wrote:

> On 10/05, Jens Axboe wrote:
> >
> > Hi,
> >
> > The goal is this patch series is to decouple TWA_SIGNAL based task_work
> > from real signals and signal delivery.
>
> I think TIF_NOTIFY_SIGNAL can have more users. Say, we can move
> try_to_freeze() from get_signal() to tracehook_notify_signal(), kill
> fake_signal_wake_up(), and remove freezing() from recalc_sigpending().
>
> Probably the same for TIF_PATCH_PENDING, klp_send_signals() can use
> set_notify_signal() rather than signal_wake_up().

Yes, that was my impression from the patch set too, when I accidentally
noticed it.

Jens, could you CC our live patching ML when you submit v4, please? It
would be a nice cleanup.

Thanks
Miroslav

2020-10-09 15:24:43

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCHSET RFC v3 0/6] Add support for TIF_NOTIFY_SIGNAL

On 10/9/20 2:01 AM, Miroslav Benes wrote:
> On Thu, 8 Oct 2020, Oleg Nesterov wrote:
>
>> On 10/05, Jens Axboe wrote:
>>>
>>> Hi,
>>>
>>> The goal is this patch series is to decouple TWA_SIGNAL based task_work
>>> from real signals and signal delivery.
>>
>> I think TIF_NOTIFY_SIGNAL can have more users. Say, we can move
>> try_to_freeze() from get_signal() to tracehook_notify_signal(), kill
>> fake_signal_wake_up(), and remove freezing() from recalc_sigpending().
>>
>> Probably the same for TIF_PATCH_PENDING, klp_send_signals() can use
>> set_notify_signal() rather than signal_wake_up().
>
> Yes, that was my impression from the patch set too, when I accidentally
> noticed it.
>
> Jens, could you CC our live patching ML when you submit v4, please? It
> would be a nice cleanup.

Definitely, though it'd be v5 at this point. But we really need to get
all archs supporting TIF_NOTIFY_SIGNAL first. Once we have that, there's
a whole slew of cleanups that'll fall out naturally:

- Removal of JOBCTL_TASK_WORK
- Removal of special path for TWA_SIGNAL in task_work
- TIF_PATCH_PENDING can be converted and then removed
- try_to_freeze() cleanup that Oleg mentioned

And probably more I'm not thinking of right now :-)

--
Jens Axboe

2020-10-10 23:12:41

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCHSET RFC v3 0/6] Add support for TIF_NOTIFY_SIGNAL

On 10/9/20 9:21 AM, Jens Axboe wrote:
> On 10/9/20 2:01 AM, Miroslav Benes wrote:
>> On Thu, 8 Oct 2020, Oleg Nesterov wrote:
>>
>>> On 10/05, Jens Axboe wrote:
>>>>
>>>> Hi,
>>>>
>>>> The goal is this patch series is to decouple TWA_SIGNAL based task_work
>>>> from real signals and signal delivery.
>>>
>>> I think TIF_NOTIFY_SIGNAL can have more users. Say, we can move
>>> try_to_freeze() from get_signal() to tracehook_notify_signal(), kill
>>> fake_signal_wake_up(), and remove freezing() from recalc_sigpending().
>>>
>>> Probably the same for TIF_PATCH_PENDING, klp_send_signals() can use
>>> set_notify_signal() rather than signal_wake_up().
>>
>> Yes, that was my impression from the patch set too, when I accidentally
>> noticed it.
>>
>> Jens, could you CC our live patching ML when you submit v4, please? It
>> would be a nice cleanup.
>
> Definitely, though it'd be v5 at this point. But we really need to get
> all archs supporting TIF_NOTIFY_SIGNAL first. Once we have that, there's
> a whole slew of cleanups that'll fall out naturally:
>
> - Removal of JOBCTL_TASK_WORK
> - Removal of special path for TWA_SIGNAL in task_work
> - TIF_PATCH_PENDING can be converted and then removed
> - try_to_freeze() cleanup that Oleg mentioned
>
> And probably more I'm not thinking of right now :-)

Here's the current series, I took a stab at converting all archs to
support TIF_NOTIFY_SIGNAL so we have a base to build on top of. Most
of them were straight forward, but I need someone to fixup powerpc,
verify arm and s390.

But it's a decent start I think, and means that we can drop various
bits as is done at the end of the series. I could swap things around
a bit and avoid having the intermediate step, but I envision that
getting this in all archs will take a bit longer than just signing off
on the generic/x86 bits. So probably best to keep the series as it is
for now, and work on getting the arch bits verified/fixed/tested.

https://git.kernel.dk/cgit/linux-block/log/?h=tif-task_work

--
Jens Axboe

2020-10-12 17:30:27

by Miroslav Benes

[permalink] [raw]
Subject: Re: [PATCHSET RFC v3 0/6] Add support for TIF_NOTIFY_SIGNAL

On Sat, 10 Oct 2020, Jens Axboe wrote:

> On 10/9/20 9:21 AM, Jens Axboe wrote:
> > On 10/9/20 2:01 AM, Miroslav Benes wrote:
> >> On Thu, 8 Oct 2020, Oleg Nesterov wrote:
> >>
> >>> On 10/05, Jens Axboe wrote:
> >>>>
> >>>> Hi,
> >>>>
> >>>> The goal is this patch series is to decouple TWA_SIGNAL based task_work
> >>>> from real signals and signal delivery.
> >>>
> >>> I think TIF_NOTIFY_SIGNAL can have more users. Say, we can move
> >>> try_to_freeze() from get_signal() to tracehook_notify_signal(), kill
> >>> fake_signal_wake_up(), and remove freezing() from recalc_sigpending().
> >>>
> >>> Probably the same for TIF_PATCH_PENDING, klp_send_signals() can use
> >>> set_notify_signal() rather than signal_wake_up().
> >>
> >> Yes, that was my impression from the patch set too, when I accidentally
> >> noticed it.
> >>
> >> Jens, could you CC our live patching ML when you submit v4, please? It
> >> would be a nice cleanup.
> >
> > Definitely, though it'd be v5 at this point. But we really need to get
> > all archs supporting TIF_NOTIFY_SIGNAL first. Once we have that, there's
> > a whole slew of cleanups that'll fall out naturally:
> >
> > - Removal of JOBCTL_TASK_WORK
> > - Removal of special path for TWA_SIGNAL in task_work
> > - TIF_PATCH_PENDING can be converted and then removed
> > - try_to_freeze() cleanup that Oleg mentioned
> >
> > And probably more I'm not thinking of right now :-)
>
> Here's the current series, I took a stab at converting all archs to
> support TIF_NOTIFY_SIGNAL so we have a base to build on top of. Most
> of them were straight forward, but I need someone to fixup powerpc,
> verify arm and s390.
>
> But it's a decent start I think, and means that we can drop various
> bits as is done at the end of the series. I could swap things around
> a bit and avoid having the intermediate step, but I envision that
> getting this in all archs will take a bit longer than just signing off
> on the generic/x86 bits. So probably best to keep the series as it is
> for now, and work on getting the arch bits verified/fixed/tested.
>
> https://git.kernel.dk/cgit/linux-block/log/?h=tif-task_work

Thanks, Jens.

Crude diff for live patching on top of the series is below. Tested only on
x86_64, but it passes the tests without an issue.

Miroslav

---
diff --git a/kernel/livepatch/transition.c b/kernel/livepatch/transition.c
index f6310f848f34..3a4beb9395c4 100644
--- a/kernel/livepatch/transition.c
+++ b/kernel/livepatch/transition.c
@@ -9,6 +9,7 @@

#include <linux/cpu.h>
#include <linux/stacktrace.h>
+#include <linux/tracehook.h>
#include "core.h"
#include "patch.h"
#include "transition.h"
@@ -369,9 +370,7 @@ static void klp_send_signals(void)
* Send fake signal to all non-kthread tasks which are
* still not migrated.
*/
- spin_lock_irq(&task->sighand->siglock);
- signal_wake_up(task, 0);
- spin_unlock_irq(&task->sighand->siglock);
+ set_notify_signal(task);
}
}
read_unlock(&tasklist_lock);
diff --git a/kernel/signal.c b/kernel/signal.c
index a15c584a0455..b7cf4eda8611 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -181,8 +181,7 @@ void recalc_sigpending_and_wake(struct task_struct *t)

void recalc_sigpending(void)
{
- if (!recalc_sigpending_tsk(current) && !freezing(current) &&
- !klp_patch_pending(current))
+ if (!recalc_sigpending_tsk(current) && !freezing(current))
clear_thread_flag(TIF_SIGPENDING);

}

2020-10-14 07:42:40

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCHSET RFC v3 0/6] Add support for TIF_NOTIFY_SIGNAL

Jens,

On Tue, Oct 13 2020 at 13:39, Jens Axboe wrote:
> On 10/12/20 11:27 AM, Miroslav Benes wrote:
> I'm continuing to hone the series, what's really missing so far is arch
> review. Most conversions are straight forward, some I need folks to
> definitely take a look at (arm, s390). powerpc is also a bit hair right
> now, but I'm told that 5.10 will kill a TIF flag there, so that'll make
> it trivial once I rebase on that.

can you pretty please not add that to anything which is not going
through kernel/entry/ ?

The amount of duplicated and differently buggy, inconsistent and
incomplete code in syscall and exception handling is just annoying.

It's perfectly fine if we keep that #ifdeffery around for a while and
encourage arch folks to move over to the generic infrastructure instead
of proliferating the status quo by adding this to their existing pile.

The #ifdef guarding this in set_notify_signal() and other core code
places wants to be:

#if defined(CONFIG_GENERIC_ENTRY) && defined(TIF_NOTIFY_SIGNAL)

Thanks,

tglx

2020-10-14 09:23:31

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCHSET RFC v3 0/6] Add support for TIF_NOTIFY_SIGNAL

On 10/13/20 5:34 PM, Thomas Gleixner wrote:
> Jens,
>
> On Tue, Oct 13 2020 at 13:39, Jens Axboe wrote:
>> On 10/12/20 11:27 AM, Miroslav Benes wrote:
>> I'm continuing to hone the series, what's really missing so far is arch
>> review. Most conversions are straight forward, some I need folks to
>> definitely take a look at (arm, s390). powerpc is also a bit hair right
>> now, but I'm told that 5.10 will kill a TIF flag there, so that'll make
>> it trivial once I rebase on that.
>
> can you pretty please not add that to anything which is not going
> through kernel/entry/ ?

Certainly, tif-task_work is just a holding ground for all of it so
far. Once the core bits are agreed upon and merged, then I'll bounce
the rest through the arch maintainers. So please don't view it as
on cohesive series, it only is up until (and including):

commit 8dc14b576a9bf13dba4993e4ddb4068d39dee1e9
Author: Jens Axboe <[email protected]>
Date: Thu Oct 1 13:29:21 2020 -0600

task_work: use TIF_NOTIFY_SIGNAL if available


> The amount of duplicated and differently buggy, inconsistent and
> incomplete code in syscall and exception handling is just annoying.
>
> It's perfectly fine if we keep that #ifdeffery around for a while and
> encourage arch folks to move over to the generic infrastructure instead
> of proliferating the status quo by adding this to their existing pile.

Agree

> The #ifdef guarding this in set_notify_signal() and other core code
> places wants to be:
>
> #if defined(CONFIG_GENERIC_ENTRY) && defined(TIF_NOTIFY_SIGNAL)

OK no problem, I'll fix that up.

--
Jens Axboe

2020-10-14 10:08:59

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCHSET RFC v3 0/6] Add support for TIF_NOTIFY_SIGNAL

On 10/12/20 11:27 AM, Miroslav Benes wrote:
> On Sat, 10 Oct 2020, Jens Axboe wrote:
>
>> On 10/9/20 9:21 AM, Jens Axboe wrote:
>>> On 10/9/20 2:01 AM, Miroslav Benes wrote:
>>>> On Thu, 8 Oct 2020, Oleg Nesterov wrote:
>>>>
>>>>> On 10/05, Jens Axboe wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> The goal is this patch series is to decouple TWA_SIGNAL based task_work
>>>>>> from real signals and signal delivery.
>>>>>
>>>>> I think TIF_NOTIFY_SIGNAL can have more users. Say, we can move
>>>>> try_to_freeze() from get_signal() to tracehook_notify_signal(), kill
>>>>> fake_signal_wake_up(), and remove freezing() from recalc_sigpending().
>>>>>
>>>>> Probably the same for TIF_PATCH_PENDING, klp_send_signals() can use
>>>>> set_notify_signal() rather than signal_wake_up().
>>>>
>>>> Yes, that was my impression from the patch set too, when I accidentally
>>>> noticed it.
>>>>
>>>> Jens, could you CC our live patching ML when you submit v4, please? It
>>>> would be a nice cleanup.
>>>
>>> Definitely, though it'd be v5 at this point. But we really need to get
>>> all archs supporting TIF_NOTIFY_SIGNAL first. Once we have that, there's
>>> a whole slew of cleanups that'll fall out naturally:
>>>
>>> - Removal of JOBCTL_TASK_WORK
>>> - Removal of special path for TWA_SIGNAL in task_work
>>> - TIF_PATCH_PENDING can be converted and then removed
>>> - try_to_freeze() cleanup that Oleg mentioned
>>>
>>> And probably more I'm not thinking of right now :-)
>>
>> Here's the current series, I took a stab at converting all archs to
>> support TIF_NOTIFY_SIGNAL so we have a base to build on top of. Most
>> of them were straight forward, but I need someone to fixup powerpc,
>> verify arm and s390.
>>
>> But it's a decent start I think, and means that we can drop various
>> bits as is done at the end of the series. I could swap things around
>> a bit and avoid having the intermediate step, but I envision that
>> getting this in all archs will take a bit longer than just signing off
>> on the generic/x86 bits. So probably best to keep the series as it is
>> for now, and work on getting the arch bits verified/fixed/tested.
>>
>> https://git.kernel.dk/cgit/linux-block/log/?h=tif-task_work
>
> Thanks, Jens.
>
> Crude diff for live patching on top of the series is below. Tested only on
> x86_64, but it passes the tests without an issue.

Nice, thanks!

I'm continuing to hone the series, what's really missing so far is arch
review. Most conversions are straight forward, some I need folks to
definitely take a look at (arm, s390). powerpc is also a bit hair right
now, but I'm told that 5.10 will kill a TIF flag there, so that'll make
it trivial once I rebase on that.

Did a few more cleanups on top, series is in the same spot. I'll repost
once the merge window settles down.

--
Jens Axboe