2020-05-05 14:26:33

by Thomas Gleixner

[permalink] [raw]
Subject: [patch V4 part 1 05/36] x86/entry: Flip _TIF_SIGPENDING and _TIF_NOTIFY_RESUME handling

Make sure task_work runs before any kind of userspace -- very much
including signals -- is invoked.

Suggested-by: Andy Lutomirski <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
---
arch/x86/entry/common.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)

--- a/arch/x86/entry/common.c
+++ b/arch/x86/entry/common.c
@@ -156,16 +156,16 @@ static void exit_to_usermode_loop(struct
if (cached_flags & _TIF_PATCH_PENDING)
klp_update_patch_state(current);

- /* deal with pending signal delivery */
- if (cached_flags & _TIF_SIGPENDING)
- do_signal(regs);
-
if (cached_flags & _TIF_NOTIFY_RESUME) {
clear_thread_flag(TIF_NOTIFY_RESUME);
tracehook_notify_resume(regs);
rseq_handle_notify_resume(NULL, regs);
}

+ /* deal with pending signal delivery */
+ if (cached_flags & _TIF_SIGPENDING)
+ do_signal(regs);
+
if (cached_flags & _TIF_USER_RETURN_NOTIFY)
fire_user_return_notifiers();



2020-05-06 11:58:22

by Miroslav Benes

[permalink] [raw]
Subject: Re: [patch V4 part 1 05/36] x86/entry: Flip _TIF_SIGPENDING and _TIF_NOTIFY_RESUME handling

On Tue, 5 May 2020, Thomas Gleixner wrote:

> Make sure task_work runs before any kind of userspace -- very much
> including signals -- is invoked.

I might be missing something, but isn't this guaranteed by
do_signal()->get_signal()->task_work_run() path?

Miroslav

2020-05-06 12:11:24

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [patch V4 part 1 05/36] x86/entry: Flip _TIF_SIGPENDING and _TIF_NOTIFY_RESUME handling

Miroslav Benes <[email protected]> writes:

> On Tue, 5 May 2020, Thomas Gleixner wrote:
>
>> Make sure task_work runs before any kind of userspace -- very much
>> including signals -- is invoked.
>
> I might be missing something, but isn't this guaranteed by
> do_signal()->get_signal()->task_work_run() path?

The changelog is misleading. This is not about task_work it's about
TIF_NOTIFY_RESUME. Will fix.

Thanks,

tglx

2020-05-06 13:11:46

by Alexandre Chartre

[permalink] [raw]
Subject: Re: [patch V4 part 1 05/36] x86/entry: Flip _TIF_SIGPENDING and _TIF_NOTIFY_RESUME handling


On 5/5/20 3:16 PM, Thomas Gleixner wrote:
> Make sure task_work runs before any kind of userspace -- very much
> including signals -- is invoked.
>
> Suggested-by: Andy Lutomirski <[email protected]>
> Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
> Signed-off-by: Thomas Gleixner <[email protected]>
> ---
> arch/x86/entry/common.c | 8 ++++----
> 1 file changed, 4 insertions(+), 4 deletions(-)

Reviewed-by: Alexandre Chartre <[email protected]>

alex.

> --- a/arch/x86/entry/common.c
> +++ b/arch/x86/entry/common.c
> @@ -156,16 +156,16 @@ static void exit_to_usermode_loop(struct
> if (cached_flags & _TIF_PATCH_PENDING)
> klp_update_patch_state(current);
>
> - /* deal with pending signal delivery */
> - if (cached_flags & _TIF_SIGPENDING)
> - do_signal(regs);
> -
> if (cached_flags & _TIF_NOTIFY_RESUME) {
> clear_thread_flag(TIF_NOTIFY_RESUME);
> tracehook_notify_resume(regs);
> rseq_handle_notify_resume(NULL, regs);
> }
>
> + /* deal with pending signal delivery */
> + if (cached_flags & _TIF_SIGPENDING)
> + do_signal(regs);
> +
> if (cached_flags & _TIF_USER_RETURN_NOTIFY)
> fire_user_return_notifiers();
>
>

2020-05-06 15:38:39

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [patch V4 part 1 05/36] x86/entry: Flip _TIF_SIGPENDING and _TIF_NOTIFY_RESUME handling

On Wed, May 06, 2020 at 01:53:36PM +0200, Miroslav Benes wrote:
> On Tue, 5 May 2020, Thomas Gleixner wrote:
>
> > Make sure task_work runs before any kind of userspace -- very much
> > including signals -- is invoked.
>
> I might be missing something, but isn't this guaranteed by
> do_signal()->get_signal()->task_work_run() path?

Yes, that too, that 'hack' can now be removed I suppose.

2020-05-06 16:31:13

by Borislav Petkov

[permalink] [raw]
Subject: Re: [patch V4 part 1 05/36] x86/entry: Flip _TIF_SIGPENDING and _TIF_NOTIFY_RESUME handling

On Tue, May 05, 2020 at 03:16:07PM +0200, Thomas Gleixner wrote:
> Make sure task_work runs before any kind of userspace -- very much
> including signals -- is invoked.
>
> Suggested-by: Andy Lutomirski <[email protected]>
> Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
> Signed-off-by: Thomas Gleixner <[email protected]>

Does this need

From: Peter

too?

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2020-05-07 17:40:34

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [patch V4 part 1 05/36] x86/entry: Flip _TIF_SIGPENDING and _TIF_NOTIFY_RESUME handling

On Tue, May 5, 2020 at 7:13 AM Thomas Gleixner <[email protected]> wrote:
>
> Make sure task_work runs before any kind of userspace -- very much
> including signals -- is invoked.

I certainly approve of the change, but anything that fundamentally
relies on this makes me uneasy. I'll keep an eye out for potential
problems.

2020-05-13 20:58:24

by Mathieu Desnoyers

[permalink] [raw]
Subject: Re: [patch V4 part 1 05/36] x86/entry: Flip _TIF_SIGPENDING and _TIF_NOTIFY_RESUME handling

----- On May 5, 2020, at 9:16 AM, Thomas Gleixner [email protected] wrote:

> Make sure task_work runs before any kind of userspace -- very much
> including signals -- is invoked.

What is missing from this patch description is: _why_ is this deemed
useful ?

Also, color me confused: is "do_signal()" actually running any user-space,
or just setting up the user-space stack for eventual return to signal handler ?

Also, it might be OK, but we're changing the order of two things which
have effects on each other: restartable sequences abort fixup for preemption
and do_signal(), which also have effects on rseq abort.

Because those two will cause the abort to trigger, I suspect changing
the order might be OK, but we really need to think this through.

Thanks,

Mathieu

>
> Suggested-by: Andy Lutomirski <[email protected]>
> Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
> Signed-off-by: Thomas Gleixner <[email protected]>
> ---
> arch/x86/entry/common.c | 8 ++++----
> 1 file changed, 4 insertions(+), 4 deletions(-)
>
> --- a/arch/x86/entry/common.c
> +++ b/arch/x86/entry/common.c
> @@ -156,16 +156,16 @@ static void exit_to_usermode_loop(struct
> if (cached_flags & _TIF_PATCH_PENDING)
> klp_update_patch_state(current);
>
> - /* deal with pending signal delivery */
> - if (cached_flags & _TIF_SIGPENDING)
> - do_signal(regs);
> -
> if (cached_flags & _TIF_NOTIFY_RESUME) {
> clear_thread_flag(TIF_NOTIFY_RESUME);
> tracehook_notify_resume(regs);
> rseq_handle_notify_resume(NULL, regs);
> }
>
> + /* deal with pending signal delivery */
> + if (cached_flags & _TIF_SIGPENDING)
> + do_signal(regs);
> +
> if (cached_flags & _TIF_USER_RETURN_NOTIFY)
> fire_user_return_notifiers();

--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

2020-05-13 21:12:39

by Steven Rostedt

[permalink] [raw]
Subject: Re: [patch V4 part 1 05/36] x86/entry: Flip _TIF_SIGPENDING and _TIF_NOTIFY_RESUME handling

On Wed, 13 May 2020 16:56:41 -0400 (EDT)
Mathieu Desnoyers <[email protected]> wrote:

> ----- On May 5, 2020, at 9:16 AM, Thomas Gleixner [email protected] wrote:
>
> > Make sure task_work runs before any kind of userspace -- very much
> > including signals -- is invoked.
>
> What is missing from this patch description is: _why_ is this deemed
> useful ?
>
> Also, color me confused: is "do_signal()" actually running any user-space,
> or just setting up the user-space stack for eventual return to signal handler ?
>
> Also, it might be OK, but we're changing the order of two things which
> have effects on each other: restartable sequences abort fixup for preemption
> and do_signal(), which also have effects on rseq abort.
>
> Because those two will cause the abort to trigger, I suspect changing
> the order might be OK, but we really need to think this through.
>
> Thanks,
>
> Mathieu
>
> >
> > Suggested-by: Andy Lutomirski <[email protected]>
> > Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
> > Signed-off-by: Thomas Gleixner <[email protected]>
> > ---
> > arch/x86/entry/common.c | 8 ++++----
> > 1 file changed, 4 insertions(+), 4 deletions(-)
> >
> > --- a/arch/x86/entry/common.c
> > +++ b/arch/x86/entry/common.c
> > @@ -156,16 +156,16 @@ static void exit_to_usermode_loop(struct
> > if (cached_flags & _TIF_PATCH_PENDING)
> > klp_update_patch_state(current);
> >
> > - /* deal with pending signal delivery */
> > - if (cached_flags & _TIF_SIGPENDING)
> > - do_signal(regs);
> > -
> > if (cached_flags & _TIF_NOTIFY_RESUME) {
> > clear_thread_flag(TIF_NOTIFY_RESUME);
> > tracehook_notify_resume(regs);
> > rseq_handle_notify_resume(NULL, regs);
> > }
> >
> > + /* deal with pending signal delivery */
> > + if (cached_flags & _TIF_SIGPENDING)
> > + do_signal(regs);

Looking deeper into this, it appears that do_signal() can freeze or kill the
task.

That is, it wont go back to user space here, but simply schedule out (being
traced) or even exit (killed).

Before the resume hooks would never be called in such cases, and now they
are.

-- Steve


> > +
> > if (cached_flags & _TIF_USER_RETURN_NOTIFY)
> > fire_user_return_notifiers();
>

2020-05-13 22:49:56

by Mathieu Desnoyers

[permalink] [raw]
Subject: Re: [patch V4 part 1 05/36] x86/entry: Flip _TIF_SIGPENDING and _TIF_NOTIFY_RESUME handling

----- On May 13, 2020, at 5:10 PM, rostedt [email protected] wrote:

> On Wed, 13 May 2020 16:56:41 -0400 (EDT)
> Mathieu Desnoyers <[email protected]> wrote:
>
>> ----- On May 5, 2020, at 9:16 AM, Thomas Gleixner [email protected] wrote:
>>
>> > Make sure task_work runs before any kind of userspace -- very much
>> > including signals -- is invoked.
>>
>> What is missing from this patch description is: _why_ is this deemed
>> useful ?
>>
>> Also, color me confused: is "do_signal()" actually running any user-space,
>> or just setting up the user-space stack for eventual return to signal handler ?
>>
>> Also, it might be OK, but we're changing the order of two things which
>> have effects on each other: restartable sequences abort fixup for preemption
>> and do_signal(), which also have effects on rseq abort.
>>
>> Because those two will cause the abort to trigger, I suspect changing
>> the order might be OK, but we really need to think this through.
>>
>> Thanks,
>>
>> Mathieu
>>
>> >
>> > Suggested-by: Andy Lutomirski <[email protected]>
>> > Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
>> > Signed-off-by: Thomas Gleixner <[email protected]>
>> > ---
>> > arch/x86/entry/common.c | 8 ++++----
>> > 1 file changed, 4 insertions(+), 4 deletions(-)
>> >
>> > --- a/arch/x86/entry/common.c
>> > +++ b/arch/x86/entry/common.c
>> > @@ -156,16 +156,16 @@ static void exit_to_usermode_loop(struct
>> > if (cached_flags & _TIF_PATCH_PENDING)
>> > klp_update_patch_state(current);
>> >
>> > - /* deal with pending signal delivery */
>> > - if (cached_flags & _TIF_SIGPENDING)
>> > - do_signal(regs);
>> > -
>> > if (cached_flags & _TIF_NOTIFY_RESUME) {
>> > clear_thread_flag(TIF_NOTIFY_RESUME);
>> > tracehook_notify_resume(regs);
>> > rseq_handle_notify_resume(NULL, regs);
>> > }
>> >
>> > + /* deal with pending signal delivery */
>> > + if (cached_flags & _TIF_SIGPENDING)
>> > + do_signal(regs);
>
> Looking deeper into this, it appears that do_signal() can freeze or kill the
> task.
>
> That is, it wont go back to user space here, but simply schedule out (being
> traced) or even exit (killed).
>
> Before the resume hooks would never be called in such cases, and now they
> are.

Regarding swapping order of tracehook vs do_signal: Is the task really resumed
if it gets frozen or killed ? What is this change trying to accomplish, why
is it needed in the first place ?

Regarding swapping order of rseq wrt do_signal: why is it needed, why, and has
this been tested ?

Thanks,

Mathieu

>
> -- Steve
>
>
>> > +
>> > if (cached_flags & _TIF_USER_RETURN_NOTIFY)
>> > fire_user_return_notifiers();

--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

2020-05-14 00:14:45

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [patch V4 part 1 05/36] x86/entry: Flip _TIF_SIGPENDING and _TIF_NOTIFY_RESUME handling

Steven, Mathieu

(combo reply)

Steven Rostedt <[email protected]> writes:
> On Wed, 13 May 2020 16:56:41 -0400 (EDT)
>> > + /* deal with pending signal delivery */
>> > + if (cached_flags & _TIF_SIGPENDING)
>> > + do_signal(regs);
>
> Looking deeper into this, it appears that do_signal() can freeze or kill the
> task.
>
> That is, it wont go back to user space here, but simply schedule out (being
> traced) or even exit (killed).
>
> Before the resume hooks would never be called in such cases, and now they
> are.

It theoretically matters because pending task work might kill the
task. That's the concern Andy and Peter had. Assume the following:

usermode

-> exception
set not fatal signal

-> exception
queue task work to kill task
<- return

<- return

The same could happen when the non fatal signal is set from a remote CPU.

So in theory that would result in:

handle non fatal signal first

handle task work which kills task

which would be the wrong order.

But that's just illusion.

>> Mathieu Desnoyers <[email protected]> wrote:

>> Also, color me confused: is "do_signal()" actually running any user-space,
>> or just setting up the user-space stack for eventual return to signal
>> handler ?

I'm surprised that you can't answer that question yourself. How did you
ever make rseq work and how did rseq_signal_deliver() end up in
setup_rt_frame()?

Hint: Tracing might answer that question

And to cut it short:

Exit to user space happnes only through ONE channel, i.e. leaving
prepare_exit_to usermode().

exit_to_usermode_loop <-prepare_exit_to_usermode
do_signal <-exit_to_usermode_loop
get_signal <-do_signal
setup_sigcontext <-do_signal
do_syscall_64 <-entry_SYSCALL_64_after_hwframe
syscall_trace_enter <-do_syscall_64

sys_rt_sigreturn()
restore_altstack <-__ia32_sys_rt_sigreturn
syscall_slow_exit_work <-do_syscall_64
exit_to_usermode_loop <-do_syscall_64

>> Also, it might be OK, but we're changing the order of two things which
>> have effects on each other: restartable sequences abort fixup for preemption
>> and do_signal(), which also have effects on rseq abort.
>>
>> Because those two will cause the abort to trigger, I suspect changing
>> the order might be OK, but we really need to think this through.

That's a purely academic problem. The order is completely
irrelevant. You have to handle any order anyway:

usermode

-> exception / syscall
sets signal

<- return

prepare_exit_to_usemode()
cached_flags = READ_ONCE(t->flags);
exit_to_user_mode_loop(regs, cached_flags) {
while (cached_flags) {
local_irq_enable();

handle(cached_flags & RESCHED);
handle(cached_flags & UPROBE);
handle(cached_flags & PATCHING);
handle(cached_flags & SIGNAL);
handle(cached_flags & NOTIFY_RESUME);
handle(cached_flags & RETURN_NOTIFY);

local_irq_disable();

cached_flags = READ_ONCE(t->flags);
}

cached_flag is a momentary snapshot when attempting to return to user
space.

But after reenabling interrupts any of the relevant flag bits can be set
by an exception/interrupt or from remote. If preemption is enabled the
task can be scheduled out, migrated at any point before disabling
interrupts again. Even after disabling interrupts and before re-reading
cached flags there might be a remote change of flags.

That said, even for the case Andy and Peter were looking at (MCE) the
ordering is completely irrelevant.

I should have thought about this before, so thanks to both of you for
making me look at it again for the very wrong reasons.

Consider the patch dropped.

Thanks,

tglx

2020-05-14 00:39:32

by Steven Rostedt

[permalink] [raw]
Subject: Re: [patch V4 part 1 05/36] x86/entry: Flip _TIF_SIGPENDING and _TIF_NOTIFY_RESUME handling

On Thu, 14 May 2020 02:12:21 +0200
Thomas Gleixner <[email protected]> wrote:

> I should have thought about this before, so thanks to both of you for
> making me look at it again for the very wrong reasons.
>
> Consider the patch dropped.

I'm glad our incompetence could be of service to you :-/

-- Steve

2020-05-14 00:51:23

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [patch V4 part 1 05/36] x86/entry: Flip _TIF_SIGPENDING and _TIF_NOTIFY_RESUME handling

Steven Rostedt <[email protected]> writes:

> On Thu, 14 May 2020 02:12:21 +0200
> Thomas Gleixner <[email protected]> wrote:
>
>> I should have thought about this before, so thanks to both of you for
>> making me look at it again for the very wrong reasons.
>>
>> Consider the patch dropped.
>
> I'm glad our incompetence could be of service to you :-/

Asking incompetent questions is perfectly fine depending on who is
asking them.

Thanks,

tglx

2020-05-14 01:26:17

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [patch V4 part 1 05/36] x86/entry: Flip _TIF_SIGPENDING and _TIF_NOTIFY_RESUME handling



> On May 13, 2020, at 5:12 PM, Thomas Gleixner <[email protected]> wrote:
>
> Steven, Mathieu
>
> (combo reply)
>
> Steven Rostedt <[email protected]> writes:
>> On Wed, 13 May 2020 16:56:41 -0400 (EDT)
>>>> + /* deal with pending signal delivery */
>>>> + if (cached_flags & _TIF_SIGPENDING)
>>>> + do_signal(regs);
>>
>> Looking deeper into this, it appears that do_signal() can freeze or kill the
>> task.
>>
>> That is, it wont go back to user space here, but simply schedule out (being
>> traced) or even exit (killed).
>>
>> Before the resume hooks would never be called in such cases, and now they
>> are.
>
> It theoretically matters because pending task work might kill the
> task. That's the concern Andy and Peter had. Assume the following:
>
> usermode
>
> -> exception
> set not fatal signal
>
> -> exception
> queue task work to kill task
> <- return
>
> <- return
>
> The same could happen when the non fatal signal is set from a remote CPU.
>
> So in theory that would result in:
>
> handle non fatal signal first
>
> handle task work which kills task
>
> which would be the wrong order.
>
> But that's just illusion.
>
>>> Mathieu Desnoyers <[email protected]> wrote:
>
>>> Also, color me confused: is "do_signal()" actually running any user-space,
>>> or just setting up the user-space stack for eventual return to signal
>>> handler ?
>
> I'm surprised that you can't answer that question yourself. How did you
> ever make rseq work and how did rseq_signal_deliver() end up in
> setup_rt_frame()?
>
> Hint: Tracing might answer that question
>
> And to cut it short:
>
> Exit to user space happnes only through ONE channel, i.e. leaving
> prepare_exit_to usermode().
>
> exit_to_usermode_loop <-prepare_exit_to_usermode
> do_signal <-exit_to_usermode_loop
> get_signal <-do_signal
> setup_sigcontext <-do_signal
> do_syscall_64 <-entry_SYSCALL_64_after_hwframe
> syscall_trace_enter <-do_syscall_64
>
> sys_rt_sigreturn()
> restore_altstack <-__ia32_sys_rt_sigreturn
> syscall_slow_exit_work <-do_syscall_64
> exit_to_usermode_loop <-do_syscall_64
>
>>> Also, it might be OK, but we're changing the order of two things which
>>> have effects on each other: restartable sequences abort fixup for preemption
>>> and do_signal(), which also have effects on rseq abort.
>>>
>>> Because those two will cause the abort to trigger, I suspect changing
>>> the order might be OK, but we really need to think this through.
>
> That's a purely academic problem. The order is completely
> irrelevant. You have to handle any order anyway:
>
> usermode
>
> -> exception / syscall
> sets signal
>
> <- return
>
> prepare_exit_to_usemode()
> cached_flags = READ_ONCE(t->flags);
> exit_to_user_mode_loop(regs, cached_flags) {
> while (cached_flags) {
> local_irq_enable();
>
> handle(cached_flags & RESCHED);
> handle(cached_flags & UPROBE);
> handle(cached_flags & PATCHING);
> handle(cached_flags & SIGNAL);
> handle(cached_flags & NOTIFY_RESUME);
> handle(cached_flags & RETURN_NOTIFY);
>
> local_irq_disable();
>
> cached_flags = READ_ONCE(t->flags);
> }
>
> cached_flag is a momentary snapshot when attempting to return to user
> space.
>
> But after reenabling interrupts any of the relevant flag bits can be set
> by an exception/interrupt or from remote. If preemption is enabled the
> task can be scheduled out, migrated at any point before disabling
> interrupts again. Even after disabling interrupts and before re-reading
> cached flags there might be a remote change of flags.
>
> That said, even for the case Andy and Peter were looking at (MCE) the
> ordering is completely irrelevant.
>
> I should have thought about this before, so thanks to both of you for
> making me look at it again for the very wrong reasons.
>
> Consider the patch dropped.

I disagree.

There is only one relevant MCE case: #MC hits user code and tries to recover.

Right now, this works via the ist_begin_non_atomic hack. But the series changes it so that it uses task_work(). So now the kernel ends up in prepare_exit_to_usermode() and the machine check work is pending, but a signal might *also* be pending due to another CPU setting the bit.

If we process the signal first, we could block on userfaultfd or whatever, and now we could take forever to finish.

Heck, even with the order changed, we could get preempted, return to another user task, and get a new machine check. Ick.

So I agree that this patch is problematic, and it doesn’t fully fix the problem, but I do believe that the task_work thing is problematic.

Rumor has it that it gets improved a bit farther along in the series, but I’m still plodding through.



>
> Thanks,
>
> tglx

2020-05-14 02:54:08

by Mathieu Desnoyers

[permalink] [raw]
Subject: Re: [patch V4 part 1 05/36] x86/entry: Flip _TIF_SIGPENDING and _TIF_NOTIFY_RESUME handling

----- On May 13, 2020, at 8:12 PM, Thomas Gleixner [email protected] wrote:
[...]
>
>>> Mathieu Desnoyers <[email protected]> wrote:
>
>>> Also, color me confused: is "do_signal()" actually running any user-space,
>>> or just setting up the user-space stack for eventual return to signal
>>> handler ?
>
> I'm surprised that you can't answer that question yourself. How did you
> ever make rseq work and how did rseq_signal_deliver() end up in
> setup_rt_frame()?
>
> Hint: Tracing might answer that question
>
> And to cut it short:
>
> Exit to user space happnes only through ONE channel, i.e. leaving
> prepare_exit_to usermode().
>

[...]

Yes, I'm very well aware of this. But the patch commit message states:

"Make sure task_work runs before any kind of userspace -- very much
including signals -- is invoked."

which seems to imply that "userspace" can be "invoked" before the task_work
runs. Which makes no sense whatsoever. Hence my confused state.

>>> Also, it might be OK, but we're changing the order of two things which
>>> have effects on each other: restartable sequences abort fixup for preemption
>>> and do_signal(), which also have effects on rseq abort.
>>>
>>> Because those two will cause the abort to trigger, I suspect changing
>>> the order might be OK, but we really need to think this through.
>
> That's a purely academic problem. The order is completely
> irrelevant. You have to handle any order anyway:

Yes indeed, whether either a signal handler frame fixup or return IP
fixup fires first (clearing the rseq_cs pointer in the process) is
irrelevant, because they will have the effect on the user-space program's
flow. And as you say, given it is run in a loop and can be preempted,
any order can happen here, so we have to be prepared for it. This loop
has caused me tons of headaches when stress-testing on NUMA machines by
the way.

> That said, even for the case Andy and Peter were looking at (MCE) the
> ordering is completely irrelevant.

Not sure about that, see Andy's follow up.

Thanks,

Mathieu


--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

2020-05-14 09:22:26

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [patch V4 part 1 05/36] x86/entry: Flip _TIF_SIGPENDING and _TIF_NOTIFY_RESUME handling

On Wed, May 13, 2020 at 10:51:53PM -0400, Mathieu Desnoyers wrote:

> Yes, I'm very well aware of this. But the patch commit message states:
>
> "Make sure task_work runs before any kind of userspace -- very much
> including signals -- is invoked."
>
> which seems to imply that "userspace" can be "invoked" before the task_work
> runs. Which makes no sense whatsoever. Hence my confused state.

I initially missed the run_task_work in the signal handling maze. Then
later figured this order still made more sense, but apparently there's
an actual problem.

I've no problem with the patch getting dropped.