2020-05-27 10:12:19

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH 0/5] x86/entry: simply stack switching when exception on userspace

7f2590a110b8("x86/entry/64: Use a per-CPU trampoline stack for IDT entries")
has resulted that when exception on userspace, the kernel (error_entry)
always push the pt_regs to entry stack(sp0), and then copy them to the
kernel stack.

This is a hot path (for example page fault) and interrupt_entry
directly switches to kernel stack and pushes pt_regs to kernel stack.
We should do it for error_entry. This is the job of patch1,2.

Patch 3-5 simply stack switching for .Lerror_bad_iret by just doing
all the work in one function (fixup_bad_iret()).

The patch set is based on tip/master (c021d3d8fe45) (Mon May 25).

The diffstat is "66 insertions(+), 66 deletions(-)", but actually
it mainly adds comments and deletes code.

Cc: Andy Lutomirski <[email protected]>,
Cc: Thomas Gleixner <[email protected]>,
Cc: Ingo Molnar <[email protected]>,
Cc: Borislav Petkov <[email protected]>,
Cc: [email protected],
Cc: "H. Peter Anvin" <[email protected]>,
Cc: Peter Zijlstra <[email protected]>,
Cc: Alexandre Chartre <[email protected]>,
Cc: "Eric W. Biederman" <[email protected]>,
Cc: Jann Horn <[email protected]>,
Cc: Dave Hansen <[email protected]>

Lai Jiangshan (5):
x86/entry: introduce macro idtentry_swapgs_and_switch_to_kernel_stack
x86/entry: avoid calling into sync_regs() when entering from userspace
x86/entry: directly switch to kernel stack when .Lerror_bad_iret
x86/entry: remove unused sync_regs()
x86/entry: don't copy to tmp in fixup_bad_iret

arch/x86/entry/entry_64.S | 89 ++++++++++++++++++++----------------
arch/x86/include/asm/traps.h | 1 -
arch/x86/kernel/traps.c | 42 +++++++----------
3 files changed, 66 insertions(+), 66 deletions(-)

--
2.20.1


2020-05-27 10:14:06

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH 2/5] x86/entry: avoid calling into sync_regs() when entering from userspace

7f2590a110b8("x86/entry/64: Use a per-CPU trampoline stack for IDT entries")
made a change that when any exception happens on userspace, the
entry code will save the pt_regs on the sp0 stack, and then copy it
to the thread stack via sync_regs() and switch to thread stack
afterward.

This is hot path, such overhead should be avoided. This patch
borrows the way how interrupt_entry handles it.

Signed-off-by: Lai Jiangshan <[email protected]>
---
arch/x86/entry/entry_64.S | 18 +++++++-----------
1 file changed, 7 insertions(+), 11 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 5e983506f82e..e8817ae31390 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -1321,19 +1321,13 @@ SYM_CODE_END(paranoid_exit)
SYM_CODE_START_LOCAL(error_entry)
UNWIND_HINT_FUNC
cld
- PUSH_AND_CLEAR_REGS save_ret=1
- ENCODE_FRAME_POINTER 8
- testb $3, CS+8(%rsp)
+ testb $3, CS-ORIG_RAX+8(%rsp)
jz .Lerror_kernelspace

- /*
- * We entered from user mode or we're pretending to have entered
- * from user mode due to an IRET fault.
- */
- SWAPGS
- FENCE_SWAPGS_USER_ENTRY
- /* We have user CR3. Change to kernel CR3. */
- SWITCH_TO_KERNEL_CR3 scratch_reg=%rax
+ idtentry_swapgs_and_switch_to_kernel_stack
+ PUSH_AND_CLEAR_REGS save_ret=1
+ ENCODE_FRAME_POINTER 8
+ ret

.Lerror_entry_from_usermode_after_swapgs:
/* Put us onto the real thread stack. */
@@ -1357,6 +1351,8 @@ SYM_CODE_START_LOCAL(error_entry)
* for these here too.
*/
.Lerror_kernelspace:
+ PUSH_AND_CLEAR_REGS save_ret=1
+ ENCODE_FRAME_POINTER 8
leaq native_irq_return_iret(%rip), %rcx
cmpq %rcx, RIP+8(%rsp)
je .Lerror_bad_iret
--
2.20.1

2020-05-27 10:14:06

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH 5/5] x86/entry: don't copy to tmp in fixup_bad_iret

It is safe to do memcpy() in fixup_bad_iret() now.

Signed-off-by: Lai Jiangshan <[email protected]>
---
arch/x86/kernel/traps.c | 20 +++++++++++++-------
1 file changed, 13 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 8291f84933ff..6fe72c771745 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -657,17 +657,23 @@ struct bad_iret_stack *fixup_bad_iret(struct bad_iret_stack *s)
* (rather than just below the IRET frame) and we want to
* pretend that the exception came from the IRET target.
*/
- struct bad_iret_stack tmp, *new_stack =
+ struct bad_iret_stack *new_stack =
(struct bad_iret_stack *)__this_cpu_read(cpu_current_top_of_stack) - 1;

- /* Copy the IRET target to the temporary storage. */
- memcpy(&tmp.regs.ip, (void *)s->regs.sp, 5*8);
+ /*
+ * Both the IRET frame and the saved pt_regs must be on the
+ * entry stacks since iret to user is only issued on the
+ * entry stacks. So they don't overlap with kernel stack and
+ * memcpy() is safe to copy them.
+ */
+ BUG_ON(((unsigned long)s - (unsigned long)new_stack) < PAGE_SIZE ||
+ ((unsigned long)new_stack - (unsigned long)s) < PAGE_SIZE);

- /* Copy the remainder of the stack from the current stack. */
- memcpy(&tmp, s, offsetof(struct bad_iret_stack, regs.ip));
+ /* Copy the IRET target to the new stack. */
+ memcpy(&new_stack->regs.ip, (void *)s->regs.sp, 5*8);

- /* Update the entry stack */
- memcpy(new_stack, &tmp, sizeof(tmp));
+ /* Copy the remainder of the stack from the current stack. */
+ memcpy(new_stack, s, offsetof(struct bad_iret_stack, regs.ip));

BUG_ON(!user_mode(&new_stack->regs));
return new_stack;
--
2.20.1

2020-05-27 10:14:06

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH 4/5] x86/entry: remove unused sync_regs()

No more users.

Signed-off-by: Lai Jiangshan <[email protected]>
---
arch/x86/include/asm/traps.h | 1 -
arch/x86/kernel/traps.c | 13 -------------
2 files changed, 14 deletions(-)

diff --git a/arch/x86/include/asm/traps.h b/arch/x86/include/asm/traps.h
index f5a2e438a878..20b9db7a1d49 100644
--- a/arch/x86/include/asm/traps.h
+++ b/arch/x86/include/asm/traps.h
@@ -21,7 +21,6 @@ asmlinkage void xen_page_fault(void);
dotraplinkage void do_page_fault(struct pt_regs *regs, unsigned long error_code, unsigned long address);

#ifdef CONFIG_X86_64
-asmlinkage __visible notrace struct pt_regs *sync_regs(struct pt_regs *eregs);
asmlinkage __visible notrace
struct bad_iret_stack *fixup_bad_iret(struct bad_iret_stack *s);
void __init trap_init(void);
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 3bef95934644..8291f84933ff 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -642,19 +642,6 @@ DEFINE_IDTENTRY_RAW(exc_int3)
}

#ifdef CONFIG_X86_64
-/*
- * Help handler running on a per-cpu (IST or entry trampoline) stack
- * to switch to the normal thread stack if the interrupted code was in
- * user mode. The actual stack switch is done in entry_64.S
- */
-asmlinkage __visible noinstr struct pt_regs *sync_regs(struct pt_regs *eregs)
-{
- struct pt_regs *regs = (struct pt_regs *)this_cpu_read(cpu_current_top_of_stack) - 1;
- if (regs != eregs)
- *regs = *eregs;
- return regs;
-}
-
struct bad_iret_stack {
void *error_entry_ret;
struct pt_regs regs;
--
2.20.1

2020-05-27 10:28:48

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH 3/5] x86/entry: directly switch to kernel stack when .Lerror_bad_iret

Directly copy pt_regs to kernel stack when .Lerror_bad_iret.
Directly switch to kernel stack when .Lerror_bad_iret.

We can see that entry_64.S do the following things back to back
when .Lerror_bad_iret:
call fixup_bad_iret(), switch to sp0 stack with pt_regs copied
call sync_regs(), switch to kernel stack with pt_regs copied

So we can do the all things together in fixup_bad_iret().

After this patch, fixup_bad_iret() is restored to the behavior before
7f2590a110b8("x86/entry/64: Use a per-CPU trampoline stack for IDT entries")

Signed-off-by: Lai Jiangshan <[email protected]>
---
arch/x86/entry/entry_64.S | 13 ++-----------
arch/x86/kernel/traps.c | 9 ++++-----
2 files changed, 6 insertions(+), 16 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index e8817ae31390..c5db048e5bed 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -1329,16 +1329,6 @@ SYM_CODE_START_LOCAL(error_entry)
ENCODE_FRAME_POINTER 8
ret

-.Lerror_entry_from_usermode_after_swapgs:
- /* Put us onto the real thread stack. */
- popq %r12 /* save return addr in %12 */
- movq %rsp, %rdi /* arg0 = pt_regs pointer */
- call sync_regs
- movq %rax, %rsp /* switch stack */
- ENCODE_FRAME_POINTER
- pushq %r12
- ret
-
.Lerror_entry_done_lfence:
FENCE_SWAPGS_KERNEL_ENTRY
.Lerror_entry_done:
@@ -1392,7 +1382,8 @@ SYM_CODE_START_LOCAL(error_entry)
mov %rsp, %rdi
call fixup_bad_iret
mov %rax, %rsp
- jmp .Lerror_entry_from_usermode_after_swapgs
+ ENCODE_FRAME_POINTER 8
+ ret
SYM_CODE_END(error_entry)

SYM_CODE_START_LOCAL(error_exit)
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 9e5d81cb94ba..3bef95934644 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -666,13 +666,12 @@ struct bad_iret_stack *fixup_bad_iret(struct bad_iret_stack *s)
/*
* This is called from entry_64.S early in handling a fault
* caused by a bad iret to user mode. To handle the fault
- * correctly, we want to move our stack frame to where it would
- * be had we entered directly on the entry stack (rather than
- * just below the IRET frame) and we want to pretend that the
- * exception came from the IRET target.
+ * correctly, we want to move our stack frame to kernel stack
+ * (rather than just below the IRET frame) and we want to
+ * pretend that the exception came from the IRET target.
*/
struct bad_iret_stack tmp, *new_stack =
- (struct bad_iret_stack *)__this_cpu_read(cpu_tss_rw.x86_tss.sp0) - 1;
+ (struct bad_iret_stack *)__this_cpu_read(cpu_current_top_of_stack) - 1;

/* Copy the IRET target to the temporary storage. */
memcpy(&tmp.regs.ip, (void *)s->regs.sp, 5*8);
--
2.20.1

2020-05-27 10:45:30

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH 1/5] x86/entry: introduce macro idtentry_swapgs_and_switch_to_kernel_stack

Move a portion of code to be a macro, and it will also be used in
next patch.

Just move around the code, no functionality changed.

Signed-off-by: Lai Jiangshan <[email protected]>
---
arch/x86/entry/entry_64.S | 60 ++++++++++++++++++++++++++-------------
1 file changed, 41 insertions(+), 19 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index d983a0d4bc73..5e983506f82e 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -698,28 +698,22 @@ SYM_CODE_END(\asmsym)
#include <asm/idtentry.h>

/*
- * Interrupt entry helper function.
+ * IDT entry helper macro for entering from userspace.
*
* Entry runs with interrupts off. Stack layout at entry:
- * +----------------------------------------------------+
- * | regs->ss |
- * | regs->rsp |
- * | regs->eflags |
- * | regs->cs |
- * | regs->ip |
- * +----------------------------------------------------+
- * | regs->orig_ax = ~(interrupt number) |
- * +----------------------------------------------------+
- * | return address |
- * +----------------------------------------------------+
+ * +--------------------+
+ * | regs->ss |
+ * | regs->rsp |
+ * | regs->eflags |
+ * | regs->cs |
+ * | regs->ip |
+ * | regs->orig_ax |
+ * | return address |
+ * +--------------------+
+ * The macro does swapgs and switches to current kernel stack with the
+ * same stack layout copied.
*/
-SYM_CODE_START(interrupt_entry)
- UNWIND_HINT_IRET_REGS offset=16
- ASM_CLAC
- cld
-
- testb $3, CS-ORIG_RAX+8(%rsp)
- jz 1f
+.macro idtentry_swapgs_and_switch_to_kernel_stack
SWAPGS
FENCE_SWAPGS_USER_ENTRY
/*
@@ -751,6 +745,34 @@ SYM_CODE_START(interrupt_entry)
pushq 8(%rdi) /* return address */

movq (%rdi), %rdi
+.endm
+
+/*
+ * Interrupt entry helper function.
+ *
+ * Entry runs with interrupts off. Stack layout at entry:
+ * +----------------------------------------------------+
+ * | regs->ss |
+ * | regs->rsp |
+ * | regs->eflags |
+ * | regs->cs |
+ * | regs->ip |
+ * +----------------------------------------------------+
+ * | regs->orig_ax = ~(interrupt number) |
+ * +----------------------------------------------------+
+ * | return address |
+ * +----------------------------------------------------+
+ */
+SYM_CODE_START(interrupt_entry)
+ UNWIND_HINT_IRET_REGS offset=16
+ ASM_CLAC
+ cld
+
+ testb $3, CS-ORIG_RAX+8(%rsp)
+ jz 1f
+
+ idtentry_swapgs_and_switch_to_kernel_stack
+
jmp 2f
1:
FENCE_SWAPGS_KERNEL_ENTRY
--
2.20.1

2020-05-28 19:16:45

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH 1/5] x86/entry: introduce macro idtentry_swapgs_and_switch_to_kernel_stack

Lai,

Lai Jiangshan <[email protected]> writes:

> Move a portion of code to be a macro, and it will also be used in
> next patch.
>
> Just move around the code, no functionality changed.

interrupt_enter does not exist anymore. See:

git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git x86/entry

Thanks,

tglx

2020-05-29 08:30:43

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH V2 0/4] x86/entry: simply stack switching when exception on userspace

7f2590a110b8("x86/entry/64: Use a per-CPU trampoline stack for IDT entries")
has resulted that when exception on userspace, the kernel (error_entry)
always push the pt_regs to entry stack(sp0), and then copy them to the
kernel stack.

And recent x86/entry work makes interrupt also use idtentry
and makes all the interrupt code save the pt_regs on the sp0 stack
and then copy it to the thread stack like exception.

This is hot path (page fault, ipi), such overhead should be avoided.
And the original interrupt_entry directly switches to kernel stack
and pushes pt_regs to kernel stack. We should do it for error_entry.
This is the job of patch1.

Patch 2-4 simply stack switching for .Lerror_bad_iret by just doing
all the work in one function (fixup_bad_iret()).

The patch set is based on tip/x86/entry (28447ea41542) (May 20).

Changed from V1:
based on tip/master -> based on tip/x86/entry

patch 1 replaces the patch1,2 of V1, it borrows the
original interrupt_entry's code into error_entry.

patch2-4 is V1's patch3-5, unchanged (but rebased)

Cc: Andy Lutomirski <[email protected]>,
Cc: Thomas Gleixner <[email protected]>,
Cc: Ingo Molnar <[email protected]>,
Cc: Borislav Petkov <[email protected]>,
Cc: [email protected],
Cc: "H. Peter Anvin" <[email protected]>,
Cc: Peter Zijlstra <[email protected]>,
Cc: Alexandre Chartre <[email protected]>,
Cc: "Eric W. Biederman" <[email protected]>,
Cc: Jann Horn <[email protected]>,
Cc: Dave Hansen <[email protected]>

Lai Jiangshan (4):
x86/entry: avoid calling into sync_regs() when entering from userspace
x86/entry: directly switch to kernel stack when .Lerror_bad_iret
x86/entry: remove unused sync_regs()
x86/entry: don't copy to tmp in fixup_bad_iret

arch/x86/entry/entry_64.S | 52 +++++++++++++++++++++++-------------
arch/x86/include/asm/traps.h | 1 -
arch/x86/kernel/traps.c | 42 ++++++++++++-----------------
3 files changed, 51 insertions(+), 44 deletions(-)

--
2.20.1

2020-05-29 18:35:07

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH V2 0/4] x86/entry: simply stack switching when exception on userspace

On Fri, May 29, 2020 at 1:26 AM Lai Jiangshan <[email protected]> wrote:
>
> 7f2590a110b8("x86/entry/64: Use a per-CPU trampoline stack for IDT entries")
> has resulted that when exception on userspace, the kernel (error_entry)
> always push the pt_regs to entry stack(sp0), and then copy them to the
> kernel stack.
>
> And recent x86/entry work makes interrupt also use idtentry
> and makes all the interrupt code save the pt_regs on the sp0 stack
> and then copy it to the thread stack like exception.
>
> This is hot path (page fault, ipi), such overhead should be avoided.
> And the original interrupt_entry directly switches to kernel stack
> and pushes pt_regs to kernel stack. We should do it for error_entry.
> This is the job of patch1.
>
> Patch 2-4 simply stack switching for .Lerror_bad_iret by just doing
> all the work in one function (fixup_bad_iret()).
>
> The patch set is based on tip/x86/entry (28447ea41542) (May 20).

There are definitely good cleanups in here, but I think it would be
nice rebased to whatever lands in 5.8-rc1 settles.

--Andy

2020-06-16 01:58:23

by Lai Jiangshan

[permalink] [raw]
Subject: Re: [PATCH V2 0/4] x86/entry: simply stack switching when exception on userspace

On Sat, May 30, 2020 at 2:33 AM Andy Lutomirski <[email protected]> wrote:
>
> On Fri, May 29, 2020 at 1:26 AM Lai Jiangshan <[email protected]> wrote:
> >
> > 7f2590a110b8("x86/entry/64: Use a per-CPU trampoline stack for IDT entries")
> > has resulted that when exception on userspace, the kernel (error_entry)
> > always push the pt_regs to entry stack(sp0), and then copy them to the
> > kernel stack.
> >
> > And recent x86/entry work makes interrupt also use idtentry
> > and makes all the interrupt code save the pt_regs on the sp0 stack
> > and then copy it to the thread stack like exception.
> >
> > This is hot path (page fault, ipi), such overhead should be avoided.
> > And the original interrupt_entry directly switches to kernel stack
> > and pushes pt_regs to kernel stack. We should do it for error_entry.
> > This is the job of patch1.
> >
> > Patch 2-4 simply stack switching for .Lerror_bad_iret by just doing
> > all the work in one function (fixup_bad_iret()).
> >
> > The patch set is based on tip/x86/entry (28447ea41542) (May 20).
>
> There are definitely good cleanups in here, but I think it would be
> nice rebased to whatever lands in 5.8-rc1 settles.
>

Hello, All

This patchset can be smoothly applicable to the newest tip/x86/entry
which has 5.8-rc1 merged. Which means I don't have to respin/resend it
until any update is needed.

Could you have a review on it please.

Thanks
Lai

2020-06-18 13:55:40

by Lai Jiangshan

[permalink] [raw]
Subject: Re: [PATCH V2 0/4] x86/entry: simply stack switching when exception on userspace

Hello and Ping

On Tue, Jun 16, 2020 at 9:56 AM Lai Jiangshan
<[email protected]> wrote:
>
> On Sat, May 30, 2020 at 2:33 AM Andy Lutomirski <[email protected]> wrote:
> >
> > On Fri, May 29, 2020 at 1:26 AM Lai Jiangshan <[email protected]> wrote:
> > >
> > > 7f2590a110b8("x86/entry/64: Use a per-CPU trampoline stack for IDT entries")
> > > has resulted that when exception on userspace, the kernel (error_entry)
> > > always push the pt_regs to entry stack(sp0), and then copy them to the
> > > kernel stack.

Ping Andy Lutomirski for having added the overhead two years ago.

> > >
> > > And recent x86/entry work makes interrupt also use idtentry
> > > and makes all the interrupt code save the pt_regs on the sp0 stack
> > > and then copy it to the thread stack like exception.
> > >

Ping Thomas Gleixner for having added the overhead recently.

>
> Hello, All
>
> This patchset can be smoothly applicable to the newest tip/x86/entry
> which has 5.8-rc1 merged. Which means I don't have to respin/resend it
> until any update is needed.
>
> Could you have a review on it please.
>
> Thanks
> Lai

2020-06-18 18:11:57

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH V2 0/4] x86/entry: simply stack switching when exception on userspace

Lai,

Lai Jiangshan <[email protected]> writes:

> Hello and Ping

you have poked on that on Tuesday, i.e. 2 days ago. It's on the todo
list, but not the must urgent problem on the planet.

Thanks

tglx

2020-06-27 21:04:59

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH V2 0/4] x86/entry: simply stack switching when exception on userspace

On Thu, Jun 18, 2020 at 7:10 AM Thomas Gleixner <[email protected]> wrote:
>
> Lai,
>
> Lai Jiangshan <[email protected]> writes:
>
> > Hello and Ping
>
> you have poked on that on Tuesday, i.e. 2 days ago. It's on the todo
> list, but not the must urgent problem on the planet.
>

Just as a heads up, I'd be surprised if I can get to this in time for
5.9. I'm still dealing with the fallout from 5.8-rc1, and there's no
shortage of fallout...

-Andy