2017-08-08 03:59:25

by Andy Lutomirski

[permalink] [raw]
Subject: [PATCH v2] x86/xen/64: Rearrange the SYSCALL entries

Xen's raw SYSCALL entries are much less weird than native. Rather
than fudging them to look like native entries, use the Xen-provided
stack frame directly.

This lets us eliminate entry_SYSCALL_64_after_swapgs and two uses of
the SWAPGS_UNSAFE_STACK paravirt hook. The SYSENTER code would
benefit from similar treatment.

This makes one change to the native code path: the compat
instruction that clears the high 32 bits of %rax is moved slightly
later. I'd be surprised if this affects performance at all.

Signed-off-by: Andy Lutomirski <[email protected]>
---

Changes from v1 (which I never actually emailed):
- Fix zero-extension in the compat case.

arch/x86/entry/entry_64.S | 9 ++-------
arch/x86/entry/entry_64_compat.S | 7 +++----
arch/x86/xen/xen-asm_64.S | 23 +++++++++--------------
3 files changed, 14 insertions(+), 25 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index aa58155187c5..7cee92cf807f 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -142,14 +142,8 @@ ENTRY(entry_SYSCALL_64)
* We do not frame this tiny irq-off block with TRACE_IRQS_OFF/ON,
* it is too small to ever cause noticeable irq latency.
*/
- SWAPGS_UNSAFE_STACK
- /*
- * A hypervisor implementation might want to use a label
- * after the swapgs, so that it can do the swapgs
- * for the guest and jump here on syscall.
- */
-GLOBAL(entry_SYSCALL_64_after_swapgs)

+ swapgs
movq %rsp, PER_CPU_VAR(rsp_scratch)
movq PER_CPU_VAR(cpu_current_top_of_stack), %rsp

@@ -161,6 +155,7 @@ GLOBAL(entry_SYSCALL_64_after_swapgs)
pushq %r11 /* pt_regs->flags */
pushq $__USER_CS /* pt_regs->cs */
pushq %rcx /* pt_regs->ip */
+GLOBAL(entry_SYSCALL_64_after_hwframe)
pushq %rax /* pt_regs->orig_ax */
pushq %rdi /* pt_regs->di */
pushq %rsi /* pt_regs->si */
diff --git a/arch/x86/entry/entry_64_compat.S b/arch/x86/entry/entry_64_compat.S
index e1721dafbcb1..5314d7b8e5ad 100644
--- a/arch/x86/entry/entry_64_compat.S
+++ b/arch/x86/entry/entry_64_compat.S
@@ -183,21 +183,20 @@ ENDPROC(entry_SYSENTER_compat)
*/
ENTRY(entry_SYSCALL_compat)
/* Interrupts are off on entry. */
- SWAPGS_UNSAFE_STACK
+ swapgs

/* Stash user ESP and switch to the kernel stack. */
movl %esp, %r8d
movq PER_CPU_VAR(cpu_current_top_of_stack), %rsp

- /* Zero-extending 32-bit regs, do not remove */
- movl %eax, %eax
-
/* Construct struct pt_regs on stack */
pushq $__USER32_DS /* pt_regs->ss */
pushq %r8 /* pt_regs->sp */
pushq %r11 /* pt_regs->flags */
pushq $__USER32_CS /* pt_regs->cs */
pushq %rcx /* pt_regs->ip */
+GLOBAL(entry_SYSCALL_compat_after_hwframe)
+ movl %eax, %eax /* discard orig_ax high bits */
pushq %rax /* pt_regs->orig_ax */
pushq %rdi /* pt_regs->di */
pushq %rsi /* pt_regs->si */
diff --git a/arch/x86/xen/xen-asm_64.S b/arch/x86/xen/xen-asm_64.S
index c3df43141e70..a8a4f4c460a6 100644
--- a/arch/x86/xen/xen-asm_64.S
+++ b/arch/x86/xen/xen-asm_64.S
@@ -82,34 +82,29 @@ RELOC(xen_sysret64, 1b+1)
* rip
* r11
* rsp->rcx
- *
- * In all the entrypoints, we undo all that to make it look like a
- * CPU-generated syscall/sysenter and jump to the normal entrypoint.
*/

-.macro undo_xen_syscall
- mov 0*8(%rsp), %rcx
- mov 1*8(%rsp), %r11
- mov 5*8(%rsp), %rsp
-.endm
-
/* Normal 64-bit system call target */
ENTRY(xen_syscall_target)
- undo_xen_syscall
- jmp entry_SYSCALL_64_after_swapgs
+ popq %rcx
+ popq %r11
+ jmp entry_SYSCALL_64_after_hwframe
ENDPROC(xen_syscall_target)

#ifdef CONFIG_IA32_EMULATION

/* 32-bit compat syscall target */
ENTRY(xen_syscall32_target)
- undo_xen_syscall
- jmp entry_SYSCALL_compat
+ popq %rcx
+ popq %r11
+ jmp entry_SYSCALL_compat_after_hwframe
ENDPROC(xen_syscall32_target)

/* 32-bit compat sysenter target */
ENTRY(xen_sysenter_target)
- undo_xen_syscall
+ mov 0*8(%rsp), %rcx
+ mov 1*8(%rsp), %r11
+ mov 5*8(%rsp), %rsp
jmp entry_SYSENTER_compat
ENDPROC(xen_sysenter_target)

--
2.13.3


2017-08-09 15:39:17

by Jürgen Groß

[permalink] [raw]
Subject: Re: [PATCH v2] x86/xen/64: Rearrange the SYSCALL entries

On 08/08/17 05:59, Andy Lutomirski wrote:
> Xen's raw SYSCALL entries are much less weird than native. Rather
> than fudging them to look like native entries, use the Xen-provided
> stack frame directly.
>
> This lets us eliminate entry_SYSCALL_64_after_swapgs and two uses of
> the SWAPGS_UNSAFE_STACK paravirt hook. The SYSENTER code would
> benefit from similar treatment.
>
> This makes one change to the native code path: the compat
> instruction that clears the high 32 bits of %rax is moved slightly
> later. I'd be surprised if this affects performance at all.
>
> Signed-off-by: Andy Lutomirski <[email protected]>

Reviewed-by: Juergen Gross <[email protected]>
Tested-by: Juergen Gross <[email protected]>


Thanks,

Juergen

Subject: [tip:x86/asm] x86/xen/64: Rearrange the SYSCALL entries

Commit-ID: 8a9949bc71a71b3dd633255ebe8f8869b1f73474
Gitweb: http://git.kernel.org/tip/8a9949bc71a71b3dd633255ebe8f8869b1f73474
Author: Andy Lutomirski <[email protected]>
AuthorDate: Mon, 7 Aug 2017 20:59:21 -0700
Committer: Ingo Molnar <[email protected]>
CommitDate: Thu, 10 Aug 2017 13:14:32 +0200

x86/xen/64: Rearrange the SYSCALL entries

Xen's raw SYSCALL entries are much less weird than native. Rather
than fudging them to look like native entries, use the Xen-provided
stack frame directly.

This lets us eliminate entry_SYSCALL_64_after_swapgs and two uses of
the SWAPGS_UNSAFE_STACK paravirt hook. The SYSENTER code would
benefit from similar treatment.

This makes one change to the native code path: the compat
instruction that clears the high 32 bits of %rax is moved slightly
later. I'd be surprised if this affects performance at all.

Tested-by: Juergen Gross <[email protected]>
Signed-off-by: Andy Lutomirski <[email protected]>
Reviewed-by: Juergen Gross <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Link: http://lkml.kernel.org/r/7c88ed36805d36841ab03ec3b48b4122c4418d71.1502164668.git.luto@kernel.org
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/entry/entry_64.S | 9 ++-------
arch/x86/entry/entry_64_compat.S | 7 +++----
arch/x86/xen/xen-asm_64.S | 23 +++++++++--------------
3 files changed, 14 insertions(+), 25 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 64b233a..4dbb336 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -142,14 +142,8 @@ ENTRY(entry_SYSCALL_64)
* We do not frame this tiny irq-off block with TRACE_IRQS_OFF/ON,
* it is too small to ever cause noticeable irq latency.
*/
- SWAPGS_UNSAFE_STACK
- /*
- * A hypervisor implementation might want to use a label
- * after the swapgs, so that it can do the swapgs
- * for the guest and jump here on syscall.
- */
-GLOBAL(entry_SYSCALL_64_after_swapgs)

+ swapgs
movq %rsp, PER_CPU_VAR(rsp_scratch)
movq PER_CPU_VAR(cpu_current_top_of_stack), %rsp

@@ -161,6 +155,7 @@ GLOBAL(entry_SYSCALL_64_after_swapgs)
pushq %r11 /* pt_regs->flags */
pushq $__USER_CS /* pt_regs->cs */
pushq %rcx /* pt_regs->ip */
+GLOBAL(entry_SYSCALL_64_after_hwframe)
pushq %rax /* pt_regs->orig_ax */
pushq %rdi /* pt_regs->di */
pushq %rsi /* pt_regs->si */
diff --git a/arch/x86/entry/entry_64_compat.S b/arch/x86/entry/entry_64_compat.S
index e1721da..5314d7b 100644
--- a/arch/x86/entry/entry_64_compat.S
+++ b/arch/x86/entry/entry_64_compat.S
@@ -183,21 +183,20 @@ ENDPROC(entry_SYSENTER_compat)
*/
ENTRY(entry_SYSCALL_compat)
/* Interrupts are off on entry. */
- SWAPGS_UNSAFE_STACK
+ swapgs

/* Stash user ESP and switch to the kernel stack. */
movl %esp, %r8d
movq PER_CPU_VAR(cpu_current_top_of_stack), %rsp

- /* Zero-extending 32-bit regs, do not remove */
- movl %eax, %eax
-
/* Construct struct pt_regs on stack */
pushq $__USER32_DS /* pt_regs->ss */
pushq %r8 /* pt_regs->sp */
pushq %r11 /* pt_regs->flags */
pushq $__USER32_CS /* pt_regs->cs */
pushq %rcx /* pt_regs->ip */
+GLOBAL(entry_SYSCALL_compat_after_hwframe)
+ movl %eax, %eax /* discard orig_ax high bits */
pushq %rax /* pt_regs->orig_ax */
pushq %rdi /* pt_regs->di */
pushq %rsi /* pt_regs->si */
diff --git a/arch/x86/xen/xen-asm_64.S b/arch/x86/xen/xen-asm_64.S
index c3df431..a8a4f4c 100644
--- a/arch/x86/xen/xen-asm_64.S
+++ b/arch/x86/xen/xen-asm_64.S
@@ -82,34 +82,29 @@ RELOC(xen_sysret64, 1b+1)
* rip
* r11
* rsp->rcx
- *
- * In all the entrypoints, we undo all that to make it look like a
- * CPU-generated syscall/sysenter and jump to the normal entrypoint.
*/

-.macro undo_xen_syscall
- mov 0*8(%rsp), %rcx
- mov 1*8(%rsp), %r11
- mov 5*8(%rsp), %rsp
-.endm
-
/* Normal 64-bit system call target */
ENTRY(xen_syscall_target)
- undo_xen_syscall
- jmp entry_SYSCALL_64_after_swapgs
+ popq %rcx
+ popq %r11
+ jmp entry_SYSCALL_64_after_hwframe
ENDPROC(xen_syscall_target)

#ifdef CONFIG_IA32_EMULATION

/* 32-bit compat syscall target */
ENTRY(xen_syscall32_target)
- undo_xen_syscall
- jmp entry_SYSCALL_compat
+ popq %rcx
+ popq %r11
+ jmp entry_SYSCALL_compat_after_hwframe
ENDPROC(xen_syscall32_target)

/* 32-bit compat sysenter target */
ENTRY(xen_sysenter_target)
- undo_xen_syscall
+ mov 0*8(%rsp), %rcx
+ mov 1*8(%rsp), %r11
+ mov 5*8(%rsp), %rsp
jmp entry_SYSENTER_compat
ENDPROC(xen_sysenter_target)


2017-08-14 02:44:31

by Brian Gerst

[permalink] [raw]
Subject: Re: [PATCH v2] x86/xen/64: Rearrange the SYSCALL entries

On Mon, Aug 7, 2017 at 11:59 PM, Andy Lutomirski <[email protected]> wrote:
> Xen's raw SYSCALL entries are much less weird than native. Rather
> than fudging them to look like native entries, use the Xen-provided
> stack frame directly.
>
> This lets us eliminate entry_SYSCALL_64_after_swapgs and two uses of
> the SWAPGS_UNSAFE_STACK paravirt hook. The SYSENTER code would
> benefit from similar treatment.
>
> This makes one change to the native code path: the compat
> instruction that clears the high 32 bits of %rax is moved slightly
> later. I'd be surprised if this affects performance at all.
>
> Signed-off-by: Andy Lutomirski <[email protected]>
> ---
>
> Changes from v1 (which I never actually emailed):
> - Fix zero-extension in the compat case.
>
> arch/x86/entry/entry_64.S | 9 ++-------
> arch/x86/entry/entry_64_compat.S | 7 +++----
> arch/x86/xen/xen-asm_64.S | 23 +++++++++--------------
> 3 files changed, 14 insertions(+), 25 deletions(-)
>
> diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
> index aa58155187c5..7cee92cf807f 100644
> --- a/arch/x86/entry/entry_64.S
> +++ b/arch/x86/entry/entry_64.S
> @@ -142,14 +142,8 @@ ENTRY(entry_SYSCALL_64)
> * We do not frame this tiny irq-off block with TRACE_IRQS_OFF/ON,
> * it is too small to ever cause noticeable irq latency.
> */
> - SWAPGS_UNSAFE_STACK
> - /*
> - * A hypervisor implementation might want to use a label
> - * after the swapgs, so that it can do the swapgs
> - * for the guest and jump here on syscall.
> - */
> -GLOBAL(entry_SYSCALL_64_after_swapgs)
>
> + swapgs
> movq %rsp, PER_CPU_VAR(rsp_scratch)
> movq PER_CPU_VAR(cpu_current_top_of_stack), %rsp
>
> @@ -161,6 +155,7 @@ GLOBAL(entry_SYSCALL_64_after_swapgs)
> pushq %r11 /* pt_regs->flags */
> pushq $__USER_CS /* pt_regs->cs */
> pushq %rcx /* pt_regs->ip */
> +GLOBAL(entry_SYSCALL_64_after_hwframe)
> pushq %rax /* pt_regs->orig_ax */
> pushq %rdi /* pt_regs->di */
> pushq %rsi /* pt_regs->si */
> diff --git a/arch/x86/entry/entry_64_compat.S b/arch/x86/entry/entry_64_compat.S
> index e1721dafbcb1..5314d7b8e5ad 100644
> --- a/arch/x86/entry/entry_64_compat.S
> +++ b/arch/x86/entry/entry_64_compat.S
> @@ -183,21 +183,20 @@ ENDPROC(entry_SYSENTER_compat)
> */
> ENTRY(entry_SYSCALL_compat)
> /* Interrupts are off on entry. */
> - SWAPGS_UNSAFE_STACK
> + swapgs
>
> /* Stash user ESP and switch to the kernel stack. */
> movl %esp, %r8d
> movq PER_CPU_VAR(cpu_current_top_of_stack), %rsp
>
> - /* Zero-extending 32-bit regs, do not remove */
> - movl %eax, %eax
> -
> /* Construct struct pt_regs on stack */
> pushq $__USER32_DS /* pt_regs->ss */
> pushq %r8 /* pt_regs->sp */
> pushq %r11 /* pt_regs->flags */
> pushq $__USER32_CS /* pt_regs->cs */
> pushq %rcx /* pt_regs->ip */
> +GLOBAL(entry_SYSCALL_compat_after_hwframe)
> + movl %eax, %eax /* discard orig_ax high bits */
> pushq %rax /* pt_regs->orig_ax */
> pushq %rdi /* pt_regs->di */
> pushq %rsi /* pt_regs->si */
> diff --git a/arch/x86/xen/xen-asm_64.S b/arch/x86/xen/xen-asm_64.S
> index c3df43141e70..a8a4f4c460a6 100644
> --- a/arch/x86/xen/xen-asm_64.S
> +++ b/arch/x86/xen/xen-asm_64.S
> @@ -82,34 +82,29 @@ RELOC(xen_sysret64, 1b+1)
> * rip
> * r11
> * rsp->rcx
> - *
> - * In all the entrypoints, we undo all that to make it look like a
> - * CPU-generated syscall/sysenter and jump to the normal entrypoint.
> */
>
> -.macro undo_xen_syscall
> - mov 0*8(%rsp), %rcx
> - mov 1*8(%rsp), %r11
> - mov 5*8(%rsp), %rsp
> -.endm
> -
> /* Normal 64-bit system call target */
> ENTRY(xen_syscall_target)
> - undo_xen_syscall
> - jmp entry_SYSCALL_64_after_swapgs
> + popq %rcx
> + popq %r11
> + jmp entry_SYSCALL_64_after_hwframe
> ENDPROC(xen_syscall_target)
>
> #ifdef CONFIG_IA32_EMULATION
>
> /* 32-bit compat syscall target */
> ENTRY(xen_syscall32_target)
> - undo_xen_syscall
> - jmp entry_SYSCALL_compat
> + popq %rcx
> + popq %r11
> + jmp entry_SYSCALL_compat_after_hwframe
> ENDPROC(xen_syscall32_target)
>
> /* 32-bit compat sysenter target */
> ENTRY(xen_sysenter_target)
> - undo_xen_syscall
> + mov 0*8(%rsp), %rcx
> + mov 1*8(%rsp), %r11
> + mov 5*8(%rsp), %rsp
> jmp entry_SYSENTER_compat
> ENDPROC(xen_sysenter_target)

This patch causes the iopl_32 and ioperm_32 self-tests to fail on a
64-bit PV kernel. The 64-bit versions pass. It gets a seg fault after
"parent: write to 0x80 (should fail)", and the fault isn't caught by
the signal handler. It just dumps back to the shell. The tests pass
after reverting this.

--
Brian Gerst

2017-08-14 05:53:59

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH v2] x86/xen/64: Rearrange the SYSCALL entries

On Sun, Aug 13, 2017 at 7:44 PM, Brian Gerst <[email protected]> wrote:
> On Mon, Aug 7, 2017 at 11:59 PM, Andy Lutomirski <[email protected]> wrote:
>> /* Normal 64-bit system call target */
>> ENTRY(xen_syscall_target)
>> - undo_xen_syscall
>> - jmp entry_SYSCALL_64_after_swapgs
>> + popq %rcx
>> + popq %r11
>> + jmp entry_SYSCALL_64_after_hwframe
>> ENDPROC(xen_syscall_target)
>>
>> #ifdef CONFIG_IA32_EMULATION
>>
>> /* 32-bit compat syscall target */
>> ENTRY(xen_syscall32_target)
>> - undo_xen_syscall
>> - jmp entry_SYSCALL_compat
>> + popq %rcx
>> + popq %r11
>> + jmp entry_SYSCALL_compat_after_hwframe
>> ENDPROC(xen_syscall32_target)
>>
>> /* 32-bit compat sysenter target */
>> ENTRY(xen_sysenter_target)
>> - undo_xen_syscall
>> + mov 0*8(%rsp), %rcx
>> + mov 1*8(%rsp), %r11
>> + mov 5*8(%rsp), %rsp
>> jmp entry_SYSENTER_compat
>> ENDPROC(xen_sysenter_target)
>
> This patch causes the iopl_32 and ioperm_32 self-tests to fail on a
> 64-bit PV kernel. The 64-bit versions pass. It gets a seg fault after
> "parent: write to 0x80 (should fail)", and the fault isn't caught by
> the signal handler. It just dumps back to the shell. The tests pass
> after reverting this.

I can reproduce it if I emulate an AMD machine. I can "fix" it like this:

diff --git a/arch/x86/xen/xen-asm_64.S b/arch/x86/xen/xen-asm_64.S
index a8a4f4c460a6..6255e00f425e 100644
--- a/arch/x86/xen/xen-asm_64.S
+++ b/arch/x86/xen/xen-asm_64.S
@@ -97,6 +97,9 @@ ENDPROC(xen_syscall_target)
ENTRY(xen_syscall32_target)
popq %rcx
popq %r11
+ movq $__USER32_DS, 4*8(%rsp)
+ movq $__USER32_CS, 1*8(%rsp)
+ movq %r11, 2*8(%rsp)
jmp entry_SYSCALL_compat_after_hwframe
ENDPROC(xen_syscall32_target)

but I haven't tried to diagnose precisely what's going on.

Xen seems to be putting the 0xe0?? values in ss and cs, which oughtn't
to be a problem, but it kills opportunistic sysretl. Maybe that's
triggering a preexisting bug?

2017-08-14 06:42:28

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH v2] x86/xen/64: Rearrange the SYSCALL entries

On Sun, Aug 13, 2017 at 10:53 PM, Andy Lutomirski <[email protected]> wrote:
> On Sun, Aug 13, 2017 at 7:44 PM, Brian Gerst <[email protected]> wrote:
>> On Mon, Aug 7, 2017 at 11:59 PM, Andy Lutomirski <[email protected]> wrote:
>>> /* Normal 64-bit system call target */
>>> ENTRY(xen_syscall_target)
>>> - undo_xen_syscall
>>> - jmp entry_SYSCALL_64_after_swapgs
>>> + popq %rcx
>>> + popq %r11
>>> + jmp entry_SYSCALL_64_after_hwframe
>>> ENDPROC(xen_syscall_target)
>>>
>>> #ifdef CONFIG_IA32_EMULATION
>>>
>>> /* 32-bit compat syscall target */
>>> ENTRY(xen_syscall32_target)
>>> - undo_xen_syscall
>>> - jmp entry_SYSCALL_compat
>>> + popq %rcx
>>> + popq %r11
>>> + jmp entry_SYSCALL_compat_after_hwframe
>>> ENDPROC(xen_syscall32_target)
>>>
>>> /* 32-bit compat sysenter target */
>>> ENTRY(xen_sysenter_target)
>>> - undo_xen_syscall
>>> + mov 0*8(%rsp), %rcx
>>> + mov 1*8(%rsp), %r11
>>> + mov 5*8(%rsp), %rsp
>>> jmp entry_SYSENTER_compat
>>> ENDPROC(xen_sysenter_target)
>>
>> This patch causes the iopl_32 and ioperm_32 self-tests to fail on a
>> 64-bit PV kernel. The 64-bit versions pass. It gets a seg fault after
>> "parent: write to 0x80 (should fail)", and the fault isn't caught by
>> the signal handler. It just dumps back to the shell. The tests pass
>> after reverting this.
>
> I can reproduce it if I emulate an AMD machine. I can "fix" it like this:
>
> diff --git a/arch/x86/xen/xen-asm_64.S b/arch/x86/xen/xen-asm_64.S
> index a8a4f4c460a6..6255e00f425e 100644
> --- a/arch/x86/xen/xen-asm_64.S
> +++ b/arch/x86/xen/xen-asm_64.S
> @@ -97,6 +97,9 @@ ENDPROC(xen_syscall_target)
> ENTRY(xen_syscall32_target)
> popq %rcx
> popq %r11
> + movq $__USER32_DS, 4*8(%rsp)
> + movq $__USER32_CS, 1*8(%rsp)
> + movq %r11, 2*8(%rsp)
> jmp entry_SYSCALL_compat_after_hwframe
> ENDPROC(xen_syscall32_target)
>
> but I haven't tried to diagnose precisely what's going on.
>
> Xen seems to be putting the 0xe0?? values in ss and cs, which oughtn't
> to be a problem, but it kills opportunistic sysretl. Maybe that's
> triggering a preexisting bug?

It is indeed triggering an existing but, but that bug is not a kernel
bug :) It's this thing:

https://sourceware.org/bugzilla/show_bug.cgi?id=21269

See, we have this old legacy garbage in which, when running with
nonstandard SS, a certain special, otherwise nonsensical input to
sigaction() causes a stack switch. Xen PV runs user code with a
nonstandard SS, and glibc accidentally passes this weird parameter to
sigaction() on a regular basis. With this patch applied, the kernel
suddenly starts to *realize* that ss is weird, and boom. (Or maybe it
increases the chance that SS is actually weird, since I'd expect this
to trip on #GP, not SYSCALL. But I don't care quite enough to dig
further.)

Patch coming.

2017-08-14 06:45:30

by Andrew Cooper

[permalink] [raw]
Subject: Re: [PATCH v2] x86/xen/64: Rearrange the SYSCALL entries

On 14/08/2017 06:53, Andy Lutomirski wrote:
> On Sun, Aug 13, 2017 at 7:44 PM, Brian Gerst <[email protected]> wrote:
>> On Mon, Aug 7, 2017 at 11:59 PM, Andy Lutomirski <[email protected]> wrote:
>>> /* Normal 64-bit system call target */
>>> ENTRY(xen_syscall_target)
>>> - undo_xen_syscall
>>> - jmp entry_SYSCALL_64_after_swapgs
>>> + popq %rcx
>>> + popq %r11
>>> + jmp entry_SYSCALL_64_after_hwframe
>>> ENDPROC(xen_syscall_target)
>>>
>>> #ifdef CONFIG_IA32_EMULATION
>>>
>>> /* 32-bit compat syscall target */
>>> ENTRY(xen_syscall32_target)
>>> - undo_xen_syscall
>>> - jmp entry_SYSCALL_compat
>>> + popq %rcx
>>> + popq %r11
>>> + jmp entry_SYSCALL_compat_after_hwframe
>>> ENDPROC(xen_syscall32_target)
>>>
>>> /* 32-bit compat sysenter target */
>>> ENTRY(xen_sysenter_target)
>>> - undo_xen_syscall
>>> + mov 0*8(%rsp), %rcx
>>> + mov 1*8(%rsp), %r11
>>> + mov 5*8(%rsp), %rsp
>>> jmp entry_SYSENTER_compat
>>> ENDPROC(xen_sysenter_target)
>> This patch causes the iopl_32 and ioperm_32 self-tests to fail on a
>> 64-bit PV kernel. The 64-bit versions pass. It gets a seg fault after
>> "parent: write to 0x80 (should fail)", and the fault isn't caught by
>> the signal handler. It just dumps back to the shell. The tests pass
>> after reverting this.
> I can reproduce it if I emulate an AMD machine. I can "fix" it like this:
>
> diff --git a/arch/x86/xen/xen-asm_64.S b/arch/x86/xen/xen-asm_64.S
> index a8a4f4c460a6..6255e00f425e 100644
> --- a/arch/x86/xen/xen-asm_64.S
> +++ b/arch/x86/xen/xen-asm_64.S
> @@ -97,6 +97,9 @@ ENDPROC(xen_syscall_target)
> ENTRY(xen_syscall32_target)
> popq %rcx
> popq %r11
> + movq $__USER32_DS, 4*8(%rsp)
> + movq $__USER32_CS, 1*8(%rsp)
> + movq %r11, 2*8(%rsp)
> jmp entry_SYSCALL_compat_after_hwframe
> ENDPROC(xen_syscall32_target)
>
> but I haven't tried to diagnose precisely what's going on.
>
> Xen seems to be putting the 0xe0?? values in ss and cs, which oughtn't
> to be a problem, but it kills opportunistic sysretl. Maybe that's
> triggering a preexisting bug?

The reason %rcx/%r11 are extra on the stack is because Xen uses sysret
to get here. This is part of the 64bit PV ABI.

%cs will be 0xe033 (FLAT_CS64) and %ss will be 0xe02b (FLAT_SS64).

I would expect it to kill opportunistic sysret, as Xen's and Linux's
idea of using sysret differ.

~Andrew

2017-08-14 12:46:02

by Brian Gerst

[permalink] [raw]
Subject: Re: [PATCH v2] x86/xen/64: Rearrange the SYSCALL entries

On Mon, Aug 14, 2017 at 1:53 AM, Andy Lutomirski <[email protected]> wrote:
> On Sun, Aug 13, 2017 at 7:44 PM, Brian Gerst <[email protected]> wrote:
>> On Mon, Aug 7, 2017 at 11:59 PM, Andy Lutomirski <[email protected]> wrote:
>>> /* Normal 64-bit system call target */
>>> ENTRY(xen_syscall_target)
>>> - undo_xen_syscall
>>> - jmp entry_SYSCALL_64_after_swapgs
>>> + popq %rcx
>>> + popq %r11
>>> + jmp entry_SYSCALL_64_after_hwframe
>>> ENDPROC(xen_syscall_target)
>>>
>>> #ifdef CONFIG_IA32_EMULATION
>>>
>>> /* 32-bit compat syscall target */
>>> ENTRY(xen_syscall32_target)
>>> - undo_xen_syscall
>>> - jmp entry_SYSCALL_compat
>>> + popq %rcx
>>> + popq %r11
>>> + jmp entry_SYSCALL_compat_after_hwframe
>>> ENDPROC(xen_syscall32_target)
>>>
>>> /* 32-bit compat sysenter target */
>>> ENTRY(xen_sysenter_target)
>>> - undo_xen_syscall
>>> + mov 0*8(%rsp), %rcx
>>> + mov 1*8(%rsp), %r11
>>> + mov 5*8(%rsp), %rsp
>>> jmp entry_SYSENTER_compat
>>> ENDPROC(xen_sysenter_target)
>>
>> This patch causes the iopl_32 and ioperm_32 self-tests to fail on a
>> 64-bit PV kernel. The 64-bit versions pass. It gets a seg fault after
>> "parent: write to 0x80 (should fail)", and the fault isn't caught by
>> the signal handler. It just dumps back to the shell. The tests pass
>> after reverting this.
>
> I can reproduce it if I emulate an AMD machine. I can "fix" it like this:

Yes, this is an AMD processor.

> diff --git a/arch/x86/xen/xen-asm_64.S b/arch/x86/xen/xen-asm_64.S
> index a8a4f4c460a6..6255e00f425e 100644
> --- a/arch/x86/xen/xen-asm_64.S
> +++ b/arch/x86/xen/xen-asm_64.S
> @@ -97,6 +97,9 @@ ENDPROC(xen_syscall_target)
> ENTRY(xen_syscall32_target)
> popq %rcx
> popq %r11
> + movq $__USER32_DS, 4*8(%rsp)
> + movq $__USER32_CS, 1*8(%rsp)
> + movq %r11, 2*8(%rsp)
> jmp entry_SYSCALL_compat_after_hwframe
> ENDPROC(xen_syscall32_target)
>
> but I haven't tried to diagnose precisely what's going on.
>
> Xen seems to be putting the 0xe0?? values in ss and cs, which oughtn't
> to be a problem, but it kills opportunistic sysretl. Maybe that's
> triggering a preexisting bug?

Resetting the CS/SS values worked. Looking at the Xen hypervisor
code, EFLAGS on the stack should already be set to the value in R11,
so that part doesn't appear necessary.

Shouldn't this also be done for the 64-bit SYSCALL entry, for
consistency with previous code?

--
Brian Gerst