2015-07-24 13:47:58

by Denys Vlasenko

[permalink] [raw]
Subject: [PATCH 1/3] x86/asm/entry/32: Massage SYSENTER32 fast path to be nearly identical to SYSCALL32

This change swaps a few instructions in final register restoring/zeroing
section of SYSENTER fast path, and adds/deletes a few empty lines.

After this, the difference between SYSENTER and SYCALL fast paths
(after the prologue which saved pt_regs) is very small:
they differ merely in the choice of register to hold arg6 (EBP or R9)
and in the value of EDX on exit: SYSENTER ABI doesn't need to preserve it,
so it is zeroed. SYSCALL preserves it:

|(prologue is different)
| orl $TS_COMPAT, ASM_THREAD_INFO(TI_status, %rsp, SIZEOF_PTREGS)
| testl $_TIF_WORK_SYSCALL_ENTRY, ASM_THREAD_INFO(TI_flags, %rsp, SIZEOF_PTREGS)
|- jnz sysenter_tracesys
|+ jnz cstar_tracesys
|
|-sysenter_do_call:
|+cstar_do_call:
| /* 32-bit syscall -> 64-bit C ABI argument conversion */
| movl %edi, %r8d /* arg5 */
|- movl %ebp, %r9d /* arg6 */
|+ /* r9 already loaded */ /* arg6 */
| xchg %ecx, %esi /* rsi:arg2, rcx:arg4 */
| movl %ebx, %edi /* arg1 */
| movl %edx, %edx /* arg3 (zero extension) */
|
|-sysenter_dispatch:
|+cstar_dispatch:
| cmpq $(IA32_NR_syscalls-1), %rax
| ja 1f
| call *ia32_sys_call_table(, %rax, 8)
|@@ -19,15 +19,15 @@
| DISABLE_INTERRUPTS(CLBR_NONE)
| TRACE_IRQS_OFF
| testl $_TIF_ALLWORK_MASK, ASM_THREAD_INFO(TI_flags, %rsp, SIZEOF_PTREGS)
|- jnz sysexit_audit
|+ jnz sysretl_audit
|
|-sysexit_from_sys_call:
|+sysretl_from_sys_call:
| andl $~TS_COMPAT, ASM_THREAD_INFO(TI_status, %rsp, SIZEOF_PTREGS)
|+ movl RDX(%rsp), %edx
| movl RSI(%rsp), %esi
| movl RDI(%rsp), %edi
| movl RIP(%rsp), %ecx
| movl EFLAGS(%rsp), %r11d
|- xorl %edx, %edx
| xorq %r10, %r10
| xorq %r9, %r9
| xorq %r8, %r8
|(the rest of fast path, up to final SYSRET32, is identical)

This is a preparatory change which allows to drop most of SYSENTER machinery
and make SYSENTER reuse SYSCALL code.

Signed-off-by: Denys Vlasenko <[email protected]>
CC: Ingo Molnar <[email protected]>
CC: Linus Torvalds <[email protected]>
CC: Krzysztof A. Sobiecki <[email protected]>
CC: Steven Rostedt <[email protected]>
CC: Borislav Petkov <[email protected]>
CC: "H. Peter Anvin" <[email protected]>
CC: Andy Lutomirski <[email protected]>
CC: Oleg Nesterov <[email protected]>
CC: Frederic Weisbecker <[email protected]>
CC: Alexei Starovoitov <[email protected]>
CC: Will Drewry <[email protected]>
CC: Kees Cook <[email protected]>
CC: [email protected]
CC: [email protected]
---
arch/x86/entry/entry_64_compat.S | 12 +++++++-----
1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/arch/x86/entry/entry_64_compat.S b/arch/x86/entry/entry_64_compat.S
index 8997383..9f9dfa5 100644
--- a/arch/x86/entry/entry_64_compat.S
+++ b/arch/x86/entry/entry_64_compat.S
@@ -117,6 +117,7 @@ sysenter_do_call:
xchg %ecx, %esi /* rsi:arg2, rcx:arg4 */
movl %ebx, %edi /* arg1 */
movl %edx, %edx /* arg3 (zero extension) */
+
sysenter_dispatch:
cmpq $(IA32_NR_syscalls-1), %rax
ja 1f
@@ -127,6 +128,7 @@ sysenter_dispatch:
TRACE_IRQS_OFF
testl $_TIF_ALLWORK_MASK, ASM_THREAD_INFO(TI_flags, %rsp, SIZEOF_PTREGS)
jnz sysexit_audit
+
sysexit_from_sys_call:
/*
* NB: SYSEXIT is not obviously safe for 64-bit kernels -- an
@@ -139,14 +141,14 @@ sysexit_from_sys_call:
* with 'sysenter' and it uses the SYSENTER calling convention.
*/
andl $~TS_COMPAT, ASM_THREAD_INFO(TI_status, %rsp, SIZEOF_PTREGS)
- movl RIP(%rsp), %ecx /* User %eip */
movl RSI(%rsp), %esi
movl RDI(%rsp), %edi
+ movl RIP(%rsp), %ecx /* User %eip */
+ movl EFLAGS(%rsp), %r11d /* User eflags */
xorl %edx, %edx /* Do not leak kernel information */
- xorq %r8, %r8
- xorq %r9, %r9
xorq %r10, %r10
- movl EFLAGS(%rsp), %r11d /* User eflags */
+ xorq %r9, %r9
+ xorq %r8, %r8
TRACE_IRQS_ON

/*
@@ -340,6 +342,7 @@ ENTRY(entry_SYSCALL_compat)
1: movl (%r8), %r9d
_ASM_EXTABLE(1b, ia32_badarg)
ASM_CLAC
+
orl $TS_COMPAT, ASM_THREAD_INFO(TI_status, %rsp, SIZEOF_PTREGS)
testl $_TIF_WORK_SYSCALL_ENTRY, ASM_THREAD_INFO(TI_flags, %rsp, SIZEOF_PTREGS)
jnz cstar_tracesys
@@ -355,7 +358,6 @@ cstar_do_call:
cstar_dispatch:
cmpq $(IA32_NR_syscalls-1), %rax
ja 1f
-
call *ia32_sys_call_table(, %rax, 8)
movq %rax, RAX(%rsp)
1:
--
1.8.1.4


2015-07-24 13:48:14

by Denys Vlasenko

[permalink] [raw]
Subject: [PATCH 2/3] x86/asm/entry/32: Remove most of SYSCALL32 code, part 1

SYSCALL32 code is nearly identical to SYSCALL32, except for initial
section. Merge them.

The removal is split into two parts, to make review eaiser. This is part 1.

auditsys_entry_common and auditsys_exit macros are indented one more tab without
any changes. This prevents diff from becoming unreadable.
They will be removed in part 2.

Signed-off-by: Denys Vlasenko <[email protected]>
CC: Ingo Molnar <[email protected]>
CC: Linus Torvalds <[email protected]>
CC: Krzysztof A. Sobiecki <[email protected]>
CC: Steven Rostedt <[email protected]>
CC: Borislav Petkov <[email protected]>
CC: "H. Peter Anvin" <[email protected]>
CC: Andy Lutomirski <[email protected]>
CC: Oleg Nesterov <[email protected]>
CC: Frederic Weisbecker <[email protected]>
CC: Alexei Starovoitov <[email protected]>
CC: Will Drewry <[email protected]>
CC: Kees Cook <[email protected]>
CC: [email protected]
CC: [email protected]
---
arch/x86/entry/entry_64_compat.S | 237 +++++++++++----------------------------
1 file changed, 63 insertions(+), 174 deletions(-)

diff --git a/arch/x86/entry/entry_64_compat.S b/arch/x86/entry/entry_64_compat.S
index 9f9dfa5..2d0a2f0 100644
--- a/arch/x86/entry/entry_64_compat.S
+++ b/arch/x86/entry/entry_64_compat.S
@@ -16,13 +16,7 @@
#include <linux/linkage.h>
#include <linux/err.h>

-/* Avoid __ASSEMBLER__'ifying <linux/audit.h> just for this. */
-#include <linux/elf-em.h>
-#define AUDIT_ARCH_I386 (EM_386|__AUDIT_ARCH_LE)
-#define __AUDIT_ARCH_LE 0x40000000
-
#ifndef CONFIG_AUDITSYSCALL
-# define sysexit_audit ia32_ret_from_sys_call_irqs_off
# define sysretl_audit ia32_ret_from_sys_call_irqs_off
#endif

@@ -93,188 +87,82 @@ ENTRY(entry_SYSENTER_compat)
* 32-bit zero extended
*/
ASM_STAC
-1: movl (%rbp), %ebp
+1: movl (%rbp), %r9d
_ASM_EXTABLE(1b, ia32_badarg)
ASM_CLAC

/*
- * Sysenter doesn't filter flags, so we need to clear NT
- * ourselves. To save a few cycles, we can check whether
- * NT was set instead of doing an unconditional popfq.
+ * Sysenter doesn't filter flags, so we need to clear NT ourselves.
*/
testl $X86_EFLAGS_NT, EFLAGS(%rsp)
jnz sysenter_fix_flags
sysenter_flags_fixed:
-
- orl $TS_COMPAT, ASM_THREAD_INFO(TI_status, %rsp, SIZEOF_PTREGS)
- testl $_TIF_WORK_SYSCALL_ENTRY, ASM_THREAD_INFO(TI_flags, %rsp, SIZEOF_PTREGS)
- jnz sysenter_tracesys
-
-sysenter_do_call:
- /* 32-bit syscall -> 64-bit C ABI argument conversion */
- movl %edi, %r8d /* arg5 */
- movl %ebp, %r9d /* arg6 */
- xchg %ecx, %esi /* rsi:arg2, rcx:arg4 */
- movl %ebx, %edi /* arg1 */
- movl %edx, %edx /* arg3 (zero extension) */
-
-sysenter_dispatch:
- cmpq $(IA32_NR_syscalls-1), %rax
- ja 1f
- call *ia32_sys_call_table(, %rax, 8)
- movq %rax, RAX(%rsp)
-1:
- DISABLE_INTERRUPTS(CLBR_NONE)
- TRACE_IRQS_OFF
- testl $_TIF_ALLWORK_MASK, ASM_THREAD_INFO(TI_flags, %rsp, SIZEOF_PTREGS)
- jnz sysexit_audit
-
-sysexit_from_sys_call:
- /*
- * NB: SYSEXIT is not obviously safe for 64-bit kernels -- an
- * NMI between STI and SYSEXIT has poorly specified behavior,
- * and and NMI followed by an IRQ with usergs is fatal. So
- * we just pretend we're using SYSEXIT but we really use
- * SYSRETL instead.
- *
- * This code path is still called 'sysexit' because it pairs
- * with 'sysenter' and it uses the SYSENTER calling convention.
- */
- andl $~TS_COMPAT, ASM_THREAD_INFO(TI_status, %rsp, SIZEOF_PTREGS)
- movl RSI(%rsp), %esi
- movl RDI(%rsp), %edi
- movl RIP(%rsp), %ecx /* User %eip */
- movl EFLAGS(%rsp), %r11d /* User eflags */
- xorl %edx, %edx /* Do not leak kernel information */
- xorq %r10, %r10
- xorq %r9, %r9
- xorq %r8, %r8
- TRACE_IRQS_ON
-
- /*
- * SYSRETL works even on Intel CPUs. Use it in preference to SYSEXIT,
- * since it avoids a dicey window with interrupts enabled.
- */
- movl RSP(%rsp), %esp
-
- /*
- * USERGS_SYSRET32 does:
- * gsbase = user's gs base
- * eip = ecx
- * rflags = r11
- * cs = __USER32_CS
- * ss = __USER_DS
- *
- * The prologue set RIP(%rsp) to VDSO32_SYSENTER_RETURN, which does:
- *
- * pop %ebp
- * pop %edx
- * pop %ecx
- *
- * Therefore, we invoke SYSRETL with EDX and R8-R10 zeroed to
- * avoid info leaks. R11 ends up with VDSO32_SYSENTER_RETURN's
- * address (already known to user code), and R12-R15 are
- * callee-saved and therefore don't contain any interesting
- * kernel data.
- */
- USERGS_SYSRET32
-
-#ifdef CONFIG_AUDITSYSCALL
- .macro auditsys_entry_common
- /*
- * At this point, registers hold syscall args in the 32-bit syscall ABI:
- * EAX is syscall number, the 6 args are in EBX,ECX,EDX,ESI,EDI,EBP.
- *
- * We want to pass them to __audit_syscall_entry(), which is a 64-bit
- * C function with 5 parameters, so shuffle them to match what
- * the function expects: RDI,RSI,RDX,RCX,R8.
- */
- movl %esi, %r8d /* arg5 (R8 ) <= 4th syscall arg (ESI) */
- xchg %ecx, %edx /* arg4 (RCX) <= 3rd syscall arg (EDX) */
- /* arg3 (RDX) <= 2nd syscall arg (ECX) */
- movl %ebx, %esi /* arg2 (RSI) <= 1st syscall arg (EBX) */
- movl %eax, %edi /* arg1 (RDI) <= syscall number (EAX) */
- call __audit_syscall_entry
-
- /*
- * We are going to jump back to the syscall dispatch code.
- * Prepare syscall args as required by the 64-bit C ABI.
- * Registers clobbered by __audit_syscall_entry() are
- * loaded from pt_regs on stack:
- */
- movl ORIG_RAX(%rsp), %eax /* syscall number */
- movl %ebx, %edi /* arg1 */
- movl RCX(%rsp), %esi /* arg2 */
- movl RDX(%rsp), %edx /* arg3 */
- movl RSI(%rsp), %ecx /* arg4 */
- movl RDI(%rsp), %r8d /* arg5 */
- .endm
-
- .macro auditsys_exit exit
- TRACE_IRQS_ON
- ENABLE_INTERRUPTS(CLBR_NONE)
- testl $(_TIF_ALLWORK_MASK & ~_TIF_SYSCALL_AUDIT), ASM_THREAD_INFO(TI_flags, %rsp, SIZEOF_PTREGS)
- jnz ia32_ret_from_sys_call
- movl %eax, %esi /* second arg, syscall return value */
- cmpl $-MAX_ERRNO, %eax /* is it an error ? */
- jbe 1f
- movslq %eax, %rsi /* if error sign extend to 64 bits */
-1: setbe %al /* 1 if error, 0 if not */
- movzbl %al, %edi /* zero-extend that into %edi */
- call __audit_syscall_exit
- movq RAX(%rsp), %rax /* reload syscall return value */
- movl $(_TIF_ALLWORK_MASK & ~_TIF_SYSCALL_AUDIT), %edi
- DISABLE_INTERRUPTS(CLBR_NONE)
- TRACE_IRQS_OFF
- testl %edi, ASM_THREAD_INFO(TI_flags, %rsp, SIZEOF_PTREGS)
- jz \exit
- xorl %eax, %eax /* Do not leak kernel information */
- movq %rax, R11(%rsp)
- movq %rax, R10(%rsp)
- movq %rax, R9(%rsp)
- movq %rax, R8(%rsp)
- jmp int_ret_from_sys_call_irqs_off
- .endm
-
-sysenter_auditsys:
- auditsys_entry_common
- movl %ebp, %r9d /* reload 6th syscall arg */
- jmp sysenter_dispatch
-
-sysexit_audit:
- auditsys_exit sysexit_from_sys_call
-#endif
+ jmp sysenter_jumps_here

sysenter_fix_flags:
pushq $(X86_EFLAGS_IF|X86_EFLAGS_FIXED)
popfq
jmp sysenter_flags_fixed
-
-sysenter_tracesys:
-#ifdef CONFIG_AUDITSYSCALL
- testl $(_TIF_WORK_SYSCALL_ENTRY & ~_TIF_SYSCALL_AUDIT), ASM_THREAD_INFO(TI_flags, %rsp, SIZEOF_PTREGS)
- jz sysenter_auditsys
-#endif
- SAVE_EXTRA_REGS
- xorl %eax, %eax /* Do not leak kernel information */
- movq %rax, R11(%rsp)
- movq %rax, R10(%rsp)
- movq %rax, R9(%rsp)
- movq %rax, R8(%rsp)
- movq %rsp, %rdi /* &pt_regs -> arg1 */
- call syscall_trace_enter
-
- /* Reload arg registers from stack. (see sysenter_tracesys) */
- movl RCX(%rsp), %ecx
- movl RDX(%rsp), %edx
- movl RSI(%rsp), %esi
- movl RDI(%rsp), %edi
- movl %eax, %eax /* zero extension */
-
- RESTORE_EXTRA_REGS
- jmp sysenter_do_call
ENDPROC(entry_SYSENTER_compat)

+
+ #ifdef CONFIG_AUDITSYSCALL
+ .macro auditsys_entry_common
+ /*
+ * At this point, registers hold syscall args in the 32-bit syscall ABI:
+ * EAX is syscall number, the 6 args are in EBX,ECX,EDX,ESI,EDI,EBP.
+ *
+ * We want to pass them to __audit_syscall_entry(), which is a 64-bit
+ * C function with 5 parameters, so shuffle them to match what
+ * the function expects: RDI,RSI,RDX,RCX,R8.
+ */
+ movl %esi, %r8d /* arg5 (R8 ) <= 4th syscall arg (ESI) */
+ xchg %ecx, %edx /* arg4 (RCX) <= 3rd syscall arg (EDX) */
+ /* arg3 (RDX) <= 2nd syscall arg (ECX) */
+ movl %ebx, %esi /* arg2 (RSI) <= 1st syscall arg (EBX) */
+ movl %eax, %edi /* arg1 (RDI) <= syscall number (EAX) */
+ call __audit_syscall_entry
+
+ /*
+ * We are going to jump back to the syscall dispatch code.
+ * Prepare syscall args as required by the 64-bit C ABI.
+ * Registers clobbered by __audit_syscall_entry() are
+ * loaded from pt_regs on stack:
+ */
+ movl ORIG_RAX(%rsp), %eax /* syscall number */
+ movl %ebx, %edi /* arg1 */
+ movl RCX(%rsp), %esi /* arg2 */
+ movl RDX(%rsp), %edx /* arg3 */
+ movl RSI(%rsp), %ecx /* arg4 */
+ movl RDI(%rsp), %r8d /* arg5 */
+ .endm
+
+ .macro auditsys_exit exit
+ TRACE_IRQS_ON
+ ENABLE_INTERRUPTS(CLBR_NONE)
+ testl $(_TIF_ALLWORK_MASK & ~_TIF_SYSCALL_AUDIT), ASM_THREAD_INFO(TI_flags, %rsp, SIZEOF_PTREGS)
+ jnz ia32_ret_from_sys_call
+ movl %eax, %esi /* second arg, syscall return value */
+ cmpl $-MAX_ERRNO, %eax /* is it an error ? */
+ jbe 1f
+ movslq %eax, %rsi /* if error sign extend to 64 bits */
+ 1: setbe %al /* 1 if error, 0 if not */
+ movzbl %al, %edi /* zero-extend that into %edi */
+ call __audit_syscall_exit
+ movq RAX(%rsp), %rax /* reload syscall return value */
+ movl $(_TIF_ALLWORK_MASK & ~_TIF_SYSCALL_AUDIT), %edi
+ DISABLE_INTERRUPTS(CLBR_NONE)
+ TRACE_IRQS_OFF
+ testl %edi, ASM_THREAD_INFO(TI_flags, %rsp, SIZEOF_PTREGS)
+ jz \exit
+ xorl %eax, %eax /* Do not leak kernel information */
+ movq %rax, R11(%rsp)
+ movq %rax, R10(%rsp)
+ movq %rax, R9(%rsp)
+ movq %rax, R8(%rsp)
+ jmp int_ret_from_sys_call_irqs_off
+ .endm
+ #endif
/*
* 32-bit SYSCALL instruction entry.
*
@@ -343,6 +231,7 @@ ENTRY(entry_SYSCALL_compat)
_ASM_EXTABLE(1b, ia32_badarg)
ASM_CLAC

+sysenter_jumps_here:
orl $TS_COMPAT, ASM_THREAD_INFO(TI_status, %rsp, SIZEOF_PTREGS)
testl $_TIF_WORK_SYSCALL_ENTRY, ASM_THREAD_INFO(TI_flags, %rsp, SIZEOF_PTREGS)
jnz cstar_tracesys
@@ -421,7 +310,7 @@ cstar_tracesys:
call syscall_trace_enter
movl R9(%rsp), %r9d

- /* Reload arg registers from stack. (see sysenter_tracesys) */
+ /* Reload arg registers from stack. */
movl RCX(%rsp), %ecx
movl RDX(%rsp), %edx
movl RSI(%rsp), %esi
--
1.8.1.4

2015-07-24 13:48:25

by Denys Vlasenko

[permalink] [raw]
Subject: [PATCH 3/3] x86/asm/entry/32: Remove most of SYSCALL32 code, part 2

SYSCALL32 code is nearly identical to SYSCALL32, except for initial
section. Merge them.

This change is split into two parts, to make review eaiser.
This is part 2, which tidies up loose ends:

"sysenter_fix_flags" detour does not need to be convoluted anymore,
straigten it up.

auditsys_entry_common and auditsys_exit macros have only one caller now.
Drop masros, move their bodies to invocation locations.

Reinstate "why we use SYSRETL instead of SYSEXIT" comment.

Run-tested under QEMU: calls through SYSENTER VDSO still work:

/ # ./test_syscall_vdso32
[RUN] Executing 6-argument 32-bit syscall via VDSO
[OK] Arguments are preserved across syscall
[NOTE] R11 has changed:0000000000200ed7 - assuming clobbered by SYSRET insn
[OK] R8..R15 did not leak kernel data
[RUN] Executing 6-argument 32-bit syscall via INT 80
[OK] Arguments are preserved across syscall
[OK] R8..R15 did not leak kernel data
[RUN] Running tests under ptrace
[RUN] Executing 6-argument 32-bit syscall via VDSO
[OK] Arguments are preserved across syscall
[OK] R8..R15 did not leak kernel data
[RUN] Executing 6-argument 32-bit syscall via INT 80
[OK] Arguments are preserved across syscall
[OK] R8..R15 did not leak kernel data

Signed-off-by: Denys Vlasenko <[email protected]>
CC: Ingo Molnar <[email protected]>
CC: Linus Torvalds <[email protected]>
CC: Krzysztof A. Sobiecki <[email protected]>
CC: Steven Rostedt <[email protected]>
CC: Borislav Petkov <[email protected]>
CC: "H. Peter Anvin" <[email protected]>
CC: Andy Lutomirski <[email protected]>
CC: Oleg Nesterov <[email protected]>
CC: Frederic Weisbecker <[email protected]>
CC: Alexei Starovoitov <[email protected]>
CC: Will Drewry <[email protected]>
CC: Kees Cook <[email protected]>
CC: [email protected]
CC: [email protected]
---
arch/x86/entry/entry_64_compat.S | 140 +++++++++++++++++++--------------------
1 file changed, 70 insertions(+), 70 deletions(-)

diff --git a/arch/x86/entry/entry_64_compat.S b/arch/x86/entry/entry_64_compat.S
index 2d0a2f0..6ee70fd 100644
--- a/arch/x86/entry/entry_64_compat.S
+++ b/arch/x86/entry/entry_64_compat.S
@@ -95,74 +95,25 @@ ENTRY(entry_SYSENTER_compat)
* Sysenter doesn't filter flags, so we need to clear NT ourselves.
*/
testl $X86_EFLAGS_NT, EFLAGS(%rsp)
- jnz sysenter_fix_flags
-sysenter_flags_fixed:
- jmp sysenter_jumps_here
-
-sysenter_fix_flags:
+ jz sysenter_jumps_here
pushq $(X86_EFLAGS_IF|X86_EFLAGS_FIXED)
popfq
- jmp sysenter_flags_fixed
-ENDPROC(entry_SYSENTER_compat)
-
-
- #ifdef CONFIG_AUDITSYSCALL
- .macro auditsys_entry_common
- /*
- * At this point, registers hold syscall args in the 32-bit syscall ABI:
- * EAX is syscall number, the 6 args are in EBX,ECX,EDX,ESI,EDI,EBP.
- *
- * We want to pass them to __audit_syscall_entry(), which is a 64-bit
- * C function with 5 parameters, so shuffle them to match what
- * the function expects: RDI,RSI,RDX,RCX,R8.
- */
- movl %esi, %r8d /* arg5 (R8 ) <= 4th syscall arg (ESI) */
- xchg %ecx, %edx /* arg4 (RCX) <= 3rd syscall arg (EDX) */
- /* arg3 (RDX) <= 2nd syscall arg (ECX) */
- movl %ebx, %esi /* arg2 (RSI) <= 1st syscall arg (EBX) */
- movl %eax, %edi /* arg1 (RDI) <= syscall number (EAX) */
- call __audit_syscall_entry
-
- /*
- * We are going to jump back to the syscall dispatch code.
- * Prepare syscall args as required by the 64-bit C ABI.
- * Registers clobbered by __audit_syscall_entry() are
- * loaded from pt_regs on stack:
- */
- movl ORIG_RAX(%rsp), %eax /* syscall number */
- movl %ebx, %edi /* arg1 */
- movl RCX(%rsp), %esi /* arg2 */
- movl RDX(%rsp), %edx /* arg3 */
- movl RSI(%rsp), %ecx /* arg4 */
- movl RDI(%rsp), %r8d /* arg5 */
- .endm
-
- .macro auditsys_exit exit
- TRACE_IRQS_ON
- ENABLE_INTERRUPTS(CLBR_NONE)
- testl $(_TIF_ALLWORK_MASK & ~_TIF_SYSCALL_AUDIT), ASM_THREAD_INFO(TI_flags, %rsp, SIZEOF_PTREGS)
- jnz ia32_ret_from_sys_call
- movl %eax, %esi /* second arg, syscall return value */
- cmpl $-MAX_ERRNO, %eax /* is it an error ? */
- jbe 1f
- movslq %eax, %rsi /* if error sign extend to 64 bits */
- 1: setbe %al /* 1 if error, 0 if not */
- movzbl %al, %edi /* zero-extend that into %edi */
- call __audit_syscall_exit
- movq RAX(%rsp), %rax /* reload syscall return value */
- movl $(_TIF_ALLWORK_MASK & ~_TIF_SYSCALL_AUDIT), %edi
- DISABLE_INTERRUPTS(CLBR_NONE)
- TRACE_IRQS_OFF
- testl %edi, ASM_THREAD_INFO(TI_flags, %rsp, SIZEOF_PTREGS)
- jz \exit
- xorl %eax, %eax /* Do not leak kernel information */
- movq %rax, R11(%rsp)
- movq %rax, R10(%rsp)
- movq %rax, R9(%rsp)
- movq %rax, R8(%rsp)
- jmp int_ret_from_sys_call_irqs_off
- .endm
- #endif
+ jmp sysenter_jumps_here
+ /*
+ * SYSEXIT insn is not obviously safe for 64-bit kernels --
+ * an NMI between STI and SYSEXIT has poorly specified behavior,
+ * and NMI followed by an IRQ with usergs is fatal.
+ * So we just pretend we're using SYSEXIT but we really use
+ * SYSRETL instead. (Yes, SYSRETL works even on Intel CPUs.)
+ * We do that by reusing the entire SYSCALL code path:
+ * the jump above takes us there.
+ *
+ * The difference of SYSENTER 32-bit ABI versus SYSCALL
+ * is that SYSENTER ABI does not promise to preserve EDX and EBP,
+ * SYSCALL does.
+ */
+ENDPROC(entry_SYSENTER_compat)
+
/*
* 32-bit SYSCALL instruction entry.
*
@@ -285,13 +236,62 @@ sysretl_from_sys_call:

#ifdef CONFIG_AUDITSYSCALL
cstar_auditsys:
- movl %r9d, R9(%rsp) /* register to be clobbered by call */
- auditsys_entry_common
- movl R9(%rsp), %r9d /* reload 6th syscall arg */
+ movl %r9d, R9(%rsp) /* R9 is callee-clobbered, save it */
+ /*
+ * At this point, registers hold syscall args in the 32-bit syscall ABI:
+ * EAX is syscall number, the 6 args are in EBX,ECX,EDX,ESI,EDI,
+ * 6th arg is in R9.
+ *
+ * We want to pass them to __audit_syscall_entry(), which is a 64-bit
+ * C function with 5 parameters, so shuffle them to match what
+ * the function expects: RDI,RSI,RDX,RCX,R8.
+ */
+ movl %esi, %r8d /* arg5 (R8 ) <= 4th syscall arg (ESI) */
+ xchg %ecx, %edx /* arg4 (RCX) <= 3rd syscall arg (EDX) */
+ /* arg3 (RDX) <= 2nd syscall arg (ECX) */
+ movl %ebx, %esi /* arg2 (RSI) <= 1st syscall arg (EBX) */
+ movl %eax, %edi /* arg1 (RDI) <= syscall number (EAX) */
+ call __audit_syscall_entry
+
+ /*
+ * We are going to jump back to the syscall dispatch code.
+ * Prepare syscall args as required by the 64-bit C ABI.
+ * Registers clobbered by __audit_syscall_entry() are
+ * loaded from pt_regs on stack:
+ */
+ movl ORIG_RAX(%rsp), %eax /* syscall number */
+ movl %ebx, %edi /* arg1 */
+ movl RCX(%rsp), %esi /* arg2 */
+ movl RDX(%rsp), %edx /* arg3 */
+ movl RSI(%rsp), %ecx /* arg4 */
+ movl RDI(%rsp), %r8d /* arg5 */
+ movl R9(%rsp), %r9d /* arg6 */
jmp cstar_dispatch

sysretl_audit:
- auditsys_exit sysretl_from_sys_call
+ TRACE_IRQS_ON
+ ENABLE_INTERRUPTS(CLBR_NONE)
+ testl $(_TIF_ALLWORK_MASK & ~_TIF_SYSCALL_AUDIT), ASM_THREAD_INFO(TI_flags, %rsp, SIZEOF_PTREGS)
+ jnz ia32_ret_from_sys_call
+ movl %eax, %esi /* second arg, syscall return value */
+ cmpl $-MAX_ERRNO, %eax /* is it an error ? */
+ jbe 1f
+ movslq %eax, %rsi /* if error sign extend to 64 bits */
+1: setbe %al /* 1 if error, 0 if not */
+ movzbl %al, %edi /* zero-extend that into %edi */
+ call __audit_syscall_exit
+ movq RAX(%rsp), %rax /* reload syscall return value */
+ movl $(_TIF_ALLWORK_MASK & ~_TIF_SYSCALL_AUDIT), %edi
+ DISABLE_INTERRUPTS(CLBR_NONE)
+ TRACE_IRQS_OFF
+ testl %edi, ASM_THREAD_INFO(TI_flags, %rsp, SIZEOF_PTREGS)
+ jz sysretl_from_sys_call
+ xorl %eax, %eax /* Do not leak kernel information */
+ movq %rax, R11(%rsp)
+ movq %rax, R10(%rsp)
+ movq %rax, R9(%rsp)
+ movq %rax, R8(%rsp)
+ jmp int_ret_from_sys_call_irqs_off
#endif

cstar_tracesys:
--
1.8.1.4

2015-07-24 17:38:12

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH 1/3] x86/asm/entry/32: Massage SYSENTER32 fast path to be nearly identical to SYSCALL32

On Fri, Jul 24, 2015 at 6:47 AM, Denys Vlasenko <[email protected]> wrote:
> This change swaps a few instructions in final register restoring/zeroing
> section of SYSENTER fast path, and adds/deletes a few empty lines.
>
> After this, the difference between SYSENTER and SYCALL fast paths
> (after the prologue which saved pt_regs) is very small:
> they differ merely in the choice of register to hold arg6 (EBP or R9)
> and in the value of EDX on exit: SYSENTER ABI doesn't need to preserve it,
> so it is zeroed. SYSCALL preserves it:

Reviewed-by: Andy Lutomirski <[email protected]>

2015-07-24 17:50:54

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH 2/3] x86/asm/entry/32: Remove most of SYSCALL32 code, part 1

On Fri, Jul 24, 2015 at 6:47 AM, Denys Vlasenko <[email protected]> wrote:
> SYSCALL32 code is nearly identical to SYSCALL32, except for initial
> section. Merge them.
>
> The removal is split into two parts, to make review eaiser. This is part 1.
>
> auditsys_entry_common and auditsys_exit macros are indented one more tab without
> any changes. This prevents diff from becoming unreadable.
> They will be removed in part 2.

I need to read these more closely, which is, at present, exceeding my
ability to look at asm. (See the big NMI thread.) I'll look soon.

Meanwhile, this code is incredibly fragile wrt syscall restart.
(Syscall restart on compat is really weird.) Do we have a decent test
for it?

--Andy

2015-07-25 18:36:12

by Denys Vlasenko

[permalink] [raw]
Subject: Re: [PATCH 2/3] x86/asm/entry/32: Remove most of SYSCALL32 code, part 1

On 07/24/2015 07:50 PM, Andy Lutomirski wrote:
> On Fri, Jul 24, 2015 at 6:47 AM, Denys Vlasenko <[email protected]> wrote:
>> SYSCALL32 code is nearly identical to SYSCALL32, except for initial
>> section. Merge them.
>>
>> The removal is split into two parts, to make review eaiser. This is part 1.
>>
>> auditsys_entry_common and auditsys_exit macros are indented one more tab without
>> any changes. This prevents diff from becoming unreadable.
>> They will be removed in part 2.
>
> I need to read these more closely, which is, at present, exceeding my
> ability to look at asm. (See the big NMI thread.) I'll look soon.

The "sysenter_fix_flags" thingy prevented the diff from being
a pure delete, so it is not as clear as I hoped.

What patch is doing is actually very simple. It "amputates"
entire SYSENTER code path after it finished creating partially
filled pt_regs, loaded arg6 and dealt with EFLAGS sanitization -
after this is done, the state is identical to the similar
state in SYSCALL code, so we can just use SYSCALL code from that moment
onward! :)


> Meanwhile, this code is incredibly fragile wrt syscall restart.
> (Syscall restart on compat is really weird.)

Weird in what way?

> Do we have a decent test for it?

I don't know of any such test.

2015-07-25 19:33:31

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH 2/3] x86/asm/entry/32: Remove most of SYSCALL32 code, part 1

On Sat, Jul 25, 2015 at 11:36 AM, Denys Vlasenko <[email protected]> wrote:
> On 07/24/2015 07:50 PM, Andy Lutomirski wrote:
>> On Fri, Jul 24, 2015 at 6:47 AM, Denys Vlasenko <[email protected]> wrote:
>>> SYSCALL32 code is nearly identical to SYSCALL32, except for initial
>>> section. Merge them.
>>>
>>> The removal is split into two parts, to make review eaiser. This is part 1.
>>>
>>> auditsys_entry_common and auditsys_exit macros are indented one more tab without
>>> any changes. This prevents diff from becoming unreadable.
>>> They will be removed in part 2.
>>
>> I need to read these more closely, which is, at present, exceeding my
>> ability to look at asm. (See the big NMI thread.) I'll look soon.
>
> The "sysenter_fix_flags" thingy prevented the diff from being
> a pure delete, so it is not as clear as I hoped.
>
> What patch is doing is actually very simple. It "amputates"
> entire SYSENTER code path after it finished creating partially
> filled pt_regs, loaded arg6 and dealt with EFLAGS sanitization -
> after this is done, the state is identical to the similar
> state in SYSCALL code, so we can just use SYSCALL code from that moment
> onward! :)
>

I certainly agree that your patches are a nice cleanup. I just want
to make sure there isn't something subtle and undocumented going on
there.

>
>> Meanwhile, this code is incredibly fragile wrt syscall restart.
>> (Syscall restart on compat is really weird.)
>
> Weird in what way?

See:

https://lkml.kernel.org/g/[email protected]

--Andy

2015-07-27 16:05:13

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 2/3] x86/asm/entry/32: Remove most of SYSCALL32 code, part 1


* Denys Vlasenko <[email protected]> wrote:

> SYSCALL32 code is nearly identical to SYSCALL32, except for initial
> section. Merge them.
>
> The removal is split into two parts, to make review eaiser. This is part 1.
>
> auditsys_entry_common and auditsys_exit macros are indented one more tab without
> any changes. This prevents diff from becoming unreadable.
> They will be removed in part 2.
>
> Signed-off-by: Denys Vlasenko <[email protected]>
> CC: Ingo Molnar <[email protected]>
> CC: Linus Torvalds <[email protected]>
> CC: Krzysztof A. Sobiecki <[email protected]>
> CC: Steven Rostedt <[email protected]>
> CC: Borislav Petkov <[email protected]>
> CC: "H. Peter Anvin" <[email protected]>
> CC: Andy Lutomirski <[email protected]>
> CC: Oleg Nesterov <[email protected]>
> CC: Frederic Weisbecker <[email protected]>
> CC: Alexei Starovoitov <[email protected]>
> CC: Will Drewry <[email protected]>
> CC: Kees Cook <[email protected]>
> CC: [email protected]
> CC: [email protected]
> ---
> arch/x86/entry/entry_64_compat.S | 237 +++++++++++----------------------------
> 1 file changed, 63 insertions(+), 174 deletions(-)

Yeah, so I realize that this is already a 'cleaner', split up version of your
original change - but the diffstat is still way too large: _please_ split it up
into 2-3 further steps to make each step really, utterly obvious.

As you must have experienced it with recent regressions in the x86 entry code, the
smaller our assembly changes, the easier our job of doing a revert is, when such a
change regresses ...

I don't care if it ends up being 5-7 patches - that's far more preferable to
having to decode difficult looking regressions.

For example in hindsight I regret that I did not insist on further splitting up
this commit:

53e9accf0f76 ("x86/asm/entry/32: Do not use R9 in SYSCALL32 entry point")

and that was a small patch already:

arch/x86/entry/ia32entry.S | 19 ++++++++-----------
1 file changed, 8 insertions(+), 11 deletions(-)

I'm not going to make that mistake again: if you want to change the x86 assembly
code, you need to learn to decompose dangerous changes into maximally split up,
atomic, bisectable steps.

Agreed?

Thanks,

Ingo

2015-07-27 19:19:21

by Denys Vlasenko

[permalink] [raw]
Subject: Re: [PATCH 2/3] x86/asm/entry/32: Remove most of SYSCALL32 code, part 1

On 07/24/2015 07:50 PM, Andy Lutomirski wrote:
> On Fri, Jul 24, 2015 at 6:47 AM, Denys Vlasenko <[email protected]> wrote:
>> SYSCALL32 code is nearly identical to SYSCALL32, except for initial
>> section. Merge them.
>>
>> The removal is split into two parts, to make review eaiser. This is part 1.
>>
>> auditsys_entry_common and auditsys_exit macros are indented one more tab without
>> any changes. This prevents diff from becoming unreadable.
>> They will be removed in part 2.
>
> I need to read these more closely, which is, at present, exceeding my
> ability to look at asm. (See the big NMI thread.) I'll look soon.
>
> Meanwhile, this code is incredibly fragile wrt syscall restart.
> (Syscall restart on compat is really weird.) Do we have a decent test
> for it?

How about this? (Feel free to expand, this is a first cut only).

$ ./test_restart
[RUN] Test restart of read()
[OK] Restart of read() worked
[RUN] Test restart of recv()
[OK] Restart of recv() worked





#undef _GNU_SOURCE
#define _GNU_SOURCE 1
#undef __USE_GNU
#define __USE_GNU 1
#include <unistd.h>
#include <stdlib.h>
#include <string.h>
#include <stdio.h>
#include <signal.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <sys/select.h>
#include <sys/time.h>
//#include <sys/ptrace.h>
#include <sys/wait.h>

void signal_SA_RESTART_empty_mask(int sig, void (*handler)(int))
{
struct sigaction sa;
memset(&sa, 0, sizeof(sa));
sigemptyset(&sa.sa_mask);
sa.sa_flags = SA_RESTART;
sa.sa_handler = handler;
sigaction(sig, &sa, NULL);
}

int make_writer(void)
{
int pid, fd[2];
if (pipe(fd)) exit(1);
pid = fork();
if (pid < 0) exit(1);
if (pid == 0) {
/* child */
usleep(50*1000);
write(fd[1], "123456789", 10);
exit(0);
}
close(fd[1]);
return fd[0];
}

int make_sender(void)
{
int pid, fd[2];
if (socketpair(AF_UNIX, SOCK_STREAM, 0, fd)) exit(1);
pid = fork();
if (pid < 0) exit(1);
if (pid == 0) {
/* child */
usleep(50*1000);
write(fd[1], "123456789", 10);
exit(0);
}
close(fd[1]);
return fd[0];
}

int got_sig;

void sighandler(int sig)
{
got_sig = 1;
}

void prepare(char buf[10])
{
got_sig = 0;
strcpy(buf, "qwertyuio");
ualarm(25*1000, 0);
}

int check(const char *test_type, int len, char buf[10])
{
int err = 0;
if (len != 10 || strcmp(buf, "123456789") != 0) {
printf("[FAIL]\tRestarted %s() returns wrong result\n", test_type);
err = 1;
}
if (!got_sig) {
printf("[FAIL]\tSignal was expected on %s read() but wasn't seen\n", test_type);
err = 1;
}
if (!err)
printf("[OK]\tRestart of %s() worked\n", test_type);
return err;
}

int main(int argc, char **argv, char **envp)
{
char buf[10];
int fd;
int len;
int err;

signal_SA_RESTART_empty_mask(SIGALRM, sighandler);

printf("[RUN]\tTest restart of read()\n");
fd = make_writer();
prepare(buf);
len = read(fd, buf, 10);
err = check("read", len, buf);
close(fd);

printf("[RUN]\tTest restart of recv()\n");
fd = make_sender();
prepare(buf);
len = recv(fd, buf, 10, MSG_PEEK);
err |= check("recv", len, buf);
if (!err) {
/* Check that restarted recv() indeed had MSG_PEEK:
* i.e. that it did not consume the data.
*/
strcpy(buf, "qwertyuio");
len = recv(fd, buf, 10, 0);
if (len != 10 || strcmp(buf, "123456789") != 0) {
printf("[FAIL]\tRestarted %s() had no MSG_PEEK flag\n", "recv");
err = 1;
}
}
close(fd);

return err;
}

2015-07-27 19:26:38

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH 2/3] x86/asm/entry/32: Remove most of SYSCALL32 code, part 1

On Mon, Jul 27, 2015 at 12:19 PM, Denys Vlasenko <[email protected]> wrote:
> On 07/24/2015 07:50 PM, Andy Lutomirski wrote:
>> On Fri, Jul 24, 2015 at 6:47 AM, Denys Vlasenko <[email protected]> wrote:
>>> SYSCALL32 code is nearly identical to SYSCALL32, except for initial
>>> section. Merge them.
>>>
>>> The removal is split into two parts, to make review eaiser. This is part 1.
>>>
>>> auditsys_entry_common and auditsys_exit macros are indented one more tab without
>>> any changes. This prevents diff from becoming unreadable.
>>> They will be removed in part 2.
>>
>> I need to read these more closely, which is, at present, exceeding my
>> ability to look at asm. (See the big NMI thread.) I'll look soon.
>>
>> Meanwhile, this code is incredibly fragile wrt syscall restart.
>> (Syscall restart on compat is really weird.) Do we have a decent test
>> for it?
>
> How about this? (Feel free to expand, this is a first cut only).

On a very brief glance, it looks reasonable, but I'd try it with
recvfrom instead of recv because it's a six-argument syscall.

--Andy

2015-08-25 07:19:30

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH 2/3] x86/asm/entry/32: Remove most of SYSCALL32 code, part 1

On Mon, Jul 27, 2015 at 12:26 PM, Andy Lutomirski <[email protected]> wrote:
> On Mon, Jul 27, 2015 at 12:19 PM, Denys Vlasenko <[email protected]> wrote:
>> On 07/24/2015 07:50 PM, Andy Lutomirski wrote:
>>> On Fri, Jul 24, 2015 at 6:47 AM, Denys Vlasenko <[email protected]> wrote:
>>>> SYSCALL32 code is nearly identical to SYSCALL32, except for initial
>>>> section. Merge them.
>>>>
>>>> The removal is split into two parts, to make review eaiser. This is part 1.
>>>>
>>>> auditsys_entry_common and auditsys_exit macros are indented one more tab without
>>>> any changes. This prevents diff from becoming unreadable.
>>>> They will be removed in part 2.
>>>
>>> I need to read these more closely, which is, at present, exceeding my
>>> ability to look at asm. (See the big NMI thread.) I'll look soon.
>>>
>>> Meanwhile, this code is incredibly fragile wrt syscall restart.
>>> (Syscall restart on compat is really weird.) Do we have a decent test
>>> for it?
>>
>> How about this? (Feel free to expand, this is a first cut only).
>
> On a very brief glance, it looks reasonable, but I'd try it with
> recvfrom instead of recv because it's a six-argument syscall.
>

I was wondering how syscall restart works on compat syscalls on AMD
(i.e. SYSCALL32). I think the answer is that it doesn't. Sigh.
(Tested with ptrace instead of actual syscall restart.)

--Andy