2015-02-25 00:01:04

by Denys Vlasenko

[permalink] [raw]
Subject: [PATCH 1/7] x86: ia32entry.S: use more understandable constant

The last instance of "mysterious" SS+8 constant is replaced by SIZEOF_PTREGS.

Signed-off-by: Denys Vlasenko <[email protected]>
CC: Linus Torvalds <[email protected]>
CC: Steven Rostedt <[email protected]>
CC: Ingo Molnar <[email protected]>
CC: Borislav Petkov <[email protected]>
CC: "H. Peter Anvin" <[email protected]>
CC: Andy Lutomirski <[email protected]>
CC: Oleg Nesterov <[email protected]>
CC: Frederic Weisbecker <[email protected]>
CC: Alexei Starovoitov <[email protected]>
CC: Will Drewry <[email protected]>
CC: Kees Cook <[email protected]>
CC: [email protected]
CC: [email protected]
---
arch/x86/ia32/ia32entry.S | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/ia32/ia32entry.S b/arch/x86/ia32/ia32entry.S
index 6dcd372..ed97463 100644
--- a/arch/x86/ia32/ia32entry.S
+++ b/arch/x86/ia32/ia32entry.S
@@ -536,7 +536,7 @@ ia32_ptregs_common:
CFI_ENDPROC
CFI_STARTPROC32 simple
CFI_SIGNAL_FRAME
- CFI_DEF_CFA rsp,SS+8
+ CFI_DEF_CFA rsp,SIZEOF_PTREGS
CFI_REL_OFFSET rax,RAX
CFI_REL_OFFSET rcx,RCX
CFI_REL_OFFSET rdx,RDX
--
1.8.1.4


2015-02-25 00:01:12

by Denys Vlasenko

[permalink] [raw]
Subject: [PATCH 2/7] x86: entry.S: simplify optimistic SYSRET

Avoid redundant load of %r11 (it is already loaded a few instructions before).
Do not needlessly increment %rsp - we are going to return to userspace
via SYSRET, this insn doesn't use stack for return.

Signed-off-by: Denys Vlasenko <[email protected]>
CC: Linus Torvalds <[email protected]>
CC: Steven Rostedt <[email protected]>
CC: Ingo Molnar <[email protected]>
CC: Borislav Petkov <[email protected]>
CC: "H. Peter Anvin" <[email protected]>
CC: Andy Lutomirski <[email protected]>
CC: Oleg Nesterov <[email protected]>
CC: Frederic Weisbecker <[email protected]>
CC: Alexei Starovoitov <[email protected]>
CC: Will Drewry <[email protected]>
CC: Kees Cook <[email protected]>
CC: [email protected]
CC: [email protected]
---
arch/x86/include/asm/calling.h | 3 +++
arch/x86/kernel/entry_64.S | 5 ++---
2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/calling.h b/arch/x86/include/asm/calling.h
index 3374235..f1a962f 100644
--- a/arch/x86/include/asm/calling.h
+++ b/arch/x86/include/asm/calling.h
@@ -176,6 +176,9 @@ For 32-bit we have the following conventions - kernel is built with
.macro RESTORE_C_REGS_EXCEPT_RCX
RESTORE_C_REGS_HELPER 1,0,1,1,1
.endm
+ .macro RESTORE_C_REGS_EXCEPT_R11
+ RESTORE_C_REGS_HELPER 1,1,0,1,1
+ .endm
.macro RESTORE_RSI_RDI
RESTORE_C_REGS_HELPER 0,0,0,0,0
.endm
diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index 4b3f3c1..e5e0a55 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -756,9 +756,8 @@ retint_swapgs: /* return to user-space */
*/
irq_return_via_sysret:
CFI_REMEMBER_STATE
- RESTORE_C_REGS
- REMOVE_PT_GPREGS_FROM_STACK 8
- movq (RSP-RIP)(%rsp),%rsp
+ RESTORE_C_REGS_EXCEPT_R11
+ movq RSP(%rsp),%rsp
USERGS_SYSRET64
CFI_RESTORE_STATE

--
1.8.1.4

2015-02-25 00:01:21

by Denys Vlasenko

[permalink] [raw]
Subject: [PATCH 3/7] x86: entry.S: use smaller insns

The $AUDIT_ARCH_X86_64 parameter to syscall_trace_enter_phase1/2
is a 32-bit constant, loading it with 32-bit MOV produces 5-byte insn
instead of 10-byte one.

Signed-off-by: Denys Vlasenko <[email protected]>
CC: Linus Torvalds <[email protected]>
CC: Steven Rostedt <[email protected]>
CC: Ingo Molnar <[email protected]>
CC: Borislav Petkov <[email protected]>
CC: "H. Peter Anvin" <[email protected]>
CC: Andy Lutomirski <[email protected]>
CC: Oleg Nesterov <[email protected]>
CC: Frederic Weisbecker <[email protected]>
CC: Alexei Starovoitov <[email protected]>
CC: Will Drewry <[email protected]>
CC: Kees Cook <[email protected]>
CC: [email protected]
CC: [email protected]
---
arch/x86/kernel/entry_64.S | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index e5e0a55..779ae21 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -312,7 +312,7 @@ int_ret_from_sys_call_fixup:
/* Do syscall tracing */
tracesys:
movq %rsp, %rdi
- movq $AUDIT_ARCH_X86_64, %rsi
+ movl $AUDIT_ARCH_X86_64, %esi
call syscall_trace_enter_phase1
test %rax, %rax
jnz tracesys_phase2 /* if needed, run the slow path */
@@ -323,7 +323,7 @@ tracesys_phase2:
SAVE_EXTRA_REGS
FIXUP_TOP_OF_STACK %rdi
movq %rsp, %rdi
- movq $AUDIT_ARCH_X86_64, %rsi
+ movl $AUDIT_ARCH_X86_64, %esi
movq %rax,%rdx
call syscall_trace_enter_phase2

--
1.8.1.4

2015-02-25 00:01:33

by Denys Vlasenko

[permalink] [raw]
Subject: [PATCH 4/7] x86: entry.S: use JZ mnemonic after TEST, not JE

After TEST insn, JE actually performs "jump if zero",
let's use JZ mnemonic instead.

No code changes, but less confusion.

Signed-off-by: Denys Vlasenko <[email protected]>
CC: Linus Torvalds <[email protected]>
CC: Steven Rostedt <[email protected]>
CC: Ingo Molnar <[email protected]>
CC: Borislav Petkov <[email protected]>
CC: "H. Peter Anvin" <[email protected]>
CC: Andy Lutomirski <[email protected]>
CC: Oleg Nesterov <[email protected]>
CC: Frederic Weisbecker <[email protected]>
CC: Alexei Starovoitov <[email protected]>
CC: Will Drewry <[email protected]>
CC: Kees Cook <[email protected]>
CC: [email protected]
CC: [email protected]
---
arch/x86/kernel/entry_64.S | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index 779ae21..99e3abc 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -686,7 +686,7 @@ ret_from_intr:
exit_intr:
GET_THREAD_INFO(%rcx)
testl $3,CS(%rsp)
- je retint_kernel
+ jz retint_kernel

/* Interrupt came from user space */
/*
--
1.8.1.4

2015-02-25 00:01:46

by Denys Vlasenko

[permalink] [raw]
Subject: [PATCH 5/7 v2] x86: get rid of KERNEL_STACK_OFFSET

PER_CPU_VAR(kernel_stack) was set up in a way where it points
five stack slots below the top of stack.

Presumably, it was done to avoid one "sub $5*8,%rsp"
in syscall/sysenter code paths, where iret frame needs to be
created by hand.

Ironically, none of them benefit from this optimization,
since all of them need to allocate additional data on stack
(struct pt_regs), so they still have to perform subtraction.
And ia32_sysenter_target even needs to *undo* this optimization:
it constructs iret stack with pushes instead of movs,
so it needs to start right at the top.

This patch eliminates KERNEL_STACK_OFFSET.
PER_CPU_VAR(kernel_stack) now points directly to top of stack.
pt_regs allocations are adjusted to allocate iret frame as well.

Semi-mysterious expressions THREAD_INFO(%rsp,RIP) - "why RIP??"
are now replaced by more logical THREAD_INFO(%rsp,SIZEOF_PTREGS) -
"rsp is SIZEOF_PTREGS bytes below the top, calculate
thread_info's address using that information"

Net result in code is that one instruction is eliminated,
and constants in several others have changed.

Changes since v1: BUG_ON fix in traps.c

Signed-off-by: Denys Vlasenko <[email protected]>
CC: Linus Torvalds <[email protected]>
CC: Steven Rostedt <[email protected]>
CC: Ingo Molnar <[email protected]>
CC: Borislav Petkov <[email protected]>
CC: "H. Peter Anvin" <[email protected]>
CC: Andy Lutomirski <[email protected]>
CC: Oleg Nesterov <[email protected]>
CC: Frederic Weisbecker <[email protected]>
CC: Alexei Starovoitov <[email protected]>
CC: Will Drewry <[email protected]>
CC: Kees Cook <[email protected]>
CC: [email protected]
CC: [email protected]
---
arch/x86/ia32/ia32entry.S | 35 +++++++++++++++++------------------
arch/x86/include/asm/thread_info.h | 8 +++-----
arch/x86/kernel/cpu/common.c | 2 +-
arch/x86/kernel/entry_64.S | 8 ++++----
arch/x86/kernel/process_32.c | 3 +--
arch/x86/kernel/process_64.c | 3 +--
arch/x86/kernel/smpboot.c | 3 +--
arch/x86/kernel/traps.c | 3 ++-
arch/x86/xen/smp.c | 3 +--
9 files changed, 31 insertions(+), 37 deletions(-)

diff --git a/arch/x86/ia32/ia32entry.S b/arch/x86/ia32/ia32entry.S
index ed97463..eb9d690 100644
--- a/arch/x86/ia32/ia32entry.S
+++ b/arch/x86/ia32/ia32entry.S
@@ -114,7 +114,6 @@ ENTRY(ia32_sysenter_target)
CFI_REGISTER rsp,rbp
SWAPGS_UNSAFE_STACK
movq PER_CPU_VAR(kernel_stack), %rsp
- addq $(KERNEL_STACK_OFFSET),%rsp
/*
* No need to follow this irqs on/off section: the syscall
* disabled irqs, here we enable it straight after entry:
@@ -128,7 +127,7 @@ ENTRY(ia32_sysenter_target)
CFI_REL_OFFSET rsp,0
pushfq_cfi
/*CFI_REL_OFFSET rflags,0*/
- movl TI_sysenter_return+THREAD_INFO(%rsp,3*8-KERNEL_STACK_OFFSET),%r10d
+ movl TI_sysenter_return+THREAD_INFO(%rsp,3*8),%r10d
CFI_REGISTER rip,r10
pushq_cfi $__USER32_CS
/*CFI_REL_OFFSET cs,0*/
@@ -160,8 +159,8 @@ ENTRY(ia32_sysenter_target)
jnz sysenter_fix_flags
sysenter_flags_fixed:

- orl $TS_COMPAT,TI_status+THREAD_INFO(%rsp,RIP)
- testl $_TIF_WORK_SYSCALL_ENTRY,TI_flags+THREAD_INFO(%rsp,RIP)
+ orl $TS_COMPAT,TI_status+THREAD_INFO(%rsp,SIZEOF_PTREGS)
+ testl $_TIF_WORK_SYSCALL_ENTRY,TI_flags+THREAD_INFO(%rsp,SIZEOF_PTREGS)
CFI_REMEMBER_STATE
jnz sysenter_tracesys
cmpq $(IA32_NR_syscalls-1),%rax
@@ -178,10 +177,10 @@ sysenter_dispatch:
movq %rax,RAX(%rsp)
DISABLE_INTERRUPTS(CLBR_NONE)
TRACE_IRQS_OFF
- testl $_TIF_ALLWORK_MASK,TI_flags+THREAD_INFO(%rsp,RIP)
+ testl $_TIF_ALLWORK_MASK,TI_flags+THREAD_INFO(%rsp,SIZEOF_PTREGS)
jnz sysexit_audit
sysexit_from_sys_call:
- andl $~TS_COMPAT,TI_status+THREAD_INFO(%rsp,RIP)
+ andl $~TS_COMPAT,TI_status+THREAD_INFO(%rsp,SIZEOF_PTREGS)
/* clear IF, that popfq doesn't enable interrupts early */
andl $~0x200,EFLAGS(%rsp)
movl RIP(%rsp),%edx /* User %eip */
@@ -226,7 +225,7 @@ sysexit_from_sys_call:
.endm

.macro auditsys_exit exit
- testl $(_TIF_ALLWORK_MASK & ~_TIF_SYSCALL_AUDIT),TI_flags+THREAD_INFO(%rsp,RIP)
+ testl $(_TIF_ALLWORK_MASK & ~_TIF_SYSCALL_AUDIT),TI_flags+THREAD_INFO(%rsp,SIZEOF_PTREGS)
jnz ia32_ret_from_sys_call
TRACE_IRQS_ON
ENABLE_INTERRUPTS(CLBR_NONE)
@@ -241,7 +240,7 @@ sysexit_from_sys_call:
movl $(_TIF_ALLWORK_MASK & ~_TIF_SYSCALL_AUDIT),%edi
DISABLE_INTERRUPTS(CLBR_NONE)
TRACE_IRQS_OFF
- testl %edi,TI_flags+THREAD_INFO(%rsp,RIP)
+ testl %edi,TI_flags+THREAD_INFO(%rsp,SIZEOF_PTREGS)
jz \exit
CLEAR_RREGS
jmp int_with_check
@@ -263,7 +262,7 @@ sysenter_fix_flags:

sysenter_tracesys:
#ifdef CONFIG_AUDITSYSCALL
- testl $(_TIF_WORK_SYSCALL_ENTRY & ~_TIF_SYSCALL_AUDIT),TI_flags+THREAD_INFO(%rsp,RIP)
+ testl $(_TIF_WORK_SYSCALL_ENTRY & ~_TIF_SYSCALL_AUDIT),TI_flags+THREAD_INFO(%rsp,SIZEOF_PTREGS)
jz sysenter_auditsys
#endif
SAVE_EXTRA_REGS
@@ -312,7 +311,7 @@ ENDPROC(ia32_sysenter_target)
ENTRY(ia32_cstar_target)
CFI_STARTPROC32 simple
CFI_SIGNAL_FRAME
- CFI_DEF_CFA rsp,KERNEL_STACK_OFFSET
+ CFI_DEF_CFA rsp,0
CFI_REGISTER rip,rcx
/*CFI_REGISTER rflags,r11*/
SWAPGS_UNSAFE_STACK
@@ -324,7 +323,7 @@ ENTRY(ia32_cstar_target)
* disabled irqs and here we enable it straight after entry:
*/
ENABLE_INTERRUPTS(CLBR_NONE)
- ALLOC_PT_GPREGS_ON_STACK 8 /* +8: space for orig_ax */
+ ALLOC_PT_GPREGS_ON_STACK 6*8 /* 6*8: space for orig_ax and iret frame */
SAVE_C_REGS_EXCEPT_RCX_R891011
movl %eax,%eax /* zero extension */
movq %rax,ORIG_RAX(%rsp)
@@ -347,8 +346,8 @@ ENTRY(ia32_cstar_target)
1: movl (%r8),%r9d
_ASM_EXTABLE(1b,ia32_badarg)
ASM_CLAC
- orl $TS_COMPAT,TI_status+THREAD_INFO(%rsp,RIP)
- testl $_TIF_WORK_SYSCALL_ENTRY,TI_flags+THREAD_INFO(%rsp,RIP)
+ orl $TS_COMPAT,TI_status+THREAD_INFO(%rsp,SIZEOF_PTREGS)
+ testl $_TIF_WORK_SYSCALL_ENTRY,TI_flags+THREAD_INFO(%rsp,SIZEOF_PTREGS)
CFI_REMEMBER_STATE
jnz cstar_tracesys
cmpq $IA32_NR_syscalls-1,%rax
@@ -365,10 +364,10 @@ cstar_dispatch:
movq %rax,RAX(%rsp)
DISABLE_INTERRUPTS(CLBR_NONE)
TRACE_IRQS_OFF
- testl $_TIF_ALLWORK_MASK,TI_flags+THREAD_INFO(%rsp,RIP)
+ testl $_TIF_ALLWORK_MASK,TI_flags+THREAD_INFO(%rsp,SIZEOF_PTREGS)
jnz sysretl_audit
sysretl_from_sys_call:
- andl $~TS_COMPAT,TI_status+THREAD_INFO(%rsp,RIP)
+ andl $~TS_COMPAT,TI_status+THREAD_INFO(%rsp,SIZEOF_PTREGS)
RESTORE_RSI_RDI_RDX
movl RIP(%rsp),%ecx
CFI_REGISTER rip,rcx
@@ -403,7 +402,7 @@ sysretl_audit:

cstar_tracesys:
#ifdef CONFIG_AUDITSYSCALL
- testl $(_TIF_WORK_SYSCALL_ENTRY & ~_TIF_SYSCALL_AUDIT),TI_flags+THREAD_INFO(%rsp,RIP)
+ testl $(_TIF_WORK_SYSCALL_ENTRY & ~_TIF_SYSCALL_AUDIT),TI_flags+THREAD_INFO(%rsp,SIZEOF_PTREGS)
jz cstar_auditsys
#endif
xchgl %r9d,%ebp
@@ -470,8 +469,8 @@ ENTRY(ia32_syscall)
this could be a problem. */
ALLOC_PT_GPREGS_ON_STACK
SAVE_C_REGS_EXCEPT_R891011
- orl $TS_COMPAT,TI_status+THREAD_INFO(%rsp,RIP)
- testl $_TIF_WORK_SYSCALL_ENTRY,TI_flags+THREAD_INFO(%rsp,RIP)
+ orl $TS_COMPAT,TI_status+THREAD_INFO(%rsp,SIZEOF_PTREGS)
+ testl $_TIF_WORK_SYSCALL_ENTRY,TI_flags+THREAD_INFO(%rsp,SIZEOF_PTREGS)
jnz ia32_tracesys
cmpq $(IA32_NR_syscalls-1),%rax
ja ia32_badsys
diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h
index e82e95a..3eb9d05 100644
--- a/arch/x86/include/asm/thread_info.h
+++ b/arch/x86/include/asm/thread_info.h
@@ -149,7 +149,6 @@ struct thread_info {
#define _TIF_WORK_CTXSW_NEXT (_TIF_WORK_CTXSW)

#define STACK_WARN (THREAD_SIZE/8)
-#define KERNEL_STACK_OFFSET (5*(BITS_PER_LONG/8))

/*
* macros/functions for gaining access to the thread information structure
@@ -163,8 +162,7 @@ DECLARE_PER_CPU(unsigned long, kernel_stack);
static inline struct thread_info *current_thread_info(void)
{
struct thread_info *ti;
- ti = (void *)(this_cpu_read_stable(kernel_stack) +
- KERNEL_STACK_OFFSET - THREAD_SIZE);
+ ti = (void *)(this_cpu_read_stable(kernel_stack) - THREAD_SIZE);
return ti;
}

@@ -184,13 +182,13 @@ static inline unsigned long current_stack_pointer(void)
/* how to get the thread information struct from ASM */
#define GET_THREAD_INFO(reg) \
_ASM_MOV PER_CPU_VAR(kernel_stack),reg ; \
- _ASM_SUB $(THREAD_SIZE-KERNEL_STACK_OFFSET),reg ;
+ _ASM_SUB $(THREAD_SIZE),reg ;

/*
* Same if PER_CPU_VAR(kernel_stack) is, perhaps with some offset, already in
* a certain register (to be used in assembler memory operands).
*/
-#define THREAD_INFO(reg, off) KERNEL_STACK_OFFSET+(off)-THREAD_SIZE(reg)
+#define THREAD_INFO(reg, off) (off)-THREAD_SIZE(reg)

#endif

diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index cb56925..a47ca751 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1116,7 +1116,7 @@ static __init int setup_disablecpuid(char *arg)
__setup("clearcpuid=", setup_disablecpuid);

DEFINE_PER_CPU(unsigned long, kernel_stack) =
- (unsigned long)&init_thread_union - KERNEL_STACK_OFFSET + THREAD_SIZE;
+ (unsigned long)&init_thread_union + THREAD_SIZE;
EXPORT_PER_CPU_SYMBOL(kernel_stack);

#ifdef CONFIG_X86_64
diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index 99e3abc..54737e5 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -237,7 +237,7 @@ ENDPROC(native_usergs_sysret64)
ENTRY(system_call)
CFI_STARTPROC simple
CFI_SIGNAL_FRAME
- CFI_DEF_CFA rsp,KERNEL_STACK_OFFSET
+ CFI_DEF_CFA rsp,0
CFI_REGISTER rip,rcx
/*CFI_REGISTER rflags,r11*/
SWAPGS_UNSAFE_STACK
@@ -256,13 +256,13 @@ GLOBAL(system_call_after_swapgs)
* and short:
*/
ENABLE_INTERRUPTS(CLBR_NONE)
- ALLOC_PT_GPREGS_ON_STACK 8 /* +8: space for orig_ax */
+ ALLOC_PT_GPREGS_ON_STACK 6*8 /* 6*8: space for orig_ax and iret frame */
SAVE_C_REGS_EXCEPT_RAX_RCX
movq $-ENOSYS,RAX(%rsp)
movq_cfi rax,ORIG_RAX
movq %rcx,RIP(%rsp)
CFI_REL_OFFSET rip,RIP
- testl $_TIF_WORK_SYSCALL_ENTRY,TI_flags+THREAD_INFO(%rsp,RIP)
+ testl $_TIF_WORK_SYSCALL_ENTRY,TI_flags+THREAD_INFO(%rsp,SIZEOF_PTREGS)
jnz tracesys
system_call_fastpath:
#if __SYSCALL_MASK == ~0
@@ -280,7 +280,7 @@ system_call_fastpath:
* Has incomplete stack frame and undefined top of stack.
*/
ret_from_sys_call:
- testl $_TIF_ALLWORK_MASK,TI_flags+THREAD_INFO(%rsp,RIP)
+ testl $_TIF_ALLWORK_MASK,TI_flags+THREAD_INFO(%rsp,SIZEOF_PTREGS)
jnz int_ret_from_sys_call_fixup /* Go the the slow path */

LOCKDEP_SYS_EXIT
diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index 8f3ebfe..ecda2bd 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -311,8 +311,7 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
arch_end_context_switch(next_p);

this_cpu_write(kernel_stack,
- (unsigned long)task_stack_page(next_p) +
- THREAD_SIZE - KERNEL_STACK_OFFSET);
+ (unsigned long)task_stack_page(next_p) + THREAD_SIZE);

/*
* Restore %gs if needed (which is common)
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 5a2c029..975d342 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -414,8 +414,7 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
this_cpu_write(__preempt_count, task_thread_info(next_p)->saved_preempt_count);

this_cpu_write(kernel_stack,
- (unsigned long)task_stack_page(next_p) +
- THREAD_SIZE - KERNEL_STACK_OFFSET);
+ (unsigned long)task_stack_page(next_p) + THREAD_SIZE);

/*
* Now maybe reload the debug registers and handle I/O bitmaps
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index febc6aa..73dfb1f 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -811,8 +811,7 @@ static int do_boot_cpu(int apicid, int cpu, struct task_struct *idle)
initial_gs = per_cpu_offset(cpu);
#endif
per_cpu(kernel_stack, cpu) =
- (unsigned long)task_stack_page(idle) -
- KERNEL_STACK_OFFSET + THREAD_SIZE;
+ (unsigned long)task_stack_page(idle) + THREAD_SIZE;
early_gdt_descr.address = (unsigned long)get_cpu_gdt_table(cpu);
initial_code = (unsigned long)start_secondary;
stack_start = idle->thread.sp;
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index c74f2f5..1e2bcf55 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -174,7 +174,8 @@ void ist_begin_non_atomic(struct pt_regs *regs)
* will catch asm bugs and any attempt to use ist_preempt_enable
* from double_fault.
*/
- BUG_ON(((current_stack_pointer() ^ this_cpu_read_stable(kernel_stack))
+ BUG_ON(((current_stack_pointer() ^
+ (this_cpu_read_stable(kernel_stack) - 1))
& ~(THREAD_SIZE - 1)) != 0);

preempt_count_sub(HARDIRQ_OFFSET);
diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c
index 4c071ae..4e71a3b 100644
--- a/arch/x86/xen/smp.c
+++ b/arch/x86/xen/smp.c
@@ -452,8 +452,7 @@ static int xen_cpu_up(unsigned int cpu, struct task_struct *idle)
clear_tsk_thread_flag(idle, TIF_FORK);
#endif
per_cpu(kernel_stack, cpu) =
- (unsigned long)task_stack_page(idle) -
- KERNEL_STACK_OFFSET + THREAD_SIZE;
+ (unsigned long)task_stack_page(idle) + THREAD_SIZE;

xen_setup_runstate_info(cpu);
xen_setup_timer(cpu);
--
1.8.1.4

2015-02-25 00:01:56

by Denys Vlasenko

[permalink] [raw]
Subject: [PATCH 6/7 v2] x86: save r11 into pt_regs->eflags on SYSCALL64 fastpath

Before this patch, r11 was saved in pt_regs->r11.
Which looks natural, but requires messy shuffling to/from iret frame
whenever ptrace or e.g. iopl wants to modify flags- because
that's how this register are used by SYSCALL/SYSRET.

This patch saves r11 in pt_regs->flags,
and uses that value for SYSRET64 insn. Shuffling is eliminated.

stub_iopl is no longer needed: pt_regs->flags needs no fixing up.

Testing shows that syscall fast path is ~54.3 ns before
and after the patch (on 2.7 GHz Sandy Bridge CPU).

Changes since v1: better comments and commit message

Signed-off-by: Denys Vlasenko <[email protected]>
CC: Linus Torvalds <[email protected]>
CC: Steven Rostedt <[email protected]>
CC: Ingo Molnar <[email protected]>
CC: Borislav Petkov <[email protected]>
CC: "H. Peter Anvin" <[email protected]>
CC: Andy Lutomirski <[email protected]>
CC: Oleg Nesterov <[email protected]>
CC: Frederic Weisbecker <[email protected]>
CC: Alexei Starovoitov <[email protected]>
CC: Will Drewry <[email protected]>
CC: Kees Cook <[email protected]>
CC: [email protected]
CC: [email protected]
---
arch/x86/include/asm/calling.h | 20 ++++++++++++++------
arch/x86/kernel/entry_64.S | 37 +++++++++++--------------------------
arch/x86/syscalls/syscall_64.tbl | 2 +-
arch/x86/um/sys_call_table_64.c | 2 +-
4 files changed, 27 insertions(+), 34 deletions(-)

diff --git a/arch/x86/include/asm/calling.h b/arch/x86/include/asm/calling.h
index f1a962f..4b5f7bf 100644
--- a/arch/x86/include/asm/calling.h
+++ b/arch/x86/include/asm/calling.h
@@ -95,9 +95,11 @@ For 32-bit we have the following conventions - kernel is built with
CFI_ADJUST_CFA_OFFSET 15*8+\addskip
.endm

- .macro SAVE_C_REGS_HELPER offset=0 rax=1 rcx=1 r8plus=1
- .if \r8plus
+ .macro SAVE_C_REGS_HELPER offset=0 rax=1 rcx=1 r8910=1 r11=1
+ .if \r11
movq_cfi r11, 6*8+\offset
+ .endif
+ .if \r8910
movq_cfi r10, 7*8+\offset
movq_cfi r9, 8*8+\offset
movq_cfi r8, 9*8+\offset
@@ -113,16 +115,19 @@ For 32-bit we have the following conventions - kernel is built with
movq_cfi rdi, 14*8+\offset
.endm
.macro SAVE_C_REGS offset=0
- SAVE_C_REGS_HELPER \offset, 1, 1, 1
+ SAVE_C_REGS_HELPER \offset, 1, 1, 1, 1
.endm
.macro SAVE_C_REGS_EXCEPT_RAX_RCX offset=0
- SAVE_C_REGS_HELPER \offset, 0, 0, 1
+ SAVE_C_REGS_HELPER \offset, 0, 0, 1, 1
.endm
.macro SAVE_C_REGS_EXCEPT_R891011
- SAVE_C_REGS_HELPER 0, 1, 1, 0
+ SAVE_C_REGS_HELPER 0, 1, 1, 0, 0
.endm
.macro SAVE_C_REGS_EXCEPT_RCX_R891011
- SAVE_C_REGS_HELPER 0, 1, 0, 0
+ SAVE_C_REGS_HELPER 0, 1, 0, 0, 0
+ .endm
+ .macro SAVE_C_REGS_EXCEPT_RAX_RCX_R11
+ SAVE_C_REGS_HELPER 0, 0, 0, 1, 0
.endm

.macro SAVE_EXTRA_REGS offset=0
@@ -179,6 +184,9 @@ For 32-bit we have the following conventions - kernel is built with
.macro RESTORE_C_REGS_EXCEPT_R11
RESTORE_C_REGS_HELPER 1,1,0,1,1
.endm
+ .macro RESTORE_C_REGS_EXCEPT_RCX_R11
+ RESTORE_C_REGS_HELPER 1,0,0,1,1
+ .endm
.macro RESTORE_RSI_RDI
RESTORE_C_REGS_HELPER 0,0,0,0,0
.endm
diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index 54737e5..2091e2e 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -121,14 +121,12 @@ ENDPROC(native_usergs_sysret64)
#endif

/*
- * C code is not supposed to know about undefined top of stack. Every time
- * a C function with an pt_regs argument is called from the SYSCALL based
- * fast path FIXUP_TOP_OF_STACK is needed.
+ * C code is not supposed to know that the iret frame is not populated.
+ * Every time a C function with an pt_regs argument is called from
+ * the SYSCALL based fast path FIXUP_TOP_OF_STACK is needed.
* RESTORE_TOP_OF_STACK syncs the syscall state after any possible ptregs
* manipulation.
*/
-
- /* %rsp:at FRAMEEND */
.macro FIXUP_TOP_OF_STACK tmp offset=0
movq PER_CPU_VAR(old_rsp),\tmp
movq \tmp,RSP+\offset(%rsp)
@@ -136,15 +134,13 @@ ENDPROC(native_usergs_sysret64)
movq $__USER_CS,CS+\offset(%rsp)
movq RIP+\offset(%rsp),\tmp /* get rip */
movq \tmp,RCX+\offset(%rsp) /* copy it to rcx as sysret would do */
- movq R11+\offset(%rsp),\tmp /* get eflags */
- movq \tmp,EFLAGS+\offset(%rsp)
+ movq EFLAGS+\offset(%rsp),\tmp /* ditto for rflags->r11 */
+ movq \tmp,R11+\offset(%rsp)
.endm

.macro RESTORE_TOP_OF_STACK tmp offset=0
movq RSP+\offset(%rsp),\tmp
movq \tmp,PER_CPU_VAR(old_rsp)
- movq EFLAGS+\offset(%rsp),\tmp
- movq \tmp,R11+\offset(%rsp)
.endm

/*
@@ -257,9 +253,10 @@ GLOBAL(system_call_after_swapgs)
*/
ENABLE_INTERRUPTS(CLBR_NONE)
ALLOC_PT_GPREGS_ON_STACK 6*8 /* 6*8: space for orig_ax and iret frame */
- SAVE_C_REGS_EXCEPT_RAX_RCX
+ SAVE_C_REGS_EXCEPT_RAX_RCX_R11
movq $-ENOSYS,RAX(%rsp)
movq_cfi rax,ORIG_RAX
+ movq %r11,EFLAGS(%rsp)
movq %rcx,RIP(%rsp)
CFI_REL_OFFSET rip,RIP
testl $_TIF_WORK_SYSCALL_ENTRY,TI_flags+THREAD_INFO(%rsp,SIZEOF_PTREGS)
@@ -277,7 +274,7 @@ system_call_fastpath:
movq %rax,RAX(%rsp)
/*
* Syscall return path ending with SYSRET (fast path)
- * Has incomplete stack frame and undefined top of stack.
+ * Has incompletely filled pt_regs, iret frame is also incomplete.
*/
ret_from_sys_call:
testl $_TIF_ALLWORK_MASK,TI_flags+THREAD_INFO(%rsp,SIZEOF_PTREGS)
@@ -291,9 +288,10 @@ ret_from_sys_call:
* sysretq will re-enable interrupts:
*/
TRACE_IRQS_ON
- RESTORE_C_REGS_EXCEPT_RCX
- movq RIP(%rsp),%rcx
+ RESTORE_C_REGS_EXCEPT_RCX_R11
+ movq RIP(%rsp),%rcx
CFI_REGISTER rip,rcx
+ movq EFLAGS(%rsp),%r11
/*CFI_REGISTER rflags,r11*/
movq PER_CPU_VAR(old_rsp), %rsp
/*
@@ -422,22 +420,9 @@ ENTRY(stub_\func)
END(stub_\func)
.endm

- .macro FIXED_FRAME label,func
-ENTRY(\label)
- CFI_STARTPROC
- DEFAULT_FRAME 0, 8 /* offset 8: return address */
- FIXUP_TOP_OF_STACK %r11, 8
- call \func
- RESTORE_TOP_OF_STACK %r11, 8
- ret
- CFI_ENDPROC
-END(\label)
- .endm
-
FORK_LIKE clone
FORK_LIKE fork
FORK_LIKE vfork
- FIXED_FRAME stub_iopl, sys_iopl

ENTRY(stub_execve)
CFI_STARTPROC
diff --git a/arch/x86/syscalls/syscall_64.tbl b/arch/x86/syscalls/syscall_64.tbl
index 8d656fb..9ef32d5 100644
--- a/arch/x86/syscalls/syscall_64.tbl
+++ b/arch/x86/syscalls/syscall_64.tbl
@@ -178,7 +178,7 @@
169 common reboot sys_reboot
170 common sethostname sys_sethostname
171 common setdomainname sys_setdomainname
-172 common iopl stub_iopl
+172 common iopl sys_iopl
173 common ioperm sys_ioperm
174 64 create_module
175 common init_module sys_init_module
diff --git a/arch/x86/um/sys_call_table_64.c b/arch/x86/um/sys_call_table_64.c
index 5cdfa9d..a75d8700 100644
--- a/arch/x86/um/sys_call_table_64.c
+++ b/arch/x86/um/sys_call_table_64.c
@@ -16,7 +16,7 @@
*/

/* Not going to be implemented by UML, since we have no hardware. */
-#define stub_iopl sys_ni_syscall
+#define sys_iopl sys_ni_syscall
#define sys_ioperm sys_ni_syscall

/*
--
1.8.1.4

2015-02-25 00:02:09

by Denys Vlasenko

[permalink] [raw]
Subject: [PATCH 7/7 v2] x86: save user %rsp in pt_regs->sp

PER_CPU(old_rsp) usage is simplified - now it is used only
as temp storage, and userspace stack pointer is immediately stored
in pt_regs->sp on syscall entry, instead of being used later,
on syscall exit. This allows to get rid of thread_struct::usersp.

The lazy store of pt_regs->cs and pt_regs->ss is retained -
tests have shown that these insns do take ~2 cycles on fast path.

Signed-off-by: Denys Vlasenko <[email protected]>
CC: Linus Torvalds <[email protected]>
CC: Steven Rostedt <[email protected]>
CC: Ingo Molnar <[email protected]>
CC: Borislav Petkov <[email protected]>
CC: "H. Peter Anvin" <[email protected]>
CC: Andy Lutomirski <[email protected]>
CC: Oleg Nesterov <[email protected]>
CC: Frederic Weisbecker <[email protected]>
CC: Alexei Starovoitov <[email protected]>
CC: Will Drewry <[email protected]>
CC: Kees Cook <[email protected]>
CC: [email protected]
CC: [email protected]
---
arch/x86/include/asm/compat.h | 2 +-
arch/x86/include/asm/processor.h | 6 ------
arch/x86/include/asm/ptrace.h | 8 ++------
arch/x86/kernel/entry_64.S | 22 +++++++++-------------
arch/x86/kernel/perf_regs.c | 2 +-
arch/x86/kernel/process_64.c | 8 +-------
6 files changed, 14 insertions(+), 34 deletions(-)

diff --git a/arch/x86/include/asm/compat.h b/arch/x86/include/asm/compat.h
index 59c6c40..acdee09 100644
--- a/arch/x86/include/asm/compat.h
+++ b/arch/x86/include/asm/compat.h
@@ -301,7 +301,7 @@ static inline void __user *arch_compat_alloc_user_space(long len)
sp = task_pt_regs(current)->sp;
} else {
/* -128 for the x32 ABI redzone */
- sp = this_cpu_read(old_rsp) - 128;
+ sp = task_pt_regs(current)->sp - 128;
}

return (void __user *)round_down(sp - len, 16);
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index a092a0c..ce4aadf 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -474,7 +474,6 @@ struct thread_struct {
#ifdef CONFIG_X86_32
unsigned long sysenter_cs;
#else
- unsigned long usersp; /* Copy from PDA */
unsigned short es;
unsigned short ds;
unsigned short fsindex;
@@ -935,11 +934,6 @@ extern unsigned long thread_saved_pc(struct task_struct *tsk);
#define task_pt_regs(tsk) ((struct pt_regs *)(tsk)->thread.sp0 - 1)
extern unsigned long KSTK_ESP(struct task_struct *task);

-/*
- * User space RSP while inside the SYSCALL fast path
- */
-DECLARE_PER_CPU(unsigned long, old_rsp);
-
#endif /* CONFIG_X86_64 */

extern void start_thread(struct pt_regs *regs, unsigned long new_ip,
diff --git a/arch/x86/include/asm/ptrace.h b/arch/x86/include/asm/ptrace.h
index 4077d96..74bb2e0 100644
--- a/arch/x86/include/asm/ptrace.h
+++ b/arch/x86/include/asm/ptrace.h
@@ -145,12 +145,8 @@ static inline bool user_64bit_mode(struct pt_regs *regs)
#endif
}

-#define current_user_stack_pointer() this_cpu_read(old_rsp)
-/* ia32 vs. x32 difference */
-#define compat_user_stack_pointer() \
- (test_thread_flag(TIF_IA32) \
- ? current_pt_regs()->sp \
- : this_cpu_read(old_rsp))
+#define current_user_stack_pointer() current_pt_regs()->sp
+#define compat_user_stack_pointer() current_pt_regs()->sp
#endif

#ifdef CONFIG_X86_32
diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index 2091e2e..c0352914 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -128,8 +128,6 @@ ENDPROC(native_usergs_sysret64)
* manipulation.
*/
.macro FIXUP_TOP_OF_STACK tmp offset=0
- movq PER_CPU_VAR(old_rsp),\tmp
- movq \tmp,RSP+\offset(%rsp)
movq $__USER_DS,SS+\offset(%rsp)
movq $__USER_CS,CS+\offset(%rsp)
movq RIP+\offset(%rsp),\tmp /* get rip */
@@ -139,8 +137,7 @@ ENDPROC(native_usergs_sysret64)
.endm

.macro RESTORE_TOP_OF_STACK tmp offset=0
- movq RSP+\offset(%rsp),\tmp
- movq \tmp,PER_CPU_VAR(old_rsp)
+ /* nothing to do */
.endm

/*
@@ -222,9 +219,6 @@ ENDPROC(native_usergs_sysret64)
* Interrupts are off on entry.
* Only called from user space.
*
- * XXX if we had a free scratch register we could save the RSP into the stack frame
- * and report it properly in ps. Unfortunately we haven't.
- *
* When user can change the frames always force IRET. That is because
* it deals with uncanonical addresses better. SYSRET has trouble
* with them due to bugs in both AMD and Intel CPUs.
@@ -253,11 +247,13 @@ GLOBAL(system_call_after_swapgs)
*/
ENABLE_INTERRUPTS(CLBR_NONE)
ALLOC_PT_GPREGS_ON_STACK 6*8 /* 6*8: space for orig_ax and iret frame */
+ movq %rcx,RIP(%rsp)
+ movq %r11,EFLAGS(%rsp)
+ movq PER_CPU_VAR(old_rsp),%rcx
+ movq %rcx,RSP(%rsp)
+ movq_cfi rax,ORIG_RAX
SAVE_C_REGS_EXCEPT_RAX_RCX_R11
movq $-ENOSYS,RAX(%rsp)
- movq_cfi rax,ORIG_RAX
- movq %r11,EFLAGS(%rsp)
- movq %rcx,RIP(%rsp)
CFI_REL_OFFSET rip,RIP
testl $_TIF_WORK_SYSCALL_ENTRY,TI_flags+THREAD_INFO(%rsp,SIZEOF_PTREGS)
jnz tracesys
@@ -293,7 +289,7 @@ ret_from_sys_call:
CFI_REGISTER rip,rcx
movq EFLAGS(%rsp),%r11
/*CFI_REGISTER rflags,r11*/
- movq PER_CPU_VAR(old_rsp), %rsp
+ movq RSP(%rsp),%rsp
/*
* 64bit SYSRET restores rip from rcx,
* rflags from r11 (but RF and VM bits are forced to 0),
@@ -307,7 +303,7 @@ int_ret_from_sys_call_fixup:
FIXUP_TOP_OF_STACK %r11
jmp int_ret_from_sys_call

- /* Do syscall tracing */
+ /* Do syscall entry tracing */
tracesys:
movq %rsp, %rdi
movl $AUDIT_ARCH_X86_64, %esi
@@ -346,7 +342,7 @@ tracesys_phase2:

/*
* Syscall return path ending with IRET.
- * Has correct top of stack, but partial stack frame.
+ * Has correct iret frame.
*/
GLOBAL(int_ret_from_sys_call)
DISABLE_INTERRUPTS(CLBR_NONE)
diff --git a/arch/x86/kernel/perf_regs.c b/arch/x86/kernel/perf_regs.c
index 781861c..02a8720 100644
--- a/arch/x86/kernel/perf_regs.c
+++ b/arch/x86/kernel/perf_regs.c
@@ -177,7 +177,7 @@ void perf_get_regs_user(struct perf_regs *regs_user,
* than just blindly copying user_regs.
*/
regs_user->abi = PERF_SAMPLE_REGS_ABI_64;
- regs_user_copy->sp = this_cpu_read(old_rsp);
+ regs_user_copy->sp = user_regs->sp;
regs_user_copy->cs = __USER_CS;
regs_user_copy->ss = __USER_DS;
regs_user_copy->cx = -1; /* usually contains garbage */
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 975d342..ab79139 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -161,7 +161,6 @@ int copy_thread(unsigned long clone_flags, unsigned long sp,
p->thread.sp0 = (unsigned long)task_stack_page(p) + THREAD_SIZE;
childregs = task_pt_regs(p);
p->thread.sp = (unsigned long) childregs;
- p->thread.usersp = me->thread.usersp;
set_tsk_thread_flag(p, TIF_FORK);
p->thread.io_bitmap_ptr = NULL;

@@ -235,10 +234,8 @@ start_thread_common(struct pt_regs *regs, unsigned long new_ip,
loadsegment(es, _ds);
loadsegment(ds, _ds);
load_gs_index(0);
- current->thread.usersp = new_sp;
regs->ip = new_ip;
regs->sp = new_sp;
- this_cpu_write(old_rsp, new_sp);
regs->cs = _cs;
regs->ss = _ss;
regs->flags = X86_EFLAGS_IF;
@@ -401,8 +398,6 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
/*
* Switch the PDA and FPU contexts.
*/
- prev->usersp = this_cpu_read(old_rsp);
- this_cpu_write(old_rsp, next->usersp);
this_cpu_write(current_task, next_p);

/*
@@ -601,6 +596,5 @@ long sys_arch_prctl(int code, unsigned long addr)

unsigned long KSTK_ESP(struct task_struct *task)
{
- return (test_tsk_thread_flag(task, TIF_IA32)) ?
- (task_pt_regs(task)->sp) : ((task)->thread.usersp);
+ return task_pt_regs(task)->sp;
}
--
1.8.1.4

2015-02-25 00:03:38

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH 1/7] x86: ia32entry.S: use more understandable constant

On Tue, Feb 24, 2015 at 4:00 PM, Denys Vlasenko <[email protected]> wrote:
> The last instance of "mysterious" SS+8 constant is replaced by SIZEOF_PTREGS.

Applied and checked by reading the macro definition :)

>
> Signed-off-by: Denys Vlasenko <[email protected]>
> CC: Linus Torvalds <[email protected]>
> CC: Steven Rostedt <[email protected]>
> CC: Ingo Molnar <[email protected]>
> CC: Borislav Petkov <[email protected]>
> CC: "H. Peter Anvin" <[email protected]>
> CC: Andy Lutomirski <[email protected]>
> CC: Oleg Nesterov <[email protected]>
> CC: Frederic Weisbecker <[email protected]>
> CC: Alexei Starovoitov <[email protected]>
> CC: Will Drewry <[email protected]>
> CC: Kees Cook <[email protected]>
> CC: [email protected]
> CC: [email protected]
> ---
> arch/x86/ia32/ia32entry.S | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/x86/ia32/ia32entry.S b/arch/x86/ia32/ia32entry.S
> index 6dcd372..ed97463 100644
> --- a/arch/x86/ia32/ia32entry.S
> +++ b/arch/x86/ia32/ia32entry.S
> @@ -536,7 +536,7 @@ ia32_ptregs_common:
> CFI_ENDPROC
> CFI_STARTPROC32 simple
> CFI_SIGNAL_FRAME
> - CFI_DEF_CFA rsp,SS+8
> + CFI_DEF_CFA rsp,SIZEOF_PTREGS
> CFI_REL_OFFSET rax,RAX
> CFI_REL_OFFSET rcx,RCX
> CFI_REL_OFFSET rdx,RDX
> --
> 1.8.1.4
>



--
Andy Lutomirski
AMA Capital Management, LLC

2015-02-25 00:05:19

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH 2/7] x86: entry.S: simplify optimistic SYSRET

On Tue, Feb 24, 2015 at 4:00 PM, Denys Vlasenko <[email protected]> wrote:
> Avoid redundant load of %r11 (it is already loaded a few instructions before).
> Do not needlessly increment %rsp - we are going to return to userspace
> via SYSRET, this insn doesn't use stack for return.
>
> Signed-off-by: Denys Vlasenko <[email protected]>
> CC: Linus Torvalds <[email protected]>
> CC: Steven Rostedt <[email protected]>
> CC: Ingo Molnar <[email protected]>
> CC: Borislav Petkov <[email protected]>
> CC: "H. Peter Anvin" <[email protected]>
> CC: Andy Lutomirski <[email protected]>
> CC: Oleg Nesterov <[email protected]>
> CC: Frederic Weisbecker <[email protected]>
> CC: Alexei Starovoitov <[email protected]>
> CC: Will Drewry <[email protected]>
> CC: Kees Cook <[email protected]>
> CC: [email protected]
> CC: [email protected]
> ---
> arch/x86/include/asm/calling.h | 3 +++
> arch/x86/kernel/entry_64.S | 5 ++---
> 2 files changed, 5 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/include/asm/calling.h b/arch/x86/include/asm/calling.h
> index 3374235..f1a962f 100644
> --- a/arch/x86/include/asm/calling.h
> +++ b/arch/x86/include/asm/calling.h
> @@ -176,6 +176,9 @@ For 32-bit we have the following conventions - kernel is built with
> .macro RESTORE_C_REGS_EXCEPT_RCX
> RESTORE_C_REGS_HELPER 1,0,1,1,1
> .endm
> + .macro RESTORE_C_REGS_EXCEPT_R11
> + RESTORE_C_REGS_HELPER 1,1,0,1,1
> + .endm
> .macro RESTORE_RSI_RDI
> RESTORE_C_REGS_HELPER 0,0,0,0,0
> .endm
> diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
> index 4b3f3c1..e5e0a55 100644
> --- a/arch/x86/kernel/entry_64.S
> +++ b/arch/x86/kernel/entry_64.S
> @@ -756,9 +756,8 @@ retint_swapgs: /* return to user-space */
> */
> irq_return_via_sysret:
> CFI_REMEMBER_STATE
> - RESTORE_C_REGS
> - REMOVE_PT_GPREGS_FROM_STACK 8
> - movq (RSP-RIP)(%rsp),%rsp
> + RESTORE_C_REGS_EXCEPT_R11
> + movq RSP(%rsp),%rsp

I think this needs a comment.

--Andy

> USERGS_SYSRET64
> CFI_RESTORE_STATE
>
> --
> 1.8.1.4
>



--
Andy Lutomirski
AMA Capital Management, LLC

2015-02-25 00:32:17

by Denys Vlasenko

[permalink] [raw]
Subject: Re: [PATCH 2/7] x86: entry.S: simplify optimistic SYSRET

On Wed, Feb 25, 2015 at 1:04 AM, Andy Lutomirski <[email protected]> wrote:
>> --- a/arch/x86/kernel/entry_64.S
>> +++ b/arch/x86/kernel/entry_64.S
>> @@ -756,9 +756,8 @@ retint_swapgs: /* return to user-space */
>> */
>> irq_return_via_sysret:
>> CFI_REMEMBER_STATE
>> - RESTORE_C_REGS
>> - REMOVE_PT_GPREGS_FROM_STACK 8
>> - movq (RSP-RIP)(%rsp),%rsp
>> + RESTORE_C_REGS_EXCEPT_R11
>> + movq RSP(%rsp),%rsp
>
> I think this needs a comment.

I assume you mean "a comment why r11 doesn't need restoring".
I sent patch v2.

2015-02-25 01:11:46

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH 5/7 v2] x86: get rid of KERNEL_STACK_OFFSET

On Tue, Feb 24, 2015 at 4:00 PM, Denys Vlasenko <[email protected]> wrote:
> Changes since v1: BUG_ON fix in traps.c

For future reference: This should be below the ---

--Andy

2015-02-25 23:05:54

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH 4/7] x86: entry.S: use JZ mnemonic after TEST, not JE

On Wed, Feb 25, 2015 at 01:00:16AM +0100, Denys Vlasenko wrote:
> After TEST insn, JE actually performs "jump if zero",
> let's use JZ mnemonic instead.
>
> No code changes, but less confusion.
>
> Signed-off-by: Denys Vlasenko <[email protected]>
> CC: Linus Torvalds <[email protected]>
> CC: Steven Rostedt <[email protected]>
> CC: Ingo Molnar <[email protected]>
> CC: Borislav Petkov <[email protected]>
> CC: "H. Peter Anvin" <[email protected]>
> CC: Andy Lutomirski <[email protected]>
> CC: Oleg Nesterov <[email protected]>
> CC: Frederic Weisbecker <[email protected]>
> CC: Alexei Starovoitov <[email protected]>
> CC: Will Drewry <[email protected]>
> CC: Kees Cook <[email protected]>
> CC: [email protected]
> CC: [email protected]
> ---
> arch/x86/kernel/entry_64.S | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)

Acked-by: Borislav Petkov <[email protected]>

--
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.
--

2015-02-26 14:13:50

by Denys Vlasenko

[permalink] [raw]
Subject: Re: [PATCH 4/7] x86: entry.S: use JZ mnemonic after TEST, not JE

On Thu, Feb 26, 2015 at 12:04 AM, Borislav Petkov <[email protected]> wrote:
> On Wed, Feb 25, 2015 at 01:00:16AM +0100, Denys Vlasenko wrote:
>> After TEST insn, JE actually performs "jump if zero",
>> let's use JZ mnemonic instead.
>>
>> No code changes, but less confusion.
>>
>> Signed-off-by: Denys Vlasenko <[email protected]>
>> CC: Linus Torvalds <[email protected]>
>> CC: Steven Rostedt <[email protected]>
>> CC: Ingo Molnar <[email protected]>
>> CC: Borislav Petkov <[email protected]>
>> CC: "H. Peter Anvin" <[email protected]>
>> CC: Andy Lutomirski <[email protected]>
>> CC: Oleg Nesterov <[email protected]>
>> CC: Frederic Weisbecker <[email protected]>
>> CC: Alexei Starovoitov <[email protected]>
>> CC: Will Drewry <[email protected]>
>> CC: Kees Cook <[email protected]>
>> CC: [email protected]
>> CC: [email protected]
>> ---
>> arch/x86/kernel/entry_64.S | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> Acked-by: Borislav Petkov <[email protected]>

Andy, please don't take this patch yet, there are about a dozen
more instances of TEST which need similar treatment.

2015-02-26 15:16:52

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH 4/7] x86: entry.S: use JZ mnemonic after TEST, not JE

On Thu, Feb 26, 2015 at 6:13 AM, Denys Vlasenko
<[email protected]> wrote:
> On Thu, Feb 26, 2015 at 12:04 AM, Borislav Petkov <[email protected]> wrote:
>> On Wed, Feb 25, 2015 at 01:00:16AM +0100, Denys Vlasenko wrote:
>>> After TEST insn, JE actually performs "jump if zero",
>>> let's use JZ mnemonic instead.
>>>
>>> No code changes, but less confusion.
>>>
>>> Signed-off-by: Denys Vlasenko <[email protected]>
>>> CC: Linus Torvalds <[email protected]>
>>> CC: Steven Rostedt <[email protected]>
>>> CC: Ingo Molnar <[email protected]>
>>> CC: Borislav Petkov <[email protected]>
>>> CC: "H. Peter Anvin" <[email protected]>
>>> CC: Andy Lutomirski <[email protected]>
>>> CC: Oleg Nesterov <[email protected]>
>>> CC: Frederic Weisbecker <[email protected]>
>>> CC: Alexei Starovoitov <[email protected]>
>>> CC: Will Drewry <[email protected]>
>>> CC: Kees Cook <[email protected]>
>>> CC: [email protected]
>>> CC: [email protected]
>>> ---
>>> arch/x86/kernel/entry_64.S | 2 +-
>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> Acked-by: Borislav Petkov <[email protected]>
>
> Andy, please don't take this patch yet, there are about a dozen
> more instances of TEST which need similar treatment.

I'm declaring a temporary moratorium on new development here until the
already-queued stuff is in -tip, looks okay, and we're in an
appropriate part of the cycle. The series is already huge.

IOW I'm not applying this or the other pending patches yet.

--Andy

--
Andy Lutomirski
AMA Capital Management, LLC

2015-02-26 16:00:11

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH 4/7] x86: entry.S: use JZ mnemonic after TEST, not JE

On Thu, Feb 26, 2015 at 07:16:28AM -0800, Andy Lutomirski wrote:
> I'm declaring a temporary moratorium on new development here until
> the already-queued stuff is in -tip, looks okay, and we're in an
> appropriate part of the cycle. The series is already huge.
>
> IOW I'm not applying this or the other pending patches yet.

Nicely conservative, good.

If you have a branch ready which you'd want tested, let me know.

--
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.
--

2015-02-26 16:04:04

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH 4/7] x86: entry.S: use JZ mnemonic after TEST, not JE

On Thu, Feb 26, 2015 at 7:59 AM, Borislav Petkov <[email protected]> wrote:
> On Thu, Feb 26, 2015 at 07:16:28AM -0800, Andy Lutomirski wrote:
>> I'm declaring a temporary moratorium on new development here until
>> the already-queued stuff is in -tip, looks okay, and we're in an
>> appropriate part of the cycle. The series is already huge.
>>
>> IOW I'm not applying this or the other pending patches yet.
>
> Nicely conservative, good.
>
> If you have a branch ready which you'd want tested, let me know.

https://git.kernel.org/cgit/linux/kernel/git/luto/linux.git/log/?h=x86/entry

It was in -next, but that version was broken. I'll put it back -next
and email out the series today, unless something changs.

>
> --
> Regards/Gruss,
> Boris.
>
> ECO tip #101: Trim your mails when you reply.
> --



--
Andy Lutomirski
AMA Capital Management, LLC

2015-02-26 19:22:33

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH 4/7] x86: entry.S: use JZ mnemonic after TEST, not JE

On Thu, Feb 26, 2015 at 08:03:38AM -0800, Andy Lutomirski wrote:
> https://git.kernel.org/cgit/linux/kernel/git/luto/linux.git/log/?h=x86/entry
>
> It was in -next, but that version was broken. I'll put it back -next
> and email out the series today, unless something changs.

Two boxes from the two x86 vendors booted fine with it, nothing strange
in dmesg, all good.

--
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.
--