2017-09-19 18:45:37

by Josh Poimboeuf

[permalink] [raw]
Subject: [PATCH 0/2] x86: Fix inline asm call constraints for clang

Patch 1 is a bug fix for an objtool issue which was uncovered by patch 2.

Patch 2 is the last fix needed for clang to be able to compile and boot
the kernel.


Josh Poimboeuf (2):
objtool: Handle another GCC stack pointer adjustment bug
x86/asm: Fix inline asm call constraints for clang

arch/x86/include/asm/alternative.h | 3 +-
arch/x86/include/asm/asm.h | 9 +++++
arch/x86/include/asm/mshyperv.h | 27 ++++++---------
arch/x86/include/asm/paravirt_types.h | 14 ++++----
arch/x86/include/asm/preempt.h | 15 +++------
arch/x86/include/asm/processor.h | 6 ++--
arch/x86/include/asm/rwsem.h | 6 ++--
arch/x86/include/asm/uaccess.h | 5 ++-
arch/x86/include/asm/xen/hypercall.h | 5 ++-
arch/x86/kvm/emulate.c | 3 +-
arch/x86/kvm/vmx.c | 4 +--
arch/x86/mm/fault.c | 3 +-
tools/objtool/Documentation/stack-validation.txt | 20 ++++++++---
tools/objtool/arch/x86/decode.c | 6 ++--
tools/objtool/check.c | 43 ++++++++++++++++--------
15 files changed, 92 insertions(+), 77 deletions(-)

--
2.13.5


2017-09-19 18:45:40

by Josh Poimboeuf

[permalink] [raw]
Subject: [PATCH 1/2] objtool: Handle another GCC stack pointer adjustment bug

The kbuild bot reported the following warning with GCC 4.4 and a
randconfig:

net/socket.o: warning: objtool: compat_sock_ioctl()+0x1083: stack state mismatch: cfa1=7+160 cfa2=-1+0

This is caused by another GCC non-optimization, where it backs up and
restores the stack pointer for no apparent reason:

2f91: 48 89 e0 mov %rsp,%rax
2f94: 4c 89 e7 mov %r12,%rdi
2f97: 4c 89 f6 mov %r14,%rsi
2f9a: ba 20 00 00 00 mov $0x20,%edx
2f9f: 48 89 c4 mov %rax,%rsp

This issue would have been happily ignored before the following commit:

dd88a0a0c861 ("objtool: Handle GCC stack pointer adjustment bug")

But now that objtool is paying attention to such stack pointer writes
to/from a register, it needs to understand them properly. In this case
that means recognizing that the "mov %rsp, %rax" instruction is
potentially a backup of the stack pointer.

Reported-by: kbuild test robot <[email protected]>
Fixes: dd88a0a0c861 ("objtool: Handle GCC stack pointer adjustment bug")
Signed-off-by: Josh Poimboeuf <[email protected]>
---
tools/objtool/arch/x86/decode.c | 6 +++---
tools/objtool/check.c | 43 +++++++++++++++++++++++++++--------------
2 files changed, 32 insertions(+), 17 deletions(-)

diff --git a/tools/objtool/arch/x86/decode.c b/tools/objtool/arch/x86/decode.c
index 0e8c8ec4fd4e..0f22768c0d4d 100644
--- a/tools/objtool/arch/x86/decode.c
+++ b/tools/objtool/arch/x86/decode.c
@@ -208,14 +208,14 @@ int arch_decode_instruction(struct elf *elf, struct section *sec,
break;

case 0x89:
- if (rex == 0x48 && modrm == 0xe5) {
+ if (rex_w && !rex_r && modrm_mod == 3 && modrm_reg == 4) {

- /* mov %rsp, %rbp */
+ /* mov %rsp, reg */
*type = INSN_STACK;
op->src.type = OP_SRC_REG;
op->src.reg = CFI_SP;
op->dest.type = OP_DEST_REG;
- op->dest.reg = CFI_BP;
+ op->dest.reg = op_to_cfi_reg[modrm_rm][rex_b];
break;
}

diff --git a/tools/objtool/check.c b/tools/objtool/check.c
index f744617c9946..a0c518ecf085 100644
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -1203,24 +1203,39 @@ static int update_insn_state(struct instruction *insn, struct insn_state *state)
switch (op->src.type) {

case OP_SRC_REG:
- if (op->src.reg == CFI_SP && op->dest.reg == CFI_BP) {
+ if (op->src.reg == CFI_SP && op->dest.reg == CFI_BP &&
+ cfa->base == CFI_SP &&
+ regs[CFI_BP].base == CFI_CFA &&
+ regs[CFI_BP].offset == -cfa->offset) {
+
+ /* mov %rsp, %rbp */
+ cfa->base = op->dest.reg;
+ state->bp_scratch = false;
+ }

- if (cfa->base == CFI_SP &&
- regs[CFI_BP].base == CFI_CFA &&
- regs[CFI_BP].offset == -cfa->offset) {
+ else if (op->src.reg == CFI_SP &&
+ op->dest.reg == CFI_BP && state->drap) {

- /* mov %rsp, %rbp */
- cfa->base = op->dest.reg;
- state->bp_scratch = false;
- }
+ /* drap: mov %rsp, %rbp */
+ regs[CFI_BP].base = CFI_BP;
+ regs[CFI_BP].offset = -state->stack_size;
+ state->bp_scratch = false;
+ }

- else if (state->drap) {
+ else if (op->src.reg == CFI_SP && cfa->base == CFI_SP) {

- /* drap: mov %rsp, %rbp */
- regs[CFI_BP].base = CFI_BP;
- regs[CFI_BP].offset = -state->stack_size;
- state->bp_scratch = false;
- }
+ /*
+ * mov %rsp, %reg
+ *
+ * This is needed for the rare case where GCC
+ * does:
+ *
+ * mov %rsp, %rax
+ * ...
+ * mov %rax, %rsp
+ */
+ state->vals[op->dest.reg].base = CFI_CFA;
+ state->vals[op->dest.reg].offset = -state->stack_size;
}

else if (op->dest.reg == cfa->base) {
--
2.13.5

2017-09-19 18:45:49

by Josh Poimboeuf

[permalink] [raw]
Subject: [PATCH 2/2] x86/asm: Fix inline asm call constraints for clang

For inline asm statements which have a CALL instruction, we list the
stack pointer as a constraint to convince GCC to ensure the frame
pointer is set up first:

static inline void foo()
{
register void *__sp asm(_ASM_SP);
asm("call bar" : "+r" (__sp))
}

Unfortunately, that pattern causes clang to corrupt the stack pointer.

There's actually an easier way to achieve the same goal in GCC, without
causing trouble for clang. If we declare the stack pointer register
variable as a global variable, and remove the constraint altogether,
that convinces GCC to always set up the frame pointer before inserting
*any* inline asm.

It basically acts as if *every* inline asm statement has a CALL
instruction. It's a bit overkill, but the performance impact should be
negligible.

Here are the vmlinux .text size differences with the following configs
on GCC:

- defconfig
- defconfig without frame pointers
- Fedora distro config
- Fedora distro config without frame pointers

defconfig defconfig-nofp distro distro-nofp
before 9796300 9466764 9076191 8789745
after 9796941 9462859 9076381 8785325

With frame pointers, the text size increases slightly. Without frame
pointers, the text size decreases, and a little more significantly.

Reported-by: Matthias Kaehlcke <[email protected]>
Signed-off-by: Josh Poimboeuf <[email protected]>
---
arch/x86/include/asm/alternative.h | 3 +--
arch/x86/include/asm/asm.h | 9 ++++++++
arch/x86/include/asm/mshyperv.h | 27 ++++++++++--------------
arch/x86/include/asm/paravirt_types.h | 14 ++++++------
arch/x86/include/asm/preempt.h | 15 +++++--------
arch/x86/include/asm/processor.h | 6 ++----
arch/x86/include/asm/rwsem.h | 6 +++---
arch/x86/include/asm/uaccess.h | 5 ++---
arch/x86/include/asm/xen/hypercall.h | 5 ++---
arch/x86/kvm/emulate.c | 3 +--
arch/x86/kvm/vmx.c | 4 +---
arch/x86/mm/fault.c | 3 +--
tools/objtool/Documentation/stack-validation.txt | 20 +++++++++++++-----
13 files changed, 60 insertions(+), 60 deletions(-)

diff --git a/arch/x86/include/asm/alternative.h b/arch/x86/include/asm/alternative.h
index 1b020381ab38..7aeb1cef4204 100644
--- a/arch/x86/include/asm/alternative.h
+++ b/arch/x86/include/asm/alternative.h
@@ -218,10 +218,9 @@ static inline int alternatives_text_reserved(void *start, void *end)
#define alternative_call_2(oldfunc, newfunc1, feature1, newfunc2, feature2, \
output, input...) \
{ \
- register void *__sp asm(_ASM_SP); \
asm volatile (ALTERNATIVE_2("call %P[old]", "call %P[new1]", feature1,\
"call %P[new2]", feature2) \
- : output, "+r" (__sp) \
+ : output \
: [old] "i" (oldfunc), [new1] "i" (newfunc1), \
[new2] "i" (newfunc2), ## input); \
}
diff --git a/arch/x86/include/asm/asm.h b/arch/x86/include/asm/asm.h
index 676ee5807d86..ff8921d4615b 100644
--- a/arch/x86/include/asm/asm.h
+++ b/arch/x86/include/asm/asm.h
@@ -132,4 +132,13 @@
/* For C file, we already have NOKPROBE_SYMBOL macro */
#endif

+#ifndef __ASSEMBLY__
+/*
+ * This variable declaration has the side effect of forcing GCC to always set
+ * up the frame pointer before inserting any inline asm. This is necessary
+ * because some inline asm statements have CALL instructions.
+ */
+register unsigned int __sp_unused asm("esp");
+#endif
+
#endif /* _ASM_X86_ASM_H */
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index 63cc96f064dc..1c913ae27f99 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -179,16 +179,15 @@ static inline u64 hv_do_hypercall(u64 control, void *input, void *output)
u64 input_address = input ? virt_to_phys(input) : 0;
u64 output_address = output ? virt_to_phys(output) : 0;
u64 hv_status;
- register void *__sp asm(_ASM_SP);

#ifdef CONFIG_X86_64
if (!hv_hypercall_pg)
return U64_MAX;

- __asm__ __volatile__("mov %4, %%r8\n"
- "call *%5"
- : "=a" (hv_status), "+r" (__sp),
- "+c" (control), "+d" (input_address)
+ __asm__ __volatile__("mov %3, %%r8\n"
+ "call *%4"
+ : "=a" (hv_status), "+c" (control),
+ "+d" (input_address)
: "r" (output_address), "m" (hv_hypercall_pg)
: "cc", "memory", "r8", "r9", "r10", "r11");
#else
@@ -200,9 +199,8 @@ static inline u64 hv_do_hypercall(u64 control, void *input, void *output)
if (!hv_hypercall_pg)
return U64_MAX;

- __asm__ __volatile__("call *%7"
- : "=A" (hv_status),
- "+c" (input_address_lo), "+r" (__sp)
+ __asm__ __volatile__("call *%6"
+ : "=A" (hv_status), "+c" (input_address_lo)
: "A" (control),
"b" (input_address_hi),
"D"(output_address_hi), "S"(output_address_lo),
@@ -224,13 +222,12 @@ static inline u64 hv_do_hypercall(u64 control, void *input, void *output)
static inline u64 hv_do_fast_hypercall8(u16 code, u64 input1)
{
u64 hv_status, control = (u64)code | HV_HYPERCALL_FAST_BIT;
- register void *__sp asm(_ASM_SP);

#ifdef CONFIG_X86_64
{
- __asm__ __volatile__("call *%4"
- : "=a" (hv_status), "+r" (__sp),
- "+c" (control), "+d" (input1)
+ __asm__ __volatile__("call *%3"
+ : "=a" (hv_status), "+c" (control),
+ "+d" (input1)
: "m" (hv_hypercall_pg)
: "cc", "r8", "r9", "r10", "r11");
}
@@ -239,10 +236,8 @@ static inline u64 hv_do_fast_hypercall8(u16 code, u64 input1)
u32 input1_hi = upper_32_bits(input1);
u32 input1_lo = lower_32_bits(input1);

- __asm__ __volatile__ ("call *%5"
- : "=A"(hv_status),
- "+c"(input1_lo),
- "+r"(__sp)
+ __asm__ __volatile__ ("call *%4"
+ : "=A"(hv_status), "+c"(input1_lo)
: "A" (control),
"b" (input1_hi),
"m" (hv_hypercall_pg)
diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h
index 42873edd9f9d..b8966ed71ba5 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -459,8 +459,8 @@ int paravirt_disable_iospace(void);
*/
#ifdef CONFIG_X86_32
#define PVOP_VCALL_ARGS \
- unsigned long __eax = __eax, __edx = __edx, __ecx = __ecx; \
- register void *__sp asm("esp")
+ unsigned long __eax = __eax, __edx = __edx, __ecx = __ecx;
+
#define PVOP_CALL_ARGS PVOP_VCALL_ARGS

#define PVOP_CALL_ARG1(x) "a" ((unsigned long)(x))
@@ -480,8 +480,8 @@ int paravirt_disable_iospace(void);
/* [re]ax isn't an arg, but the return val */
#define PVOP_VCALL_ARGS \
unsigned long __edi = __edi, __esi = __esi, \
- __edx = __edx, __ecx = __ecx, __eax = __eax; \
- register void *__sp asm("rsp")
+ __edx = __edx, __ecx = __ecx, __eax = __eax;
+
#define PVOP_CALL_ARGS PVOP_VCALL_ARGS

#define PVOP_CALL_ARG1(x) "D" ((unsigned long)(x))
@@ -532,7 +532,7 @@ int paravirt_disable_iospace(void);
asm volatile(pre \
paravirt_alt(PARAVIRT_CALL) \
post \
- : call_clbr, "+r" (__sp) \
+ : call_clbr \
: paravirt_type(op), \
paravirt_clobber(clbr), \
##__VA_ARGS__ \
@@ -542,7 +542,7 @@ int paravirt_disable_iospace(void);
asm volatile(pre \
paravirt_alt(PARAVIRT_CALL) \
post \
- : call_clbr, "+r" (__sp) \
+ : call_clbr \
: paravirt_type(op), \
paravirt_clobber(clbr), \
##__VA_ARGS__ \
@@ -569,7 +569,7 @@ int paravirt_disable_iospace(void);
asm volatile(pre \
paravirt_alt(PARAVIRT_CALL) \
post \
- : call_clbr, "+r" (__sp) \
+ : call_clbr \
: paravirt_type(op), \
paravirt_clobber(clbr), \
##__VA_ARGS__ \
diff --git a/arch/x86/include/asm/preempt.h b/arch/x86/include/asm/preempt.h
index ec1f3c651150..f66b189a1142 100644
--- a/arch/x86/include/asm/preempt.h
+++ b/arch/x86/include/asm/preempt.h
@@ -100,19 +100,14 @@ static __always_inline bool should_resched(int preempt_offset)

#ifdef CONFIG_PREEMPT
extern asmlinkage void ___preempt_schedule(void);
-# define __preempt_schedule() \
-({ \
- register void *__sp asm(_ASM_SP); \
- asm volatile ("call ___preempt_schedule" : "+r"(__sp)); \
-})
+# define __preempt_schedule() \
+ asm volatile ("call ___preempt_schedule")

extern asmlinkage void preempt_schedule(void);
extern asmlinkage void ___preempt_schedule_notrace(void);
-# define __preempt_schedule_notrace() \
-({ \
- register void *__sp asm(_ASM_SP); \
- asm volatile ("call ___preempt_schedule_notrace" : "+r"(__sp)); \
-})
+# define __preempt_schedule_notrace() \
+ asm volatile ("call ___preempt_schedule_notrace")
+
extern asmlinkage void preempt_schedule_notrace(void);
#endif

diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 3fa26a61eabc..7a7969211527 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -677,8 +677,6 @@ static inline void sync_core(void)
* Like all of Linux's memory ordering operations, this is a
* compiler barrier as well.
*/
- register void *__sp asm(_ASM_SP);
-
#ifdef CONFIG_X86_32
asm volatile (
"pushfl\n\t"
@@ -686,7 +684,7 @@ static inline void sync_core(void)
"pushl $1f\n\t"
"iret\n\t"
"1:"
- : "+r" (__sp) : : "memory");
+ : : : "memory");
#else
unsigned int tmp;

@@ -703,7 +701,7 @@ static inline void sync_core(void)
"iretq\n\t"
UNWIND_HINT_RESTORE
"1:"
- : "=&r" (tmp), "+r" (__sp) : : "cc", "memory");
+ : "=&r" (tmp) : : "cc", "memory");
#endif
}

diff --git a/arch/x86/include/asm/rwsem.h b/arch/x86/include/asm/rwsem.h
index a34e0d4b957d..f8f127a2a300 100644
--- a/arch/x86/include/asm/rwsem.h
+++ b/arch/x86/include/asm/rwsem.h
@@ -103,10 +103,9 @@ static inline bool __down_read_trylock(struct rw_semaphore *sem)
({ \
long tmp; \
struct rw_semaphore* ret; \
- register void *__sp asm(_ASM_SP); \
\
asm volatile("# beginning down_write\n\t" \
- LOCK_PREFIX " xadd %1,(%4)\n\t" \
+ LOCK_PREFIX " xadd %1,(%3)\n\t" \
/* adds 0xffff0001, returns the old value */ \
" test " __ASM_SEL(%w1,%k1) "," __ASM_SEL(%w1,%k1) "\n\t" \
/* was the active mask 0 before? */\
@@ -114,7 +113,8 @@ static inline bool __down_read_trylock(struct rw_semaphore *sem)
" call " slow_path "\n" \
"1:\n" \
"# ending down_write" \
- : "+m" (sem->count), "=d" (tmp), "=a" (ret), "+r" (__sp) \
+ : "+m" (sem->count), "=d" (tmp), \
+ "=a" (ret) \
: "a" (sem), "1" (RWSEM_ACTIVE_WRITE_BIAS) \
: "memory", "cc"); \
ret; \
diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
index 184eb9894dae..113ddb8ce0d8 100644
--- a/arch/x86/include/asm/uaccess.h
+++ b/arch/x86/include/asm/uaccess.h
@@ -166,11 +166,10 @@ __typeof__(__builtin_choose_expr(sizeof(x) > sizeof(0UL), 0ULL, 0UL))
({ \
int __ret_gu; \
register __inttype(*(ptr)) __val_gu asm("%"_ASM_DX); \
- register void *__sp asm(_ASM_SP); \
__chk_user_ptr(ptr); \
might_fault(); \
- asm volatile("call __get_user_%P4" \
- : "=a" (__ret_gu), "=r" (__val_gu), "+r" (__sp) \
+ asm volatile("call __get_user_%P3" \
+ : "=a" (__ret_gu), "=r" (__val_gu) \
: "0" (ptr), "i" (sizeof(*(ptr)))); \
(x) = (__force __typeof__(*(ptr))) __val_gu; \
__builtin_expect(__ret_gu, 0); \
diff --git a/arch/x86/include/asm/xen/hypercall.h b/arch/x86/include/asm/xen/hypercall.h
index 9606688caa4b..6264dba01896 100644
--- a/arch/x86/include/asm/xen/hypercall.h
+++ b/arch/x86/include/asm/xen/hypercall.h
@@ -113,10 +113,9 @@ extern struct { char _entry[32]; } hypercall_page[];
register unsigned long __arg2 asm(__HYPERCALL_ARG2REG) = __arg2; \
register unsigned long __arg3 asm(__HYPERCALL_ARG3REG) = __arg3; \
register unsigned long __arg4 asm(__HYPERCALL_ARG4REG) = __arg4; \
- register unsigned long __arg5 asm(__HYPERCALL_ARG5REG) = __arg5; \
- register void *__sp asm(_ASM_SP);
+ register unsigned long __arg5 asm(__HYPERCALL_ARG5REG) = __arg5;

-#define __HYPERCALL_0PARAM "=r" (__res), "+r" (__sp)
+#define __HYPERCALL_0PARAM "=r" (__res)
#define __HYPERCALL_1PARAM __HYPERCALL_0PARAM, "+r" (__arg1)
#define __HYPERCALL_2PARAM __HYPERCALL_1PARAM, "+r" (__arg2)
#define __HYPERCALL_3PARAM __HYPERCALL_2PARAM, "+r" (__arg3)
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 16bf6655aa85..5e82e0821c1f 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -5296,7 +5296,6 @@ static void fetch_possible_mmx_operand(struct x86_emulate_ctxt *ctxt,

static int fastop(struct x86_emulate_ctxt *ctxt, void (*fop)(struct fastop *))
{
- register void *__sp asm(_ASM_SP);
ulong flags = (ctxt->eflags & EFLAGS_MASK) | X86_EFLAGS_IF;

if (!(ctxt->d & ByteOp))
@@ -5304,7 +5303,7 @@ static int fastop(struct x86_emulate_ctxt *ctxt, void (*fop)(struct fastop *))

asm("push %[flags]; popf; call *%[fastop]; pushf; pop %[flags]\n"
: "+a"(ctxt->dst.val), "+d"(ctxt->src.val), [flags]"+D"(flags),
- [fastop]"+S"(fop), "+r"(__sp)
+ [fastop]"+S"(fop)
: "c"(ctxt->src2.val));

ctxt->eflags = (ctxt->eflags & ~EFLAGS_MASK) | (flags & EFLAGS_MASK);
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 06c0c6d0541e..a693362f41af 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -9036,7 +9036,6 @@ static void vmx_complete_atomic_exit(struct vcpu_vmx *vmx)
static void vmx_handle_external_intr(struct kvm_vcpu *vcpu)
{
u32 exit_intr_info = vmcs_read32(VM_EXIT_INTR_INFO);
- register void *__sp asm(_ASM_SP);

if ((exit_intr_info & (INTR_INFO_VALID_MASK | INTR_INFO_INTR_TYPE_MASK))
== (INTR_INFO_VALID_MASK | INTR_TYPE_EXT_INTR)) {
@@ -9063,9 +9062,8 @@ static void vmx_handle_external_intr(struct kvm_vcpu *vcpu)
"call *%[entry]\n\t"
:
#ifdef CONFIG_X86_64
- [sp]"=&r"(tmp),
+ [sp]"=&r"(tmp)
#endif
- "+r"(__sp)
:
[entry]"r"(entry),
[ss]"i"(__KERNEL_DS),
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index b836a7274e12..4457e41378e4 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -806,7 +806,6 @@ no_context(struct pt_regs *regs, unsigned long error_code,
if (is_vmalloc_addr((void *)address) &&
(((unsigned long)tsk->stack - 1 - address < PAGE_SIZE) ||
address - ((unsigned long)tsk->stack + THREAD_SIZE) < PAGE_SIZE)) {
- register void *__sp asm("rsp");
unsigned long stack = this_cpu_read(orig_ist.ist[DOUBLEFAULT_STACK]) - sizeof(void *);
/*
* We're likely to be running with very little stack space
@@ -821,7 +820,7 @@ no_context(struct pt_regs *regs, unsigned long error_code,
asm volatile ("movq %[stack], %%rsp\n\t"
"call handle_stack_overflow\n\t"
"1: jmp 1b"
- : "+r" (__sp)
+ :
: "D" ("kernel stack overflow (page fault)"),
"S" (regs), "d" (address),
[stack] "rm" (stack));
diff --git a/tools/objtool/Documentation/stack-validation.txt b/tools/objtool/Documentation/stack-validation.txt
index 6a1af43862df..18af4aa7488d 100644
--- a/tools/objtool/Documentation/stack-validation.txt
+++ b/tools/objtool/Documentation/stack-validation.txt
@@ -192,12 +192,22 @@ they mean, and suggestions for how to fix them.
use the manual unwind hint macros in asm/unwind_hints.h.

If it's a GCC-compiled .c file, the error may be because the function
- uses an inline asm() statement which has a "call" instruction. An
- asm() statement with a call instruction must declare the use of the
- stack pointer in its output operand. For example, on x86_64:
+ uses an inline asm() statement which has a "call" instruction. To
+ fix this, either:

- register void *__sp asm("rsp");
- asm volatile("call func" : "+r" (__sp));
+ a) list the stack pointer as an input/output constraint;
+
+ register void *__sp asm("rsp");
+ asm volatile("call func" : "+r" (__sp));
+
+ or
+
+ b) declare a global register variable for the stack pointer.
+
+ arch/x86/include/asm/asm.h:register unsigned int __sp_unused asm("esp");
+
+ This is the strategy used on x86_64. It ensures that GCC always
+ sets up the frame pointer before inserting *any* inline asm.

Otherwise the stack frame may not get created before the call.

--
2.13.5

2017-09-19 21:55:31

by Alexander Potapenko

[permalink] [raw]
Subject: Re: [PATCH 2/2] x86/asm: Fix inline asm call constraints for clang

On Tue, Sep 19, 2017 at 11:45 AM, Josh Poimboeuf <[email protected]> wrote:
> For inline asm statements which have a CALL instruction, we list the
> stack pointer as a constraint to convince GCC to ensure the frame
> pointer is set up first:
>
> static inline void foo()
> {
> register void *__sp asm(_ASM_SP);
> asm("call bar" : "+r" (__sp))
> }
>
> Unfortunately, that pattern causes clang to corrupt the stack pointer.
>
> There's actually an easier way to achieve the same goal in GCC, without
> causing trouble for clang. If we declare the stack pointer register
> variable as a global variable, and remove the constraint altogether,
> that convinces GCC to always set up the frame pointer before inserting
> *any* inline asm.
>
> It basically acts as if *every* inline asm statement has a CALL
> instruction. It's a bit overkill, but the performance impact should be
> negligible.
>
> Here are the vmlinux .text size differences with the following configs
> on GCC:
>
> - defconfig
> - defconfig without frame pointers
> - Fedora distro config
> - Fedora distro config without frame pointers
>
> defconfig defconfig-nofp distro distro-nofp
> before 9796300 9466764 9076191 8789745
> after 9796941 9462859 9076381 8785325
>
> With frame pointers, the text size increases slightly. Without frame
> pointers, the text size decreases, and a little more significantly.
>
> Reported-by: Matthias Kaehlcke <[email protected]>
> Signed-off-by: Josh Poimboeuf <[email protected]>
> ---
> arch/x86/include/asm/alternative.h | 3 +--
> arch/x86/include/asm/asm.h | 9 ++++++++
> arch/x86/include/asm/mshyperv.h | 27 ++++++++++--------------
> arch/x86/include/asm/paravirt_types.h | 14 ++++++------
> arch/x86/include/asm/preempt.h | 15 +++++--------
> arch/x86/include/asm/processor.h | 6 ++----
> arch/x86/include/asm/rwsem.h | 6 +++---
> arch/x86/include/asm/uaccess.h | 5 ++---
> arch/x86/include/asm/xen/hypercall.h | 5 ++---
> arch/x86/kvm/emulate.c | 3 +--
> arch/x86/kvm/vmx.c | 4 +---
> arch/x86/mm/fault.c | 3 +--
> tools/objtool/Documentation/stack-validation.txt | 20 +++++++++++++-----
> 13 files changed, 60 insertions(+), 60 deletions(-)
>
> diff --git a/arch/x86/include/asm/alternative.h b/arch/x86/include/asm/alternative.h
> index 1b020381ab38..7aeb1cef4204 100644
> --- a/arch/x86/include/asm/alternative.h
> +++ b/arch/x86/include/asm/alternative.h
> @@ -218,10 +218,9 @@ static inline int alternatives_text_reserved(void *start, void *end)
> #define alternative_call_2(oldfunc, newfunc1, feature1, newfunc2, feature2, \
> output, input...) \
> { \
> - register void *__sp asm(_ASM_SP); \
> asm volatile (ALTERNATIVE_2("call %P[old]", "call %P[new1]", feature1,\
> "call %P[new2]", feature2) \
> - : output, "+r" (__sp) \
> + : output \
> : [old] "i" (oldfunc), [new1] "i" (newfunc1), \
> [new2] "i" (newfunc2), ## input); \
> }
> diff --git a/arch/x86/include/asm/asm.h b/arch/x86/include/asm/asm.h
> index 676ee5807d86..ff8921d4615b 100644
> --- a/arch/x86/include/asm/asm.h
> +++ b/arch/x86/include/asm/asm.h
> @@ -132,4 +132,13 @@
> /* For C file, we already have NOKPROBE_SYMBOL macro */
> #endif
>
> +#ifndef __ASSEMBLY__
> +/*
> + * This variable declaration has the side effect of forcing GCC to always set
> + * up the frame pointer before inserting any inline asm. This is necessary
> + * because some inline asm statements have CALL instructions.
> + */
> +register unsigned int __sp_unused asm("esp");
Shouldn't this be "asm(_ASM_SP)"?
> +#endif
> +
> #endif /* _ASM_X86_ASM_H */
> diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
> index 63cc96f064dc..1c913ae27f99 100644
> --- a/arch/x86/include/asm/mshyperv.h
> +++ b/arch/x86/include/asm/mshyperv.h
> @@ -179,16 +179,15 @@ static inline u64 hv_do_hypercall(u64 control, void *input, void *output)
> u64 input_address = input ? virt_to_phys(input) : 0;
> u64 output_address = output ? virt_to_phys(output) : 0;
> u64 hv_status;
> - register void *__sp asm(_ASM_SP);
>
> #ifdef CONFIG_X86_64
> if (!hv_hypercall_pg)
> return U64_MAX;
>
> - __asm__ __volatile__("mov %4, %%r8\n"
> - "call *%5"
> - : "=a" (hv_status), "+r" (__sp),
> - "+c" (control), "+d" (input_address)
> + __asm__ __volatile__("mov %3, %%r8\n"
> + "call *%4"
> + : "=a" (hv_status), "+c" (control),
> + "+d" (input_address)
> : "r" (output_address), "m" (hv_hypercall_pg)
> : "cc", "memory", "r8", "r9", "r10", "r11");
> #else
> @@ -200,9 +199,8 @@ static inline u64 hv_do_hypercall(u64 control, void *input, void *output)
> if (!hv_hypercall_pg)
> return U64_MAX;
>
> - __asm__ __volatile__("call *%7"
> - : "=A" (hv_status),
> - "+c" (input_address_lo), "+r" (__sp)
> + __asm__ __volatile__("call *%6"
> + : "=A" (hv_status), "+c" (input_address_lo)
> : "A" (control),
> "b" (input_address_hi),
> "D"(output_address_hi), "S"(output_address_lo),
> @@ -224,13 +222,12 @@ static inline u64 hv_do_hypercall(u64 control, void *input, void *output)
> static inline u64 hv_do_fast_hypercall8(u16 code, u64 input1)
> {
> u64 hv_status, control = (u64)code | HV_HYPERCALL_FAST_BIT;
> - register void *__sp asm(_ASM_SP);
>
> #ifdef CONFIG_X86_64
> {
> - __asm__ __volatile__("call *%4"
> - : "=a" (hv_status), "+r" (__sp),
> - "+c" (control), "+d" (input1)
> + __asm__ __volatile__("call *%3"
> + : "=a" (hv_status), "+c" (control),
> + "+d" (input1)
> : "m" (hv_hypercall_pg)
> : "cc", "r8", "r9", "r10", "r11");
> }
> @@ -239,10 +236,8 @@ static inline u64 hv_do_fast_hypercall8(u16 code, u64 input1)
> u32 input1_hi = upper_32_bits(input1);
> u32 input1_lo = lower_32_bits(input1);
>
> - __asm__ __volatile__ ("call *%5"
> - : "=A"(hv_status),
> - "+c"(input1_lo),
> - "+r"(__sp)
> + __asm__ __volatile__ ("call *%4"
> + : "=A"(hv_status), "+c"(input1_lo)
> : "A" (control),
> "b" (input1_hi),
> "m" (hv_hypercall_pg)
> diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h
> index 42873edd9f9d..b8966ed71ba5 100644
> --- a/arch/x86/include/asm/paravirt_types.h
> +++ b/arch/x86/include/asm/paravirt_types.h
> @@ -459,8 +459,8 @@ int paravirt_disable_iospace(void);
> */
> #ifdef CONFIG_X86_32
> #define PVOP_VCALL_ARGS \
> - unsigned long __eax = __eax, __edx = __edx, __ecx = __ecx; \
> - register void *__sp asm("esp")
> + unsigned long __eax = __eax, __edx = __edx, __ecx = __ecx;
> +
> #define PVOP_CALL_ARGS PVOP_VCALL_ARGS
>
> #define PVOP_CALL_ARG1(x) "a" ((unsigned long)(x))
> @@ -480,8 +480,8 @@ int paravirt_disable_iospace(void);
> /* [re]ax isn't an arg, but the return val */
> #define PVOP_VCALL_ARGS \
> unsigned long __edi = __edi, __esi = __esi, \
> - __edx = __edx, __ecx = __ecx, __eax = __eax; \
> - register void *__sp asm("rsp")
> + __edx = __edx, __ecx = __ecx, __eax = __eax;
> +
> #define PVOP_CALL_ARGS PVOP_VCALL_ARGS
>
> #define PVOP_CALL_ARG1(x) "D" ((unsigned long)(x))
> @@ -532,7 +532,7 @@ int paravirt_disable_iospace(void);
> asm volatile(pre \
> paravirt_alt(PARAVIRT_CALL) \
> post \
> - : call_clbr, "+r" (__sp) \
> + : call_clbr \
> : paravirt_type(op), \
> paravirt_clobber(clbr), \
> ##__VA_ARGS__ \
> @@ -542,7 +542,7 @@ int paravirt_disable_iospace(void);
> asm volatile(pre \
> paravirt_alt(PARAVIRT_CALL) \
> post \
> - : call_clbr, "+r" (__sp) \
> + : call_clbr \
> : paravirt_type(op), \
> paravirt_clobber(clbr), \
> ##__VA_ARGS__ \
> @@ -569,7 +569,7 @@ int paravirt_disable_iospace(void);
> asm volatile(pre \
> paravirt_alt(PARAVIRT_CALL) \
> post \
> - : call_clbr, "+r" (__sp) \
> + : call_clbr \
> : paravirt_type(op), \
> paravirt_clobber(clbr), \
> ##__VA_ARGS__ \
> diff --git a/arch/x86/include/asm/preempt.h b/arch/x86/include/asm/preempt.h
> index ec1f3c651150..f66b189a1142 100644
> --- a/arch/x86/include/asm/preempt.h
> +++ b/arch/x86/include/asm/preempt.h
> @@ -100,19 +100,14 @@ static __always_inline bool should_resched(int preempt_offset)
>
> #ifdef CONFIG_PREEMPT
> extern asmlinkage void ___preempt_schedule(void);
> -# define __preempt_schedule() \
> -({ \
> - register void *__sp asm(_ASM_SP); \
> - asm volatile ("call ___preempt_schedule" : "+r"(__sp)); \
> -})
> +# define __preempt_schedule() \
> + asm volatile ("call ___preempt_schedule")
>
> extern asmlinkage void preempt_schedule(void);
> extern asmlinkage void ___preempt_schedule_notrace(void);
> -# define __preempt_schedule_notrace() \
> -({ \
> - register void *__sp asm(_ASM_SP); \
> - asm volatile ("call ___preempt_schedule_notrace" : "+r"(__sp)); \
> -})
> +# define __preempt_schedule_notrace() \
> + asm volatile ("call ___preempt_schedule_notrace")
> +
> extern asmlinkage void preempt_schedule_notrace(void);
> #endif
>
> diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
> index 3fa26a61eabc..7a7969211527 100644
> --- a/arch/x86/include/asm/processor.h
> +++ b/arch/x86/include/asm/processor.h
> @@ -677,8 +677,6 @@ static inline void sync_core(void)
> * Like all of Linux's memory ordering operations, this is a
> * compiler barrier as well.
> */
> - register void *__sp asm(_ASM_SP);
> -
> #ifdef CONFIG_X86_32
> asm volatile (
> "pushfl\n\t"
> @@ -686,7 +684,7 @@ static inline void sync_core(void)
> "pushl $1f\n\t"
> "iret\n\t"
> "1:"
> - : "+r" (__sp) : : "memory");
> + : : : "memory");
> #else
> unsigned int tmp;
>
> @@ -703,7 +701,7 @@ static inline void sync_core(void)
> "iretq\n\t"
> UNWIND_HINT_RESTORE
> "1:"
> - : "=&r" (tmp), "+r" (__sp) : : "cc", "memory");
> + : "=&r" (tmp) : : "cc", "memory");
> #endif
> }
>
> diff --git a/arch/x86/include/asm/rwsem.h b/arch/x86/include/asm/rwsem.h
> index a34e0d4b957d..f8f127a2a300 100644
> --- a/arch/x86/include/asm/rwsem.h
> +++ b/arch/x86/include/asm/rwsem.h
> @@ -103,10 +103,9 @@ static inline bool __down_read_trylock(struct rw_semaphore *sem)
> ({ \
> long tmp; \
> struct rw_semaphore* ret; \
> - register void *__sp asm(_ASM_SP); \
> \
> asm volatile("# beginning down_write\n\t" \
> - LOCK_PREFIX " xadd %1,(%4)\n\t" \
> + LOCK_PREFIX " xadd %1,(%3)\n\t" \
> /* adds 0xffff0001, returns the old value */ \
> " test " __ASM_SEL(%w1,%k1) "," __ASM_SEL(%w1,%k1) "\n\t" \
> /* was the active mask 0 before? */\
> @@ -114,7 +113,8 @@ static inline bool __down_read_trylock(struct rw_semaphore *sem)
> " call " slow_path "\n" \
> "1:\n" \
> "# ending down_write" \
> - : "+m" (sem->count), "=d" (tmp), "=a" (ret), "+r" (__sp) \
> + : "+m" (sem->count), "=d" (tmp), \
> + "=a" (ret) \
> : "a" (sem), "1" (RWSEM_ACTIVE_WRITE_BIAS) \
> : "memory", "cc"); \
> ret; \
> diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
> index 184eb9894dae..113ddb8ce0d8 100644
> --- a/arch/x86/include/asm/uaccess.h
> +++ b/arch/x86/include/asm/uaccess.h
> @@ -166,11 +166,10 @@ __typeof__(__builtin_choose_expr(sizeof(x) > sizeof(0UL), 0ULL, 0UL))
> ({ \
> int __ret_gu; \
> register __inttype(*(ptr)) __val_gu asm("%"_ASM_DX); \
> - register void *__sp asm(_ASM_SP); \
> __chk_user_ptr(ptr); \
> might_fault(); \
> - asm volatile("call __get_user_%P4" \
> - : "=a" (__ret_gu), "=r" (__val_gu), "+r" (__sp) \
> + asm volatile("call __get_user_%P3" \
> + : "=a" (__ret_gu), "=r" (__val_gu) \
> : "0" (ptr), "i" (sizeof(*(ptr)))); \
> (x) = (__force __typeof__(*(ptr))) __val_gu; \
> __builtin_expect(__ret_gu, 0); \
> diff --git a/arch/x86/include/asm/xen/hypercall.h b/arch/x86/include/asm/xen/hypercall.h
> index 9606688caa4b..6264dba01896 100644
> --- a/arch/x86/include/asm/xen/hypercall.h
> +++ b/arch/x86/include/asm/xen/hypercall.h
> @@ -113,10 +113,9 @@ extern struct { char _entry[32]; } hypercall_page[];
> register unsigned long __arg2 asm(__HYPERCALL_ARG2REG) = __arg2; \
> register unsigned long __arg3 asm(__HYPERCALL_ARG3REG) = __arg3; \
> register unsigned long __arg4 asm(__HYPERCALL_ARG4REG) = __arg4; \
> - register unsigned long __arg5 asm(__HYPERCALL_ARG5REG) = __arg5; \
> - register void *__sp asm(_ASM_SP);
> + register unsigned long __arg5 asm(__HYPERCALL_ARG5REG) = __arg5;
>
> -#define __HYPERCALL_0PARAM "=r" (__res), "+r" (__sp)
> +#define __HYPERCALL_0PARAM "=r" (__res)
> #define __HYPERCALL_1PARAM __HYPERCALL_0PARAM, "+r" (__arg1)
> #define __HYPERCALL_2PARAM __HYPERCALL_1PARAM, "+r" (__arg2)
> #define __HYPERCALL_3PARAM __HYPERCALL_2PARAM, "+r" (__arg3)
> diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
> index 16bf6655aa85..5e82e0821c1f 100644
> --- a/arch/x86/kvm/emulate.c
> +++ b/arch/x86/kvm/emulate.c
> @@ -5296,7 +5296,6 @@ static void fetch_possible_mmx_operand(struct x86_emulate_ctxt *ctxt,
>
> static int fastop(struct x86_emulate_ctxt *ctxt, void (*fop)(struct fastop *))
> {
> - register void *__sp asm(_ASM_SP);
> ulong flags = (ctxt->eflags & EFLAGS_MASK) | X86_EFLAGS_IF;
>
> if (!(ctxt->d & ByteOp))
> @@ -5304,7 +5303,7 @@ static int fastop(struct x86_emulate_ctxt *ctxt, void (*fop)(struct fastop *))
>
> asm("push %[flags]; popf; call *%[fastop]; pushf; pop %[flags]\n"
> : "+a"(ctxt->dst.val), "+d"(ctxt->src.val), [flags]"+D"(flags),
> - [fastop]"+S"(fop), "+r"(__sp)
> + [fastop]"+S"(fop)
> : "c"(ctxt->src2.val));
>
> ctxt->eflags = (ctxt->eflags & ~EFLAGS_MASK) | (flags & EFLAGS_MASK);
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 06c0c6d0541e..a693362f41af 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -9036,7 +9036,6 @@ static void vmx_complete_atomic_exit(struct vcpu_vmx *vmx)
> static void vmx_handle_external_intr(struct kvm_vcpu *vcpu)
> {
> u32 exit_intr_info = vmcs_read32(VM_EXIT_INTR_INFO);
> - register void *__sp asm(_ASM_SP);
>
> if ((exit_intr_info & (INTR_INFO_VALID_MASK | INTR_INFO_INTR_TYPE_MASK))
> == (INTR_INFO_VALID_MASK | INTR_TYPE_EXT_INTR)) {
> @@ -9063,9 +9062,8 @@ static void vmx_handle_external_intr(struct kvm_vcpu *vcpu)
> "call *%[entry]\n\t"
> :
> #ifdef CONFIG_X86_64
> - [sp]"=&r"(tmp),
> + [sp]"=&r"(tmp)
> #endif
> - "+r"(__sp)
> :
> [entry]"r"(entry),
> [ss]"i"(__KERNEL_DS),
> diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
> index b836a7274e12..4457e41378e4 100644
> --- a/arch/x86/mm/fault.c
> +++ b/arch/x86/mm/fault.c
> @@ -806,7 +806,6 @@ no_context(struct pt_regs *regs, unsigned long error_code,
> if (is_vmalloc_addr((void *)address) &&
> (((unsigned long)tsk->stack - 1 - address < PAGE_SIZE) ||
> address - ((unsigned long)tsk->stack + THREAD_SIZE) < PAGE_SIZE)) {
> - register void *__sp asm("rsp");
> unsigned long stack = this_cpu_read(orig_ist.ist[DOUBLEFAULT_STACK]) - sizeof(void *);
> /*
> * We're likely to be running with very little stack space
> @@ -821,7 +820,7 @@ no_context(struct pt_regs *regs, unsigned long error_code,
> asm volatile ("movq %[stack], %%rsp\n\t"
> "call handle_stack_overflow\n\t"
> "1: jmp 1b"
> - : "+r" (__sp)
> + :
> : "D" ("kernel stack overflow (page fault)"),
> "S" (regs), "d" (address),
> [stack] "rm" (stack));
> diff --git a/tools/objtool/Documentation/stack-validation.txt b/tools/objtool/Documentation/stack-validation.txt
> index 6a1af43862df..18af4aa7488d 100644
> --- a/tools/objtool/Documentation/stack-validation.txt
> +++ b/tools/objtool/Documentation/stack-validation.txt
> @@ -192,12 +192,22 @@ they mean, and suggestions for how to fix them.
> use the manual unwind hint macros in asm/unwind_hints.h.
>
> If it's a GCC-compiled .c file, the error may be because the function
> - uses an inline asm() statement which has a "call" instruction. An
> - asm() statement with a call instruction must declare the use of the
> - stack pointer in its output operand. For example, on x86_64:
> + uses an inline asm() statement which has a "call" instruction. To
> + fix this, either:
>
> - register void *__sp asm("rsp");
> - asm volatile("call func" : "+r" (__sp));
> + a) list the stack pointer as an input/output constraint;
> +
> + register void *__sp asm("rsp");
> + asm volatile("call func" : "+r" (__sp));
> +
> + or
> +
> + b) declare a global register variable for the stack pointer.
> +
> + arch/x86/include/asm/asm.h:register unsigned int __sp_unused asm("esp");
> +
> + This is the strategy used on x86_64. It ensures that GCC always
> + sets up the frame pointer before inserting *any* inline asm.
>
> Otherwise the stack frame may not get created before the call.
>
> --
> 2.13.5
>



--
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg

2017-09-19 22:26:03

by Alexander Potapenko

[permalink] [raw]
Subject: Re: [PATCH 2/2] x86/asm: Fix inline asm call constraints for clang

On Tue, Sep 19, 2017 at 2:55 PM, Alexander Potapenko <[email protected]> wrote:
> On Tue, Sep 19, 2017 at 11:45 AM, Josh Poimboeuf <[email protected]> wrote:
>> For inline asm statements which have a CALL instruction, we list the
>> stack pointer as a constraint to convince GCC to ensure the frame
>> pointer is set up first:
>>
>> static inline void foo()
>> {
>> register void *__sp asm(_ASM_SP);
>> asm("call bar" : "+r" (__sp))
>> }
>>
>> Unfortunately, that pattern causes clang to corrupt the stack pointer.
>>
>> There's actually an easier way to achieve the same goal in GCC, without
>> causing trouble for clang. If we declare the stack pointer register
>> variable as a global variable, and remove the constraint altogether,
>> that convinces GCC to always set up the frame pointer before inserting
>> *any* inline asm.
>>
>> It basically acts as if *every* inline asm statement has a CALL
>> instruction. It's a bit overkill, but the performance impact should be
>> negligible.
>>
>> Here are the vmlinux .text size differences with the following configs
>> on GCC:
>>
>> - defconfig
>> - defconfig without frame pointers
>> - Fedora distro config
>> - Fedora distro config without frame pointers
>>
>> defconfig defconfig-nofp distro distro-nofp
>> before 9796300 9466764 9076191 8789745
>> after 9796941 9462859 9076381 8785325
>>
>> With frame pointers, the text size increases slightly. Without frame
>> pointers, the text size decreases, and a little more significantly.
>>
>> Reported-by: Matthias Kaehlcke <[email protected]>
>> Signed-off-by: Josh Poimboeuf <[email protected]>
>> ---
>> arch/x86/include/asm/alternative.h | 3 +--
>> arch/x86/include/asm/asm.h | 9 ++++++++
>> arch/x86/include/asm/mshyperv.h | 27 ++++++++++--------------
>> arch/x86/include/asm/paravirt_types.h | 14 ++++++------
>> arch/x86/include/asm/preempt.h | 15 +++++--------
>> arch/x86/include/asm/processor.h | 6 ++----
>> arch/x86/include/asm/rwsem.h | 6 +++---
>> arch/x86/include/asm/uaccess.h | 5 ++---
>> arch/x86/include/asm/xen/hypercall.h | 5 ++---
>> arch/x86/kvm/emulate.c | 3 +--
>> arch/x86/kvm/vmx.c | 4 +---
>> arch/x86/mm/fault.c | 3 +--
>> tools/objtool/Documentation/stack-validation.txt | 20 +++++++++++++-----
>> 13 files changed, 60 insertions(+), 60 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/alternative.h b/arch/x86/include/asm/alternative.h
>> index 1b020381ab38..7aeb1cef4204 100644
>> --- a/arch/x86/include/asm/alternative.h
>> +++ b/arch/x86/include/asm/alternative.h
>> @@ -218,10 +218,9 @@ static inline int alternatives_text_reserved(void *start, void *end)
>> #define alternative_call_2(oldfunc, newfunc1, feature1, newfunc2, feature2, \
>> output, input...) \
>> { \
>> - register void *__sp asm(_ASM_SP); \
>> asm volatile (ALTERNATIVE_2("call %P[old]", "call %P[new1]", feature1,\
>> "call %P[new2]", feature2) \
>> - : output, "+r" (__sp) \
>> + : output \
>> : [old] "i" (oldfunc), [new1] "i" (newfunc1), \
>> [new2] "i" (newfunc2), ## input); \
>> }
>> diff --git a/arch/x86/include/asm/asm.h b/arch/x86/include/asm/asm.h
>> index 676ee5807d86..ff8921d4615b 100644
>> --- a/arch/x86/include/asm/asm.h
>> +++ b/arch/x86/include/asm/asm.h
>> @@ -132,4 +132,13 @@
>> /* For C file, we already have NOKPROBE_SYMBOL macro */
>> #endif
>>
>> +#ifndef __ASSEMBLY__
>> +/*
>> + * This variable declaration has the side effect of forcing GCC to always set
>> + * up the frame pointer before inserting any inline asm. This is necessary
>> + * because some inline asm statements have CALL instructions.
>> + */
>> +register unsigned int __sp_unused asm("esp");
> Shouldn't this be "asm(_ASM_SP)"?
Answering my own question, this can't be _ASM_SP because of the
realmode code, which is built with -m16 whereas _ASM_SP is "rsp".
The patch works fine with "esp" though - most certainly declaring a
ESP variable is enough to reserve the whole RSP in an x86_64 build.
>> +#endif
>> +
>> #endif /* _ASM_X86_ASM_H */
>> diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
>> index 63cc96f064dc..1c913ae27f99 100644
>> --- a/arch/x86/include/asm/mshyperv.h
>> +++ b/arch/x86/include/asm/mshyperv.h
>> @@ -179,16 +179,15 @@ static inline u64 hv_do_hypercall(u64 control, void *input, void *output)
>> u64 input_address = input ? virt_to_phys(input) : 0;
>> u64 output_address = output ? virt_to_phys(output) : 0;
>> u64 hv_status;
>> - register void *__sp asm(_ASM_SP);
>>
>> #ifdef CONFIG_X86_64
>> if (!hv_hypercall_pg)
>> return U64_MAX;
>>
>> - __asm__ __volatile__("mov %4, %%r8\n"
>> - "call *%5"
>> - : "=a" (hv_status), "+r" (__sp),
>> - "+c" (control), "+d" (input_address)
>> + __asm__ __volatile__("mov %3, %%r8\n"
>> + "call *%4"
>> + : "=a" (hv_status), "+c" (control),
>> + "+d" (input_address)
>> : "r" (output_address), "m" (hv_hypercall_pg)
>> : "cc", "memory", "r8", "r9", "r10", "r11");
>> #else
>> @@ -200,9 +199,8 @@ static inline u64 hv_do_hypercall(u64 control, void *input, void *output)
>> if (!hv_hypercall_pg)
>> return U64_MAX;
>>
>> - __asm__ __volatile__("call *%7"
>> - : "=A" (hv_status),
>> - "+c" (input_address_lo), "+r" (__sp)
>> + __asm__ __volatile__("call *%6"
>> + : "=A" (hv_status), "+c" (input_address_lo)
>> : "A" (control),
>> "b" (input_address_hi),
>> "D"(output_address_hi), "S"(output_address_lo),
>> @@ -224,13 +222,12 @@ static inline u64 hv_do_hypercall(u64 control, void *input, void *output)
>> static inline u64 hv_do_fast_hypercall8(u16 code, u64 input1)
>> {
>> u64 hv_status, control = (u64)code | HV_HYPERCALL_FAST_BIT;
>> - register void *__sp asm(_ASM_SP);
>>
>> #ifdef CONFIG_X86_64
>> {
>> - __asm__ __volatile__("call *%4"
>> - : "=a" (hv_status), "+r" (__sp),
>> - "+c" (control), "+d" (input1)
>> + __asm__ __volatile__("call *%3"
>> + : "=a" (hv_status), "+c" (control),
>> + "+d" (input1)
>> : "m" (hv_hypercall_pg)
>> : "cc", "r8", "r9", "r10", "r11");
>> }
>> @@ -239,10 +236,8 @@ static inline u64 hv_do_fast_hypercall8(u16 code, u64 input1)
>> u32 input1_hi = upper_32_bits(input1);
>> u32 input1_lo = lower_32_bits(input1);
>>
>> - __asm__ __volatile__ ("call *%5"
>> - : "=A"(hv_status),
>> - "+c"(input1_lo),
>> - "+r"(__sp)
>> + __asm__ __volatile__ ("call *%4"
>> + : "=A"(hv_status), "+c"(input1_lo)
>> : "A" (control),
>> "b" (input1_hi),
>> "m" (hv_hypercall_pg)
>> diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h
>> index 42873edd9f9d..b8966ed71ba5 100644
>> --- a/arch/x86/include/asm/paravirt_types.h
>> +++ b/arch/x86/include/asm/paravirt_types.h
>> @@ -459,8 +459,8 @@ int paravirt_disable_iospace(void);
>> */
>> #ifdef CONFIG_X86_32
>> #define PVOP_VCALL_ARGS \
>> - unsigned long __eax = __eax, __edx = __edx, __ecx = __ecx; \
>> - register void *__sp asm("esp")
>> + unsigned long __eax = __eax, __edx = __edx, __ecx = __ecx;
>> +
>> #define PVOP_CALL_ARGS PVOP_VCALL_ARGS
>>
>> #define PVOP_CALL_ARG1(x) "a" ((unsigned long)(x))
>> @@ -480,8 +480,8 @@ int paravirt_disable_iospace(void);
>> /* [re]ax isn't an arg, but the return val */
>> #define PVOP_VCALL_ARGS \
>> unsigned long __edi = __edi, __esi = __esi, \
>> - __edx = __edx, __ecx = __ecx, __eax = __eax; \
>> - register void *__sp asm("rsp")
>> + __edx = __edx, __ecx = __ecx, __eax = __eax;
>> +
>> #define PVOP_CALL_ARGS PVOP_VCALL_ARGS
>>
>> #define PVOP_CALL_ARG1(x) "D" ((unsigned long)(x))
>> @@ -532,7 +532,7 @@ int paravirt_disable_iospace(void);
>> asm volatile(pre \
>> paravirt_alt(PARAVIRT_CALL) \
>> post \
>> - : call_clbr, "+r" (__sp) \
>> + : call_clbr \
>> : paravirt_type(op), \
>> paravirt_clobber(clbr), \
>> ##__VA_ARGS__ \
>> @@ -542,7 +542,7 @@ int paravirt_disable_iospace(void);
>> asm volatile(pre \
>> paravirt_alt(PARAVIRT_CALL) \
>> post \
>> - : call_clbr, "+r" (__sp) \
>> + : call_clbr \
>> : paravirt_type(op), \
>> paravirt_clobber(clbr), \
>> ##__VA_ARGS__ \
>> @@ -569,7 +569,7 @@ int paravirt_disable_iospace(void);
>> asm volatile(pre \
>> paravirt_alt(PARAVIRT_CALL) \
>> post \
>> - : call_clbr, "+r" (__sp) \
>> + : call_clbr \
>> : paravirt_type(op), \
>> paravirt_clobber(clbr), \
>> ##__VA_ARGS__ \
>> diff --git a/arch/x86/include/asm/preempt.h b/arch/x86/include/asm/preempt.h
>> index ec1f3c651150..f66b189a1142 100644
>> --- a/arch/x86/include/asm/preempt.h
>> +++ b/arch/x86/include/asm/preempt.h
>> @@ -100,19 +100,14 @@ static __always_inline bool should_resched(int preempt_offset)
>>
>> #ifdef CONFIG_PREEMPT
>> extern asmlinkage void ___preempt_schedule(void);
>> -# define __preempt_schedule() \
>> -({ \
>> - register void *__sp asm(_ASM_SP); \
>> - asm volatile ("call ___preempt_schedule" : "+r"(__sp)); \
>> -})
>> +# define __preempt_schedule() \
>> + asm volatile ("call ___preempt_schedule")
>>
>> extern asmlinkage void preempt_schedule(void);
>> extern asmlinkage void ___preempt_schedule_notrace(void);
>> -# define __preempt_schedule_notrace() \
>> -({ \
>> - register void *__sp asm(_ASM_SP); \
>> - asm volatile ("call ___preempt_schedule_notrace" : "+r"(__sp)); \
>> -})
>> +# define __preempt_schedule_notrace() \
>> + asm volatile ("call ___preempt_schedule_notrace")
>> +
>> extern asmlinkage void preempt_schedule_notrace(void);
>> #endif
>>
>> diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
>> index 3fa26a61eabc..7a7969211527 100644
>> --- a/arch/x86/include/asm/processor.h
>> +++ b/arch/x86/include/asm/processor.h
>> @@ -677,8 +677,6 @@ static inline void sync_core(void)
>> * Like all of Linux's memory ordering operations, this is a
>> * compiler barrier as well.
>> */
>> - register void *__sp asm(_ASM_SP);
>> -
>> #ifdef CONFIG_X86_32
>> asm volatile (
>> "pushfl\n\t"
>> @@ -686,7 +684,7 @@ static inline void sync_core(void)
>> "pushl $1f\n\t"
>> "iret\n\t"
>> "1:"
>> - : "+r" (__sp) : : "memory");
>> + : : : "memory");
>> #else
>> unsigned int tmp;
>>
>> @@ -703,7 +701,7 @@ static inline void sync_core(void)
>> "iretq\n\t"
>> UNWIND_HINT_RESTORE
>> "1:"
>> - : "=&r" (tmp), "+r" (__sp) : : "cc", "memory");
>> + : "=&r" (tmp) : : "cc", "memory");
>> #endif
>> }
>>
>> diff --git a/arch/x86/include/asm/rwsem.h b/arch/x86/include/asm/rwsem.h
>> index a34e0d4b957d..f8f127a2a300 100644
>> --- a/arch/x86/include/asm/rwsem.h
>> +++ b/arch/x86/include/asm/rwsem.h
>> @@ -103,10 +103,9 @@ static inline bool __down_read_trylock(struct rw_semaphore *sem)
>> ({ \
>> long tmp; \
>> struct rw_semaphore* ret; \
>> - register void *__sp asm(_ASM_SP); \
>> \
>> asm volatile("# beginning down_write\n\t" \
>> - LOCK_PREFIX " xadd %1,(%4)\n\t" \
>> + LOCK_PREFIX " xadd %1,(%3)\n\t" \
>> /* adds 0xffff0001, returns the old value */ \
>> " test " __ASM_SEL(%w1,%k1) "," __ASM_SEL(%w1,%k1) "\n\t" \
>> /* was the active mask 0 before? */\
>> @@ -114,7 +113,8 @@ static inline bool __down_read_trylock(struct rw_semaphore *sem)
>> " call " slow_path "\n" \
>> "1:\n" \
>> "# ending down_write" \
>> - : "+m" (sem->count), "=d" (tmp), "=a" (ret), "+r" (__sp) \
>> + : "+m" (sem->count), "=d" (tmp), \
>> + "=a" (ret) \
>> : "a" (sem), "1" (RWSEM_ACTIVE_WRITE_BIAS) \
>> : "memory", "cc"); \
>> ret; \
>> diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
>> index 184eb9894dae..113ddb8ce0d8 100644
>> --- a/arch/x86/include/asm/uaccess.h
>> +++ b/arch/x86/include/asm/uaccess.h
>> @@ -166,11 +166,10 @@ __typeof__(__builtin_choose_expr(sizeof(x) > sizeof(0UL), 0ULL, 0UL))
>> ({ \
>> int __ret_gu; \
>> register __inttype(*(ptr)) __val_gu asm("%"_ASM_DX); \
>> - register void *__sp asm(_ASM_SP); \
>> __chk_user_ptr(ptr); \
>> might_fault(); \
>> - asm volatile("call __get_user_%P4" \
>> - : "=a" (__ret_gu), "=r" (__val_gu), "+r" (__sp) \
>> + asm volatile("call __get_user_%P3" \
>> + : "=a" (__ret_gu), "=r" (__val_gu) \
>> : "0" (ptr), "i" (sizeof(*(ptr)))); \
>> (x) = (__force __typeof__(*(ptr))) __val_gu; \
>> __builtin_expect(__ret_gu, 0); \
>> diff --git a/arch/x86/include/asm/xen/hypercall.h b/arch/x86/include/asm/xen/hypercall.h
>> index 9606688caa4b..6264dba01896 100644
>> --- a/arch/x86/include/asm/xen/hypercall.h
>> +++ b/arch/x86/include/asm/xen/hypercall.h
>> @@ -113,10 +113,9 @@ extern struct { char _entry[32]; } hypercall_page[];
>> register unsigned long __arg2 asm(__HYPERCALL_ARG2REG) = __arg2; \
>> register unsigned long __arg3 asm(__HYPERCALL_ARG3REG) = __arg3; \
>> register unsigned long __arg4 asm(__HYPERCALL_ARG4REG) = __arg4; \
>> - register unsigned long __arg5 asm(__HYPERCALL_ARG5REG) = __arg5; \
>> - register void *__sp asm(_ASM_SP);
>> + register unsigned long __arg5 asm(__HYPERCALL_ARG5REG) = __arg5;
>>
>> -#define __HYPERCALL_0PARAM "=r" (__res), "+r" (__sp)
>> +#define __HYPERCALL_0PARAM "=r" (__res)
>> #define __HYPERCALL_1PARAM __HYPERCALL_0PARAM, "+r" (__arg1)
>> #define __HYPERCALL_2PARAM __HYPERCALL_1PARAM, "+r" (__arg2)
>> #define __HYPERCALL_3PARAM __HYPERCALL_2PARAM, "+r" (__arg3)
>> diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
>> index 16bf6655aa85..5e82e0821c1f 100644
>> --- a/arch/x86/kvm/emulate.c
>> +++ b/arch/x86/kvm/emulate.c
>> @@ -5296,7 +5296,6 @@ static void fetch_possible_mmx_operand(struct x86_emulate_ctxt *ctxt,
>>
>> static int fastop(struct x86_emulate_ctxt *ctxt, void (*fop)(struct fastop *))
>> {
>> - register void *__sp asm(_ASM_SP);
>> ulong flags = (ctxt->eflags & EFLAGS_MASK) | X86_EFLAGS_IF;
>>
>> if (!(ctxt->d & ByteOp))
>> @@ -5304,7 +5303,7 @@ static int fastop(struct x86_emulate_ctxt *ctxt, void (*fop)(struct fastop *))
>>
>> asm("push %[flags]; popf; call *%[fastop]; pushf; pop %[flags]\n"
>> : "+a"(ctxt->dst.val), "+d"(ctxt->src.val), [flags]"+D"(flags),
>> - [fastop]"+S"(fop), "+r"(__sp)
>> + [fastop]"+S"(fop)
>> : "c"(ctxt->src2.val));
>>
>> ctxt->eflags = (ctxt->eflags & ~EFLAGS_MASK) | (flags & EFLAGS_MASK);
>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>> index 06c0c6d0541e..a693362f41af 100644
>> --- a/arch/x86/kvm/vmx.c
>> +++ b/arch/x86/kvm/vmx.c
>> @@ -9036,7 +9036,6 @@ static void vmx_complete_atomic_exit(struct vcpu_vmx *vmx)
>> static void vmx_handle_external_intr(struct kvm_vcpu *vcpu)
>> {
>> u32 exit_intr_info = vmcs_read32(VM_EXIT_INTR_INFO);
>> - register void *__sp asm(_ASM_SP);
>>
>> if ((exit_intr_info & (INTR_INFO_VALID_MASK | INTR_INFO_INTR_TYPE_MASK))
>> == (INTR_INFO_VALID_MASK | INTR_TYPE_EXT_INTR)) {
>> @@ -9063,9 +9062,8 @@ static void vmx_handle_external_intr(struct kvm_vcpu *vcpu)
>> "call *%[entry]\n\t"
>> :
>> #ifdef CONFIG_X86_64
>> - [sp]"=&r"(tmp),
>> + [sp]"=&r"(tmp)
>> #endif
>> - "+r"(__sp)
>> :
>> [entry]"r"(entry),
>> [ss]"i"(__KERNEL_DS),
>> diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
>> index b836a7274e12..4457e41378e4 100644
>> --- a/arch/x86/mm/fault.c
>> +++ b/arch/x86/mm/fault.c
>> @@ -806,7 +806,6 @@ no_context(struct pt_regs *regs, unsigned long error_code,
>> if (is_vmalloc_addr((void *)address) &&
>> (((unsigned long)tsk->stack - 1 - address < PAGE_SIZE) ||
>> address - ((unsigned long)tsk->stack + THREAD_SIZE) < PAGE_SIZE)) {
>> - register void *__sp asm("rsp");
>> unsigned long stack = this_cpu_read(orig_ist.ist[DOUBLEFAULT_STACK]) - sizeof(void *);
>> /*
>> * We're likely to be running with very little stack space
>> @@ -821,7 +820,7 @@ no_context(struct pt_regs *regs, unsigned long error_code,
>> asm volatile ("movq %[stack], %%rsp\n\t"
>> "call handle_stack_overflow\n\t"
>> "1: jmp 1b"
>> - : "+r" (__sp)
>> + :
>> : "D" ("kernel stack overflow (page fault)"),
>> "S" (regs), "d" (address),
>> [stack] "rm" (stack));
>> diff --git a/tools/objtool/Documentation/stack-validation.txt b/tools/objtool/Documentation/stack-validation.txt
>> index 6a1af43862df..18af4aa7488d 100644
>> --- a/tools/objtool/Documentation/stack-validation.txt
>> +++ b/tools/objtool/Documentation/stack-validation.txt
>> @@ -192,12 +192,22 @@ they mean, and suggestions for how to fix them.
>> use the manual unwind hint macros in asm/unwind_hints.h.
>>
>> If it's a GCC-compiled .c file, the error may be because the function
>> - uses an inline asm() statement which has a "call" instruction. An
>> - asm() statement with a call instruction must declare the use of the
>> - stack pointer in its output operand. For example, on x86_64:
>> + uses an inline asm() statement which has a "call" instruction. To
>> + fix this, either:
>>
>> - register void *__sp asm("rsp");
>> - asm volatile("call func" : "+r" (__sp));
>> + a) list the stack pointer as an input/output constraint;
>> +
>> + register void *__sp asm("rsp");
>> + asm volatile("call func" : "+r" (__sp));
>> +
>> + or
>> +
>> + b) declare a global register variable for the stack pointer.
>> +
>> + arch/x86/include/asm/asm.h:register unsigned int __sp_unused asm("esp");
>> +
>> + This is the strategy used on x86_64. It ensures that GCC always
>> + sets up the frame pointer before inserting *any* inline asm.
>>
>> Otherwise the stack frame may not get created before the call.
Tested-by: Alexander Potapenko <[email protected]>
>> --
>> 2.13.5
>>
>
>
>
> --
> Alexander Potapenko
> Software Engineer
>
> Google Germany GmbH
> Erika-Mann-Straße, 33
> 80636 München
>
> Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
> Registergericht und -nummer: Hamburg, HRB 86891
> Sitz der Gesellschaft: Hamburg



--
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg

2017-09-20 01:18:26

by Josh Poimboeuf

[permalink] [raw]
Subject: Re: [PATCH 2/2] x86/asm: Fix inline asm call constraints for clang

On Tue, Sep 19, 2017 at 01:45:28PM -0500, Josh Poimboeuf wrote:
> For inline asm statements which have a CALL instruction, we list the
> stack pointer as a constraint to convince GCC to ensure the frame
> pointer is set up first:
>
> static inline void foo()
> {
> register void *__sp asm(_ASM_SP);
> asm("call bar" : "+r" (__sp))
> }
>
> Unfortunately, that pattern causes clang to corrupt the stack pointer.
>
> There's actually an easier way to achieve the same goal in GCC, without
> causing trouble for clang. If we declare the stack pointer register
> variable as a global variable, and remove the constraint altogether,
> that convinces GCC to always set up the frame pointer before inserting
> *any* inline asm.
>
> It basically acts as if *every* inline asm statement has a CALL
> instruction. It's a bit overkill, but the performance impact should be
> negligible.
>
> Here are the vmlinux .text size differences with the following configs
> on GCC:
>
> - defconfig
> - defconfig without frame pointers
> - Fedora distro config
> - Fedora distro config without frame pointers
>
> defconfig defconfig-nofp distro distro-nofp
> before 9796300 9466764 9076191 8789745
> after 9796941 9462859 9076381 8785325
>
> With frame pointers, the text size increases slightly. Without frame
> pointers, the text size decreases, and a little more significantly.
>
> Reported-by: Matthias Kaehlcke <[email protected]>
> Signed-off-by: Josh Poimboeuf <[email protected]>

NAK - kbuild bot is reporting some cases where this patch doesn't force
the frame pointer setup.

--
Josh

2017-09-20 01:47:15

by Josh Poimboeuf

[permalink] [raw]
Subject: Re: [PATCH 2/2] x86/asm: Fix inline asm call constraints for clang

On Tue, Sep 19, 2017 at 03:25:59PM -0700, Alexander Potapenko wrote:
> On Tue, Sep 19, 2017 at 2:55 PM, Alexander Potapenko <[email protected]> wrote:
> > On Tue, Sep 19, 2017 at 11:45 AM, Josh Poimboeuf <[email protected]> wrote:
> >> For inline asm statements which have a CALL instruction, we list the
> >> stack pointer as a constraint to convince GCC to ensure the frame
> >> pointer is set up first:
> >>
> >> static inline void foo()
> >> {
> >> register void *__sp asm(_ASM_SP);
> >> asm("call bar" : "+r" (__sp))
> >> }
> >>
> >> Unfortunately, that pattern causes clang to corrupt the stack pointer.
> >>
> >> There's actually an easier way to achieve the same goal in GCC, without
> >> causing trouble for clang. If we declare the stack pointer register
> >> variable as a global variable, and remove the constraint altogether,
> >> that convinces GCC to always set up the frame pointer before inserting
> >> *any* inline asm.
> >>
> >> It basically acts as if *every* inline asm statement has a CALL
> >> instruction. It's a bit overkill, but the performance impact should be
> >> negligible.
> >>
> >> Here are the vmlinux .text size differences with the following configs
> >> on GCC:
> >>
> >> - defconfig
> >> - defconfig without frame pointers
> >> - Fedora distro config
> >> - Fedora distro config without frame pointers
> >>
> >> defconfig defconfig-nofp distro distro-nofp
> >> before 9796300 9466764 9076191 8789745
> >> after 9796941 9462859 9076381 8785325
> >>
> >> With frame pointers, the text size increases slightly. Without frame
> >> pointers, the text size decreases, and a little more significantly.
> >>
> >> Reported-by: Matthias Kaehlcke <[email protected]>
> >> Signed-off-by: Josh Poimboeuf <[email protected]>
> >> ---
> >> arch/x86/include/asm/alternative.h | 3 +--
> >> arch/x86/include/asm/asm.h | 9 ++++++++
> >> arch/x86/include/asm/mshyperv.h | 27 ++++++++++--------------
> >> arch/x86/include/asm/paravirt_types.h | 14 ++++++------
> >> arch/x86/include/asm/preempt.h | 15 +++++--------
> >> arch/x86/include/asm/processor.h | 6 ++----
> >> arch/x86/include/asm/rwsem.h | 6 +++---
> >> arch/x86/include/asm/uaccess.h | 5 ++---
> >> arch/x86/include/asm/xen/hypercall.h | 5 ++---
> >> arch/x86/kvm/emulate.c | 3 +--
> >> arch/x86/kvm/vmx.c | 4 +---
> >> arch/x86/mm/fault.c | 3 +--
> >> tools/objtool/Documentation/stack-validation.txt | 20 +++++++++++++-----
> >> 13 files changed, 60 insertions(+), 60 deletions(-)
> >>
> >> diff --git a/arch/x86/include/asm/alternative.h b/arch/x86/include/asm/alternative.h
> >> index 1b020381ab38..7aeb1cef4204 100644
> >> --- a/arch/x86/include/asm/alternative.h
> >> +++ b/arch/x86/include/asm/alternative.h
> >> @@ -218,10 +218,9 @@ static inline int alternatives_text_reserved(void *start, void *end)
> >> #define alternative_call_2(oldfunc, newfunc1, feature1, newfunc2, feature2, \
> >> output, input...) \
> >> { \
> >> - register void *__sp asm(_ASM_SP); \
> >> asm volatile (ALTERNATIVE_2("call %P[old]", "call %P[new1]", feature1,\
> >> "call %P[new2]", feature2) \
> >> - : output, "+r" (__sp) \
> >> + : output \
> >> : [old] "i" (oldfunc), [new1] "i" (newfunc1), \
> >> [new2] "i" (newfunc2), ## input); \
> >> }
> >> diff --git a/arch/x86/include/asm/asm.h b/arch/x86/include/asm/asm.h
> >> index 676ee5807d86..ff8921d4615b 100644
> >> --- a/arch/x86/include/asm/asm.h
> >> +++ b/arch/x86/include/asm/asm.h
> >> @@ -132,4 +132,13 @@
> >> /* For C file, we already have NOKPROBE_SYMBOL macro */
> >> #endif
> >>
> >> +#ifndef __ASSEMBLY__
> >> +/*
> >> + * This variable declaration has the side effect of forcing GCC to always set
> >> + * up the frame pointer before inserting any inline asm. This is necessary
> >> + * because some inline asm statements have CALL instructions.
> >> + */
> >> +register unsigned int __sp_unused asm("esp");

> > Shouldn't this be "asm(_ASM_SP)"?

> Answering my own question, this can't be _ASM_SP because of the
> realmode code, which is built with -m16 whereas _ASM_SP is "rsp".
> The patch works fine with "esp" though - most certainly declaring a
> ESP variable is enough to reserve the whole RSP in an x86_64 build.

Right, clang failing to build the realmode code was exactly why I did
that.

--
Josh

2017-09-20 17:38:33

by Dmitry Vyukov

[permalink] [raw]
Subject: Re: [PATCH 2/2] x86/asm: Fix inline asm call constraints for clang

On Wed, Sep 20, 2017 at 7:32 PM, H. Peter Anvin <[email protected]> wrote:
> On 09/19/17 11:45, Josh Poimboeuf wrote:
>> For inline asm statements which have a CALL instruction, we list the
>> stack pointer as a constraint to convince GCC to ensure the frame
>> pointer is set up first:
>>
>> static inline void foo()
>> {
>> register void *__sp asm(_ASM_SP);
>> asm("call bar" : "+r" (__sp))
>> }
>>
>> Unfortunately, that pattern causes clang to corrupt the stack pointer.
>>
>> There's actually an easier way to achieve the same goal in GCC, without
>> causing trouble for clang. If we declare the stack pointer register
>> variable as a global variable, and remove the constraint altogether,
>> that convinces GCC to always set up the frame pointer before inserting
>> *any* inline asm.
>>
>> It basically acts as if *every* inline asm statement has a CALL
>> instruction. It's a bit overkill, but the performance impact should be
>> negligible.
>>
>
> Again, probably negligible, but why do we need a frame pointer just
> because we have a call assembly instruction?

I think we need just the frame itself and RSP pointing below this
frame. If we don't have a frame, CALL instruction will smash whatever
RSP happens to point to. Compiler doesn't have to setup RSP to point
below used part of stack in leaf functions.

2017-09-20 17:44:59

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH 2/2] x86/asm: Fix inline asm call constraints for clang

On 09/19/17 11:45, Josh Poimboeuf wrote:
> For inline asm statements which have a CALL instruction, we list the
> stack pointer as a constraint to convince GCC to ensure the frame
> pointer is set up first:
>
> static inline void foo()
> {
> register void *__sp asm(_ASM_SP);
> asm("call bar" : "+r" (__sp))
> }
>
> Unfortunately, that pattern causes clang to corrupt the stack pointer.
>
> There's actually an easier way to achieve the same goal in GCC, without
> causing trouble for clang. If we declare the stack pointer register
> variable as a global variable, and remove the constraint altogether,
> that convinces GCC to always set up the frame pointer before inserting
> *any* inline asm.
>
> It basically acts as if *every* inline asm statement has a CALL
> instruction. It's a bit overkill, but the performance impact should be
> negligible.
>

Again, probably negligible, but why do we need a frame pointer just
because we have a call assembly instruction?

-hpa

2017-09-20 17:51:29

by Josh Poimboeuf

[permalink] [raw]
Subject: Re: [PATCH 2/2] x86/asm: Fix inline asm call constraints for clang

On Wed, Sep 20, 2017 at 10:32:43AM -0700, H. Peter Anvin wrote:
> On 09/19/17 11:45, Josh Poimboeuf wrote:
> > For inline asm statements which have a CALL instruction, we list the
> > stack pointer as a constraint to convince GCC to ensure the frame
> > pointer is set up first:
> >
> > static inline void foo()
> > {
> > register void *__sp asm(_ASM_SP);
> > asm("call bar" : "+r" (__sp))
> > }
> >
> > Unfortunately, that pattern causes clang to corrupt the stack pointer.
> >
> > There's actually an easier way to achieve the same goal in GCC, without
> > causing trouble for clang. If we declare the stack pointer register
> > variable as a global variable, and remove the constraint altogether,
> > that convinces GCC to always set up the frame pointer before inserting
> > *any* inline asm.
> >
> > It basically acts as if *every* inline asm statement has a CALL
> > instruction. It's a bit overkill, but the performance impact should be
> > negligible.
> >
>
> Again, probably negligible, but why do we need a frame pointer just
> because we have a call assembly instruction?

It's frame pointer convention. Without it, if dumping the stack from
the called function, a function will get skipped in the stack trace.

--
Josh

2017-09-20 17:54:09

by Josh Poimboeuf

[permalink] [raw]
Subject: Re: [PATCH 2/2] x86/asm: Fix inline asm call constraints for clang

On Tue, Sep 19, 2017 at 08:18:23PM -0500, Josh Poimboeuf wrote:
> On Tue, Sep 19, 2017 at 01:45:28PM -0500, Josh Poimboeuf wrote:
> > For inline asm statements which have a CALL instruction, we list the
> > stack pointer as a constraint to convince GCC to ensure the frame
> > pointer is set up first:
> >
> > static inline void foo()
> > {
> > register void *__sp asm(_ASM_SP);
> > asm("call bar" : "+r" (__sp))
> > }
> >
> > Unfortunately, that pattern causes clang to corrupt the stack pointer.
> >
> > There's actually an easier way to achieve the same goal in GCC, without
> > causing trouble for clang. If we declare the stack pointer register
> > variable as a global variable, and remove the constraint altogether,
> > that convinces GCC to always set up the frame pointer before inserting
> > *any* inline asm.
> >
> > It basically acts as if *every* inline asm statement has a CALL
> > instruction. It's a bit overkill, but the performance impact should be
> > negligible.
> >
> > Here are the vmlinux .text size differences with the following configs
> > on GCC:
> >
> > - defconfig
> > - defconfig without frame pointers
> > - Fedora distro config
> > - Fedora distro config without frame pointers
> >
> > defconfig defconfig-nofp distro distro-nofp
> > before 9796300 9466764 9076191 8789745
> > after 9796941 9462859 9076381 8785325
> >
> > With frame pointers, the text size increases slightly. Without frame
> > pointers, the text size decreases, and a little more significantly.
> >
> > Reported-by: Matthias Kaehlcke <[email protected]>
> > Signed-off-by: Josh Poimboeuf <[email protected]>
>
> NAK - kbuild bot is reporting some cases where this patch doesn't force
> the frame pointer setup.

So it turns out that for GCC 7, it works as described above: the global
register variable results in the frame pointer getting set up before
*all* inline asm.

But for GCC 6, it doesn't work that way. The global register variable
has no such effect.

So we need both the global register variable *and* the output constraint
after all. Will post another patch after some more testing.

--
Josh

2017-09-20 17:57:57

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH 2/2] x86/asm: Fix inline asm call constraints for clang

On 09/20/17 10:38, Dmitry Vyukov wrote:
>
> I think we need just the frame itself and RSP pointing below this
> frame. If we don't have a frame, CALL instruction will smash whatever
> RSP happens to point to. Compiler doesn't have to setup RSP to point
> below used part of stack in leaf functions.
>

In the kernel it does. Redzoning is not allowed in the kernel, because
interrupts or exceptions would also smash the redzone.

-hpa

2017-09-20 18:01:25

by Dmitry Vyukov

[permalink] [raw]
Subject: Re: [PATCH 2/2] x86/asm: Fix inline asm call constraints for clang

On Wed, Sep 20, 2017 at 7:46 PM, H. Peter Anvin <[email protected]> wrote:
> On 09/20/17 10:38, Dmitry Vyukov wrote:
>>
>> I think we need just the frame itself and RSP pointing below this
>> frame. If we don't have a frame, CALL instruction will smash whatever
>> RSP happens to point to. Compiler doesn't have to setup RSP to point
>> below used part of stack in leaf functions.
>>
>
> In the kernel it does. Redzoning is not allowed in the kernel, because
> interrupts or exceptions would also smash the redzone.

I see... But it's the same for user-space signals, the first thing a
signal should do is to skip the redzone. I guess interrupt handlers
should switch to interrupt stack which avoids smashing redzone
altogether. Do you mean nested interrupts/exceptions in interrupts?
In my experience frames in leaf functions can have pretty large
performance penalty. Wonder if we have we considered changing
interrupt/exception handlers to avoid smashing redzones and disable
leaf frames?

2017-09-20 21:07:36

by Josh Poimboeuf

[permalink] [raw]
Subject: Re: [PATCH 2/2] x86/asm: Fix inline asm call constraints for clang

On Wed, Sep 20, 2017 at 08:01:02PM +0200, Dmitry Vyukov wrote:
> On Wed, Sep 20, 2017 at 7:46 PM, H. Peter Anvin <[email protected]> wrote:
> > On 09/20/17 10:38, Dmitry Vyukov wrote:
> >>
> >> I think we need just the frame itself and RSP pointing below this
> >> frame. If we don't have a frame, CALL instruction will smash whatever
> >> RSP happens to point to. Compiler doesn't have to setup RSP to point
> >> below used part of stack in leaf functions.
> >>
> >
> > In the kernel it does. Redzoning is not allowed in the kernel, because
> > interrupts or exceptions would also smash the redzone.
>
> I see... But it's the same for user-space signals, the first thing a
> signal should do is to skip the redzone. I guess interrupt handlers
> should switch to interrupt stack which avoids smashing redzone
> altogether. Do you mean nested interrupts/exceptions in interrupts?
> In my experience frames in leaf functions can have pretty large
> performance penalty. Wonder if we have we considered changing
> interrupt/exception handlers to avoid smashing redzones and disable
> leaf frames?

Currently, on x86-64, I believe all exceptions have their own dedicated
stacks in the kernel, but IRQs still come in on the task's kernel stack.

Andy, do you know if there's a reason why IRQs don't use a dedicated IST
stack?

--
Josh

2017-09-20 21:19:11

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH 2/2] x86/asm: Fix inline asm call constraints for clang



> On Sep 20, 2017, at 2:07 PM, Josh Poimboeuf <[email protected]> wrote:
>
>> On Wed, Sep 20, 2017 at 08:01:02PM +0200, Dmitry Vyukov wrote:
>>> On Wed, Sep 20, 2017 at 7:46 PM, H. Peter Anvin <[email protected]> wrote:
>>>> On 09/20/17 10:38, Dmitry Vyukov wrote:
>>>>
>>>> I think we need just the frame itself and RSP pointing below this
>>>> frame. If we don't have a frame, CALL instruction will smash whatever
>>>> RSP happens to point to. Compiler doesn't have to setup RSP to point
>>>> below used part of stack in leaf functions.
>>>>
>>>
>>> In the kernel it does. Redzoning is not allowed in the kernel, because
>>> interrupts or exceptions would also smash the redzone.
>>
>> I see... But it's the same for user-space signals, the first thing a
>> signal should do is to skip the redzone. I guess interrupt handlers
>> should switch to interrupt stack which avoids smashing redzone
>> altogether. Do you mean nested interrupts/exceptions in interrupts?
>> In my experience frames in leaf functions can have pretty large
>> performance penalty. Wonder if we have we considered changing
>> interrupt/exception handlers to avoid smashing redzones and disable
>> leaf frames?
>
> Currently, on x86-64, I believe all exceptions have their own dedicated
> stacks in the kernel, but IRQs still come in on the task's kernel stack.
>
> Andy, do you know if there's a reason why IRQs don't use a dedicated IST
> stack?
>

Because IST is awful due to recursion issues. We immediately switch to an IRQ stack, though.

If the kernel wanted a redzone, it would have to use IST for everything, which would entail a bunch of unpleasant hackery.

> --
> Josh

2017-09-21 08:12:52

by Dmitry Vyukov

[permalink] [raw]
Subject: Re: [PATCH 2/2] x86/asm: Fix inline asm call constraints for clang

On Wed, Sep 20, 2017 at 11:19 PM, Andy Lutomirski <[email protected]> wrote:
>>> On Wed, Sep 20, 2017 at 08:01:02PM +0200, Dmitry Vyukov wrote:
>>>> On Wed, Sep 20, 2017 at 7:46 PM, H. Peter Anvin <[email protected]> wrote:
>>>>> On 09/20/17 10:38, Dmitry Vyukov wrote:
>>>>>
>>>>> I think we need just the frame itself and RSP pointing below this
>>>>> frame. If we don't have a frame, CALL instruction will smash whatever
>>>>> RSP happens to point to. Compiler doesn't have to setup RSP to point
>>>>> below used part of stack in leaf functions.
>>>>>
>>>>
>>>> In the kernel it does. Redzoning is not allowed in the kernel, because
>>>> interrupts or exceptions would also smash the redzone.
>>>
>>> I see... But it's the same for user-space signals, the first thing a
>>> signal should do is to skip the redzone. I guess interrupt handlers
>>> should switch to interrupt stack which avoids smashing redzone
>>> altogether. Do you mean nested interrupts/exceptions in interrupts?
>>> In my experience frames in leaf functions can have pretty large
>>> performance penalty. Wonder if we have we considered changing
>>> interrupt/exception handlers to avoid smashing redzones and disable
>>> leaf frames?
>>
>> Currently, on x86-64, I believe all exceptions have their own dedicated
>> stacks in the kernel, but IRQs still come in on the task's kernel stack.
>>
>> Andy, do you know if there's a reason why IRQs don't use a dedicated IST
>> stack?
>>
>
> Because IST is awful due to recursion issues. We immediately switch to an IRQ stack, though.
>
> If the kernel wanted a redzone, it would have to use IST for everything, which would entail a bunch of unpleasant hackery.

Thanks.

I guess it must be finite recursion, because we could not handle
infinite with finite stack. I thing that solves it is simply:

sub $256, %rsp
... do stuff ...
add $256, %rsp

Don't know if it's applicable to interrupts or not.

2017-09-21 11:52:22

by Brian Gerst

[permalink] [raw]
Subject: Re: [PATCH 2/2] x86/asm: Fix inline asm call constraints for clang

On Thu, Sep 21, 2017 at 4:12 AM, Dmitry Vyukov <[email protected]> wrote:
> On Wed, Sep 20, 2017 at 11:19 PM, Andy Lutomirski <[email protected]> wrote:
>>>> On Wed, Sep 20, 2017 at 08:01:02PM +0200, Dmitry Vyukov wrote:
>>>>> On Wed, Sep 20, 2017 at 7:46 PM, H. Peter Anvin <[email protected]> wrote:
>>>>>> On 09/20/17 10:38, Dmitry Vyukov wrote:
>>>>>>
>>>>>> I think we need just the frame itself and RSP pointing below this
>>>>>> frame. If we don't have a frame, CALL instruction will smash whatever
>>>>>> RSP happens to point to. Compiler doesn't have to setup RSP to point
>>>>>> below used part of stack in leaf functions.
>>>>>>
>>>>>
>>>>> In the kernel it does. Redzoning is not allowed in the kernel, because
>>>>> interrupts or exceptions would also smash the redzone.
>>>>
>>>> I see... But it's the same for user-space signals, the first thing a
>>>> signal should do is to skip the redzone. I guess interrupt handlers
>>>> should switch to interrupt stack which avoids smashing redzone
>>>> altogether. Do you mean nested interrupts/exceptions in interrupts?
>>>> In my experience frames in leaf functions can have pretty large
>>>> performance penalty. Wonder if we have we considered changing
>>>> interrupt/exception handlers to avoid smashing redzones and disable
>>>> leaf frames?
>>>
>>> Currently, on x86-64, I believe all exceptions have their own dedicated
>>> stacks in the kernel, but IRQs still come in on the task's kernel stack.
>>>
>>> Andy, do you know if there's a reason why IRQs don't use a dedicated IST
>>> stack?
>>>
>>
>> Because IST is awful due to recursion issues. We immediately switch to an IRQ stack, though.
>>
>> If the kernel wanted a redzone, it would have to use IST for everything, which would entail a bunch of unpleasant hackery.
>
> Thanks.
>
> I guess it must be finite recursion, because we could not handle
> infinite with finite stack. I thing that solves it is simply:
>
> sub $256, %rsp
> ... do stuff ...
> add $256, %rsp
>
> Don't know if it's applicable to interrupts or not.

No, it is not. The processor pushes 5 or 6 words of data on the stack
(the IRET frame plus an error code for certain exceptions) before the
interrupt handler gets control. So without using the IST for stack
switching on every interrupt, the redzone cannot be used in the kernel
as it will get smashed by the IRET frame. In addition, since the
kernel's stack is limited in size, skipping 128 bytes on every
interrupt would overrun the stack faster. The small gain from using
the redzone in the kernel is outweighed by these limitations.

--
Brian Gerst

2017-09-21 12:18:38

by Dmitry Vyukov

[permalink] [raw]
Subject: Re: [PATCH 2/2] x86/asm: Fix inline asm call constraints for clang

On Thu, Sep 21, 2017 at 1:52 PM, Brian Gerst <[email protected]> wrote:
>>>>>>> I think we need just the frame itself and RSP pointing below this
>>>>>>> frame. If we don't have a frame, CALL instruction will smash whatever
>>>>>>> RSP happens to point to. Compiler doesn't have to setup RSP to point
>>>>>>> below used part of stack in leaf functions.
>>>>>>>
>>>>>>
>>>>>> In the kernel it does. Redzoning is not allowed in the kernel, because
>>>>>> interrupts or exceptions would also smash the redzone.
>>>>>
>>>>> I see... But it's the same for user-space signals, the first thing a
>>>>> signal should do is to skip the redzone. I guess interrupt handlers
>>>>> should switch to interrupt stack which avoids smashing redzone
>>>>> altogether. Do you mean nested interrupts/exceptions in interrupts?
>>>>> In my experience frames in leaf functions can have pretty large
>>>>> performance penalty. Wonder if we have we considered changing
>>>>> interrupt/exception handlers to avoid smashing redzones and disable
>>>>> leaf frames?
>>>>
>>>> Currently, on x86-64, I believe all exceptions have their own dedicated
>>>> stacks in the kernel, but IRQs still come in on the task's kernel stack.
>>>>
>>>> Andy, do you know if there's a reason why IRQs don't use a dedicated IST
>>>> stack?
>>>>
>>>
>>> Because IST is awful due to recursion issues. We immediately switch to an IRQ stack, though.
>>>
>>> If the kernel wanted a redzone, it would have to use IST for everything, which would entail a bunch of unpleasant hackery.
>>
>> Thanks.
>>
>> I guess it must be finite recursion, because we could not handle
>> infinite with finite stack. I thing that solves it is simply:
>>
>> sub $256, %rsp
>> ... do stuff ...
>> add $256, %rsp
>>
>> Don't know if it's applicable to interrupts or not.
>
> No, it is not. The processor pushes 5 or 6 words of data on the stack
> (the IRET frame plus an error code for certain exceptions) before the
> interrupt handler gets control. So without using the IST for stack
> switching on every interrupt, the redzone cannot be used in the kernel
> as it will get smashed by the IRET frame. In addition, since the
> kernel's stack is limited in size, skipping 128 bytes on every
> interrupt would overrun the stack faster. The small gain from using
> the redzone in the kernel is outweighed by these limitations.

I see, thanks for educating.

2017-09-21 15:35:17

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 2/2] x86/asm: Fix inline asm call constraints for clang


* Josh Poimboeuf <[email protected]> wrote:

> On Wed, Sep 20, 2017 at 10:32:43AM -0700, H. Peter Anvin wrote:
> > On 09/19/17 11:45, Josh Poimboeuf wrote:
> > > For inline asm statements which have a CALL instruction, we list the
> > > stack pointer as a constraint to convince GCC to ensure the frame
> > > pointer is set up first:
> > >
> > > static inline void foo()
> > > {
> > > register void *__sp asm(_ASM_SP);
> > > asm("call bar" : "+r" (__sp))
> > > }
> > >
> > > Unfortunately, that pattern causes clang to corrupt the stack pointer.
> > >
> > > There's actually an easier way to achieve the same goal in GCC, without
> > > causing trouble for clang. If we declare the stack pointer register
> > > variable as a global variable, and remove the constraint altogether,
> > > that convinces GCC to always set up the frame pointer before inserting
> > > *any* inline asm.
> > >
> > > It basically acts as if *every* inline asm statement has a CALL
> > > instruction. It's a bit overkill, but the performance impact should be
> > > negligible.
> > >
> >
> > Again, probably negligible, but why do we need a frame pointer just
> > because we have a call assembly instruction?
>
> It's frame pointer convention. Without it, if dumping the stack from
> the called function, a function will get skipped in the stack trace.

BTW., could we perhaps relax this and simply phase out the frame pointer on x86,
and simplify all our assembly in a cycle or two? ORC unwinder is working out very
well so far. Live kernel patching can use ORC data just fine, and nothing else
actually relies on frame pointers, right?

That would give one more register to assembly code.

I realize that we just rewrote a whole bunch of assembly code... but that was the
price for ORC, in a way.

Thanks,

Ingo

2017-09-21 16:18:16

by Josh Poimboeuf

[permalink] [raw]
Subject: Re: [PATCH 2/2] x86/asm: Fix inline asm call constraints for clang

On Thu, Sep 21, 2017 at 05:35:11PM +0200, Ingo Molnar wrote:
>
> * Josh Poimboeuf <[email protected]> wrote:
>
> > On Wed, Sep 20, 2017 at 10:32:43AM -0700, H. Peter Anvin wrote:
> > > On 09/19/17 11:45, Josh Poimboeuf wrote:
> > > > For inline asm statements which have a CALL instruction, we list the
> > > > stack pointer as a constraint to convince GCC to ensure the frame
> > > > pointer is set up first:
> > > >
> > > > static inline void foo()
> > > > {
> > > > register void *__sp asm(_ASM_SP);
> > > > asm("call bar" : "+r" (__sp))
> > > > }
> > > >
> > > > Unfortunately, that pattern causes clang to corrupt the stack pointer.
> > > >
> > > > There's actually an easier way to achieve the same goal in GCC, without
> > > > causing trouble for clang. If we declare the stack pointer register
> > > > variable as a global variable, and remove the constraint altogether,
> > > > that convinces GCC to always set up the frame pointer before inserting
> > > > *any* inline asm.
> > > >
> > > > It basically acts as if *every* inline asm statement has a CALL
> > > > instruction. It's a bit overkill, but the performance impact should be
> > > > negligible.
> > > >
> > >
> > > Again, probably negligible, but why do we need a frame pointer just
> > > because we have a call assembly instruction?
> >
> > It's frame pointer convention. Without it, if dumping the stack from
> > the called function, a function will get skipped in the stack trace.
>
> BTW., could we perhaps relax this and simply phase out the frame pointer on x86,
> and simplify all our assembly in a cycle or two? ORC unwinder is working out very
> well so far. Live kernel patching can use ORC data just fine, and nothing else
> actually relies on frame pointers, right?
>
> That would give one more register to assembly code.
>
> I realize that we just rewrote a whole bunch of assembly code... but that was the
> price for ORC, in a way.

I think ORC isn't *quite* ready for livepatch yet, though it's close.
We could probably make it ready in a cycle or two.

However, I'm not sure whether we would want to remove livepatch support
for frame pointers yet:

- There might be some embedded livepatch users who can't deal with the
extra 3MB of RAM needed by ORC. (Though this is purely theoretical, I
don't actually know anybody with this problem. I suppose we could
float the idea on the livepatch/kpatch mailing lists and see if
anybody objects.)

- Objtool's ORC generation is working fine now, but objtool maintenance
is currently a little heavy-handed, and I'm currently the only person
who knows how to do it. I've got an idea brewing for how to improve
its maintainability with the help of compiler plugins, but until then,
if I get hit by a bus tomorrow, who's going to pick it up? It's nice
to have frame pointers as a backup solution for live patching.

But also, is this a problem that's even worth fixing? Do we have any
assembly code that would be noticeably better off with that extra
register? Most of the changes were:

- adding FRAME_BEGIN/FRAME_END to some asm functions;
- juggling register usage in a few crypto functions; and
- adding the stack pointer constraint to 15 or so inline asm functions.

I think those changes weren't all that disruptive to start with.

--
Josh