2020-04-07 07:28:45

by Alexandre Chartre

[permalink] [raw]
Subject: [PATCH V2 0/9] objtool changes to remove all ANNOTATE_NOSPEC_ALTERNATIVE

Hi,

This is version v2 of this patchset based on the different comments
received so far. It now uses and includes PeterZ patch to add
UNWIND_HINT_RET_OFFSET. Other changes are described below.

Code like retpoline or RSB stuffing, which is used to mitigate some of
the speculative execution issues, is currently ignored by objtool with
the ANNOTATE_NOSPEC_ALTERNATIVE directive. This series adds support
for intra-function calls to objtool so that it can handle such a code.
With these changes, we can remove all ANNOTATE_NOSPEC_ALTERNATIVE
directives.

Changes:
- replace RETPOLINE_RET with PeterZ UNWIND_HINT_RET_OFFSET
- make objtool intra-function call action architecture dependent
- objtool now automatically detects and validates all intra-function
calls but it issues a warning if the call was not explicitly tagged
- change __FILL_RETURN_BUFFER to work with objtool
- add generic ANNOTATE_INTRA_FUNCTION_CALL macro
- remove all ANNOTATE_SPEC_ALTERNATIVE (even for __FILL_RETURN_BUFFER)

Thanks,

alex.

-----

Alexandre Chartre (8):
objtool: UNWIND_HINT_RET_OFFSET should not check registers
objtool: is_fentry_call() crashes if call has no destination
objtool: Allow branches within the same alternative.
objtool: Add support for intra-function calls
x86/speculation: Change __FILL_RETURN_BUFFER to work with objtool
x86/speculation: Annotate intra-function calls
x86/speculation: Add unwind hint to trampoline return
x86/speculation: Remove all ANNOTATE_NOSPEC_ALTERNATIVE directives

Peter Zijlstra (Intel) (1):
objtool: Introduce HINT_RET_OFFSET

arch/x86/include/asm/nospec-branch.h | 32 ++--
arch/x86/include/asm/orc_types.h | 1 +
arch/x86/include/asm/unwind_hints.h | 10 ++
include/linux/frame.h | 11 ++
tools/arch/x86/include/asm/orc_types.h | 1 +
.../Documentation/stack-validation.txt | 8 +
tools/objtool/arch.h | 2 +
tools/objtool/arch/x86/decode.c | 12 ++
tools/objtool/check.c | 152 ++++++++++++++----
tools/objtool/check.h | 6 +-
10 files changed, 192 insertions(+), 43 deletions(-)

--
2.18.2


2020-04-07 07:28:52

by Alexandre Chartre

[permalink] [raw]
Subject: [PATCH V2 3/9] objtool: is_fentry_call() crashes if call has no destination

Fix is_fentry_call() so that it works if a call has no destination
set (call_dest). This needs to be done in order to support intra-
function calls.

Signed-off-by: Alexandre Chartre <[email protected]>
---
tools/objtool/check.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/objtool/check.c b/tools/objtool/check.c
index c7fcaddfaa8a..c0da033a40c5 100644
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -1367,7 +1367,7 @@ static int decode_sections(struct objtool_file *file)

static bool is_fentry_call(struct instruction *insn)
{
- if (insn->type == INSN_CALL &&
+ if (insn->type == INSN_CALL && insn->call_dest &&
insn->call_dest->type == STT_NOTYPE &&
!strcmp(insn->call_dest->name, "__fentry__"))
return true;
--
2.18.2

2020-04-07 07:28:55

by Alexandre Chartre

[permalink] [raw]
Subject: [PATCH V2 6/9] x86/speculation: Change __FILL_RETURN_BUFFER to work with objtool

Change __FILL_RETURN_BUFFER so that the stack state is deterministically
defined for each iteration and that objtool can have an accurate view
of the stack.

Signed-off-by: Alexandre Chartre <[email protected]>
---
arch/x86/include/asm/nospec-branch.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
index 5c24a7b35166..9a946fd5e824 100644
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -60,8 +60,8 @@
jmp 775b; \
774: \
dec reg; \
- jnz 771b; \
- add $(BITS_PER_LONG/8) * nr, sp;
+ add $(BITS_PER_LONG/8) * 2, sp; \
+ jnz 771b;

#ifdef __ASSEMBLY__

--
2.18.2

2020-04-07 07:28:58

by Alexandre Chartre

[permalink] [raw]
Subject: [PATCH V2 7/9] x86/speculation: Annotate intra-function calls

Some speculative execution mitigations (like retpoline) use intra-
function calls. Provide a macro to annotate such intra-function calls
so they can be properly handled by objtool, and use this macro to
annotate intra-function calls.

Signed-off-by: Alexandre Chartre <[email protected]>
---
arch/x86/include/asm/nospec-branch.h | 18 ++++++++++++++----
1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
index 9a946fd5e824..a345c6fa0541 100644
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -3,6 +3,7 @@
#ifndef _ASM_X86_NOSPEC_BRANCH_H_
#define _ASM_X86_NOSPEC_BRANCH_H_

+#include <linux/frame.h>
#include <linux/static_key.h>

#include <asm/alternative.h>
@@ -19,6 +20,15 @@
#define ANNOTATE_NOSPEC_ALTERNATIVE \
ANNOTATE_IGNORE_ALTERNATIVE

+/*
+ * Intra-function call instruction. This should be used as a substitute
+ * for the call instruction when doing an intra-function call. It is
+ * similar to the call instruction but it tells objtool that this is
+ * an intra-function call.
+ */
+#define INTRA_FUNCTION_CALL \
+ ANNOTATE_INTRA_FUNCTION_CALL call
+
/*
* Fill the CPU return stack buffer.
*
@@ -47,13 +57,13 @@
#define __FILL_RETURN_BUFFER(reg, nr, sp) \
mov $(nr/2), reg; \
771: \
- call 772f; \
+ INTRA_FUNCTION_CALL 772f; \
773: /* speculation trap */ \
pause; \
lfence; \
jmp 773b; \
772: \
- call 774f; \
+ INTRA_FUNCTION_CALL 774f; \
775: /* speculation trap */ \
pause; \
lfence; \
@@ -83,7 +93,7 @@
* invocation below less ugly.
*/
.macro RETPOLINE_JMP reg:req
- call .Ldo_rop_\@
+ INTRA_FUNCTION_CALL .Ldo_rop_\@
.Lspec_trap_\@:
pause
lfence
@@ -102,7 +112,7 @@
.Ldo_retpoline_jmp_\@:
RETPOLINE_JMP \reg
.Ldo_call_\@:
- call .Ldo_retpoline_jmp_\@
+ INTRA_FUNCTION_CALL .Ldo_retpoline_jmp_\@
.endm

/*
--
2.18.2

2020-04-07 07:29:09

by Alexandre Chartre

[permalink] [raw]
Subject: [PATCH V2 9/9] x86/speculation: Remove all ANNOTATE_NOSPEC_ALTERNATIVE directives

Now that intra-function calls have been annotated and are supported
by objtool, that retpoline return instructions have been annotated,
and that __FILL_RETURN_BUFFER code is compatible with objtool, then
all ANNOTATE_NOSPEC_ALTERNATIVE directives can be removed.

Signed-off-by: Alexandre Chartre <[email protected]>
---
arch/x86/include/asm/nospec-branch.h | 8 +-------
1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
index 5ce2a40a26da..6480db4304a0 100644
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -124,7 +124,6 @@
*/
.macro JMP_NOSPEC reg:req
#ifdef CONFIG_RETPOLINE
- ANNOTATE_NOSPEC_ALTERNATIVE
ALTERNATIVE_2 __stringify(ANNOTATE_RETPOLINE_SAFE; jmp *\reg), \
__stringify(RETPOLINE_JMP \reg), X86_FEATURE_RETPOLINE, \
__stringify(lfence; ANNOTATE_RETPOLINE_SAFE; jmp *\reg), X86_FEATURE_RETPOLINE_AMD
@@ -135,7 +134,6 @@

.macro CALL_NOSPEC reg:req
#ifdef CONFIG_RETPOLINE
- ANNOTATE_NOSPEC_ALTERNATIVE
ALTERNATIVE_2 __stringify(ANNOTATE_RETPOLINE_SAFE; call *\reg), \
__stringify(RETPOLINE_CALL \reg), X86_FEATURE_RETPOLINE,\
__stringify(lfence; ANNOTATE_RETPOLINE_SAFE; call *\reg), X86_FEATURE_RETPOLINE_AMD
@@ -150,7 +148,6 @@
*/
.macro FILL_RETURN_BUFFER reg:req nr:req ftr:req
#ifdef CONFIG_RETPOLINE
- ANNOTATE_NOSPEC_ALTERNATIVE
ALTERNATIVE "jmp .Lskip_rsb_\@", \
__stringify(__FILL_RETURN_BUFFER(\reg,\nr,%_ASM_SP)) \
\ftr
@@ -174,7 +171,6 @@
* which is ensured when CONFIG_RETPOLINE is defined.
*/
# define CALL_NOSPEC \
- ANNOTATE_NOSPEC_ALTERNATIVE \
ALTERNATIVE_2( \
ANNOTATE_RETPOLINE_SAFE \
"call *%[thunk_target]\n", \
@@ -193,7 +189,6 @@
* here, anyway.
*/
# define CALL_NOSPEC \
- ANNOTATE_NOSPEC_ALTERNATIVE \
ALTERNATIVE_2( \
ANNOTATE_RETPOLINE_SAFE \
"call *%[thunk_target]\n", \
@@ -261,8 +256,7 @@ static inline void vmexit_fill_RSB(void)
#ifdef CONFIG_RETPOLINE
unsigned long loops;

- asm volatile (ANNOTATE_NOSPEC_ALTERNATIVE
- ALTERNATIVE("jmp 910f",
+ asm volatile (ALTERNATIVE("jmp 910f",
__stringify(__FILL_RETURN_BUFFER(%0, RSB_CLEAR_LOOPS, %1)),
X86_FEATURE_RETPOLINE)
"910:"
--
2.18.2

2020-04-07 07:29:35

by Alexandre Chartre

[permalink] [raw]
Subject: [PATCH V2 2/9] objtool: UNWIND_HINT_RET_OFFSET should not check registers

UNWIND_HINT_RET_OFFSET will adjust a modified stack. However if a
callee-saved register was pushed on the stack then the stack frame
will still appear modified. So stop checking registers when
UNWIND_HINT_RET_OFFSET is used.

Signed-off-by: Alexandre Chartre <[email protected]>
---
tools/objtool/check.c | 8 ++++++++
1 file changed, 8 insertions(+)

diff --git a/tools/objtool/check.c b/tools/objtool/check.c
index bbee26de92ec..c7fcaddfaa8a 100644
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -1390,6 +1390,14 @@ static bool has_modified_stack_frame(struct instruction *insn,
if (state->stack_size != initial_func_cfi.cfa.offset + ret_offset)
return true;

+ /*
+ * If there is a ret offset hint then don't check registers
+ * because a callee-saved register might have been pushed on
+ * the stack.
+ */
+ if (ret_offset)
+ return false;
+
for (i = 0; i < CFI_NUM_REGS; i++) {
if (state->regs[i].base != initial_func_cfi.regs[i].base ||
state->regs[i].offset != initial_func_cfi.regs[i].offset)
--
2.18.2

2020-04-07 07:29:36

by Alexandre Chartre

[permalink] [raw]
Subject: [PATCH V2 8/9] x86/speculation: Add unwind hint to trampoline return

With retpoline, the address to branch to is pushed on the stack and
the return instruction (ret) is used to jump to that address. Use the
UNWIND_HINT_RET_OFFSET directive to inform objtool that the stack
should be adjusted.

Signed-off-by: Alexandre Chartre <[email protected]>

replace RETPOLINE_RET with UNWIND_HINT_RET_OFFSET
---
arch/x86/include/asm/nospec-branch.h | 2 ++
1 file changed, 2 insertions(+)

diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
index a345c6fa0541..5ce2a40a26da 100644
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -10,6 +10,7 @@
#include <asm/alternative-asm.h>
#include <asm/cpufeatures.h>
#include <asm/msr-index.h>
+#include <asm/unwind_hints.h>

/*
* This should be used immediately before a retpoline alternative. It tells
@@ -100,6 +101,7 @@
jmp .Lspec_trap_\@
.Ldo_rop_\@:
mov \reg, (%_ASM_SP)
+ UNWIND_HINT_RET_OFFSET
ret
.endm

--
2.18.2

2020-04-07 07:29:39

by Alexandre Chartre

[permalink] [raw]
Subject: [PATCH V2 4/9] objtool: Allow branches within the same alternative.

Currently objtool prevents any branch to an alternative. While preventing
branching from the outside to the middle of an alternative makes perfect
sense, branching within the same alternative should be allowed. To do so,
identify each alternative and check that a branch to an alternative comes
from the same alternative.

Signed-off-by: Alexandre Chartre <[email protected]>
---
tools/objtool/check.c | 19 +++++++++++++------
1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/tools/objtool/check.c b/tools/objtool/check.c
index c0da033a40c5..ab06e9bdd396 100644
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -712,7 +712,9 @@ static int handle_group_alt(struct objtool_file *file,
struct instruction *orig_insn,
struct instruction **new_insn)
{
+ static unsigned int alt_group_next_index = 1;
struct instruction *last_orig_insn, *last_new_insn, *insn, *fake_jump = NULL;
+ unsigned int alt_group = alt_group_next_index++;
unsigned long dest_off;

last_orig_insn = NULL;
@@ -721,7 +723,7 @@ static int handle_group_alt(struct objtool_file *file,
if (insn->offset >= special_alt->orig_off + special_alt->orig_len)
break;

- insn->alt_group = true;
+ insn->alt_group = alt_group;
last_orig_insn = insn;
}

@@ -1960,6 +1962,7 @@ static int validate_sibling_call(struct instruction *insn, struct insn_state *st
* tools/objtool/Documentation/stack-validation.txt.
*/
static int validate_branch(struct objtool_file *file, struct symbol *func,
+ struct instruction *from,
struct instruction *first, struct insn_state state)
{
struct alternative *alt;
@@ -1971,7 +1974,9 @@ static int validate_branch(struct objtool_file *file, struct symbol *func,
insn = first;
sec = insn->sec;

- if (insn->alt_group && list_empty(&insn->alts)) {
+ if (insn->alt_group &&
+ (!from || from->alt_group != insn->alt_group) &&
+ list_empty(&insn->alts)) {
WARN_FUNC("don't know how to handle branch to middle of alternative instruction group",
sec, insn->offset);
return 1;
@@ -2053,7 +2058,8 @@ static int validate_branch(struct objtool_file *file, struct symbol *func,
if (alt->skip_orig)
skip_orig = true;

- ret = validate_branch(file, func, alt->insn, state);
+ ret = validate_branch(file, func,
+ NULL, alt->insn, state);
if (ret) {
if (backtrace)
BT_FUNC("(alt)", insn);
@@ -2123,7 +2129,7 @@ static int validate_branch(struct objtool_file *file, struct symbol *func,
return ret;

} else if (insn->jump_dest) {
- ret = validate_branch(file, func,
+ ret = validate_branch(file, func, insn,
insn->jump_dest, state);
if (ret) {
if (backtrace)
@@ -2254,7 +2260,8 @@ static int validate_unwind_hints(struct objtool_file *file)

for_each_insn(file, insn) {
if (insn->hint && !insn->visited) {
- ret = validate_branch(file, insn->func, insn, state);
+ ret = validate_branch(file, insn->func,
+ NULL, insn, state);
if (ret && backtrace)
BT_FUNC("<=== (hint)", insn);
warnings += ret;
@@ -2395,7 +2402,7 @@ static int validate_functions(struct objtool_file *file)

state.uaccess = func->uaccess_safe;

- ret = validate_branch(file, func, insn, state);
+ ret = validate_branch(file, func, NULL, insn, state);
if (ret && backtrace)
BT_FUNC("<=== (func)", insn);
warnings += ret;
--
2.18.2

2020-04-07 07:30:39

by Alexandre Chartre

[permalink] [raw]
Subject: [PATCH V2 5/9] objtool: Add support for intra-function calls

Change objtool to support intra-function calls. On x86, an intra-function
call is represented in objtool as a push onto the stack (of the return
address), and a jump to the destination address. That way the stack
information is correctly updated and the call flow is still accurate.

Signed-off-by: Alexandre Chartre <[email protected]>
---
include/linux/frame.h | 11 +++
.../Documentation/stack-validation.txt | 8 ++
tools/objtool/arch.h | 2 +
tools/objtool/arch/x86/decode.c | 12 +++
tools/objtool/check.c | 97 ++++++++++++++++---
tools/objtool/check.h | 1 +
6 files changed, 117 insertions(+), 14 deletions(-)

diff --git a/include/linux/frame.h b/include/linux/frame.h
index 02d3ca2d9598..c7178e6c9d48 100644
--- a/include/linux/frame.h
+++ b/include/linux/frame.h
@@ -15,9 +15,20 @@
static void __used __section(.discard.func_stack_frame_non_standard) \
*__func_stack_frame_non_standard_##func = func

+/*
+ * This macro indicates that the following intra-function call is valid.
+ * Any non-annotated intra-function call will cause objtool to issue warning.
+ */
+#define ANNOTATE_INTRA_FUNCTION_CALL \
+ 999: \
+ .pushsection .discard.intra_function_call; \
+ .long 999b; \
+ .popsection;
+
#else /* !CONFIG_STACK_VALIDATION */

#define STACK_FRAME_NON_STANDARD(func)
+#define ANNOTATE_INTRA_FUNCTION_CALL

#endif /* CONFIG_STACK_VALIDATION */

diff --git a/tools/objtool/Documentation/stack-validation.txt b/tools/objtool/Documentation/stack-validation.txt
index de094670050b..09f863fd32d2 100644
--- a/tools/objtool/Documentation/stack-validation.txt
+++ b/tools/objtool/Documentation/stack-validation.txt
@@ -290,6 +290,14 @@ they mean, and suggestions for how to fix them.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70646


+9. file.o: warning: unsupported intra-function call
+
+ This warning means that a direct call is done to a destination which
+ is not at the beginning of a function. If this is a legit call, you
+ can remove this warning by putting the ANNOTATE_INTRA_FUNCTION_CALL
+ directive right before the call.
+
+
If the error doesn't seem to make sense, it could be a bug in objtool.
Feel free to ask the objtool maintainer for help.

diff --git a/tools/objtool/arch.h b/tools/objtool/arch.h
index ced3765c4f44..4d42c12d9957 100644
--- a/tools/objtool/arch.h
+++ b/tools/objtool/arch.h
@@ -75,4 +75,6 @@ int arch_decode_instruction(struct elf *elf, struct section *sec,

bool arch_callee_saved_reg(unsigned char reg);

+void arch_configure_intra_function_call(struct stack_op *op);
+
#endif /* _ARCH_H */
diff --git a/tools/objtool/arch/x86/decode.c b/tools/objtool/arch/x86/decode.c
index a62e032863a8..7ee1561bf7ad 100644
--- a/tools/objtool/arch/x86/decode.c
+++ b/tools/objtool/arch/x86/decode.c
@@ -497,3 +497,15 @@ void arch_initial_func_cfi_state(struct cfi_state *state)
state->regs[16].base = CFI_CFA;
state->regs[16].offset = -8;
}
+
+
+void arch_configure_intra_function_call(struct stack_op *op)
+{
+ /*
+ * For the impact on the stack, make an intra-function
+ * call behaves like a push of an immediate value (the
+ * return address).
+ */
+ op->src.type = OP_SRC_CONST;
+ op->dest.type = OP_DEST_PUSH;
+}
diff --git a/tools/objtool/check.c b/tools/objtool/check.c
index ab06e9bdd396..bf1e4e94d686 100644
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -644,34 +644,50 @@ static int add_jump_destinations(struct objtool_file *file)
return 0;
}

+static int configure_call(struct objtool_file *file, struct instruction *insn)
+{
+ unsigned long dest_off;
+
+ dest_off = insn->offset + insn->len + insn->immediate;
+ insn->call_dest = find_symbol_by_offset(insn->sec, dest_off);
+ if (insn->call_dest) {
+ /* regular call */
+ return 0;
+ }
+
+ /* intra-function call */
+ if (!insn->intra_function_call)
+ WARN_FUNC("intra-function call", insn->sec, insn->offset);
+
+ dest_off = insn->offset + insn->len + insn->immediate;
+ insn->jump_dest = find_insn(file, insn->sec, dest_off);
+ if (!insn->jump_dest) {
+ WARN_FUNC("can't find call dest at %s+0x%lx",
+ insn->sec, insn->offset,
+ insn->sec->name, dest_off);
+ return -1;
+ }
+
+ return 0;
+}
+
/*
* Find the destination instructions for all calls.
*/
static int add_call_destinations(struct objtool_file *file)
{
struct instruction *insn;
- unsigned long dest_off;
struct rela *rela;

for_each_insn(file, insn) {
- if (insn->type != INSN_CALL)
+ if (insn->type != INSN_CALL || insn->ignore)
continue;

rela = find_rela_by_dest_range(insn->sec, insn->offset,
insn->len);
if (!rela) {
- dest_off = insn->offset + insn->len + insn->immediate;
- insn->call_dest = find_symbol_by_offset(insn->sec,
- dest_off);
-
- if (!insn->call_dest && !insn->ignore) {
- WARN_FUNC("unsupported intra-function call",
- insn->sec, insn->offset);
- if (retpoline)
- WARN("If this is a retpoline, please patch it in with alternatives and annotate it with ANNOTATE_NOSPEC_ALTERNATIVE.");
+ if (configure_call(file, insn))
return -1;
- }
-
} else if (rela->sym->type == STT_SECTION) {
insn->call_dest = find_symbol_by_offset(rela->sym->sec,
rela->addend+4);
@@ -1293,6 +1309,43 @@ static int read_retpoline_hints(struct objtool_file *file)
return 0;
}

+static int read_intra_function_call(struct objtool_file *file)
+{
+ struct section *sec;
+ struct instruction *insn;
+ struct rela *rela;
+
+ sec = find_section_by_name(file->elf,
+ ".rela.discard.intra_function_call");
+ if (!sec)
+ return 0;
+
+ list_for_each_entry(rela, &sec->rela_list, list) {
+ if (rela->sym->type != STT_SECTION) {
+ WARN("unexpected relocation symbol type in %s",
+ sec->name);
+ return -1;
+ }
+
+ insn = find_insn(file, rela->sym->sec, rela->addend);
+ if (!insn) {
+ WARN("bad .discard.intra_function_call entry");
+ return -1;
+ }
+
+ if (insn->type != INSN_CALL) {
+ WARN_FUNC("intra_function_call not a direct call",
+ insn->sec, insn->offset);
+ return -1;
+ }
+
+ arch_configure_intra_function_call(&insn->stack_op);
+ insn->intra_function_call = true;
+ }
+
+ return 0;
+}
+
static void mark_rodata(struct objtool_file *file)
{
struct section *sec;
@@ -1348,6 +1401,10 @@ static int decode_sections(struct objtool_file *file)
if (ret)
return ret;

+ ret = read_intra_function_call(file);
+ if (ret)
+ return ret;
+
ret = add_call_destinations(file);
if (ret)
return ret;
@@ -2110,7 +2167,8 @@ static int validate_branch(struct objtool_file *file, struct symbol *func,
return ret;

if (!no_fp && func && !is_fentry_call(insn) &&
- !has_valid_stack_frame(&state)) {
+ !has_valid_stack_frame(&state) &&
+ !insn->intra_function_call) {
WARN_FUNC("call without frame pointer save/setup",
sec, insn->offset);
return 1;
@@ -2119,6 +2177,17 @@ static int validate_branch(struct objtool_file *file, struct symbol *func,
if (dead_end_function(file, insn->call_dest))
return 0;

+ if (insn->intra_function_call) {
+ update_insn_state(insn, &state);
+ ret = validate_branch(file, func, insn,
+ insn->jump_dest, state);
+ if (ret) {
+ if (backtrace)
+ BT_FUNC("(intra-call)", insn);
+ return ret;
+ }
+ }
+
break;

case INSN_JUMP_CONDITIONAL:
diff --git a/tools/objtool/check.h b/tools/objtool/check.h
index 7a91497fee7e..7c7ec3d8fb00 100644
--- a/tools/objtool/check.h
+++ b/tools/objtool/check.h
@@ -36,6 +36,7 @@ struct instruction {
unsigned int alt_group;
bool dead_end, ignore, ignore_alts;
bool hint, save, restore;
+ bool intra_function_call;
bool retpoline_safe;
u8 visited;
u8 ret_offset;
--
2.18.2

2020-04-07 13:09:14

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH V2 5/9] objtool: Add support for intra-function calls

On Tue, Apr 07, 2020 at 09:31:38AM +0200, Alexandre Chartre wrote:

> index a62e032863a8..7ee1561bf7ad 100644
> --- a/tools/objtool/arch/x86/decode.c
> +++ b/tools/objtool/arch/x86/decode.c
> @@ -497,3 +497,15 @@ void arch_initial_func_cfi_state(struct cfi_state *state)
> state->regs[16].base = CFI_CFA;
> state->regs[16].offset = -8;
> }
> +
> +
> +void arch_configure_intra_function_call(struct stack_op *op)
> +{
> + /*
> + * For the impact on the stack, make an intra-function
> + * call behaves like a push of an immediate value (the
> + * return address).
> + */
> + op->src.type = OP_SRC_CONST;
> + op->dest.type = OP_DEST_PUSH;
> +}

An alternative is to always set up stack ops for CALL/RET on decode, but
conditionally run update_insn_state() for them.

Not sure that makes more logical sense, but the patch would be simpler I
think.

2020-04-07 13:25:10

by Alexandre Chartre

[permalink] [raw]
Subject: Re: [PATCH V2 5/9] objtool: Add support for intra-function calls


On 4/7/20 3:07 PM, Peter Zijlstra wrote:
> On Tue, Apr 07, 2020 at 09:31:38AM +0200, Alexandre Chartre wrote:
>
>> index a62e032863a8..7ee1561bf7ad 100644
>> --- a/tools/objtool/arch/x86/decode.c
>> +++ b/tools/objtool/arch/x86/decode.c
>> @@ -497,3 +497,15 @@ void arch_initial_func_cfi_state(struct cfi_state *state)
>> state->regs[16].base = CFI_CFA;
>> state->regs[16].offset = -8;
>> }
>> +
>> +
>> +void arch_configure_intra_function_call(struct stack_op *op)
>> +{
>> + /*
>> + * For the impact on the stack, make an intra-function
>> + * call behaves like a push of an immediate value (the
>> + * return address).
>> + */
>> + op->src.type = OP_SRC_CONST;
>> + op->dest.type = OP_DEST_PUSH;
>> +}
>
> An alternative is to always set up stack ops for CALL/RET on decode, but
> conditionally run update_insn_state() for them.
>
> Not sure that makes more logical sense, but the patch would be simpler I
> think.

Right, this would avoid adding a new arch dependent function and the patch
will be simpler. This probably makes sense as the stack impact is the same
for all calls (but objtool will use it only for intra-function calls).

Thanks,

alex.

2020-04-07 13:28:43

by Josh Poimboeuf

[permalink] [raw]
Subject: Re: [PATCH V2 6/9] x86/speculation: Change __FILL_RETURN_BUFFER to work with objtool

On Tue, Apr 07, 2020 at 09:31:39AM +0200, Alexandre Chartre wrote:
> Change __FILL_RETURN_BUFFER so that the stack state is deterministically
> defined for each iteration and that objtool can have an accurate view
> of the stack.
>
> Signed-off-by: Alexandre Chartre <[email protected]>
> ---
> arch/x86/include/asm/nospec-branch.h | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
> index 5c24a7b35166..9a946fd5e824 100644
> --- a/arch/x86/include/asm/nospec-branch.h
> +++ b/arch/x86/include/asm/nospec-branch.h
> @@ -60,8 +60,8 @@
> jmp 775b; \
> 774: \
> dec reg; \
> - jnz 771b; \
> - add $(BITS_PER_LONG/8) * nr, sp;
> + add $(BITS_PER_LONG/8) * 2, sp; \
> + jnz 771b;
>
> #ifdef __ASSEMBLY__

This still isn't a complete fix because the macro is used in an
alternative. So in the !X86_FEATURE_RSB_CTXSW case, this code isn't
patched in and the ORC data which refers to it is wrong.

As I said before I think the easiest fix would be to convert RSB and
retpolines to use static branches instead of alternatives.

--
Josh

2020-04-07 13:31:22

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH V2 9/9] x86/speculation: Remove all ANNOTATE_NOSPEC_ALTERNATIVE directives

On Tue, Apr 07, 2020 at 09:31:42AM +0200, Alexandre Chartre wrote:
> Now that intra-function calls have been annotated and are supported
> by objtool, that retpoline return instructions have been annotated,
> and that __FILL_RETURN_BUFFER code is compatible with objtool, then
> all ANNOTATE_NOSPEC_ALTERNATIVE directives can be removed.

Like Josh said in the previous thread, this isn't going to work right.

> - ANNOTATE_NOSPEC_ALTERNATIVE
> ALTERNATIVE_2 __stringify(ANNOTATE_RETPOLINE_SAFE; jmp *\reg), \
> __stringify(RETPOLINE_JMP \reg), X86_FEATURE_RETPOLINE, \
> __stringify(lfence; ANNOTATE_RETPOLINE_SAFE; jmp *\reg), X86_FEATURE_RETPOLINE_AMD

The problem is that while objtool can now understand the code flow and
the corresponding stack layout, we only have a single ORC table, one
that must be valid for all alternatives.

Effectively this means there should not be any orc entries in an
alternative range.

In practise it _might_ work when the instruction of the various
alternatives have unique offsets in the range. But I'm not entirely sure
of that.

Josh, we should probably have objtool verify it doesn't emit ORC entries
in alternative ranges.

2020-04-07 13:36:10

by Josh Poimboeuf

[permalink] [raw]
Subject: Re: [PATCH V2 9/9] x86/speculation: Remove all ANNOTATE_NOSPEC_ALTERNATIVE directives

On Tue, Apr 07, 2020 at 03:28:37PM +0200, Peter Zijlstra wrote:
> Josh, we should probably have objtool verify it doesn't emit ORC entries
> in alternative ranges.

Agreed, it might be as simple as checking for insn->alt_group in the
INSN_STACK check or in update_insn_state().

--
Josh

2020-04-07 13:36:48

by Josh Poimboeuf

[permalink] [raw]
Subject: Re: [PATCH V2 0/9] objtool changes to remove all ANNOTATE_NOSPEC_ALTERNATIVE

On Tue, Apr 07, 2020 at 09:31:33AM +0200, Alexandre Chartre wrote:
> Hi,
>
> This is version v2 of this patchset based on the different comments
> received so far. It now uses and includes PeterZ patch to add
> UNWIND_HINT_RET_OFFSET. Other changes are described below.
>
> Code like retpoline or RSB stuffing, which is used to mitigate some of
> the speculative execution issues, is currently ignored by objtool with
> the ANNOTATE_NOSPEC_ALTERNATIVE directive. This series adds support
> for intra-function calls to objtool so that it can handle such a code.
> With these changes, we can remove all ANNOTATE_NOSPEC_ALTERNATIVE
> directives.
>
> Changes:
> - replace RETPOLINE_RET with PeterZ UNWIND_HINT_RET_OFFSET
> - make objtool intra-function call action architecture dependent
> - objtool now automatically detects and validates all intra-function
> calls but it issues a warning if the call was not explicitly tagged
> - change __FILL_RETURN_BUFFER to work with objtool
> - add generic ANNOTATE_INTRA_FUNCTION_CALL macro
> - remove all ANNOTATE_SPEC_ALTERNATIVE (even for __FILL_RETURN_BUFFER)

I had trouble applying the patches. What branch are they based on? In
general the latest tip/master is good.

--
Josh

2020-04-07 13:53:28

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH V2 9/9] x86/speculation: Remove all ANNOTATE_NOSPEC_ALTERNATIVE directives

On Tue, Apr 07, 2020 at 09:31:42AM +0200, Alexandre Chartre wrote:

> - ANNOTATE_NOSPEC_ALTERNATIVE
> ALTERNATIVE_2 __stringify(ANNOTATE_RETPOLINE_SAFE; jmp *\reg), \
> __stringify(RETPOLINE_JMP \reg), X86_FEATURE_RETPOLINE, \
> __stringify(lfence; ANNOTATE_RETPOLINE_SAFE; jmp *\reg), X86_FEATURE_RETPOLINE_AMD

Possibly we can write this like:

ALTERNATIVE("", "lfence", X86_FEATURE_RETPOLINE_AMD);
ALTERNATIVE("jmp *\reg", "jmp __x86_retpoline_jmp_\reg", X86_FEATURE_RETPOLINE);

With an out-of-line copy of the retpoline, just like the THUNKs the
compiler uses, except of course, it can't be those, because we actually
want to use the alternative to implement those.

By moving the retpoline magic out-of-line we ensure it has a unique
address and the ORC stuff should work.

I'm just not sure what to do about the RETPOLINE_CALL variant.

2020-04-07 13:59:22

by Alexandre Chartre

[permalink] [raw]
Subject: Re: [PATCH V2 0/9] objtool changes to remove all ANNOTATE_NOSPEC_ALTERNATIVE


On 4/7/20 3:35 PM, Josh Poimboeuf wrote:
> On Tue, Apr 07, 2020 at 09:31:33AM +0200, Alexandre Chartre wrote:
>> Hi,
>>
>> This is version v2 of this patchset based on the different comments
>> received so far. It now uses and includes PeterZ patch to add
>> UNWIND_HINT_RET_OFFSET. Other changes are described below.
>>
>> Code like retpoline or RSB stuffing, which is used to mitigate some of
>> the speculative execution issues, is currently ignored by objtool with
>> the ANNOTATE_NOSPEC_ALTERNATIVE directive. This series adds support
>> for intra-function calls to objtool so that it can handle such a code.
>> With these changes, we can remove all ANNOTATE_NOSPEC_ALTERNATIVE
>> directives.
>>
>> Changes:
>> - replace RETPOLINE_RET with PeterZ UNWIND_HINT_RET_OFFSET
>> - make objtool intra-function call action architecture dependent
>> - objtool now automatically detects and validates all intra-function
>> calls but it issues a warning if the call was not explicitly tagged
>> - change __FILL_RETURN_BUFFER to work with objtool
>> - add generic ANNOTATE_INTRA_FUNCTION_CALL macro
>> - remove all ANNOTATE_SPEC_ALTERNATIVE (even for __FILL_RETURN_BUFFER)
>
> I had trouble applying the patches. What branch are they based on? In
> general the latest tip/master is good.

Oups, this is on based on 5.5. I didn't realize I was that late, I though
I recently rebased but it looks like I didn't. Sorry about that, I will
make sure the next version is more recent.

alex.

2020-04-07 14:01:11

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH V2 9/9] x86/speculation: Remove all ANNOTATE_NOSPEC_ALTERNATIVE directives

On Tue, Apr 07, 2020 at 03:52:11PM +0200, Peter Zijlstra wrote:
> On Tue, Apr 07, 2020 at 09:31:42AM +0200, Alexandre Chartre wrote:
>
> > - ANNOTATE_NOSPEC_ALTERNATIVE
> > ALTERNATIVE_2 __stringify(ANNOTATE_RETPOLINE_SAFE; jmp *\reg), \
> > __stringify(RETPOLINE_JMP \reg), X86_FEATURE_RETPOLINE, \
> > __stringify(lfence; ANNOTATE_RETPOLINE_SAFE; jmp *\reg), X86_FEATURE_RETPOLINE_AMD
>
> Possibly we can write this like:

.macro OOL_RETPOLINE_JMP reg:req
SYM_FUNC_START(__x86_retpoline_jmp_\reg)
CFI_STARTPROC
RETPOLINE_JMP \reg
CFI_ENDPROC
SYM_FUNC_END(__x86_retpoline_jmp_\reg)
.endm

> ALTERNATIVE("", "lfence", X86_FEATURE_RETPOLINE_AMD);
> ALTERNATIVE("jmp *\reg", "jmp __x86_retpoline_jmp_\reg", X86_FEATURE_RETPOLINE);
>
> With an out-of-line copy of the retpoline, just like the THUNKs the
> compiler uses, except of course, it can't be those, because we actually
> want to use the alternative to implement those.
>
> By moving the retpoline magic out-of-line we ensure it has a unique
> address and the ORC stuff should work.
>
> I'm just not sure what to do about the RETPOLINE_CALL variant.

Duh, something like so:

ALTERNATIVE("", "lfence", X86_FEATURE_RETPOLINE_AMD);
ALTERNATIVE("call *\reg", "call __x86_retpoline_jmp_\reg", X86_FEATURE_RETPOLINE);


2020-04-07 14:29:20

by Alexandre Chartre

[permalink] [raw]
Subject: Re: [PATCH V2 9/9] x86/speculation: Remove all ANNOTATE_NOSPEC_ALTERNATIVE directives


On 4/7/20 3:34 PM, Josh Poimboeuf wrote:
> On Tue, Apr 07, 2020 at 03:28:37PM +0200, Peter Zijlstra wrote:
>> Josh, we should probably have objtool verify it doesn't emit ORC entries
>> in alternative ranges.
>
> Agreed, it might be as simple as checking for insn->alt_group in the
> INSN_STACK check or in update_insn_state().
>

We could do that only for the "objtool orc generate" command. That way
"objtool check" would still check the alternative, but "objtool orc generate"
will just use the first half of the alternative (like it does today with
ANNOTATE_NOSPEC_ALTERNATIVE). We can even keep all ANNOTATE_NOSPEC_ALTERNATIVE
but only use them for "objtool orc generate".

alex.

2020-04-07 16:15:44

by Alexandre Chartre

[permalink] [raw]
Subject: Re: [PATCH V2 9/9] x86/speculation: Remove all ANNOTATE_NOSPEC_ALTERNATIVE directives


On 4/7/20 4:32 PM, Alexandre Chartre wrote:
>
> On 4/7/20 3:34 PM, Josh Poimboeuf wrote:
>> On Tue, Apr 07, 2020 at 03:28:37PM +0200, Peter Zijlstra wrote:
>>> Josh, we should probably have objtool verify it doesn't emit ORC entries
>>> in alternative ranges.
>>
>> Agreed, it might be as simple as checking for insn->alt_group in the
>> INSN_STACK check or in update_insn_state().
>>
>
> We could do that only for the "objtool orc generate" command. That way
> "objtool check" would still check the alternative, but "objtool orc generate"
> will just use the first half of the alternative (like it does today with
> ANNOTATE_NOSPEC_ALTERNATIVE). We can even keep all ANNOTATE_NOSPEC_ALTERNATIVE
> but only use them for "objtool orc generate".
>

I have checked and objtool doesn't emit ORC entries for alternative:
decode_instructions() doesn't mark such section with sec->text = true
so create_orc_sections() doesn't emit corresponding ORC entries.

So I think we can remove the ANNOTATE_NOSPEC_ALTERNATIVE directives,
this will allow objtool to check the instructions but it still won't
emit ORC entries (same behavior as today). In the future, if ORC
eventually supports alternative we will be ready to have objtool emit
ORC entries.

alex.

2020-04-07 16:29:35

by Josh Poimboeuf

[permalink] [raw]
Subject: Re: [PATCH V2 9/9] x86/speculation: Remove all ANNOTATE_NOSPEC_ALTERNATIVE directives

On Tue, Apr 07, 2020 at 06:18:51PM +0200, Alexandre Chartre wrote:
>
> On 4/7/20 4:32 PM, Alexandre Chartre wrote:
> >
> > On 4/7/20 3:34 PM, Josh Poimboeuf wrote:
> > > On Tue, Apr 07, 2020 at 03:28:37PM +0200, Peter Zijlstra wrote:
> > > > Josh, we should probably have objtool verify it doesn't emit ORC entries
> > > > in alternative ranges.
> > >
> > > Agreed, it might be as simple as checking for insn->alt_group in the
> > > INSN_STACK check or in update_insn_state().
> > >
> >
> > We could do that only for the "objtool orc generate" command. That way
> > "objtool check" would still check the alternative, but "objtool orc generate"
> > will just use the first half of the alternative (like it does today with
> > ANNOTATE_NOSPEC_ALTERNATIVE). We can even keep all ANNOTATE_NOSPEC_ALTERNATIVE
> > but only use them for "objtool orc generate".
> >
>
> I have checked and objtool doesn't emit ORC entries for alternative:
> decode_instructions() doesn't mark such section with sec->text = true
> so create_orc_sections() doesn't emit corresponding ORC entries.
>
> So I think we can remove the ANNOTATE_NOSPEC_ALTERNATIVE directives,
> this will allow objtool to check the instructions but it still won't
> emit ORC entries (same behavior as today). In the future, if ORC
> eventually supports alternative we will be ready to have objtool emit
> ORC entries.

What's the benefit of removing ANNOTATE_NOSPEC_ALTERNATIVE if there's no
ORC support to go along with it?

Also I want to avoid adding "ORC alternatives". ORC is nice and simple
and we should keep it that way as much as possible.

Again, we should warn on stack changes inside alternatives, and then
look at converting RSB and retpolines to use static branches so they
have deterministic stacks.

--
Josh

2020-04-07 16:42:30

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH V2 9/9] x86/speculation: Remove all ANNOTATE_NOSPEC_ALTERNATIVE directives

On Tue, Apr 07, 2020 at 06:18:51PM +0200, Alexandre Chartre wrote:
>
> On 4/7/20 4:32 PM, Alexandre Chartre wrote:
> >
> > On 4/7/20 3:34 PM, Josh Poimboeuf wrote:
> > > On Tue, Apr 07, 2020 at 03:28:37PM +0200, Peter Zijlstra wrote:
> > > > Josh, we should probably have objtool verify it doesn't emit ORC entries
> > > > in alternative ranges.
> > >
> > > Agreed, it might be as simple as checking for insn->alt_group in the
> > > INSN_STACK check or in update_insn_state().
> > >
> >
> > We could do that only for the "objtool orc generate" command. That way
> > "objtool check" would still check the alternative, but "objtool orc generate"
> > will just use the first half of the alternative (like it does today with
> > ANNOTATE_NOSPEC_ALTERNATIVE). We can even keep all ANNOTATE_NOSPEC_ALTERNATIVE
> > but only use them for "objtool orc generate".
> >
>
> I have checked and objtool doesn't emit ORC entries for alternative:
> decode_instructions() doesn't mark such section with sec->text = true
> so create_orc_sections() doesn't emit corresponding ORC entries.
>
> So I think we can remove the ANNOTATE_NOSPEC_ALTERNATIVE directives,
> this will allow objtool to check the instructions but it still won't
> emit ORC entries (same behavior as today). In the future, if ORC
> eventually supports alternative we will be ready to have objtool emit
> ORC entries.

I mean, we should make it warn for the case where you remove
ANNOTATE_NOSPEC and it would like to generate ORC.

Also, what's the point of having objtool grok this code and then not
doing anything with it?

2020-04-07 16:58:43

by Alexandre Chartre

[permalink] [raw]
Subject: Re: [PATCH V2 9/9] x86/speculation: Remove all ANNOTATE_NOSPEC_ALTERNATIVE directives



On 4/7/20 6:28 PM, Josh Poimboeuf wrote:
> On Tue, Apr 07, 2020 at 06:18:51PM +0200, Alexandre Chartre wrote:
>>
>> On 4/7/20 4:32 PM, Alexandre Chartre wrote:
>>>
>>> On 4/7/20 3:34 PM, Josh Poimboeuf wrote:
>>>> On Tue, Apr 07, 2020 at 03:28:37PM +0200, Peter Zijlstra wrote:
>>>>> Josh, we should probably have objtool verify it doesn't emit ORC entries
>>>>> in alternative ranges.
>>>>
>>>> Agreed, it might be as simple as checking for insn->alt_group in the
>>>> INSN_STACK check or in update_insn_state().
>>>>
>>>
>>> We could do that only for the "objtool orc generate" command. That way
>>> "objtool check" would still check the alternative, but "objtool orc generate"
>>> will just use the first half of the alternative (like it does today with
>>> ANNOTATE_NOSPEC_ALTERNATIVE). We can even keep all ANNOTATE_NOSPEC_ALTERNATIVE
>>> but only use them for "objtool orc generate".
>>>
>>
>> I have checked and objtool doesn't emit ORC entries for alternative:
>> decode_instructions() doesn't mark such section with sec->text = true
>> so create_orc_sections() doesn't emit corresponding ORC entries.
>>
>> So I think we can remove the ANNOTATE_NOSPEC_ALTERNATIVE directives,
>> this will allow objtool to check the instructions but it still won't
>> emit ORC entries (same behavior as today). In the future, if ORC
>> eventually supports alternative we will be ready to have objtool emit
>> ORC entries.
>
> What's the benefit of removing ANNOTATE_NOSPEC_ALTERNATIVE if there's no
> ORC support to go along with it?

To have the code validated by objtool like any other alternative code
(which is not tagged with ANNOTATE_NOSPEC_ALTERNATIVE).

> Also I want to avoid adding "ORC alternatives". ORC is nice and simple
> and we should keep it that way as much as possible.
>
> Again, we should warn on stack changes inside alternatives, and then
> look at converting RSB and retpolines to use static branches so they
> have deterministic stacks.
>
objtool doesn't currently warn on stack changes inside alternatives.
The RSB/retpoline alternatives have warning because objtool doesn't
support retpoline ret and intra-function calls. If you have an alternative
doing stack changes that objtool understand (like push/pop, add/remove
to sp) then you won't have a warning.

I think that's the case with smap_save:

static __always_inline unsigned long smap_save(void)
{
unsigned long flags;

asm volatile (ALTERNATIVE("", "pushf; pop %0; " __ASM_CLAC,
X86_FEATURE_SMAP)
: "=rm" (flags) : : "memory", "cc");

return flags;
}

The alternative does change the stack but objtool won't complain
because it handles the pushf and pop instruction.

alex.

2020-04-07 17:02:50

by Alexandre Chartre

[permalink] [raw]
Subject: Re: [PATCH V2 9/9] x86/speculation: Remove all ANNOTATE_NOSPEC_ALTERNATIVE directives


On 4/7/20 6:41 PM, Peter Zijlstra wrote:
> On Tue, Apr 07, 2020 at 06:18:51PM +0200, Alexandre Chartre wrote:
>>
>> On 4/7/20 4:32 PM, Alexandre Chartre wrote:
>>>
>>> On 4/7/20 3:34 PM, Josh Poimboeuf wrote:
>>>> On Tue, Apr 07, 2020 at 03:28:37PM +0200, Peter Zijlstra wrote:
>>>>> Josh, we should probably have objtool verify it doesn't emit ORC entries
>>>>> in alternative ranges.
>>>>
>>>> Agreed, it might be as simple as checking for insn->alt_group in the
>>>> INSN_STACK check or in update_insn_state().
>>>>
>>>
>>> We could do that only for the "objtool orc generate" command. That way
>>> "objtool check" would still check the alternative, but "objtool orc generate"
>>> will just use the first half of the alternative (like it does today with
>>> ANNOTATE_NOSPEC_ALTERNATIVE). We can even keep all ANNOTATE_NOSPEC_ALTERNATIVE
>>> but only use them for "objtool orc generate".
>>>
>>
>> I have checked and objtool doesn't emit ORC entries for alternative:
>> decode_instructions() doesn't mark such section with sec->text = true
>> so create_orc_sections() doesn't emit corresponding ORC entries.
>>
>> So I think we can remove the ANNOTATE_NOSPEC_ALTERNATIVE directives,
>> this will allow objtool to check the instructions but it still won't
>> emit ORC entries (same behavior as today). In the future, if ORC
>> eventually supports alternative we will be ready to have objtool emit
>> ORC entries.
>
> I mean, we should make it warn for the case where you remove
> ANNOTATE_NOSPEC and it would like to generate ORC.

Okay, so check if an alternative is changing the stack and warn if it is.

alex.

> Also, what's the point of having objtool grok this code and then not
> doing anything with it?
>

2020-04-07 17:28:11

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH V2 9/9] x86/speculation: Remove all ANNOTATE_NOSPEC_ALTERNATIVE directives

On Tue, Apr 07, 2020 at 07:01:43PM +0200, Alexandre Chartre wrote:

> I think that's the case with smap_save:
>
> static __always_inline unsigned long smap_save(void)
> {
> unsigned long flags;
>
> asm volatile (ALTERNATIVE("", "pushf; pop %0; " __ASM_CLAC,
> X86_FEATURE_SMAP)
> : "=rm" (flags) : : "memory", "cc");
>
> return flags;
> }
>
> The alternative does change the stack but objtool won't complain
> because it handles the pushf and pop instruction.

Oh bugger; indeed! That'll wreck unwinding.

2020-04-07 17:28:36

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH V2 9/9] x86/speculation: Remove all ANNOTATE_NOSPEC_ALTERNATIVE directives

On Tue, Apr 07, 2020 at 11:28:38AM -0500, Josh Poimboeuf wrote:
> Again, we should warn on stack changes inside alternatives, and then
> look at converting RSB and retpolines to use static branches so they
> have deterministic stacks.

I don't think we need static brancher, we should just out-of-line the
whole thing.

Let me sort this CFI error Thomas is getting and then I'll attempt a
patch along the lines I outlined in earlier emails.

2020-04-08 15:13:25

by Alexandre Chartre

[permalink] [raw]
Subject: Re: [PATCH V2 5/9] objtool: Add support for intra-function calls



On 4/7/20 3:28 PM, Alexandre Chartre wrote:
>
> On 4/7/20 3:07 PM, Peter Zijlstra wrote:
>> On Tue, Apr 07, 2020 at 09:31:38AM +0200, Alexandre Chartre wrote:
>>
>>> index a62e032863a8..7ee1561bf7ad 100644
>>> --- a/tools/objtool/arch/x86/decode.c
>>> +++ b/tools/objtool/arch/x86/decode.c
>>> @@ -497,3 +497,15 @@ void arch_initial_func_cfi_state(struct cfi_state *state)
>>>       state->regs[16].base = CFI_CFA;
>>>       state->regs[16].offset = -8;
>>>   }
>>> +
>>> +
>>> +void arch_configure_intra_function_call(struct stack_op *op)
>>> +{
>>> +    /*
>>> +     * For the impact on the stack, make an intra-function
>>> +     * call behaves like a push of an immediate value (the
>>> +     * return address).
>>> +     */
>>> +    op->src.type = OP_SRC_CONST;
>>> +    op->dest.type = OP_DEST_PUSH;
>>> +}
>>
>> An alternative is to always set up stack ops for CALL/RET on decode, but
>> conditionally run update_insn_state() for them.
>>
>> Not sure that makes more logical sense, but the patch would be simpler I
>> think.
>
> Right, this would avoid adding a new arch dependent function and the patch
> will be simpler. This probably makes sense as the stack impact is the same
> for all calls (but objtool will use it only for intra-function calls).
>

Actually the processing of the ret instruction is more complicated than I
anticipated with intra-function calls, and so my implementation is not
complete at the moment.

The issue is to correctly handle how the ret is going to behave depending how
the stack (or register on arm) is modified before the ret. Adjusting the stack
offset makes the stack state correct, but objtool still needs to correctly
figure out where the ret is going to return and where the code flow continues.

alex.

2020-04-08 15:29:44

by Julien Thierry

[permalink] [raw]
Subject: Re: [PATCH V2 5/9] objtool: Add support for intra-function calls



On 4/8/20 3:06 PM, Alexandre Chartre wrote:
>
>
> On 4/7/20 3:28 PM, Alexandre Chartre wrote:
>>
>> On 4/7/20 3:07 PM, Peter Zijlstra wrote:
>>> On Tue, Apr 07, 2020 at 09:31:38AM +0200, Alexandre Chartre wrote:
>>>
>>>> index a62e032863a8..7ee1561bf7ad 100644
>>>> --- a/tools/objtool/arch/x86/decode.c
>>>> +++ b/tools/objtool/arch/x86/decode.c
>>>> @@ -497,3 +497,15 @@ void arch_initial_func_cfi_state(struct
>>>> cfi_state *state)
>>>>       state->regs[16].base = CFI_CFA;
>>>>       state->regs[16].offset = -8;
>>>>   }
>>>> +
>>>> +
>>>> +void arch_configure_intra_function_call(struct stack_op *op)
>>>> +{
>>>> +    /*
>>>> +     * For the impact on the stack, make an intra-function
>>>> +     * call behaves like a push of an immediate value (the
>>>> +     * return address).
>>>> +     */
>>>> +    op->src.type = OP_SRC_CONST;
>>>> +    op->dest.type = OP_DEST_PUSH;
>>>> +}
>>>
>>> An alternative is to always set up stack ops for CALL/RET on decode, but
>>> conditionally run update_insn_state() for them.
>>>
>>> Not sure that makes more logical sense, but the patch would be simpler I
>>> think.
>>
>> Right, this would avoid adding a new arch dependent function and the
>> patch
>> will be simpler. This probably makes sense as the stack impact is the
>> same
>> for all calls (but objtool will use it only for intra-function calls).
>>
>
> Actually the processing of the ret instruction is more complicated than I
> anticipated with intra-function calls, and so my implementation is not
> complete at the moment.
>
> The issue is to correctly handle how the ret is going to behave
> depending how
> the stack (or register on arm) is modified before the ret. Adjusting the
> stack
> offset makes the stack state correct, but objtool still needs to correctly
> figure out where the ret is going to return and where the code flow
> continues.
>

A hint indicating the target "jump" address could be useful. It could be
used to add the information on some call/jump dynamic that aren't
associated with jump tables. Currently when objtool finds a jump
dynamic, if no branches were added to it, it will just return.

Having such a hint could help make additional links (at least on arm64).
I don't know what Peter and Josh would think of that (if that helps in
your case of course).

--
Julien Thierry

2020-04-08 18:56:43

by Alexandre Chartre

[permalink] [raw]
Subject: Re: [PATCH V2 5/9] objtool: Add support for intra-function calls



On 4/8/20 4:19 PM, Julien Thierry wrote:
>
>
> On 4/8/20 3:06 PM, Alexandre Chartre wrote:
>>
>>
>> On 4/7/20 3:28 PM, Alexandre Chartre wrote:
>>>
>>> On 4/7/20 3:07 PM, Peter Zijlstra wrote:
>>>> On Tue, Apr 07, 2020 at 09:31:38AM +0200, Alexandre Chartre wrote:
>>>>
>>>>> index a62e032863a8..7ee1561bf7ad 100644
>>>>> --- a/tools/objtool/arch/x86/decode.c
>>>>> +++ b/tools/objtool/arch/x86/decode.c
>>>>> @@ -497,3 +497,15 @@ void arch_initial_func_cfi_state(struct cfi_state *state)
>>>>>       state->regs[16].base = CFI_CFA;
>>>>>       state->regs[16].offset = -8;
>>>>>   }
>>>>> +
>>>>> +
>>>>> +void arch_configure_intra_function_call(struct stack_op *op)
>>>>> +{
>>>>> +    /*
>>>>> +     * For the impact on the stack, make an intra-function
>>>>> +     * call behaves like a push of an immediate value (the
>>>>> +     * return address).
>>>>> +     */
>>>>> +    op->src.type = OP_SRC_CONST;
>>>>> +    op->dest.type = OP_DEST_PUSH;
>>>>> +}
>>>>
>>>> An alternative is to always set up stack ops for CALL/RET on decode, but
>>>> conditionally run update_insn_state() for them.
>>>>
>>>> Not sure that makes more logical sense, but the patch would be simpler I
>>>> think.
>>>
>>> Right, this would avoid adding a new arch dependent function and the patch
>>> will be simpler. This probably makes sense as the stack impact is the same
>>> for all calls (but objtool will use it only for intra-function calls).
>>>
>>
>> Actually the processing of the ret instruction is more complicated than I
>> anticipated with intra-function calls, and so my implementation is not
>> complete at the moment.
>>
>> The issue is to correctly handle how the ret is going to behave depending how
>> the stack (or register on arm) is modified before the ret. Adjusting the stack
>> offset makes the stack state correct, but objtool still needs to correctly
>> figure out where the ret is going to return and where the code flow continues.
>>
>
> A hint indicating the target "jump" address could be useful. It could
> be used to add the information on some call/jump dynamic that aren't
> associated with jump tables. Currently when objtool finds a jump
> dynamic, if no branches were added to it, it will just return.
>
> Having such a hint could help make additional links (at least on
> arm64). I don't know what Peter and Josh would think of that (if that
> helps in your case of course).
>

Yes, I am thinking about tracking intra-function call return address,
and having hints to specify a return address changes. For example,
on x86, when we push the branch address on the stack we overwrite the
last return address (the return address of the last intra-function call).
Then the return instruction can figure out where to branch.

alex.

2020-04-08 18:58:15

by Julien Thierry

[permalink] [raw]
Subject: Re: [PATCH V2 5/9] objtool: Add support for intra-function calls



On 4/8/20 5:03 PM, Alexandre Chartre wrote:
>
>
> On 4/8/20 4:19 PM, Julien Thierry wrote:
>>
>>
>> On 4/8/20 3:06 PM, Alexandre Chartre wrote:
>>>
>>>
>>> On 4/7/20 3:28 PM, Alexandre Chartre wrote:
>>>>
>>>> On 4/7/20 3:07 PM, Peter Zijlstra wrote:
>>>>> On Tue, Apr 07, 2020 at 09:31:38AM +0200, Alexandre Chartre wrote:
>>>>>
>>>>>> index a62e032863a8..7ee1561bf7ad 100644
>>>>>> --- a/tools/objtool/arch/x86/decode.c
>>>>>> +++ b/tools/objtool/arch/x86/decode.c
>>>>>> @@ -497,3 +497,15 @@ void arch_initial_func_cfi_state(struct
>>>>>> cfi_state *state)
>>>>>>       state->regs[16].base = CFI_CFA;
>>>>>>       state->regs[16].offset = -8;
>>>>>>   }
>>>>>> +
>>>>>> +
>>>>>> +void arch_configure_intra_function_call(struct stack_op *op)
>>>>>> +{
>>>>>> +    /*
>>>>>> +     * For the impact on the stack, make an intra-function
>>>>>> +     * call behaves like a push of an immediate value (the
>>>>>> +     * return address).
>>>>>> +     */
>>>>>> +    op->src.type = OP_SRC_CONST;
>>>>>> +    op->dest.type = OP_DEST_PUSH;
>>>>>> +}
>>>>>
>>>>> An alternative is to always set up stack ops for CALL/RET on
>>>>> decode, but
>>>>> conditionally run update_insn_state() for them.
>>>>>
>>>>> Not sure that makes more logical sense, but the patch would be
>>>>> simpler I
>>>>> think.
>>>>
>>>> Right, this would avoid adding a new arch dependent function and the
>>>> patch
>>>> will be simpler. This probably makes sense as the stack impact is
>>>> the same
>>>> for all calls (but objtool will use it only for intra-function calls).
>>>>
>>>
>>> Actually the processing of the ret instruction is more complicated
>>> than I
>>> anticipated with intra-function calls, and so my implementation is not
>>> complete at the moment.
>>>
>>> The issue is to correctly handle how the ret is going to behave
>>> depending how
>>> the stack (or register on arm) is modified before the ret. Adjusting
>>> the stack
>>> offset makes the stack state correct, but objtool still needs to
>>> correctly
>>> figure out where the ret is going to return and where the code flow
>>> continues.
>>>
>>
>> A hint indicating the target "jump" address could be useful. It could
>> be used to add the information on some call/jump dynamic that aren't
>> associated with jump tables. Currently when objtool finds a jump
>> dynamic, if no branches were added to it, it will just return.
>>
>> Having such a hint could help make additional links (at least on
>> arm64). I don't know what Peter and Josh would think of that (if that
>> helps in your case of course).
>>
>
> Yes, I am thinking about tracking intra-function call return address,
> and having hints to specify a return address changes. For example,
> on x86, when we push the branch address on the stack we overwrite the
> last return address (the return address of the last intra-function call).
> Then the return instruction can figure out where to branch.

I see, I was thinking about a more generic hint, that would just
indicate "this instruction actually jumps here". So in your case it
would just point that a certain return instruction causes to branch
somewhere.

This way the hint could also be used for other instructions (e.g.
INSN_JUMP_DYNAMIC).



--
Julien Thierry

2020-04-08 19:43:23

by Alexandre Chartre

[permalink] [raw]
Subject: Re: [PATCH V2 5/9] objtool: Add support for intra-function calls



On 4/8/20 6:04 PM, Julien Thierry wrote:
>
>
> On 4/8/20 5:03 PM, Alexandre Chartre wrote:
>>
>>
>> On 4/8/20 4:19 PM, Julien Thierry wrote:
>>>
>>>
>>> On 4/8/20 3:06 PM, Alexandre Chartre wrote:
>>>>
>>>>
>>>> On 4/7/20 3:28 PM, Alexandre Chartre wrote:
>>>>>
>>>>> On 4/7/20 3:07 PM, Peter Zijlstra wrote:
>>>>>> On Tue, Apr 07, 2020 at 09:31:38AM +0200, Alexandre Chartre wrote:
>>>>>>
>>>>>>> index a62e032863a8..7ee1561bf7ad 100644
>>>>>>> --- a/tools/objtool/arch/x86/decode.c
>>>>>>> +++ b/tools/objtool/arch/x86/decode.c
>>>>>>> @@ -497,3 +497,15 @@ void arch_initial_func_cfi_state(struct cfi_state *state)
>>>>>>>       state->regs[16].base = CFI_CFA;
>>>>>>>       state->regs[16].offset = -8;
>>>>>>>   }
>>>>>>> +
>>>>>>> +
>>>>>>> +void arch_configure_intra_function_call(struct stack_op *op)
>>>>>>> +{
>>>>>>> +    /*
>>>>>>> +     * For the impact on the stack, make an intra-function
>>>>>>> +     * call behaves like a push of an immediate value (the
>>>>>>> +     * return address).
>>>>>>> +     */
>>>>>>> +    op->src.type = OP_SRC_CONST;
>>>>>>> +    op->dest.type = OP_DEST_PUSH;
>>>>>>> +}
>>>>>>
>>>>>> An alternative is to always set up stack ops for CALL/RET on decode, but
>>>>>> conditionally run update_insn_state() for them.
>>>>>>
>>>>>> Not sure that makes more logical sense, but the patch would be simpler I
>>>>>> think.
>>>>>
>>>>> Right, this would avoid adding a new arch dependent function and the patch
>>>>> will be simpler. This probably makes sense as the stack impact is the same
>>>>> for all calls (but objtool will use it only for intra-function calls).
>>>>>
>>>>
>>>> Actually the processing of the ret instruction is more complicated than I
>>>> anticipated with intra-function calls, and so my implementation is not
>>>> complete at the moment.
>>>>
>>>> The issue is to correctly handle how the ret is going to behave depending how
>>>> the stack (or register on arm) is modified before the ret. Adjusting the stack
>>>> offset makes the stack state correct, but objtool still needs to correctly
>>>> figure out where the ret is going to return and where the code flow continues.
>>>>
>>>
>>> A hint indicating the target "jump" address could be useful. It could
>>> be used to add the information on some call/jump dynamic that aren't
>>> associated with jump tables. Currently when objtool finds a jump
>>> dynamic, if no branches were added to it, it will just return.
>>>
>>> Having such a hint could help make additional links (at least on
>>> arm64). I don't know what Peter and Josh would think of that (if that
>>> helps in your case of course).
>>>
>>
>> Yes, I am thinking about tracking intra-function call return address,
>> and having hints to specify a return address changes. For example,
>> on x86, when we push the branch address on the stack we overwrite the
>> last return address (the return address of the last intra-function call).
>> Then the return instruction can figure out where to branch.
>
> I see, I was thinking about a more generic hint, that would just
> indicate "this instruction actually jumps here". So in your case it
> would just point that a certain return instruction causes to branch
> somewhere.

I thought about doing that but the problem is that on x86 the same
retpoline code can branch differently depending on how it is used.
Basically we have a return instruction that will branch differently
based on what's on the stack. So we can just tell that this ret
instruction will branch/return there.

alex.

> This way the hint could also be used for other instructions (e.g.
> INSN_JUMP_DYNAMIC).
>
>
>

2020-04-08 19:43:41

by Julien Thierry

[permalink] [raw]
Subject: Re: [PATCH V2 5/9] objtool: Add support for intra-function calls



On 4/8/20 6:06 PM, Alexandre Chartre wrote:
>
>
> On 4/8/20 6:04 PM, Julien Thierry wrote:
>>
>>
>> On 4/8/20 5:03 PM, Alexandre Chartre wrote:
>>>
>>>
>>> On 4/8/20 4:19 PM, Julien Thierry wrote:
>>>>
>>>>
>>>> On 4/8/20 3:06 PM, Alexandre Chartre wrote:
>>>>>
>>>>>
>>>>> On 4/7/20 3:28 PM, Alexandre Chartre wrote:
>>>>>>
>>>>>> On 4/7/20 3:07 PM, Peter Zijlstra wrote:
>>>>>>> On Tue, Apr 07, 2020 at 09:31:38AM +0200, Alexandre Chartre wrote:
>>>>>>>
>>>>>>>> index a62e032863a8..7ee1561bf7ad 100644
>>>>>>>> --- a/tools/objtool/arch/x86/decode.c
>>>>>>>> +++ b/tools/objtool/arch/x86/decode.c
>>>>>>>> @@ -497,3 +497,15 @@ void arch_initial_func_cfi_state(struct
>>>>>>>> cfi_state *state)
>>>>>>>>       state->regs[16].base = CFI_CFA;
>>>>>>>>       state->regs[16].offset = -8;
>>>>>>>>   }
>>>>>>>> +
>>>>>>>> +
>>>>>>>> +void arch_configure_intra_function_call(struct stack_op *op)
>>>>>>>> +{
>>>>>>>> +    /*
>>>>>>>> +     * For the impact on the stack, make an intra-function
>>>>>>>> +     * call behaves like a push of an immediate value (the
>>>>>>>> +     * return address).
>>>>>>>> +     */
>>>>>>>> +    op->src.type = OP_SRC_CONST;
>>>>>>>> +    op->dest.type = OP_DEST_PUSH;
>>>>>>>> +}
>>>>>>>
>>>>>>> An alternative is to always set up stack ops for CALL/RET on
>>>>>>> decode, but
>>>>>>> conditionally run update_insn_state() for them.
>>>>>>>
>>>>>>> Not sure that makes more logical sense, but the patch would be
>>>>>>> simpler I
>>>>>>> think.
>>>>>>
>>>>>> Right, this would avoid adding a new arch dependent function and
>>>>>> the patch
>>>>>> will be simpler. This probably makes sense as the stack impact is
>>>>>> the same
>>>>>> for all calls (but objtool will use it only for intra-function
>>>>>> calls).
>>>>>>
>>>>>
>>>>> Actually the processing of the ret instruction is more complicated
>>>>> than I
>>>>> anticipated with intra-function calls, and so my implementation is not
>>>>> complete at the moment.
>>>>>
>>>>> The issue is to correctly handle how the ret is going to behave
>>>>> depending how
>>>>> the stack (or register on arm) is modified before the ret.
>>>>> Adjusting the stack
>>>>> offset makes the stack state correct, but objtool still needs to
>>>>> correctly
>>>>> figure out where the ret is going to return and where the code flow
>>>>> continues.
>>>>>
>>>>
>>>> A hint indicating the target "jump" address could be useful. It could
>>>> be used to add the information on some call/jump dynamic that aren't
>>>> associated with jump tables. Currently when objtool finds a jump
>>>> dynamic, if no branches were added to it, it will just return.
>>>>
>>>> Having such a hint could help make additional links (at least on
>>>> arm64). I don't know what Peter and Josh would think of that (if that
>>>> helps in your case of course).
>>>>
>>>
>>> Yes, I am thinking about tracking intra-function call return address,
>>> and having hints to specify a return address changes. For example,
>>> on x86, when we push the branch address on the stack we overwrite the
>>> last return address (the return address of the last intra-function
>>> call).
>>> Then the return instruction can figure out where to branch.
>>
>> I see, I was thinking about a more generic hint, that would just
>> indicate "this instruction actually jumps here". So in your case it
>> would just point that a certain return instruction causes to branch
>> somewhere.
>
> I thought about doing that but the problem is that on x86 the same
> retpoline code can branch differently depending on how it is used.
> Basically we have a return instruction that will branch differently
> based on what's on the stack. So we can just tell that this ret
> instruction will branch/return there.
>

Oh, I see. Nevermind then!

--
Julien Thierry

2020-04-08 22:01:22

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH V2 9/9] x86/speculation: Remove all ANNOTATE_NOSPEC_ALTERNATIVE directives

On Tue, Apr 07, 2020 at 07:27:39PM +0200, Peter Zijlstra wrote:
> On Tue, Apr 07, 2020 at 11:28:38AM -0500, Josh Poimboeuf wrote:
> > Again, we should warn on stack changes inside alternatives, and then
> > look at converting RSB and retpolines to use static branches so they
> > have deterministic stacks.
>
> I don't think we need static brancher, we should just out-of-line the
> whole thing.
>
> Let me sort this CFI error Thomas is getting and then I'll attempt a
> patch along the lines I outlined in earlier emails.

Something like so.. seems to build and boot.

---
From: Peter Zijlstra (Intel) <[email protected]>
Subject: x86: Out-of-line retpoline

Since GCC generated code already uses out-of-line retpolines and objtool
has trouble with retpolines in alternatives, out-of-line them entirely.

This will enable objtool (once it's been taught a few more tricks) to
generate valid ORC data for the out-of-line copies, which means we can
correctly and reliably unwind through a retpoline.

Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
---
arch/x86/crypto/aesni-intel_asm.S | 4 +--
arch/x86/crypto/camellia-aesni-avx-asm_64.S | 2 +-
arch/x86/crypto/camellia-aesni-avx2-asm_64.S | 2 +-
arch/x86/crypto/crc32c-pcl-intel-asm_64.S | 26 ++++++++---------
arch/x86/entry/entry_32.S | 6 ++--
arch/x86/entry/entry_64.S | 2 +-
arch/x86/include/asm/asm-prototypes.h | 8 ++++--
arch/x86/include/asm/nospec-branch.h | 42 ++++------------------------
arch/x86/kernel/ftrace_32.S | 2 +-
arch/x86/kernel/ftrace_64.S | 4 +--
arch/x86/lib/checksum_32.S | 4 +--
arch/x86/lib/retpoline.S | 27 +++++++++++++++---
arch/x86/platform/efi/efi_stub_64.S | 2 +-
13 files changed, 62 insertions(+), 69 deletions(-)

diff --git a/arch/x86/crypto/aesni-intel_asm.S b/arch/x86/crypto/aesni-intel_asm.S
index cad6e1bfa7d5..54e7d15dbd0d 100644
--- a/arch/x86/crypto/aesni-intel_asm.S
+++ b/arch/x86/crypto/aesni-intel_asm.S
@@ -2758,7 +2758,7 @@ SYM_FUNC_START(aesni_xts_crypt8)
pxor INC, STATE4
movdqu IV, 0x30(OUTP)

- CALL_NOSPEC %r11
+ CALL_NOSPEC r11

movdqu 0x00(OUTP), INC
pxor INC, STATE1
@@ -2803,7 +2803,7 @@ SYM_FUNC_START(aesni_xts_crypt8)
_aesni_gf128mul_x_ble()
movups IV, (IVP)

- CALL_NOSPEC %r11
+ CALL_NOSPEC r11

movdqu 0x40(OUTP), INC
pxor INC, STATE1
diff --git a/arch/x86/crypto/camellia-aesni-avx-asm_64.S b/arch/x86/crypto/camellia-aesni-avx-asm_64.S
index d01ddd73de65..ecc0a9a905c4 100644
--- a/arch/x86/crypto/camellia-aesni-avx-asm_64.S
+++ b/arch/x86/crypto/camellia-aesni-avx-asm_64.S
@@ -1228,7 +1228,7 @@ SYM_FUNC_START_LOCAL(camellia_xts_crypt_16way)
vpxor 14 * 16(%rax), %xmm15, %xmm14;
vpxor 15 * 16(%rax), %xmm15, %xmm15;

- CALL_NOSPEC %r9;
+ CALL_NOSPEC r9;

addq $(16 * 16), %rsp;

diff --git a/arch/x86/crypto/camellia-aesni-avx2-asm_64.S b/arch/x86/crypto/camellia-aesni-avx2-asm_64.S
index 563ef6e83cdd..0907243c501c 100644
--- a/arch/x86/crypto/camellia-aesni-avx2-asm_64.S
+++ b/arch/x86/crypto/camellia-aesni-avx2-asm_64.S
@@ -1339,7 +1339,7 @@ SYM_FUNC_START_LOCAL(camellia_xts_crypt_32way)
vpxor 14 * 32(%rax), %ymm15, %ymm14;
vpxor 15 * 32(%rax), %ymm15, %ymm15;

- CALL_NOSPEC %r9;
+ CALL_NOSPEC r9;

addq $(16 * 32), %rsp;

diff --git a/arch/x86/crypto/crc32c-pcl-intel-asm_64.S b/arch/x86/crypto/crc32c-pcl-intel-asm_64.S
index 0e6690e3618c..8501ec4532f4 100644
--- a/arch/x86/crypto/crc32c-pcl-intel-asm_64.S
+++ b/arch/x86/crypto/crc32c-pcl-intel-asm_64.S
@@ -75,7 +75,7 @@

.text
SYM_FUNC_START(crc_pcl)
-#define bufp %rdi
+#define bufp rdi
#define bufp_dw %edi
#define bufp_w %di
#define bufp_b %dil
@@ -105,9 +105,9 @@ SYM_FUNC_START(crc_pcl)
## 1) ALIGN:
################################################################

- mov bufp, bufptmp # rdi = *buf
- neg bufp
- and $7, bufp # calculate the unalignment amount of
+ mov %bufp, bufptmp # rdi = *buf
+ neg %bufp
+ and $7, %bufp # calculate the unalignment amount of
# the address
je proc_block # Skip if aligned

@@ -123,13 +123,13 @@ SYM_FUNC_START(crc_pcl)
do_align:
#### Calculate CRC of unaligned bytes of the buffer (if any)
movq (bufptmp), tmp # load a quadward from the buffer
- add bufp, bufptmp # align buffer pointer for quadword
+ add %bufp, bufptmp # align buffer pointer for quadword
# processing
- sub bufp, len # update buffer length
+ sub %bufp, len # update buffer length
align_loop:
crc32b %bl, crc_init_dw # compute crc32 of 1-byte
shr $8, tmp # get next byte
- dec bufp
+ dec %bufp
jne align_loop

proc_block:
@@ -169,10 +169,10 @@ continue_block:
xor crc2, crc2

## branch into array
- lea jump_table(%rip), bufp
- movzxw (bufp, %rax, 2), len
- lea crc_array(%rip), bufp
- lea (bufp, len, 1), bufp
+ lea jump_table(%rip), %bufp
+ movzxw (%bufp, %rax, 2), len
+ lea crc_array(%rip), %bufp
+ lea (%bufp, len, 1), %bufp
JMP_NOSPEC bufp

################################################################
@@ -218,9 +218,9 @@ LABEL crc_ %i
## 4) Combine three results:
################################################################

- lea (K_table-8)(%rip), bufp # first entry is for idx 1
+ lea (K_table-8)(%rip), %bufp # first entry is for idx 1
shlq $3, %rax # rax *= 8
- pmovzxdq (bufp,%rax), %xmm0 # 2 consts: K1:K2
+ pmovzxdq (%bufp,%rax), %xmm0 # 2 consts: K1:K2
leal (%eax,%eax,2), %eax # rax *= 3 (total *24)
subq %rax, tmp # tmp -= rax*24

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index b67bae7091d7..7e7ffb7a5147 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -816,7 +816,7 @@ SYM_CODE_START(ret_from_fork)

/* kernel thread */
1: movl %edi, %eax
- CALL_NOSPEC %ebx
+ CALL_NOSPEC ebx
/*
* A kernel thread is allowed to return here after successfully
* calling do_execve(). Exit to userspace to complete the execve()
@@ -1501,7 +1501,7 @@ SYM_CODE_START_LOCAL_NOALIGN(common_exception_read_cr2)

TRACE_IRQS_OFF
movl %esp, %eax # pt_regs pointer
- CALL_NOSPEC %edi
+ CALL_NOSPEC edi
jmp ret_from_exception
SYM_CODE_END(common_exception_read_cr2)

@@ -1522,7 +1522,7 @@ SYM_CODE_START_LOCAL_NOALIGN(common_exception)

TRACE_IRQS_OFF
movl %esp, %eax # pt_regs pointer
- CALL_NOSPEC %edi
+ CALL_NOSPEC edi
jmp ret_from_exception
SYM_CODE_END(common_exception)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 0e9504fabe52..168b798913bc 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -349,7 +349,7 @@ SYM_CODE_START(ret_from_fork)
/* kernel thread */
UNWIND_HINT_EMPTY
movq %r12, %rdi
- CALL_NOSPEC %rbx
+ CALL_NOSPEC rbx
/*
* A kernel thread is allowed to return here after successfully
* calling do_execve(). Exit to userspace to complete the execve()
diff --git a/arch/x86/include/asm/asm-prototypes.h b/arch/x86/include/asm/asm-prototypes.h
index ce92c4acc913..a75195f159cc 100644
--- a/arch/x86/include/asm/asm-prototypes.h
+++ b/arch/x86/include/asm/asm-prototypes.h
@@ -18,9 +18,13 @@ extern void cmpxchg8b_emu(void);

#ifdef CONFIG_RETPOLINE
#ifdef CONFIG_X86_32
-#define INDIRECT_THUNK(reg) extern asmlinkage void __x86_indirect_thunk_e ## reg(void);
+#define INDIRECT_THUNK(reg) \
+ extern asmlinkage void __x86_retpoline_e ## reg(void); \
+ extern asmlinkage void __x86_indirect_thunk_e ## reg(void);
#else
-#define INDIRECT_THUNK(reg) extern asmlinkage void __x86_indirect_thunk_r ## reg(void);
+#define INDIRECT_THUNK(reg) \
+ extern asmlinkage void __x86_retpoline_r ## reg(void); \
+ extern asmlinkage void __x86_indirect_thunk_r ## reg(void);
INDIRECT_THUNK(8)
INDIRECT_THUNK(9)
INDIRECT_THUNK(10)
diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
index 07e95dcb40ad..0a3ceab5e0ec 100644
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -76,34 +76,6 @@
.popsection
.endm

-/*
- * These are the bare retpoline primitives for indirect jmp and call.
- * Do not use these directly; they only exist to make the ALTERNATIVE
- * invocation below less ugly.
- */
-.macro RETPOLINE_JMP reg:req
- call .Ldo_rop_\@
-.Lspec_trap_\@:
- pause
- lfence
- jmp .Lspec_trap_\@
-.Ldo_rop_\@:
- mov \reg, (%_ASM_SP)
- ret
-.endm
-
-/*
- * This is a wrapper around RETPOLINE_JMP so the called function in reg
- * returns to the instruction after the macro.
- */
-.macro RETPOLINE_CALL reg:req
- jmp .Ldo_call_\@
-.Ldo_retpoline_jmp_\@:
- RETPOLINE_JMP \reg
-.Ldo_call_\@:
- call .Ldo_retpoline_jmp_\@
-.endm
-
/*
* JMP_NOSPEC and CALL_NOSPEC macros can be used instead of a simple
* indirect jmp/call which may be susceptible to the Spectre variant 2
@@ -111,10 +83,9 @@
*/
.macro JMP_NOSPEC reg:req
#ifdef CONFIG_RETPOLINE
- ANNOTATE_NOSPEC_ALTERNATIVE
- ALTERNATIVE_2 __stringify(ANNOTATE_RETPOLINE_SAFE; jmp *\reg), \
- __stringify(RETPOLINE_JMP \reg), X86_FEATURE_RETPOLINE, \
- __stringify(lfence; ANNOTATE_RETPOLINE_SAFE; jmp *\reg), X86_FEATURE_RETPOLINE_AMD
+ ALTERNATIVE_2 __stringify(ANNOTATE_RETPOLINE_SAFE; jmp *%\reg), \
+ __stringify(jmp __x86_retpoline_\()\reg), X86_FEATURE_RETPOLINE, \
+ __stringify(lfence; ANNOTATE_RETPOLINE_SAFE; jmp *%\reg), X86_FEATURE_RETPOLINE_AMD
#else
jmp *\reg
#endif
@@ -122,10 +93,9 @@

.macro CALL_NOSPEC reg:req
#ifdef CONFIG_RETPOLINE
- ANNOTATE_NOSPEC_ALTERNATIVE
- ALTERNATIVE_2 __stringify(ANNOTATE_RETPOLINE_SAFE; call *\reg), \
- __stringify(RETPOLINE_CALL \reg), X86_FEATURE_RETPOLINE,\
- __stringify(lfence; ANNOTATE_RETPOLINE_SAFE; call *\reg), X86_FEATURE_RETPOLINE_AMD
+ ALTERNATIVE_2 __stringify(ANNOTATE_RETPOLINE_SAFE; call *%\reg), \
+ __stringify(call __x86_retpoline_\()\reg), X86_FEATURE_RETPOLINE,\
+ __stringify(lfence; ANNOTATE_RETPOLINE_SAFE; call *%\reg), X86_FEATURE_RETPOLINE_AMD
#else
call *\reg
#endif
diff --git a/arch/x86/kernel/ftrace_32.S b/arch/x86/kernel/ftrace_32.S
index e8a9f8370112..e405fe1a8bf4 100644
--- a/arch/x86/kernel/ftrace_32.S
+++ b/arch/x86/kernel/ftrace_32.S
@@ -189,5 +189,5 @@ return_to_handler:
movl %eax, %ecx
popl %edx
popl %eax
- JMP_NOSPEC %ecx
+ JMP_NOSPEC ecx
#endif
diff --git a/arch/x86/kernel/ftrace_64.S b/arch/x86/kernel/ftrace_64.S
index 369e61faacfe..2f2b5702e6cf 100644
--- a/arch/x86/kernel/ftrace_64.S
+++ b/arch/x86/kernel/ftrace_64.S
@@ -303,7 +303,7 @@ trace:
* function tracing is enabled.
*/
movq ftrace_trace_function, %r8
- CALL_NOSPEC %r8
+ CALL_NOSPEC r8
restore_mcount_regs

jmp fgraph_trace
@@ -340,6 +340,6 @@ SYM_CODE_START(return_to_handler)
movq 8(%rsp), %rdx
movq (%rsp), %rax
addq $24, %rsp
- JMP_NOSPEC %rdi
+ JMP_NOSPEC rdi
SYM_CODE_END(return_to_handler)
#endif
diff --git a/arch/x86/lib/checksum_32.S b/arch/x86/lib/checksum_32.S
index 4742e8fa7ee7..d1d768912368 100644
--- a/arch/x86/lib/checksum_32.S
+++ b/arch/x86/lib/checksum_32.S
@@ -153,7 +153,7 @@ SYM_FUNC_START(csum_partial)
negl %ebx
lea 45f(%ebx,%ebx,2), %ebx
testl %esi, %esi
- JMP_NOSPEC %ebx
+ JMP_NOSPEC ebx

# Handle 2-byte-aligned regions
20: addw (%esi), %ax
@@ -436,7 +436,7 @@ SYM_FUNC_START(csum_partial_copy_generic)
andl $-32,%edx
lea 3f(%ebx,%ebx), %ebx
testl %esi, %esi
- JMP_NOSPEC %ebx
+ JMP_NOSPEC ebx
1: addl $64,%esi
addl $64,%edi
SRC(movb -32(%edx),%bl) ; SRC(movb (%edx),%bl)
diff --git a/arch/x86/lib/retpoline.S b/arch/x86/lib/retpoline.S
index 363ec132df7e..d4aef0c856a6 100644
--- a/arch/x86/lib/retpoline.S
+++ b/arch/x86/lib/retpoline.S
@@ -8,14 +8,31 @@
#include <asm/export.h>
#include <asm/nospec-branch.h>

+/*
+ * This is the bare retpoline primitive.
+ */
+.macro RETPOLINE_JMP reg:req
+ call .Ldo_rop_\@
+.Lspec_trap_\@:
+ pause
+ lfence
+ jmp .Lspec_trap_\@
+.Ldo_rop_\@:
+ mov \reg, (%_ASM_SP)
+ ret
+.endm
+
.macro THUNK reg
.section .text.__x86.indirect_thunk

+SYM_FUNC_START(__x86_retpoline_\reg)
+ RETPOLINE_JMP %\reg
+SYM_FUNC_END(__x86_retpoline_\reg)
+
SYM_FUNC_START(__x86_indirect_thunk_\reg)
- CFI_STARTPROC
- JMP_NOSPEC %\reg
- CFI_ENDPROC
+ JMP_NOSPEC \reg
SYM_FUNC_END(__x86_indirect_thunk_\reg)
+
.endm

/*
@@ -26,7 +43,9 @@ SYM_FUNC_END(__x86_indirect_thunk_\reg)
* the simple and nasty way...
*/
#define __EXPORT_THUNK(sym) _ASM_NOKPROBE(sym); EXPORT_SYMBOL(sym)
-#define EXPORT_THUNK(reg) __EXPORT_THUNK(__x86_indirect_thunk_ ## reg)
+#define EXPORT_THUNK(reg) \
+ __EXPORT_THUNK(__x86_retpoline_ ## reg); \
+ __EXPORT_THUNK(__x86_indirect_thunk_ ## reg)
#define GENERATE_THUNK(reg) THUNK reg ; EXPORT_THUNK(reg)

GENERATE_THUNK(_ASM_AX)
diff --git a/arch/x86/platform/efi/efi_stub_64.S b/arch/x86/platform/efi/efi_stub_64.S
index 15da118f04f0..90380a17ab23 100644
--- a/arch/x86/platform/efi/efi_stub_64.S
+++ b/arch/x86/platform/efi/efi_stub_64.S
@@ -21,7 +21,7 @@ SYM_FUNC_START(__efi_call)
mov %r8, %r9
mov %rcx, %r8
mov %rsi, %rcx
- CALL_NOSPEC %rdi
+ CALL_NOSPEC rdi
leave
ret
SYM_FUNC_END(__efi_call)

2020-04-09 08:18:28

by Alexandre Chartre

[permalink] [raw]
Subject: Re: [PATCH V2 9/9] x86/speculation: Remove all ANNOTATE_NOSPEC_ALTERNATIVE directives


On 4/8/20 11:35 PM, Peter Zijlstra wrote:
> On Tue, Apr 07, 2020 at 07:27:39PM +0200, Peter Zijlstra wrote:
>> On Tue, Apr 07, 2020 at 11:28:38AM -0500, Josh Poimboeuf wrote:
>>> Again, we should warn on stack changes inside alternatives, and then
>>> look at converting RSB and retpolines to use static branches so they
>>> have deterministic stacks.
>>
>> I don't think we need static brancher, we should just out-of-line the
>> whole thing.
>>
>> Let me sort this CFI error Thomas is getting and then I'll attempt a
>> patch along the lines I outlined in earlier emails.
>
> Something like so.. seems to build and boot.
>
> ---
> From: Peter Zijlstra (Intel) <[email protected]>
> Subject: x86: Out-of-line retpoline
>
> Since GCC generated code already uses out-of-line retpolines and objtool
> has trouble with retpolines in alternatives, out-of-line them entirely.
>
> This will enable objtool (once it's been taught a few more tricks) to
> generate valid ORC data for the out-of-line copies, which means we can
> correctly and reliably unwind through a retpoline.
>
> Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
> ---
> arch/x86/crypto/aesni-intel_asm.S | 4 +--
> arch/x86/crypto/camellia-aesni-avx-asm_64.S | 2 +-
> arch/x86/crypto/camellia-aesni-avx2-asm_64.S | 2 +-
> arch/x86/crypto/crc32c-pcl-intel-asm_64.S | 26 ++++++++---------
> arch/x86/entry/entry_32.S | 6 ++--
> arch/x86/entry/entry_64.S | 2 +-
> arch/x86/include/asm/asm-prototypes.h | 8 ++++--
> arch/x86/include/asm/nospec-branch.h | 42 ++++------------------------
> arch/x86/kernel/ftrace_32.S | 2 +-
> arch/x86/kernel/ftrace_64.S | 4 +--
> arch/x86/lib/checksum_32.S | 4 +--
> arch/x86/lib/retpoline.S | 27 +++++++++++++++---
> arch/x86/platform/efi/efi_stub_64.S | 2 +-
> 13 files changed, 62 insertions(+), 69 deletions(-)
>
...
> /*
> * JMP_NOSPEC and CALL_NOSPEC macros can be used instead of a simple
> * indirect jmp/call which may be susceptible to the Spectre variant 2
> @@ -111,10 +83,9 @@
> */
> .macro JMP_NOSPEC reg:req
> #ifdef CONFIG_RETPOLINE
> - ANNOTATE_NOSPEC_ALTERNATIVE
> - ALTERNATIVE_2 __stringify(ANNOTATE_RETPOLINE_SAFE; jmp *\reg), \
> - __stringify(RETPOLINE_JMP \reg), X86_FEATURE_RETPOLINE, \
> - __stringify(lfence; ANNOTATE_RETPOLINE_SAFE; jmp *\reg), X86_FEATURE_RETPOLINE_AMD
> + ALTERNATIVE_2 __stringify(ANNOTATE_RETPOLINE_SAFE; jmp *%\reg), \
> + __stringify(jmp __x86_retpoline_\()\reg), X86_FEATURE_RETPOLINE, \
> + __stringify(lfence; ANNOTATE_RETPOLINE_SAFE; jmp *%\reg), X86_FEATURE_RETPOLINE_AMD
> #else
> jmp *\reg
> #endif
> @@ -122,10 +93,9 @@
>
> .macro CALL_NOSPEC reg:req
> #ifdef CONFIG_RETPOLINE
> - ANNOTATE_NOSPEC_ALTERNATIVE
> - ALTERNATIVE_2 __stringify(ANNOTATE_RETPOLINE_SAFE; call *\reg), \
> - __stringify(RETPOLINE_CALL \reg), X86_FEATURE_RETPOLINE,\
> - __stringify(lfence; ANNOTATE_RETPOLINE_SAFE; call *\reg), X86_FEATURE_RETPOLINE_AMD
> + ALTERNATIVE_2 __stringify(ANNOTATE_RETPOLINE_SAFE; call *%\reg), \
> + __stringify(call __x86_retpoline_\()\reg), X86_FEATURE_RETPOLINE,\
> + __stringify(lfence; ANNOTATE_RETPOLINE_SAFE; call *%\reg), X86_FEATURE_RETPOLINE_AMD

For X86_FEATURE_RETPOLINE_AMD, the call won't be aligned like the others,
it will be after the lfence instruction so ORC data won't be at the same
place. I am adding some code in objtool to check that alternatives don't
change the stack, but I should actually be checking if all alternatives
have the same unwind instructions at the same place.

Other than that, my only question would be any impact on performances.
Retpoline code was added with trying to limit performance impact.
Here, JMP_NOSPEC has now an additional (long) jump, and CALL_NOSPEC
is doing a long call instead of a near call. But I have no idea if this
has a visible impact.

alex.

2020-04-09 10:35:33

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH V2 9/9] x86/speculation: Remove all ANNOTATE_NOSPEC_ALTERNATIVE directives

On Thu, Apr 09, 2020 at 10:18:56AM +0200, Alexandre Chartre wrote:

> > - ANNOTATE_NOSPEC_ALTERNATIVE
> > - ALTERNATIVE_2 __stringify(ANNOTATE_RETPOLINE_SAFE; call *\reg), \
> > - __stringify(RETPOLINE_CALL \reg), X86_FEATURE_RETPOLINE,\
> > - __stringify(lfence; ANNOTATE_RETPOLINE_SAFE; call *\reg), X86_FEATURE_RETPOLINE_AMD
> > + ALTERNATIVE_2 __stringify(ANNOTATE_RETPOLINE_SAFE; call *%\reg), \
> > + __stringify(call __x86_retpoline_\()\reg), X86_FEATURE_RETPOLINE,\
> > + __stringify(lfence; ANNOTATE_RETPOLINE_SAFE; call *%\reg), X86_FEATURE_RETPOLINE_AMD
>
> For X86_FEATURE_RETPOLINE_AMD, the call won't be aligned like the others,
> it will be after the lfence instruction so ORC data won't be at the same
> place. I am adding some code in objtool to check that alternatives don't
> change the stack, but I should actually be checking if all alternatives
> have the same unwind instructions at the same place.

Argh; earlier ([email protected]) I
used 2 alternatives but then, when I did the patch yesterday, I forgot
why I did that :/

> Other than that, my only question would be any impact on performances.

Yeah, that needs testing, I suspect it's a wash.

> Retpoline code was added with trying to limit performance impact.
> Here, JMP_NOSPEC has now an additional (long) jump, and CALL_NOSPEC
> is doing a long call instead of a near call. But I have no idea if this
> has a visible impact.

The thing is, all the compiler generated code already used the
out-of-line copies, and those will now have a near jump extra I think,
but that should be to the same $I line, given that __x86_retpoline_ and
__x86_indirect_thunk_ are next to one another.

We can also play alignment tricks, see below.

The only sites that can suffer are those few manual asm uses of
JMP_NOSPEC, and I'm not sure we care.


The below results in:

Disassembly of section .text.__x86.indirect_thunk:

0000000000000000 <__x86_indirect_thunk_rax>:
0: 90 nop
1: 90 nop
2: 90 nop
3: ff e0 jmpq *%rax
5: 90 nop
6: 90 nop
7: 90 nop

0000000000000008 <__x86_retpoline_rax>:
8: e8 07 00 00 00 callq 14 <__x86_retpoline_rax+0xc>
d: f3 90 pause
f: 0f ae e8 lfence
12: eb f9 jmp d <__x86_retpoline_rax+0x5>
14: 48 89 04 24 mov %rax,(%rsp)
18: c3 retq
19: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)

0000000000000020 <__x86_indirect_thunk_rbx>:
....


I'm not sure why it thinks the "jmp __x86_retpoline_rax" needs 5 bytes
to encode, but those nops don't hurt nothing since we'll not fit in 16
bytes anyway.

---
diff --git a/arch/x86/crypto/aesni-intel_asm.S b/arch/x86/crypto/aesni-intel_asm.S
index cad6e1bfa7d5..54e7d15dbd0d 100644
--- a/arch/x86/crypto/aesni-intel_asm.S
+++ b/arch/x86/crypto/aesni-intel_asm.S
@@ -2758,7 +2758,7 @@ SYM_FUNC_START(aesni_xts_crypt8)
pxor INC, STATE4
movdqu IV, 0x30(OUTP)

- CALL_NOSPEC %r11
+ CALL_NOSPEC r11

movdqu 0x00(OUTP), INC
pxor INC, STATE1
@@ -2803,7 +2803,7 @@ SYM_FUNC_START(aesni_xts_crypt8)
_aesni_gf128mul_x_ble()
movups IV, (IVP)

- CALL_NOSPEC %r11
+ CALL_NOSPEC r11

movdqu 0x40(OUTP), INC
pxor INC, STATE1
diff --git a/arch/x86/crypto/camellia-aesni-avx-asm_64.S b/arch/x86/crypto/camellia-aesni-avx-asm_64.S
index d01ddd73de65..ecc0a9a905c4 100644
--- a/arch/x86/crypto/camellia-aesni-avx-asm_64.S
+++ b/arch/x86/crypto/camellia-aesni-avx-asm_64.S
@@ -1228,7 +1228,7 @@ SYM_FUNC_START_LOCAL(camellia_xts_crypt_16way)
vpxor 14 * 16(%rax), %xmm15, %xmm14;
vpxor 15 * 16(%rax), %xmm15, %xmm15;

- CALL_NOSPEC %r9;
+ CALL_NOSPEC r9;

addq $(16 * 16), %rsp;

diff --git a/arch/x86/crypto/camellia-aesni-avx2-asm_64.S b/arch/x86/crypto/camellia-aesni-avx2-asm_64.S
index 563ef6e83cdd..0907243c501c 100644
--- a/arch/x86/crypto/camellia-aesni-avx2-asm_64.S
+++ b/arch/x86/crypto/camellia-aesni-avx2-asm_64.S
@@ -1339,7 +1339,7 @@ SYM_FUNC_START_LOCAL(camellia_xts_crypt_32way)
vpxor 14 * 32(%rax), %ymm15, %ymm14;
vpxor 15 * 32(%rax), %ymm15, %ymm15;

- CALL_NOSPEC %r9;
+ CALL_NOSPEC r9;

addq $(16 * 32), %rsp;

diff --git a/arch/x86/crypto/crc32c-pcl-intel-asm_64.S b/arch/x86/crypto/crc32c-pcl-intel-asm_64.S
index 0e6690e3618c..8501ec4532f4 100644
--- a/arch/x86/crypto/crc32c-pcl-intel-asm_64.S
+++ b/arch/x86/crypto/crc32c-pcl-intel-asm_64.S
@@ -75,7 +75,7 @@

.text
SYM_FUNC_START(crc_pcl)
-#define bufp %rdi
+#define bufp rdi
#define bufp_dw %edi
#define bufp_w %di
#define bufp_b %dil
@@ -105,9 +105,9 @@ SYM_FUNC_START(crc_pcl)
## 1) ALIGN:
################################################################

- mov bufp, bufptmp # rdi = *buf
- neg bufp
- and $7, bufp # calculate the unalignment amount of
+ mov %bufp, bufptmp # rdi = *buf
+ neg %bufp
+ and $7, %bufp # calculate the unalignment amount of
# the address
je proc_block # Skip if aligned

@@ -123,13 +123,13 @@ SYM_FUNC_START(crc_pcl)
do_align:
#### Calculate CRC of unaligned bytes of the buffer (if any)
movq (bufptmp), tmp # load a quadward from the buffer
- add bufp, bufptmp # align buffer pointer for quadword
+ add %bufp, bufptmp # align buffer pointer for quadword
# processing
- sub bufp, len # update buffer length
+ sub %bufp, len # update buffer length
align_loop:
crc32b %bl, crc_init_dw # compute crc32 of 1-byte
shr $8, tmp # get next byte
- dec bufp
+ dec %bufp
jne align_loop

proc_block:
@@ -169,10 +169,10 @@ continue_block:
xor crc2, crc2

## branch into array
- lea jump_table(%rip), bufp
- movzxw (bufp, %rax, 2), len
- lea crc_array(%rip), bufp
- lea (bufp, len, 1), bufp
+ lea jump_table(%rip), %bufp
+ movzxw (%bufp, %rax, 2), len
+ lea crc_array(%rip), %bufp
+ lea (%bufp, len, 1), %bufp
JMP_NOSPEC bufp

################################################################
@@ -218,9 +218,9 @@ LABEL crc_ %i
## 4) Combine three results:
################################################################

- lea (K_table-8)(%rip), bufp # first entry is for idx 1
+ lea (K_table-8)(%rip), %bufp # first entry is for idx 1
shlq $3, %rax # rax *= 8
- pmovzxdq (bufp,%rax), %xmm0 # 2 consts: K1:K2
+ pmovzxdq (%bufp,%rax), %xmm0 # 2 consts: K1:K2
leal (%eax,%eax,2), %eax # rax *= 3 (total *24)
subq %rax, tmp # tmp -= rax*24

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index b67bae7091d7..7e7ffb7a5147 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -816,7 +816,7 @@ SYM_CODE_START(ret_from_fork)

/* kernel thread */
1: movl %edi, %eax
- CALL_NOSPEC %ebx
+ CALL_NOSPEC ebx
/*
* A kernel thread is allowed to return here after successfully
* calling do_execve(). Exit to userspace to complete the execve()
@@ -1501,7 +1501,7 @@ SYM_CODE_START_LOCAL_NOALIGN(common_exception_read_cr2)

TRACE_IRQS_OFF
movl %esp, %eax # pt_regs pointer
- CALL_NOSPEC %edi
+ CALL_NOSPEC edi
jmp ret_from_exception
SYM_CODE_END(common_exception_read_cr2)

@@ -1522,7 +1522,7 @@ SYM_CODE_START_LOCAL_NOALIGN(common_exception)

TRACE_IRQS_OFF
movl %esp, %eax # pt_regs pointer
- CALL_NOSPEC %edi
+ CALL_NOSPEC edi
jmp ret_from_exception
SYM_CODE_END(common_exception)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 0e9504fabe52..168b798913bc 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -349,7 +349,7 @@ SYM_CODE_START(ret_from_fork)
/* kernel thread */
UNWIND_HINT_EMPTY
movq %r12, %rdi
- CALL_NOSPEC %rbx
+ CALL_NOSPEC rbx
/*
* A kernel thread is allowed to return here after successfully
* calling do_execve(). Exit to userspace to complete the execve()
diff --git a/arch/x86/include/asm/asm-prototypes.h b/arch/x86/include/asm/asm-prototypes.h
index ce92c4acc913..a75195f159cc 100644
--- a/arch/x86/include/asm/asm-prototypes.h
+++ b/arch/x86/include/asm/asm-prototypes.h
@@ -18,9 +18,13 @@ extern void cmpxchg8b_emu(void);

#ifdef CONFIG_RETPOLINE
#ifdef CONFIG_X86_32
-#define INDIRECT_THUNK(reg) extern asmlinkage void __x86_indirect_thunk_e ## reg(void);
+#define INDIRECT_THUNK(reg) \
+ extern asmlinkage void __x86_retpoline_e ## reg(void); \
+ extern asmlinkage void __x86_indirect_thunk_e ## reg(void);
#else
-#define INDIRECT_THUNK(reg) extern asmlinkage void __x86_indirect_thunk_r ## reg(void);
+#define INDIRECT_THUNK(reg) \
+ extern asmlinkage void __x86_retpoline_r ## reg(void); \
+ extern asmlinkage void __x86_indirect_thunk_r ## reg(void);
INDIRECT_THUNK(8)
INDIRECT_THUNK(9)
INDIRECT_THUNK(10)
diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
index 07e95dcb40ad..a180b0fe2fed 100644
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -76,34 +76,6 @@
.popsection
.endm

-/*
- * These are the bare retpoline primitives for indirect jmp and call.
- * Do not use these directly; they only exist to make the ALTERNATIVE
- * invocation below less ugly.
- */
-.macro RETPOLINE_JMP reg:req
- call .Ldo_rop_\@
-.Lspec_trap_\@:
- pause
- lfence
- jmp .Lspec_trap_\@
-.Ldo_rop_\@:
- mov \reg, (%_ASM_SP)
- ret
-.endm
-
-/*
- * This is a wrapper around RETPOLINE_JMP so the called function in reg
- * returns to the instruction after the macro.
- */
-.macro RETPOLINE_CALL reg:req
- jmp .Ldo_call_\@
-.Ldo_retpoline_jmp_\@:
- RETPOLINE_JMP \reg
-.Ldo_call_\@:
- call .Ldo_retpoline_jmp_\@
-.endm
-
/*
* JMP_NOSPEC and CALL_NOSPEC macros can be used instead of a simple
* indirect jmp/call which may be susceptible to the Spectre variant 2
@@ -111,23 +83,21 @@
*/
.macro JMP_NOSPEC reg:req
#ifdef CONFIG_RETPOLINE
- ANNOTATE_NOSPEC_ALTERNATIVE
- ALTERNATIVE_2 __stringify(ANNOTATE_RETPOLINE_SAFE; jmp *\reg), \
- __stringify(RETPOLINE_JMP \reg), X86_FEATURE_RETPOLINE, \
- __stringify(lfence; ANNOTATE_RETPOLINE_SAFE; jmp *\reg), X86_FEATURE_RETPOLINE_AMD
+ ALTERNATIVE "", "lfence", X86_FEATURE_RETPOLINE_AMD
+ ALTERNATIVE __stringify(ANNOTATE_RETPOLINE_SAFE; jmp *%\reg), \
+ __stringify(jmp __x86_retpoline_\reg), X86_FEATURE_RETPOLINE
#else
- jmp *\reg
+ jmp *%\reg
#endif
.endm

.macro CALL_NOSPEC reg:req
#ifdef CONFIG_RETPOLINE
- ANNOTATE_NOSPEC_ALTERNATIVE
- ALTERNATIVE_2 __stringify(ANNOTATE_RETPOLINE_SAFE; call *\reg), \
- __stringify(RETPOLINE_CALL \reg), X86_FEATURE_RETPOLINE,\
- __stringify(lfence; ANNOTATE_RETPOLINE_SAFE; call *\reg), X86_FEATURE_RETPOLINE_AMD
+ ALTERNATIVE "", "lfence", X86_FEATURE_RETPOLINE_AMD
+ ALTERNATIVE __stringify(ANNOTATE_RETPOLINE_SAFE; call *%\reg), \
+ __stringify(call __x86_retpoline_\reg), X86_FEATURE_RETPOLINE
#else
- call *\reg
+ call *%\reg
#endif
.endm

diff --git a/arch/x86/kernel/ftrace_32.S b/arch/x86/kernel/ftrace_32.S
index e8a9f8370112..e405fe1a8bf4 100644
--- a/arch/x86/kernel/ftrace_32.S
+++ b/arch/x86/kernel/ftrace_32.S
@@ -189,5 +189,5 @@ return_to_handler:
movl %eax, %ecx
popl %edx
popl %eax
- JMP_NOSPEC %ecx
+ JMP_NOSPEC ecx
#endif
diff --git a/arch/x86/kernel/ftrace_64.S b/arch/x86/kernel/ftrace_64.S
index 369e61faacfe..2f2b5702e6cf 100644
--- a/arch/x86/kernel/ftrace_64.S
+++ b/arch/x86/kernel/ftrace_64.S
@@ -303,7 +303,7 @@ trace:
* function tracing is enabled.
*/
movq ftrace_trace_function, %r8
- CALL_NOSPEC %r8
+ CALL_NOSPEC r8
restore_mcount_regs

jmp fgraph_trace
@@ -340,6 +340,6 @@ SYM_CODE_START(return_to_handler)
movq 8(%rsp), %rdx
movq (%rsp), %rax
addq $24, %rsp
- JMP_NOSPEC %rdi
+ JMP_NOSPEC rdi
SYM_CODE_END(return_to_handler)
#endif
diff --git a/arch/x86/lib/checksum_32.S b/arch/x86/lib/checksum_32.S
index 4742e8fa7ee7..d1d768912368 100644
--- a/arch/x86/lib/checksum_32.S
+++ b/arch/x86/lib/checksum_32.S
@@ -153,7 +153,7 @@ SYM_FUNC_START(csum_partial)
negl %ebx
lea 45f(%ebx,%ebx,2), %ebx
testl %esi, %esi
- JMP_NOSPEC %ebx
+ JMP_NOSPEC ebx

# Handle 2-byte-aligned regions
20: addw (%esi), %ax
@@ -436,7 +436,7 @@ SYM_FUNC_START(csum_partial_copy_generic)
andl $-32,%edx
lea 3f(%ebx,%ebx), %ebx
testl %esi, %esi
- JMP_NOSPEC %ebx
+ JMP_NOSPEC ebx
1: addl $64,%esi
addl $64,%edi
SRC(movb -32(%edx),%bl) ; SRC(movb (%edx),%bl)
diff --git a/arch/x86/lib/retpoline.S b/arch/x86/lib/retpoline.S
index 363ec132df7e..cc44702d0492 100644
--- a/arch/x86/lib/retpoline.S
+++ b/arch/x86/lib/retpoline.S
@@ -7,15 +7,35 @@
#include <asm/alternative-asm.h>
#include <asm/export.h>
#include <asm/nospec-branch.h>
+#include <asm/unwind_hints.h>
+
+/*
+ * This is the bare retpoline primitive.
+ */
+.macro RETPOLINE_JMP reg:req
+ call .Ldo_rop_\@
+.Lspec_trap_\@:
+ pause
+ lfence
+ jmp .Lspec_trap_\@
+.Ldo_rop_\@:
+ mov \reg, (%_ASM_SP)
+ ret
+.endm

.macro THUNK reg
.section .text.__x86.indirect_thunk

+ .align 32
SYM_FUNC_START(__x86_indirect_thunk_\reg)
- CFI_STARTPROC
- JMP_NOSPEC %\reg
- CFI_ENDPROC
+ JMP_NOSPEC \reg
SYM_FUNC_END(__x86_indirect_thunk_\reg)
+
+SYM_CODE_START_NOALIGN(__x86_retpoline_\reg)
+ UNWIND_HINT_EMPTY
+ RETPOLINE_JMP %\reg
+SYM_CODE_END(__x86_retpoline_\reg)
+
.endm

/*
@@ -26,7 +46,9 @@ SYM_FUNC_END(__x86_indirect_thunk_\reg)
* the simple and nasty way...
*/
#define __EXPORT_THUNK(sym) _ASM_NOKPROBE(sym); EXPORT_SYMBOL(sym)
-#define EXPORT_THUNK(reg) __EXPORT_THUNK(__x86_indirect_thunk_ ## reg)
+#define EXPORT_THUNK(reg) \
+ __EXPORT_THUNK(__x86_retpoline_ ## reg); \
+ __EXPORT_THUNK(__x86_indirect_thunk_ ## reg)
#define GENERATE_THUNK(reg) THUNK reg ; EXPORT_THUNK(reg)

GENERATE_THUNK(_ASM_AX)
diff --git a/arch/x86/platform/efi/efi_stub_64.S b/arch/x86/platform/efi/efi_stub_64.S
index 15da118f04f0..90380a17ab23 100644
--- a/arch/x86/platform/efi/efi_stub_64.S
+++ b/arch/x86/platform/efi/efi_stub_64.S
@@ -21,7 +21,7 @@ SYM_FUNC_START(__efi_call)
mov %r8, %r9
mov %rcx, %r8
mov %rsi, %rcx
- CALL_NOSPEC %rdi
+ CALL_NOSPEC rdi
leave
ret
SYM_FUNC_END(__efi_call)

2020-04-09 10:41:21

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH V2 9/9] x86/speculation: Remove all ANNOTATE_NOSPEC_ALTERNATIVE directives

On Thu, Apr 09, 2020 at 12:34:24PM +0200, Peter Zijlstra wrote:
> @@ -111,23 +83,21 @@
> */
> .macro JMP_NOSPEC reg:req
> #ifdef CONFIG_RETPOLINE
> - ANNOTATE_NOSPEC_ALTERNATIVE
> - ALTERNATIVE_2 __stringify(ANNOTATE_RETPOLINE_SAFE; jmp *\reg), \
> - __stringify(RETPOLINE_JMP \reg), X86_FEATURE_RETPOLINE, \
> - __stringify(lfence; ANNOTATE_RETPOLINE_SAFE; jmp *\reg), X86_FEATURE_RETPOLINE_AMD
> + ALTERNATIVE "", "lfence", X86_FEATURE_RETPOLINE_AMD
> + ALTERNATIVE __stringify(ANNOTATE_RETPOLINE_SAFE; jmp *%\reg), \
> + __stringify(jmp __x86_retpoline_\reg), X86_FEATURE_RETPOLINE

Still, I am being an idiot, only the call (below) needs this, this can
stay what it was:

+ ALTERNATIVE_2 __stringify(ANNOTATE_RETPOLINE_SAFE; jmp *%\reg), \
+ __stringify(jmp __x86_retpoline_\()\reg), X86_FEATURE_RETPOLINE, \
+ __stringify(lfence; ANNOTATE_RETPOLINE_SAFE; jmp *%\reg), X86_FEATURE_RETPOLINE_AMD

> #else
> - jmp *\reg
> + jmp *%\reg
> #endif
> .endm
>
> .macro CALL_NOSPEC reg:req
> #ifdef CONFIG_RETPOLINE
> - ANNOTATE_NOSPEC_ALTERNATIVE
> - ALTERNATIVE_2 __stringify(ANNOTATE_RETPOLINE_SAFE; call *\reg), \
> - __stringify(RETPOLINE_CALL \reg), X86_FEATURE_RETPOLINE,\
> - __stringify(lfence; ANNOTATE_RETPOLINE_SAFE; call *\reg), X86_FEATURE_RETPOLINE_AMD

And then this needs a comment on why it is not ALTERNATIVE_2 like above.

> + ALTERNATIVE "", "lfence", X86_FEATURE_RETPOLINE_AMD
> + ALTERNATIVE __stringify(ANNOTATE_RETPOLINE_SAFE; call *%\reg), \
> + __stringify(call __x86_retpoline_\reg), X86_FEATURE_RETPOLINE
> #else
> - call *\reg
> + call *%\reg
> #endif
> .endm

2020-05-01 18:26:05

by tip-bot2 for Haifeng Xu

[permalink] [raw]
Subject: [tip: objtool/core] objtool: UNWIND_HINT_RET_OFFSET should not check registers

The following commit has been merged into the objtool/core branch of tip:

Commit-ID: c721b3f80faebc7891211fa82de303eebadfed15
Gitweb: https://git.kernel.org/tip/c721b3f80faebc7891211fa82de303eebadfed15
Author: Alexandre Chartre <[email protected]>
AuthorDate: Tue, 07 Apr 2020 09:31:35 +02:00
Committer: Peter Zijlstra <[email protected]>
CommitterDate: Thu, 30 Apr 2020 20:14:32 +02:00

objtool: UNWIND_HINT_RET_OFFSET should not check registers

UNWIND_HINT_RET_OFFSET will adjust a modified stack. However if a
callee-saved register was pushed on the stack then the stack frame
will still appear modified. So stop checking registers when
UNWIND_HINT_RET_OFFSET is used.

Signed-off-by: Alexandre Chartre <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Miroslav Benes <[email protected]>
Acked-by: Josh Poimboeuf <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
---
tools/objtool/check.c | 8 ++++++++
1 file changed, 8 insertions(+)

diff --git a/tools/objtool/check.c b/tools/objtool/check.c
index 8af8de2..068897d 100644
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -1507,6 +1507,14 @@ static bool has_modified_stack_frame(struct instruction *insn, struct insn_state
if (cfi->stack_size != initial_func_cfi.cfa.offset + ret_offset)
return true;

+ /*
+ * If there is a ret offset hint then don't check registers
+ * because a callee-saved register might have been pushed on
+ * the stack.
+ */
+ if (ret_offset)
+ return false;
+
for (i = 0; i < CFI_NUM_REGS; i++) {
if (cfi->regs[i].base != initial_func_cfi.regs[i].base ||
cfi->regs[i].offset != initial_func_cfi.regs[i].offset)