From: Thomas Gleixner <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
---
arch/x86/lib/putuser.S | 62 ++++++++++++++++++++++++++++++++++++++-----------
1 file changed, 49 insertions(+), 13 deletions(-)
--- a/arch/x86/lib/putuser.S
+++ b/arch/x86/lib/putuser.S
@@ -47,8 +47,6 @@ SYM_FUNC_START(__put_user_1)
LOAD_TASK_SIZE_MINUS_N(0)
cmp %_ASM_BX,%_ASM_CX
jae .Lbad_put_user
-SYM_INNER_LABEL(__put_user_nocheck_1, SYM_L_GLOBAL)
- ENDBR
ASM_STAC
1: movb %al,(%_ASM_CX)
xor %ecx,%ecx
@@ -56,54 +54,87 @@ SYM_INNER_LABEL(__put_user_nocheck_1, SY
RET
SYM_FUNC_END(__put_user_1)
EXPORT_SYMBOL(__put_user_1)
+
+SYM_FUNC_START(__put_user_nocheck_1)
+ ENDBR
+ ASM_STAC
+2: movb %al,(%_ASM_CX)
+ xor %ecx,%ecx
+ ASM_CLAC
+ RET
+SYM_FUNC_END(__put_user_nocheck_1)
EXPORT_SYMBOL(__put_user_nocheck_1)
SYM_FUNC_START(__put_user_2)
LOAD_TASK_SIZE_MINUS_N(1)
cmp %_ASM_BX,%_ASM_CX
jae .Lbad_put_user
-SYM_INNER_LABEL(__put_user_nocheck_2, SYM_L_GLOBAL)
- ENDBR
ASM_STAC
-2: movw %ax,(%_ASM_CX)
+3: movw %ax,(%_ASM_CX)
xor %ecx,%ecx
ASM_CLAC
RET
SYM_FUNC_END(__put_user_2)
EXPORT_SYMBOL(__put_user_2)
+
+SYM_FUNC_START(__put_user_nocheck_2)
+ ENDBR
+ ASM_STAC
+4: movw %ax,(%_ASM_CX)
+ xor %ecx,%ecx
+ ASM_CLAC
+ RET
+SYM_FUNC_END(__put_user_nocheck_2)
EXPORT_SYMBOL(__put_user_nocheck_2)
SYM_FUNC_START(__put_user_4)
LOAD_TASK_SIZE_MINUS_N(3)
cmp %_ASM_BX,%_ASM_CX
jae .Lbad_put_user
-SYM_INNER_LABEL(__put_user_nocheck_4, SYM_L_GLOBAL)
- ENDBR
ASM_STAC
-3: movl %eax,(%_ASM_CX)
+5: movl %eax,(%_ASM_CX)
xor %ecx,%ecx
ASM_CLAC
RET
SYM_FUNC_END(__put_user_4)
EXPORT_SYMBOL(__put_user_4)
+
+SYM_FUNC_START(__put_user_nocheck_4)
+ ENDBR
+ ASM_STAC
+6: movl %eax,(%_ASM_CX)
+ xor %ecx,%ecx
+ ASM_CLAC
+ RET
+SYM_FUNC_END(__put_user_nocheck_4)
EXPORT_SYMBOL(__put_user_nocheck_4)
SYM_FUNC_START(__put_user_8)
LOAD_TASK_SIZE_MINUS_N(7)
cmp %_ASM_BX,%_ASM_CX
jae .Lbad_put_user
-SYM_INNER_LABEL(__put_user_nocheck_8, SYM_L_GLOBAL)
- ENDBR
ASM_STAC
-4: mov %_ASM_AX,(%_ASM_CX)
+7: mov %_ASM_AX,(%_ASM_CX)
#ifdef CONFIG_X86_32
-5: movl %edx,4(%_ASM_CX)
+8: movl %edx,4(%_ASM_CX)
#endif
xor %ecx,%ecx
ASM_CLAC
RET
SYM_FUNC_END(__put_user_8)
EXPORT_SYMBOL(__put_user_8)
+
+SYM_FUNC_START(__put_user_nocheck_8)
+ ENDBR
+ ASM_STAC
+9: mov %_ASM_AX,(%_ASM_CX)
+#ifdef CONFIG_X86_32
+10: movl %edx,4(%_ASM_CX)
+#endif
+ xor %ecx,%ecx
+ ASM_CLAC
+ RET
+SYM_FUNC_END(__put_user_nocheck_8)
EXPORT_SYMBOL(__put_user_nocheck_8)
SYM_CODE_START_LOCAL(.Lbad_put_user_clac)
@@ -117,6 +148,11 @@ SYM_CODE_END(.Lbad_put_user_clac)
_ASM_EXTABLE_UA(2b, .Lbad_put_user_clac)
_ASM_EXTABLE_UA(3b, .Lbad_put_user_clac)
_ASM_EXTABLE_UA(4b, .Lbad_put_user_clac)
-#ifdef CONFIG_X86_32
_ASM_EXTABLE_UA(5b, .Lbad_put_user_clac)
+ _ASM_EXTABLE_UA(6b, .Lbad_put_user_clac)
+ _ASM_EXTABLE_UA(7b, .Lbad_put_user_clac)
+ _ASM_EXTABLE_UA(9b, .Lbad_put_user_clac)
+#ifdef CONFIG_X86_32
+ _ASM_EXTABLE_UA(8b, .Lbad_put_user_clac)
+ _ASM_EXTABLE_UA(10b, .Lbad_put_user_clac)
#endif
So I don't hate this patch and it's probably good for consistency, but
I really think that the retbleed tracking could perhaps be improved to
let this be all unnecessary.
The whole return stack depth counting is already not 100% exact, and I
think we could just make the rule be that we don't track leaf
functions.
Why? It's just a off-by-one in the already not exact tracking. And -
perhaps equally importantly - leaf functions are very very common
dynamically, and I suspect it's trivial to see them.
Yes, yes, you could make objtool even smarter and actually do some
kind of function flow graph thing (and I think some people were
talking about that with the whole ret counting long long ago), but the
leaf function thing is the really simple low-hanging fruit case of
that.
Linus
On Fri, Sep 02, 2022 at 09:43:45AM -0700, Linus Torvalds wrote:
> So I don't hate this patch and it's probably good for consistency, but
> I really think that the retbleed tracking could perhaps be improved to
> let this be all unnecessary.
>
> The whole return stack depth counting is already not 100% exact, and I
> think we could just make the rule be that we don't track leaf
> functions.
>
> Why? It's just a off-by-one in the already not exact tracking. And -
> perhaps equally importantly - leaf functions are very very common
> dynamically, and I suspect it's trivial to see them.
>
> Yes, yes, you could make objtool even smarter and actually do some
> kind of function flow graph thing (and I think some people were
> talking about that with the whole ret counting long long ago), but the
> leaf function thing is the really simple low-hanging fruit case of
> that.
So I did the leaf thing a few weeks ago, and at the time the perf gains
where not worth the complexity.
I can try again :-)
On Fri, Sep 02, 2022 at 07:03:33PM +0200, Peter Zijlstra wrote:
> On Fri, Sep 02, 2022 at 09:43:45AM -0700, Linus Torvalds wrote:
> > So I don't hate this patch and it's probably good for consistency, but
> > I really think that the retbleed tracking could perhaps be improved to
> > let this be all unnecessary.
> >
> > The whole return stack depth counting is already not 100% exact, and I
> > think we could just make the rule be that we don't track leaf
> > functions.
> >
> > Why? It's just a off-by-one in the already not exact tracking. And -
> > perhaps equally importantly - leaf functions are very very common
> > dynamically, and I suspect it's trivial to see them.
> >
> > Yes, yes, you could make objtool even smarter and actually do some
> > kind of function flow graph thing (and I think some people were
> > talking about that with the whole ret counting long long ago), but the
> > leaf function thing is the really simple low-hanging fruit case of
> > that.
>
> So I did the leaf thing a few weeks ago, and at the time the perf gains
> where not worth the complexity.
>
> I can try again :-)
The below (mashup of a handful of patches) is the best I could come up
with in a hurry.
Specifically, the approach taken is that instead of the 10 byte sarq for
accounting a leaf has a 10 byte NOP in front of it. When this 'leaf'-nop
is found, the return thunk is also not used and a regular 'ret'
instruction is patched in.
Direct calls to leaf functions are unmodified; they still go to +0.
However, indirect call will unconditionally jump to -10. These will then
either hit the sarq or the fancy nop
Seems to boot in kvm (defconfig+kvm_guest.config)
That is; the thing you complained about isn't actually 'fixed', leaf
functions still need their padding.
If this patch makes you feel warm and fuzzy, I can clean it up,
otherwise I would suggest getting the stuff we have merged before adding
even more things on top.
I'll see if I get time to clean up the alignment thing this weekend,
otherwise it'll have to wait till next week or so.
---
arch/x86/include/asm/alternative.h | 7 ++
arch/x86/kernel/alternative.c | 11 ++++
arch/x86/kernel/callthunks.c | 51 +++++++++++++++++++
arch/x86/kernel/cpu/bugs.c | 2
arch/x86/kernel/module.c | 8 ++-
arch/x86/kernel/vmlinux.lds.S | 7 ++
arch/x86/lib/retpoline.S | 11 +---
tools/objtool/check.c | 95 ++++++++++++++++++++++++++++++++++++
tools/objtool/include/objtool/elf.h | 2
9 files changed, 186 insertions(+), 8 deletions(-)
--- a/arch/x86/include/asm/alternative.h
+++ b/arch/x86/include/asm/alternative.h
@@ -94,6 +94,8 @@ extern void callthunks_patch_module_call
extern void *callthunks_translate_call_dest(void *dest);
extern bool is_callthunk(void *addr);
extern int x86_call_depth_emit_accounting(u8 **pprog, void *func);
+extern void apply_leafs(s32 *start, s32 *end);
+extern bool is_leaf_function(void *addr);
#else
static __always_inline void callthunks_patch_builtin_calls(void) {}
static __always_inline void
@@ -112,6 +114,11 @@ static __always_inline int x86_call_dept
{
return 0;
}
+static __always_inline void apply_leafs(s32 *start, s32 *end) {}
+static __always_inline bool is_leaf_function(void *addr)
+{
+ return false;
+}
#endif
#ifdef CONFIG_SMP
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -114,6 +114,7 @@ static void __init_or_module add_nops(vo
}
}
+extern s32 __leaf_sites[], __leaf_sites_end[];
extern s32 __retpoline_sites[], __retpoline_sites_end[];
extern s32 __return_sites[], __return_sites_end[];
extern s32 __ibt_endbr_seal[], __ibt_endbr_seal_end[];
@@ -586,9 +587,14 @@ static int patch_return(void *addr, stru
if (x86_return_thunk == __x86_return_thunk)
return -1;
+ if (x86_return_thunk == __x86_return_skl &&
+ is_leaf_function(addr))
+ goto plain_ret;
+
i = JMP32_INSN_SIZE;
__text_gen_insn(bytes, JMP32_INSN_OPCODE, addr, x86_return_thunk, i);
} else {
+plain_ret:
bytes[i++] = RET_INSN_OPCODE;
}
@@ -988,6 +994,11 @@ void __init alternative_instructions(voi
apply_paravirt(__parainstructions, __parainstructions_end);
/*
+ * Mark the leaf sites; this affects apply_returns() and callthunks_patch*().
+ */
+ apply_leafs(__leaf_sites, __leaf_sites_end);
+
+ /*
* Rewrite the retpolines, must be done before alternatives since
* those can rewrite the retpoline thunks.
*/
--- a/arch/x86/kernel/callthunks.c
+++ b/arch/x86/kernel/callthunks.c
@@ -181,6 +181,54 @@ static const u8 nops[] = {
0x90, 0x90, 0x90, 0x90, 0x90, 0x90, 0x90, 0x90,
};
+/*
+ * 10 byte nop that spells 'leaf' in the immediate.
+ */
+static const u8 leaf_nop[] = { /* 'l', 'e', 'a', 'f' */
+ 0x2e, 0x66, 0x0f, 0x1f, 0x84, 0x00, 0x6c, 0x65, 0x61, 0x66,
+};
+
+void __init_or_module noinline apply_leafs(s32 *start, s32 *end)
+{
+ u8 buffer[16];
+ s32 *s;
+
+ for (s = start; s < end; s++) {
+ void *addr = (void *)s + *s;
+
+ if (skip_addr(addr))
+ continue;
+
+ if (copy_from_kernel_nofault(buffer, addr-10, 10))
+ continue;
+
+ /* already patched */
+ if (!memcmp(buffer, leaf_nop, 10))
+ continue;
+
+ if (memcmp(buffer, nops, 10)) {
+ pr_warn("Not NOPs: %pS %px %*ph\n", addr, addr, 10, addr);
+ continue;
+ }
+
+ text_poke_early(addr-10, leaf_nop, 10);
+ }
+}
+
+bool is_leaf_function(void *addr)
+{
+ unsigned long size, offset;
+ u8 buffer[10];
+
+ if (kallsyms_lookup_size_offset((unsigned long)addr, &size, &offset))
+ addr -= offset;
+
+ if (copy_from_kernel_nofault(buffer, addr-10, 10))
+ return false;
+
+ return memcmp(buffer, leaf_nop, 10) == 0;
+}
+
static __init_or_module void *patch_dest(void *dest, bool direct)
{
unsigned int tsize = SKL_TMPL_SIZE;
@@ -190,6 +238,9 @@ static __init_or_module void *patch_dest
if (!bcmp(pad, skl_call_thunk_template, tsize))
return pad;
+ if (!bcmp(pad, leaf_nop, tsize))
+ return dest;
+
/* Ensure there are nops */
if (bcmp(pad, nops, tsize)) {
pr_warn_once("Invalid padding area for %pS\n", dest);
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -838,6 +838,8 @@ static int __init retbleed_parse_cmdline
retbleed_cmd = RETBLEED_CMD_STUFF;
} else if (!strcmp(str, "nosmt")) {
retbleed_nosmt = true;
+ } else if (!strcmp(str, "force")) {
+ setup_force_cpu_bug(X86_BUG_RETBLEED);
} else {
pr_err("Ignoring unknown retbleed option (%s).", str);
}
--- a/arch/x86/kernel/module.c
+++ b/arch/x86/kernel/module.c
@@ -253,7 +253,7 @@ int module_finalize(const Elf_Ehdr *hdr,
struct module *me)
{
const Elf_Shdr *s, *text = NULL, *alt = NULL, *locks = NULL,
- *para = NULL, *orc = NULL, *orc_ip = NULL,
+ *para = NULL, *orc = NULL, *orc_ip = NULL, *leafs = NULL,
*retpolines = NULL, *returns = NULL, *ibt_endbr = NULL,
*calls = NULL;
char *secstrings = (void *)hdr + sechdrs[hdr->e_shstrndx].sh_offset;
@@ -271,6 +271,8 @@ int module_finalize(const Elf_Ehdr *hdr,
orc = s;
if (!strcmp(".orc_unwind_ip", secstrings + s->sh_name))
orc_ip = s;
+ if (!strcmp(".leaf_sites", secstrings + s->sh_name))
+ leafs = s;
if (!strcmp(".retpoline_sites", secstrings + s->sh_name))
retpolines = s;
if (!strcmp(".return_sites", secstrings + s->sh_name))
@@ -289,6 +291,10 @@ int module_finalize(const Elf_Ehdr *hdr,
void *pseg = (void *)para->sh_addr;
apply_paravirt(pseg, pseg + para->sh_size);
}
+ if (leafs) {
+ void *rseg = (void *)leafs->sh_addr;
+ apply_leafs(rseg, rseg + leafs->sh_size);
+ }
if (retpolines) {
void *rseg = (void *)retpolines->sh_addr;
apply_retpolines(rseg, rseg + retpolines->sh_size);
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -298,6 +298,13 @@ SECTIONS
*(.call_sites)
__call_sites_end = .;
}
+
+ . = ALIGN(8);
+ .leaf_sites : AT(ADDR(.leaf_sites) - LOAD_OFFSET) {
+ __leaf_sites = .;
+ *(.leaf_sites)
+ __leaf_sites_end = .;
+ }
#endif
#ifdef CONFIG_X86_KERNEL_IBT
--- a/arch/x86/lib/retpoline.S
+++ b/arch/x86/lib/retpoline.S
@@ -77,9 +77,9 @@ SYM_CODE_END(__x86_indirect_thunk_array)
SYM_INNER_LABEL(__x86_indirect_call_thunk_\reg, SYM_L_GLOBAL)
UNWIND_HINT_EMPTY
ANNOTATE_NOENDBR
-
- CALL_DEPTH_ACCOUNT
- POLINE \reg
+ sub $10, %\reg
+ POLINE \reg
+ add $10, %\reg
ANNOTATE_UNRET_SAFE
ret
int3
@@ -216,10 +216,7 @@ SYM_FUNC_START(__x86_return_skl)
1:
CALL_THUNKS_DEBUG_INC_STUFFS
.rept 16
- ANNOTATE_INTRA_FUNCTION_CALL
- call 2f
- int3
-2:
+ __FILL_RETURN_SLOT
.endr
add $(8*16), %rsp
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -945,6 +945,74 @@ static int create_direct_call_sections(s
return 0;
}
+static int create_leaf_sites_sections(struct objtool_file *file)
+{
+ struct section *sec, *s;
+ struct symbol *sym;
+ unsigned int *loc;
+ int idx;
+
+ sec = find_section_by_name(file->elf, ".leaf_sites");
+ if (sec) {
+ INIT_LIST_HEAD(&file->call_list);
+ WARN("file already has .leaf_sites section, skipping");
+ return 0;
+ }
+
+ idx = 0;
+ for_each_sec(file, s) {
+ if (!s->text || s->init)
+ continue;
+
+ list_for_each_entry(sym, &s->symbol_list, list) {
+ if (sym->pfunc != sym)
+ continue;
+
+ if (sym->static_call_tramp)
+ continue;
+
+ if (!sym->leaf)
+ continue;
+
+ idx++;
+ }
+ }
+
+ sec = elf_create_section(file->elf, ".leaf_sites", 0, sizeof(unsigned int), idx);
+ if (!sec)
+ return -1;
+
+ idx = 0;
+ for_each_sec(file, s) {
+ if (!s->text || s->init)
+ continue;
+
+ list_for_each_entry(sym, &s->symbol_list, list) {
+ if (sym->pfunc != sym)
+ continue;
+
+ if (sym->static_call_tramp)
+ continue;
+
+ if (!sym->leaf)
+ continue;
+
+ loc = (unsigned int *)sec->data->d_buf + idx;
+ memset(loc, 0, sizeof(unsigned int));
+
+ if (elf_add_reloc_to_insn(file->elf, sec,
+ idx * sizeof(unsigned int),
+ R_X86_64_PC32,
+ s, sym->offset))
+ return -1;
+
+ idx++;
+ }
+ }
+
+ return 0;
+}
+
/*
* Warnings shouldn't be reported for ignored functions.
*/
@@ -2362,6 +2430,9 @@ static int classify_symbols(struct objto
if (!strcmp(func->name, "__fentry__"))
func->fentry = true;
+ if (!strcmp(func->name, "__stack_chk_fail"))
+ func->stack_chk = true;
+
if (is_profiling_func(func->name))
func->profiling_func = true;
}
@@ -2492,6 +2563,16 @@ static bool is_fentry_call(struct instru
return false;
}
+static bool is_stack_chk_call(struct instruction *insn)
+{
+ if (insn->type == INSN_CALL &&
+ insn->call_dest &&
+ insn->call_dest->stack_chk)
+ return true;
+
+ return false;
+}
+
static bool has_modified_stack_frame(struct instruction *insn, struct insn_state *state)
{
struct cfi_state *cfi = &state->cfi;
@@ -3269,6 +3350,9 @@ static int validate_call(struct objtool_
struct instruction *insn,
struct insn_state *state)
{
+ if (insn->func && !is_fentry_call(insn) && !is_stack_chk_call(insn))
+ insn->func->leaf = 0;
+
if (state->noinstr && state->instr <= 0 &&
!noinstr_call_dest(file, insn, insn->call_dest)) {
WARN_FUNC("call to %s() leaves .noinstr.text section",
@@ -3973,6 +4057,12 @@ static int validate_section(struct objto
init_insn_state(file, &state, sec);
set_func_state(&state.cfi);
+ /*
+ * Asumme it is a leaf function; will be cleared for any CALL
+ * encountered while validating the branches.
+ */
+ func->leaf = 1;
+
warnings += validate_symbol(file, sec, func, &state);
}
@@ -4358,6 +4448,11 @@ int check(struct objtool_file *file)
if (ret < 0)
goto out;
warnings += ret;
+
+ ret = create_leaf_sites_sections(file);
+ if (ret < 0)
+ goto out;
+ warnings += ret;
}
if (opts.rethunk) {
--- a/tools/objtool/include/objtool/elf.h
+++ b/tools/objtool/include/objtool/elf.h
@@ -61,6 +61,8 @@ struct symbol {
u8 return_thunk : 1;
u8 fentry : 1;
u8 profiling_func : 1;
+ u8 leaf : 1;
+ u8 stack_chk : 1;
struct list_head pv_target;
};
On Fri, Sep 2, 2022 at 1:25 PM Peter Zijlstra <[email protected]> wrote:
>
> The below (mashup of a handful of patches) is the best I could come up
> with in a hurry.
Hmm. It doesn't look too horrible, but yeah, fi it still ends up
getting the same padding overhead I'm not sure it ends up really
mattering.
I was hoping that some of the overhead would just go away - looking at
my kernel build, we do have a fair number of functions that are 1-31
bytes (according to a random and probably broken little shell script:
objdump -t vmlinux |
grep 'F .text' |
sort |
cut -f2- |
grep '^00000000000000[01]'
but from a quick look, a fair number of them aren't actually even leaf
functions (ie they are small because they just call another function
with a set of simple arguments, often as a tail-call).,
So maybe it's just not worth it.
> If this patch makes you feel warm and fuzzy, I can clean it up,
> otherwise I would suggest getting the stuff we have merged before adding
> even more things on top.
Yeah, it doesn't look that convincing. I have no real numbers - a lot
of small functions, but I'm not convinced that it is worth worrying
about them, particularly if it doesn't really help the actual text
size.
Linus
On Fri, Sep 2, 2022 at 2:46 PM Linus Torvalds
<[email protected]> wrote:
>
> Hmm. It doesn't look too horrible, but yeah, fi it still ends up
> getting the same padding overhead I'm not sure it ends up really
> mattering.
Oh, and I think it's buggy anyway.
If there are any tail-calls to a leaf function, the tracking now gets
out of whack. So it's no longer a "don't bother counting the last
level", now it ends up being a "count got off by one".
Oh well.
Linus
On Sat, Sep 03, 2022 at 10:26:45AM -0700, Linus Torvalds wrote:
> On Fri, Sep 2, 2022 at 2:46 PM Linus Torvalds
> <[email protected]> wrote:
> >
> > Hmm. It doesn't look too horrible, but yeah, fi it still ends up
> > getting the same padding overhead I'm not sure it ends up really
> > mattering.
>
> Oh, and I think it's buggy anyway.
>
> If there are any tail-calls to a leaf function, the tracking now gets
> out of whack. So it's no longer a "don't bother counting the last
> level", now it ends up being a "count got off by one".
See validate_sibling_call(), it too calls validate_call().
(Although prehaps I should go s/sibling/tail/ on all that for clarity)
On Mon, Sep 05, 2022 at 09:16:43AM +0200, Peter Zijlstra wrote:
> On Sat, Sep 03, 2022 at 10:26:45AM -0700, Linus Torvalds wrote:
> > On Fri, Sep 2, 2022 at 2:46 PM Linus Torvalds
> > <[email protected]> wrote:
> > >
> > > Hmm. It doesn't look too horrible, but yeah, fi it still ends up
> > > getting the same padding overhead I'm not sure it ends up really
> > > mattering.
> >
> > Oh, and I think it's buggy anyway.
> >
> > If there are any tail-calls to a leaf function, the tracking now gets
> > out of whack. So it's no longer a "don't bother counting the last
> > level", now it ends up being a "count got off by one".
>
> See validate_sibling_call(), it too calls validate_call().
>
> (Although prehaps I should go s/sibling/tail/ on all that for clarity)
Ah, no, you're right and I needed more wake-up juice.