2022-09-02 14:36:32

by Peter Zijlstra

[permalink] [raw]
Subject: [PATCH v2 37/59] x86/putuser: Provide room for padding

From: Thomas Gleixner <[email protected]>

Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
---
arch/x86/lib/putuser.S | 62 ++++++++++++++++++++++++++++++++++++++-----------
1 file changed, 49 insertions(+), 13 deletions(-)

--- a/arch/x86/lib/putuser.S
+++ b/arch/x86/lib/putuser.S
@@ -47,8 +47,6 @@ SYM_FUNC_START(__put_user_1)
LOAD_TASK_SIZE_MINUS_N(0)
cmp %_ASM_BX,%_ASM_CX
jae .Lbad_put_user
-SYM_INNER_LABEL(__put_user_nocheck_1, SYM_L_GLOBAL)
- ENDBR
ASM_STAC
1: movb %al,(%_ASM_CX)
xor %ecx,%ecx
@@ -56,54 +54,87 @@ SYM_INNER_LABEL(__put_user_nocheck_1, SY
RET
SYM_FUNC_END(__put_user_1)
EXPORT_SYMBOL(__put_user_1)
+
+SYM_FUNC_START(__put_user_nocheck_1)
+ ENDBR
+ ASM_STAC
+2: movb %al,(%_ASM_CX)
+ xor %ecx,%ecx
+ ASM_CLAC
+ RET
+SYM_FUNC_END(__put_user_nocheck_1)
EXPORT_SYMBOL(__put_user_nocheck_1)

SYM_FUNC_START(__put_user_2)
LOAD_TASK_SIZE_MINUS_N(1)
cmp %_ASM_BX,%_ASM_CX
jae .Lbad_put_user
-SYM_INNER_LABEL(__put_user_nocheck_2, SYM_L_GLOBAL)
- ENDBR
ASM_STAC
-2: movw %ax,(%_ASM_CX)
+3: movw %ax,(%_ASM_CX)
xor %ecx,%ecx
ASM_CLAC
RET
SYM_FUNC_END(__put_user_2)
EXPORT_SYMBOL(__put_user_2)
+
+SYM_FUNC_START(__put_user_nocheck_2)
+ ENDBR
+ ASM_STAC
+4: movw %ax,(%_ASM_CX)
+ xor %ecx,%ecx
+ ASM_CLAC
+ RET
+SYM_FUNC_END(__put_user_nocheck_2)
EXPORT_SYMBOL(__put_user_nocheck_2)

SYM_FUNC_START(__put_user_4)
LOAD_TASK_SIZE_MINUS_N(3)
cmp %_ASM_BX,%_ASM_CX
jae .Lbad_put_user
-SYM_INNER_LABEL(__put_user_nocheck_4, SYM_L_GLOBAL)
- ENDBR
ASM_STAC
-3: movl %eax,(%_ASM_CX)
+5: movl %eax,(%_ASM_CX)
xor %ecx,%ecx
ASM_CLAC
RET
SYM_FUNC_END(__put_user_4)
EXPORT_SYMBOL(__put_user_4)
+
+SYM_FUNC_START(__put_user_nocheck_4)
+ ENDBR
+ ASM_STAC
+6: movl %eax,(%_ASM_CX)
+ xor %ecx,%ecx
+ ASM_CLAC
+ RET
+SYM_FUNC_END(__put_user_nocheck_4)
EXPORT_SYMBOL(__put_user_nocheck_4)

SYM_FUNC_START(__put_user_8)
LOAD_TASK_SIZE_MINUS_N(7)
cmp %_ASM_BX,%_ASM_CX
jae .Lbad_put_user
-SYM_INNER_LABEL(__put_user_nocheck_8, SYM_L_GLOBAL)
- ENDBR
ASM_STAC
-4: mov %_ASM_AX,(%_ASM_CX)
+7: mov %_ASM_AX,(%_ASM_CX)
#ifdef CONFIG_X86_32
-5: movl %edx,4(%_ASM_CX)
+8: movl %edx,4(%_ASM_CX)
#endif
xor %ecx,%ecx
ASM_CLAC
RET
SYM_FUNC_END(__put_user_8)
EXPORT_SYMBOL(__put_user_8)
+
+SYM_FUNC_START(__put_user_nocheck_8)
+ ENDBR
+ ASM_STAC
+9: mov %_ASM_AX,(%_ASM_CX)
+#ifdef CONFIG_X86_32
+10: movl %edx,4(%_ASM_CX)
+#endif
+ xor %ecx,%ecx
+ ASM_CLAC
+ RET
+SYM_FUNC_END(__put_user_nocheck_8)
EXPORT_SYMBOL(__put_user_nocheck_8)

SYM_CODE_START_LOCAL(.Lbad_put_user_clac)
@@ -117,6 +148,11 @@ SYM_CODE_END(.Lbad_put_user_clac)
_ASM_EXTABLE_UA(2b, .Lbad_put_user_clac)
_ASM_EXTABLE_UA(3b, .Lbad_put_user_clac)
_ASM_EXTABLE_UA(4b, .Lbad_put_user_clac)
-#ifdef CONFIG_X86_32
_ASM_EXTABLE_UA(5b, .Lbad_put_user_clac)
+ _ASM_EXTABLE_UA(6b, .Lbad_put_user_clac)
+ _ASM_EXTABLE_UA(7b, .Lbad_put_user_clac)
+ _ASM_EXTABLE_UA(9b, .Lbad_put_user_clac)
+#ifdef CONFIG_X86_32
+ _ASM_EXTABLE_UA(8b, .Lbad_put_user_clac)
+ _ASM_EXTABLE_UA(10b, .Lbad_put_user_clac)
#endif



2022-09-02 17:16:38

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH v2 37/59] x86/putuser: Provide room for padding

So I don't hate this patch and it's probably good for consistency, but
I really think that the retbleed tracking could perhaps be improved to
let this be all unnecessary.

The whole return stack depth counting is already not 100% exact, and I
think we could just make the rule be that we don't track leaf
functions.

Why? It's just a off-by-one in the already not exact tracking. And -
perhaps equally importantly - leaf functions are very very common
dynamically, and I suspect it's trivial to see them.

Yes, yes, you could make objtool even smarter and actually do some
kind of function flow graph thing (and I think some people were
talking about that with the whole ret counting long long ago), but the
leaf function thing is the really simple low-hanging fruit case of
that.

Linus

2022-09-02 17:28:51

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v2 37/59] x86/putuser: Provide room for padding

On Fri, Sep 02, 2022 at 09:43:45AM -0700, Linus Torvalds wrote:
> So I don't hate this patch and it's probably good for consistency, but
> I really think that the retbleed tracking could perhaps be improved to
> let this be all unnecessary.
>
> The whole return stack depth counting is already not 100% exact, and I
> think we could just make the rule be that we don't track leaf
> functions.
>
> Why? It's just a off-by-one in the already not exact tracking. And -
> perhaps equally importantly - leaf functions are very very common
> dynamically, and I suspect it's trivial to see them.
>
> Yes, yes, you could make objtool even smarter and actually do some
> kind of function flow graph thing (and I think some people were
> talking about that with the whole ret counting long long ago), but the
> leaf function thing is the really simple low-hanging fruit case of
> that.

So I did the leaf thing a few weeks ago, and at the time the perf gains
where not worth the complexity.

I can try again :-)

2022-09-02 21:03:41

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v2 37/59] x86/putuser: Provide room for padding

On Fri, Sep 02, 2022 at 07:03:33PM +0200, Peter Zijlstra wrote:
> On Fri, Sep 02, 2022 at 09:43:45AM -0700, Linus Torvalds wrote:
> > So I don't hate this patch and it's probably good for consistency, but
> > I really think that the retbleed tracking could perhaps be improved to
> > let this be all unnecessary.
> >
> > The whole return stack depth counting is already not 100% exact, and I
> > think we could just make the rule be that we don't track leaf
> > functions.
> >
> > Why? It's just a off-by-one in the already not exact tracking. And -
> > perhaps equally importantly - leaf functions are very very common
> > dynamically, and I suspect it's trivial to see them.
> >
> > Yes, yes, you could make objtool even smarter and actually do some
> > kind of function flow graph thing (and I think some people were
> > talking about that with the whole ret counting long long ago), but the
> > leaf function thing is the really simple low-hanging fruit case of
> > that.
>
> So I did the leaf thing a few weeks ago, and at the time the perf gains
> where not worth the complexity.
>
> I can try again :-)

The below (mashup of a handful of patches) is the best I could come up
with in a hurry.

Specifically, the approach taken is that instead of the 10 byte sarq for
accounting a leaf has a 10 byte NOP in front of it. When this 'leaf'-nop
is found, the return thunk is also not used and a regular 'ret'
instruction is patched in.

Direct calls to leaf functions are unmodified; they still go to +0.

However, indirect call will unconditionally jump to -10. These will then
either hit the sarq or the fancy nop

Seems to boot in kvm (defconfig+kvm_guest.config)

That is; the thing you complained about isn't actually 'fixed', leaf
functions still need their padding.

If this patch makes you feel warm and fuzzy, I can clean it up,
otherwise I would suggest getting the stuff we have merged before adding
even more things on top.

I'll see if I get time to clean up the alignment thing this weekend,
otherwise it'll have to wait till next week or so.

---
arch/x86/include/asm/alternative.h | 7 ++
arch/x86/kernel/alternative.c | 11 ++++
arch/x86/kernel/callthunks.c | 51 +++++++++++++++++++
arch/x86/kernel/cpu/bugs.c | 2
arch/x86/kernel/module.c | 8 ++-
arch/x86/kernel/vmlinux.lds.S | 7 ++
arch/x86/lib/retpoline.S | 11 +---
tools/objtool/check.c | 95 ++++++++++++++++++++++++++++++++++++
tools/objtool/include/objtool/elf.h | 2
9 files changed, 186 insertions(+), 8 deletions(-)

--- a/arch/x86/include/asm/alternative.h
+++ b/arch/x86/include/asm/alternative.h
@@ -94,6 +94,8 @@ extern void callthunks_patch_module_call
extern void *callthunks_translate_call_dest(void *dest);
extern bool is_callthunk(void *addr);
extern int x86_call_depth_emit_accounting(u8 **pprog, void *func);
+extern void apply_leafs(s32 *start, s32 *end);
+extern bool is_leaf_function(void *addr);
#else
static __always_inline void callthunks_patch_builtin_calls(void) {}
static __always_inline void
@@ -112,6 +114,11 @@ static __always_inline int x86_call_dept
{
return 0;
}
+static __always_inline void apply_leafs(s32 *start, s32 *end) {}
+static __always_inline bool is_leaf_function(void *addr)
+{
+ return false;
+}
#endif

#ifdef CONFIG_SMP
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -114,6 +114,7 @@ static void __init_or_module add_nops(vo
}
}

+extern s32 __leaf_sites[], __leaf_sites_end[];
extern s32 __retpoline_sites[], __retpoline_sites_end[];
extern s32 __return_sites[], __return_sites_end[];
extern s32 __ibt_endbr_seal[], __ibt_endbr_seal_end[];
@@ -586,9 +587,14 @@ static int patch_return(void *addr, stru
if (x86_return_thunk == __x86_return_thunk)
return -1;

+ if (x86_return_thunk == __x86_return_skl &&
+ is_leaf_function(addr))
+ goto plain_ret;
+
i = JMP32_INSN_SIZE;
__text_gen_insn(bytes, JMP32_INSN_OPCODE, addr, x86_return_thunk, i);
} else {
+plain_ret:
bytes[i++] = RET_INSN_OPCODE;
}

@@ -988,6 +994,11 @@ void __init alternative_instructions(voi
apply_paravirt(__parainstructions, __parainstructions_end);

/*
+ * Mark the leaf sites; this affects apply_returns() and callthunks_patch*().
+ */
+ apply_leafs(__leaf_sites, __leaf_sites_end);
+
+ /*
* Rewrite the retpolines, must be done before alternatives since
* those can rewrite the retpoline thunks.
*/
--- a/arch/x86/kernel/callthunks.c
+++ b/arch/x86/kernel/callthunks.c
@@ -181,6 +181,54 @@ static const u8 nops[] = {
0x90, 0x90, 0x90, 0x90, 0x90, 0x90, 0x90, 0x90,
};

+/*
+ * 10 byte nop that spells 'leaf' in the immediate.
+ */
+static const u8 leaf_nop[] = { /* 'l', 'e', 'a', 'f' */
+ 0x2e, 0x66, 0x0f, 0x1f, 0x84, 0x00, 0x6c, 0x65, 0x61, 0x66,
+};
+
+void __init_or_module noinline apply_leafs(s32 *start, s32 *end)
+{
+ u8 buffer[16];
+ s32 *s;
+
+ for (s = start; s < end; s++) {
+ void *addr = (void *)s + *s;
+
+ if (skip_addr(addr))
+ continue;
+
+ if (copy_from_kernel_nofault(buffer, addr-10, 10))
+ continue;
+
+ /* already patched */
+ if (!memcmp(buffer, leaf_nop, 10))
+ continue;
+
+ if (memcmp(buffer, nops, 10)) {
+ pr_warn("Not NOPs: %pS %px %*ph\n", addr, addr, 10, addr);
+ continue;
+ }
+
+ text_poke_early(addr-10, leaf_nop, 10);
+ }
+}
+
+bool is_leaf_function(void *addr)
+{
+ unsigned long size, offset;
+ u8 buffer[10];
+
+ if (kallsyms_lookup_size_offset((unsigned long)addr, &size, &offset))
+ addr -= offset;
+
+ if (copy_from_kernel_nofault(buffer, addr-10, 10))
+ return false;
+
+ return memcmp(buffer, leaf_nop, 10) == 0;
+}
+
static __init_or_module void *patch_dest(void *dest, bool direct)
{
unsigned int tsize = SKL_TMPL_SIZE;
@@ -190,6 +238,9 @@ static __init_or_module void *patch_dest
if (!bcmp(pad, skl_call_thunk_template, tsize))
return pad;

+ if (!bcmp(pad, leaf_nop, tsize))
+ return dest;
+
/* Ensure there are nops */
if (bcmp(pad, nops, tsize)) {
pr_warn_once("Invalid padding area for %pS\n", dest);
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -838,6 +838,8 @@ static int __init retbleed_parse_cmdline
retbleed_cmd = RETBLEED_CMD_STUFF;
} else if (!strcmp(str, "nosmt")) {
retbleed_nosmt = true;
+ } else if (!strcmp(str, "force")) {
+ setup_force_cpu_bug(X86_BUG_RETBLEED);
} else {
pr_err("Ignoring unknown retbleed option (%s).", str);
}
--- a/arch/x86/kernel/module.c
+++ b/arch/x86/kernel/module.c
@@ -253,7 +253,7 @@ int module_finalize(const Elf_Ehdr *hdr,
struct module *me)
{
const Elf_Shdr *s, *text = NULL, *alt = NULL, *locks = NULL,
- *para = NULL, *orc = NULL, *orc_ip = NULL,
+ *para = NULL, *orc = NULL, *orc_ip = NULL, *leafs = NULL,
*retpolines = NULL, *returns = NULL, *ibt_endbr = NULL,
*calls = NULL;
char *secstrings = (void *)hdr + sechdrs[hdr->e_shstrndx].sh_offset;
@@ -271,6 +271,8 @@ int module_finalize(const Elf_Ehdr *hdr,
orc = s;
if (!strcmp(".orc_unwind_ip", secstrings + s->sh_name))
orc_ip = s;
+ if (!strcmp(".leaf_sites", secstrings + s->sh_name))
+ leafs = s;
if (!strcmp(".retpoline_sites", secstrings + s->sh_name))
retpolines = s;
if (!strcmp(".return_sites", secstrings + s->sh_name))
@@ -289,6 +291,10 @@ int module_finalize(const Elf_Ehdr *hdr,
void *pseg = (void *)para->sh_addr;
apply_paravirt(pseg, pseg + para->sh_size);
}
+ if (leafs) {
+ void *rseg = (void *)leafs->sh_addr;
+ apply_leafs(rseg, rseg + leafs->sh_size);
+ }
if (retpolines) {
void *rseg = (void *)retpolines->sh_addr;
apply_retpolines(rseg, rseg + retpolines->sh_size);
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -298,6 +298,13 @@ SECTIONS
*(.call_sites)
__call_sites_end = .;
}
+
+ . = ALIGN(8);
+ .leaf_sites : AT(ADDR(.leaf_sites) - LOAD_OFFSET) {
+ __leaf_sites = .;
+ *(.leaf_sites)
+ __leaf_sites_end = .;
+ }
#endif

#ifdef CONFIG_X86_KERNEL_IBT
--- a/arch/x86/lib/retpoline.S
+++ b/arch/x86/lib/retpoline.S
@@ -77,9 +77,9 @@ SYM_CODE_END(__x86_indirect_thunk_array)
SYM_INNER_LABEL(__x86_indirect_call_thunk_\reg, SYM_L_GLOBAL)
UNWIND_HINT_EMPTY
ANNOTATE_NOENDBR
-
- CALL_DEPTH_ACCOUNT
- POLINE \reg
+ sub $10, %\reg
+ POLINE \reg
+ add $10, %\reg
ANNOTATE_UNRET_SAFE
ret
int3
@@ -216,10 +216,7 @@ SYM_FUNC_START(__x86_return_skl)
1:
CALL_THUNKS_DEBUG_INC_STUFFS
.rept 16
- ANNOTATE_INTRA_FUNCTION_CALL
- call 2f
- int3
-2:
+ __FILL_RETURN_SLOT
.endr
add $(8*16), %rsp

--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -945,6 +945,74 @@ static int create_direct_call_sections(s
return 0;
}

+static int create_leaf_sites_sections(struct objtool_file *file)
+{
+ struct section *sec, *s;
+ struct symbol *sym;
+ unsigned int *loc;
+ int idx;
+
+ sec = find_section_by_name(file->elf, ".leaf_sites");
+ if (sec) {
+ INIT_LIST_HEAD(&file->call_list);
+ WARN("file already has .leaf_sites section, skipping");
+ return 0;
+ }
+
+ idx = 0;
+ for_each_sec(file, s) {
+ if (!s->text || s->init)
+ continue;
+
+ list_for_each_entry(sym, &s->symbol_list, list) {
+ if (sym->pfunc != sym)
+ continue;
+
+ if (sym->static_call_tramp)
+ continue;
+
+ if (!sym->leaf)
+ continue;
+
+ idx++;
+ }
+ }
+
+ sec = elf_create_section(file->elf, ".leaf_sites", 0, sizeof(unsigned int), idx);
+ if (!sec)
+ return -1;
+
+ idx = 0;
+ for_each_sec(file, s) {
+ if (!s->text || s->init)
+ continue;
+
+ list_for_each_entry(sym, &s->symbol_list, list) {
+ if (sym->pfunc != sym)
+ continue;
+
+ if (sym->static_call_tramp)
+ continue;
+
+ if (!sym->leaf)
+ continue;
+
+ loc = (unsigned int *)sec->data->d_buf + idx;
+ memset(loc, 0, sizeof(unsigned int));
+
+ if (elf_add_reloc_to_insn(file->elf, sec,
+ idx * sizeof(unsigned int),
+ R_X86_64_PC32,
+ s, sym->offset))
+ return -1;
+
+ idx++;
+ }
+ }
+
+ return 0;
+}
+
/*
* Warnings shouldn't be reported for ignored functions.
*/
@@ -2362,6 +2430,9 @@ static int classify_symbols(struct objto
if (!strcmp(func->name, "__fentry__"))
func->fentry = true;

+ if (!strcmp(func->name, "__stack_chk_fail"))
+ func->stack_chk = true;
+
if (is_profiling_func(func->name))
func->profiling_func = true;
}
@@ -2492,6 +2563,16 @@ static bool is_fentry_call(struct instru
return false;
}

+static bool is_stack_chk_call(struct instruction *insn)
+{
+ if (insn->type == INSN_CALL &&
+ insn->call_dest &&
+ insn->call_dest->stack_chk)
+ return true;
+
+ return false;
+}
+
static bool has_modified_stack_frame(struct instruction *insn, struct insn_state *state)
{
struct cfi_state *cfi = &state->cfi;
@@ -3269,6 +3350,9 @@ static int validate_call(struct objtool_
struct instruction *insn,
struct insn_state *state)
{
+ if (insn->func && !is_fentry_call(insn) && !is_stack_chk_call(insn))
+ insn->func->leaf = 0;
+
if (state->noinstr && state->instr <= 0 &&
!noinstr_call_dest(file, insn, insn->call_dest)) {
WARN_FUNC("call to %s() leaves .noinstr.text section",
@@ -3973,6 +4057,12 @@ static int validate_section(struct objto
init_insn_state(file, &state, sec);
set_func_state(&state.cfi);

+ /*
+ * Asumme it is a leaf function; will be cleared for any CALL
+ * encountered while validating the branches.
+ */
+ func->leaf = 1;
+
warnings += validate_symbol(file, sec, func, &state);
}

@@ -4358,6 +4448,11 @@ int check(struct objtool_file *file)
if (ret < 0)
goto out;
warnings += ret;
+
+ ret = create_leaf_sites_sections(file);
+ if (ret < 0)
+ goto out;
+ warnings += ret;
}

if (opts.rethunk) {
--- a/tools/objtool/include/objtool/elf.h
+++ b/tools/objtool/include/objtool/elf.h
@@ -61,6 +61,8 @@ struct symbol {
u8 return_thunk : 1;
u8 fentry : 1;
u8 profiling_func : 1;
+ u8 leaf : 1;
+ u8 stack_chk : 1;
struct list_head pv_target;
};

2022-09-02 22:18:06

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH v2 37/59] x86/putuser: Provide room for padding

On Fri, Sep 2, 2022 at 1:25 PM Peter Zijlstra <[email protected]> wrote:
>
> The below (mashup of a handful of patches) is the best I could come up
> with in a hurry.

Hmm. It doesn't look too horrible, but yeah, fi it still ends up
getting the same padding overhead I'm not sure it ends up really
mattering.

I was hoping that some of the overhead would just go away - looking at
my kernel build, we do have a fair number of functions that are 1-31
bytes (according to a random and probably broken little shell script:

objdump -t vmlinux |
grep 'F .text' |
sort |
cut -f2- |
grep '^00000000000000[01]'

but from a quick look, a fair number of them aren't actually even leaf
functions (ie they are small because they just call another function
with a set of simple arguments, often as a tail-call).,

So maybe it's just not worth it.

> If this patch makes you feel warm and fuzzy, I can clean it up,
> otherwise I would suggest getting the stuff we have merged before adding
> even more things on top.

Yeah, it doesn't look that convincing. I have no real numbers - a lot
of small functions, but I'm not convinced that it is worth worrying
about them, particularly if it doesn't really help the actual text
size.

Linus

2022-09-03 17:57:32

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH v2 37/59] x86/putuser: Provide room for padding

On Fri, Sep 2, 2022 at 2:46 PM Linus Torvalds
<[email protected]> wrote:
>
> Hmm. It doesn't look too horrible, but yeah, fi it still ends up
> getting the same padding overhead I'm not sure it ends up really
> mattering.

Oh, and I think it's buggy anyway.

If there are any tail-calls to a leaf function, the tracking now gets
out of whack. So it's no longer a "don't bother counting the last
level", now it ends up being a "count got off by one".

Oh well.

Linus

2022-09-05 07:23:43

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v2 37/59] x86/putuser: Provide room for padding

On Sat, Sep 03, 2022 at 10:26:45AM -0700, Linus Torvalds wrote:
> On Fri, Sep 2, 2022 at 2:46 PM Linus Torvalds
> <[email protected]> wrote:
> >
> > Hmm. It doesn't look too horrible, but yeah, fi it still ends up
> > getting the same padding overhead I'm not sure it ends up really
> > mattering.
>
> Oh, and I think it's buggy anyway.
>
> If there are any tail-calls to a leaf function, the tracking now gets
> out of whack. So it's no longer a "don't bother counting the last
> level", now it ends up being a "count got off by one".

See validate_sibling_call(), it too calls validate_call().

(Although prehaps I should go s/sibling/tail/ on all that for clarity)

2022-09-05 11:41:46

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v2 37/59] x86/putuser: Provide room for padding

On Mon, Sep 05, 2022 at 09:16:43AM +0200, Peter Zijlstra wrote:
> On Sat, Sep 03, 2022 at 10:26:45AM -0700, Linus Torvalds wrote:
> > On Fri, Sep 2, 2022 at 2:46 PM Linus Torvalds
> > <[email protected]> wrote:
> > >
> > > Hmm. It doesn't look too horrible, but yeah, fi it still ends up
> > > getting the same padding overhead I'm not sure it ends up really
> > > mattering.
> >
> > Oh, and I think it's buggy anyway.
> >
> > If there are any tail-calls to a leaf function, the tracking now gets
> > out of whack. So it's no longer a "don't bother counting the last
> > level", now it ends up being a "count got off by one".
>
> See validate_sibling_call(), it too calls validate_call().
>
> (Although prehaps I should go s/sibling/tail/ on all that for clarity)

Ah, no, you're right and I needed more wake-up juice.