2020-03-24 14:26:49

by Peter Zijlstra

[permalink] [raw]
Subject: [RESEND][PATCH v3 09/17] x86/static_call: Add out-of-line static call implementation

Add the x86 out-of-line static call implementation. For each key, a
permanent trampoline is created which is the destination for all static
calls for the given key. The trampoline has a direct jump which gets
patched by static_call_update() when the destination function changes.

Signed-off-by: Josh Poimboeuf <[email protected]>
[peterz: fixed trampoline, rewrote patching code]
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
---
arch/x86/Kconfig | 1 +
arch/x86/include/asm/static_call.h | 22 ++++++++++++++++++++++
arch/x86/kernel/Makefile | 1 +
arch/x86/kernel/static_call.c | 31 +++++++++++++++++++++++++++++++
4 files changed, 55 insertions(+)
create mode 100644 arch/x86/include/asm/static_call.h
create mode 100644 arch/x86/kernel/static_call.c

--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -205,6 +205,7 @@ config X86
select HAVE_FUNCTION_ARG_ACCESS_API
select HAVE_STACKPROTECTOR if CC_HAS_SANE_STACKPROTECTOR
select HAVE_STACK_VALIDATION if X86_64
+ select HAVE_STATIC_CALL
select HAVE_RSEQ
select HAVE_SYSCALL_TRACEPOINTS
select HAVE_UNSTABLE_SCHED_CLOCK
--- /dev/null
+++ b/arch/x86/include/asm/static_call.h
@@ -0,0 +1,22 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_STATIC_CALL_H
+#define _ASM_STATIC_CALL_H
+
+#include <asm/text-patching.h>
+
+/*
+ * For CONFIG_HAVE_STATIC_CALL, this is a permanent trampoline which
+ * does a direct jump to the function. The direct jump gets patched by
+ * static_call_update().
+ */
+#define ARCH_DEFINE_STATIC_CALL_TRAMP(name, func) \
+ asm(".pushsection .text, \"ax\" \n" \
+ ".align 4 \n" \
+ ".globl " STATIC_CALL_TRAMP_STR(name) " \n" \
+ STATIC_CALL_TRAMP_STR(name) ": \n" \
+ " jmp.d32 " #func " \n" \
+ ".type " STATIC_CALL_TRAMP_STR(name) ", @function \n" \
+ ".size " STATIC_CALL_TRAMP_STR(name) ", . - " STATIC_CALL_TRAMP_STR(name) " \n" \
+ ".popsection \n")
+
+#endif /* _ASM_STATIC_CALL_H */
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -63,6 +63,7 @@ obj-y += tsc.o tsc_msr.o io_delay.o rt
obj-y += pci-iommu_table.o
obj-y += resource.o
obj-y += irqflags.o
+obj-y += static_call.o

obj-y += process.o
obj-y += fpu/
--- /dev/null
+++ b/arch/x86/kernel/static_call.c
@@ -0,0 +1,31 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/static_call.h>
+#include <linux/memory.h>
+#include <linux/bug.h>
+#include <asm/text-patching.h>
+
+static void __static_call_transform(void *insn, u8 opcode, void *func)
+{
+ const void *code = text_gen_insn(opcode, (long)insn, (long)func);
+
+ if (WARN_ONCE(*(u8 *)insn != opcode,
+ "unexpected static call insn opcode 0x%x at %pS\n",
+ opcode, insn))
+ return;
+
+ if (memcmp(insn, code, CALL_INSN_SIZE) == 0)
+ return;
+
+ text_poke_bp(insn, code, CALL_INSN_SIZE, NULL);
+}
+
+void arch_static_call_transform(void *site, void *tramp, void *func)
+{
+ mutex_lock(&text_mutex);
+
+ if (tramp)
+ __static_call_transform(tramp, JMP32_INSN_OPCODE, func);
+
+ mutex_unlock(&text_mutex);
+}
+EXPORT_SYMBOL_GPL(arch_static_call_transform);



2020-03-26 14:58:08

by Borislav Petkov

[permalink] [raw]
Subject: Re: [RESEND][PATCH v3 09/17] x86/static_call: Add out-of-line static call implementation

On Tue, Mar 24, 2020 at 02:56:12PM +0100, Peter Zijlstra wrote:
> +static void __static_call_transform(void *insn, u8 opcode, void *func)
> +{
> + const void *code = text_gen_insn(opcode, (long)insn, (long)func);
> +
> + if (WARN_ONCE(*(u8 *)insn != opcode,
> + "unexpected static call insn opcode 0x%x at %pS\n",
> + opcode, insn))
> + return;
> +
> + if (memcmp(insn, code, CALL_INSN_SIZE) == 0)
> + return;
> +
> + text_poke_bp(insn, code, CALL_INSN_SIZE, NULL);

Right, this is working with CALL_INSN_SIZE but ...

> +}
> +
> +void arch_static_call_transform(void *site, void *tramp, void *func)
> +{
> + mutex_lock(&text_mutex);
> +
> + if (tramp)
> + __static_call_transform(tramp, JMP32_INSN_OPCODE, func);

... it gets called with JMP32_INSN_OPCODE too. I mean, both lengths are
equal and all - it is just a bit confusing at a first glance. Maybe slap
a small comment that it is ok.

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2020-04-06 01:10:21

by Fangrui Song

[permalink] [raw]
Subject: Re: [RESEND][PATCH v3 09/17] x86/static_call: Add out-of-line static call implementation

On 2020-03-24, Peter Zijlstra wrote:
>Add the x86 out-of-line static call implementation. For each key, a
>permanent trampoline is created which is the destination for all static
>calls for the given key. The trampoline has a direct jump which gets
>patched by static_call_update() when the destination function changes.
>
>Signed-off-by: Josh Poimboeuf <[email protected]>
>[peterz: fixed trampoline, rewrote patching code]
>Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
>---
> arch/x86/Kconfig | 1 +
> arch/x86/include/asm/static_call.h | 22 ++++++++++++++++++++++
> arch/x86/kernel/Makefile | 1 +
> arch/x86/kernel/static_call.c | 31 +++++++++++++++++++++++++++++++
> 4 files changed, 55 insertions(+)
> create mode 100644 arch/x86/include/asm/static_call.h
> create mode 100644 arch/x86/kernel/static_call.c
>
>--- a/arch/x86/Kconfig
>+++ b/arch/x86/Kconfig
>@@ -205,6 +205,7 @@ config X86
> select HAVE_FUNCTION_ARG_ACCESS_API
> select HAVE_STACKPROTECTOR if CC_HAS_SANE_STACKPROTECTOR
> select HAVE_STACK_VALIDATION if X86_64
>+ select HAVE_STATIC_CALL
> select HAVE_RSEQ
> select HAVE_SYSCALL_TRACEPOINTS
> select HAVE_UNSTABLE_SCHED_CLOCK
>--- /dev/null
>+++ b/arch/x86/include/asm/static_call.h
>@@ -0,0 +1,22 @@
>+/* SPDX-License-Identifier: GPL-2.0 */
>+#ifndef _ASM_STATIC_CALL_H
>+#define _ASM_STATIC_CALL_H
>+
>+#include <asm/text-patching.h>
>+
>+/*
>+ * For CONFIG_HAVE_STATIC_CALL, this is a permanent trampoline which
>+ * does a direct jump to the function. The direct jump gets patched by
>+ * static_call_update().
>+ */
>+#define ARCH_DEFINE_STATIC_CALL_TRAMP(name, func) \
>+ asm(".pushsection .text, \"ax\" \n" \
>+ ".align 4 \n" \
>+ ".globl " STATIC_CALL_TRAMP_STR(name) " \n" \
>+ STATIC_CALL_TRAMP_STR(name) ": \n" \
>+ " jmp.d32 " #func " \n" \
>+ ".type " STATIC_CALL_TRAMP_STR(name) ", @function \n" \
>+ ".size " STATIC_CALL_TRAMP_STR(name) ", . - " STATIC_CALL_TRAMP_STR(name) " \n" \
>+ ".popsection \n")
>+
>+#endif /* _ASM_STATIC_CALL_H */

Hi Peter,

Coming here from https://github.com/ClangBuiltLinux/linux/issues/974

jmp.d32 is not recognized by clang integrated assembler.
The syntax appears to be very rarely used. According to Debian Code Search,
u-boot is the only project using this syntax.

In March 2017, gas added the pseudo prefix {disp32}
https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=86fa6981e7487e2c2df4337aa75ed2d93c32eaf2
which generalizes jmp.d32 ({disp32} jmp foo)

I wonder whether the instruction jmp.d32 can be replaced with the trick in
arch/x86/include/asm/jump_label.h for clang portability.

% grep -A2 'jmp.d32' arch/x86/include/asm/jump_label.h
/* Equivalent to "jmp.d32 \target" */
.byte 0xe9
.long \target - .Lstatic_jump_after_\@
...

+ [email protected]

2020-04-06 11:05:36

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [RESEND][PATCH v3 09/17] x86/static_call: Add out-of-line static call implementation

On Sun, Apr 05, 2020 at 06:08:59PM -0700, Fangrui Song wrote:
> On 2020-03-24, Peter Zijlstra wrote:

> > +#define ARCH_DEFINE_STATIC_CALL_TRAMP(name, func) \
> > + asm(".pushsection .text, \"ax\" \n" \
> > + ".align 4 \n" \
> > + ".globl " STATIC_CALL_TRAMP_STR(name) " \n" \
> > + STATIC_CALL_TRAMP_STR(name) ": \n" \
> > + " jmp.d32 " #func " \n" \
> > + ".type " STATIC_CALL_TRAMP_STR(name) ", @function \n" \
> > + ".size " STATIC_CALL_TRAMP_STR(name) ", . - " STATIC_CALL_TRAMP_STR(name) " \n" \
> > + ".popsection \n")
> > +
> > +#endif /* _ASM_STATIC_CALL_H */
>
> Hi Peter,
>
> Coming here from https://github.com/ClangBuiltLinux/linux/issues/974
>
> jmp.d32 is not recognized by clang integrated assembler.
> The syntax appears to be very rarely used. According to Debian Code Search,
> u-boot is the only project using this syntax.

*groan*... I was going to use it in more places :-/

> In March 2017, gas added the pseudo prefix {disp32}
> https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=86fa6981e7487e2c2df4337aa75ed2d93c32eaf2
> which generalizes jmp.d32 ({disp32} jmp foo)

That's all well and cute, but I can't use that because its too new. Not
until we raise the minimum gcc/bintils version to something that
includes that.

> I wonder whether the instruction jmp.d32 can be replaced with the trick in
> arch/x86/include/asm/jump_label.h for clang portability.
>
> % grep -A2 'jmp.d32' arch/x86/include/asm/jump_label.h
> /* Equivalent to "jmp.d32 \target" */
> .byte 0xe9
> .long \target - .Lstatic_jump_after_\@

Sure, but I was hoping to move away from that since all assemblers
should now support jmp.d32. Except of course, you have to go ruin
things.

The thing is, jmp.d32 reads so much nicer than the above crap.

Also, I still need a meta instruction like:

nopjmp $label

what works just like 'jmp' but instead emits either a nop2 or a nop5.
I tried various hacks to get GAS to emit that, but no luck :/

The only up-side from that new syntax is that I suppose we can go write:

{disp8} push \vec

without gas shitting itself and emitting a 5 byte push just because..
Except of course we can't, for the same reason I can't go around and
use:

{disp32} jmp

in the above code.

2020-04-06 18:31:41

by Nick Desaulniers

[permalink] [raw]
Subject: Re: [RESEND][PATCH v3 09/17] x86/static_call: Add out-of-line static call implementation

On Mon, Apr 6, 2020 at 4:04 AM Peter Zijlstra <[email protected]> wrote:
>
> On Sun, Apr 05, 2020 at 06:08:59PM -0700, Fangrui Song wrote:
> > On 2020-03-24, Peter Zijlstra wrote:
>
> > > +#define ARCH_DEFINE_STATIC_CALL_TRAMP(name, func) \
> > > + asm(".pushsection .text, \"ax\" \n" \
> > > + ".align 4 \n" \
> > > + ".globl " STATIC_CALL_TRAMP_STR(name) " \n" \
> > > + STATIC_CALL_TRAMP_STR(name) ": \n" \
> > > + " jmp.d32 " #func " \n" \
> > > + ".type " STATIC_CALL_TRAMP_STR(name) ", @function \n" \
> > > + ".size " STATIC_CALL_TRAMP_STR(name) ", . - " STATIC_CALL_TRAMP_STR(name) " \n" \
> > > + ".popsection \n")
> > > +
> > > +#endif /* _ASM_STATIC_CALL_H */
> >
> > Hi Peter,
> >
> > Coming here from https://github.com/ClangBuiltLinux/linux/issues/974
> >
> > jmp.d32 is not recognized by clang integrated assembler.
> > The syntax appears to be very rarely used. According to Debian Code Search,
> > u-boot is the only project using this syntax.
>
> *groan*... I was going to use it in more places :-/
>
> > In March 2017, gas added the pseudo prefix {disp32}
> > https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=86fa6981e7487e2c2df4337aa75ed2d93c32eaf2
> > which generalizes jmp.d32 ({disp32} jmp foo)
>
> That's all well and cute, but I can't use that because its too new. Not
> until we raise the minimum gcc/bintils version to something that
> includes that.

If it seems like it would be useful, let us know. If it doesn't have
some ridiculous baggage or inconsistency or unspecified behavior,
we're generally happy to do so. We have finite resources so knowing
where to focus our attention helps us sort the never ending TODO list.
It's not easy to predict which feature we'll need to drop everything
to implement next, so any help there would be much appreciated.

>
> > I wonder whether the instruction jmp.d32 can be replaced with the trick in
> > arch/x86/include/asm/jump_label.h for clang portability.
> >
> > % grep -A2 'jmp.d32' arch/x86/include/asm/jump_label.h
> > /* Equivalent to "jmp.d32 \target" */
> > .byte 0xe9
> > .long \target - .Lstatic_jump_after_\@
>
> Sure, but I was hoping to move away from that since all assemblers
> should now support jmp.d32. Except of course, you have to go ruin
> things.
>
> The thing is, jmp.d32 reads so much nicer than the above crap.
>
> Also, I still need a meta instruction like:
>
> nopjmp $label
>
> what works just like 'jmp' but instead emits either a nop2 or a nop5.
> I tried various hacks to get GAS to emit that, but no luck :/
>
> The only up-side from that new syntax is that I suppose we can go write:
>
> {disp8} push \vec
>
> without gas shitting itself and emitting a 5 byte push just because..
> Except of course we can't, for the same reason I can't go around and
> use:
>
> {disp32} jmp
>
> in the above code.
>
> --
> You received this message because you are subscribed to the Google Groups "Clang Built Linux" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
> To view this discussion on the web visit https://groups.google.com/d/msgid/clang-built-linux/20200406110438.GJ20730%40hirez.programming.kicks-ass.net.



--
Thanks,
~Nick Desaulniers