2019-06-05 13:26:09

by Peter Zijlstra

[permalink] [raw]
Subject: [PATCH 12/15] x86/static_call: Add out-of-line static call implementation

From: Josh Poimboeuf <[email protected]>

Add the x86 out-of-line static call implementation. For each key, a
permanent trampoline is created which is the destination for all static
calls for the given key. The trampoline has a direct jump which gets
patched by static_call_update() when the destination function changes.

Cc: [email protected]
Cc: Steven Rostedt <[email protected]>
Cc: Julia Cartwright <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: Jason Baron <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Jiri Kosina <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: David Laight <[email protected]>
Cc: Jessica Yu <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Signed-off-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/00b08f2194e80241decbf206624b6580b9b8855b.1543200841.git.jpoimboe@redhat.com
---
arch/x86/Kconfig | 1
arch/x86/include/asm/static_call.h | 28 +++++++++++++++++++++++++++
arch/x86/kernel/Makefile | 1
arch/x86/kernel/static_call.c | 38 +++++++++++++++++++++++++++++++++++++
4 files changed, 68 insertions(+)
create mode 100644 arch/x86/include/asm/static_call.h
create mode 100644 arch/x86/kernel/static_call.c

--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -198,6 +198,7 @@ config X86
select HAVE_FUNCTION_ARG_ACCESS_API
select HAVE_STACKPROTECTOR if CC_HAS_SANE_STACKPROTECTOR
select HAVE_STACK_VALIDATION if X86_64
+ select HAVE_STATIC_CALL
select HAVE_RSEQ
select HAVE_SYSCALL_TRACEPOINTS
select HAVE_UNSTABLE_SCHED_CLOCK
--- /dev/null
+++ b/arch/x86/include/asm/static_call.h
@@ -0,0 +1,28 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_STATIC_CALL_H
+#define _ASM_STATIC_CALL_H
+
+/*
+ * Manually construct a 5-byte direct JMP to prevent the assembler from
+ * optimizing it into a 2-byte JMP.
+ */
+#define __ARCH_STATIC_CALL_JMP_LABEL(key) ".L" __stringify(key ## _after_jmp)
+#define __ARCH_STATIC_CALL_TRAMP_JMP(key, func) \
+ ".byte 0xe9 \n" \
+ ".long " #func " - " __ARCH_STATIC_CALL_JMP_LABEL(key) "\n" \
+ __ARCH_STATIC_CALL_JMP_LABEL(key) ":"
+
+/*
+ * This is a permanent trampoline which does a direct jump to the function.
+ * The direct jump get patched by static_call_update().
+ */
+#define ARCH_DEFINE_STATIC_CALL_TRAMP(key, func) \
+ asm(".pushsection .text, \"ax\" \n" \
+ ".align 4 \n" \
+ ".globl " STATIC_CALL_TRAMP_STR(key) " \n" \
+ ".type " STATIC_CALL_TRAMP_STR(key) ", @function \n" \
+ STATIC_CALL_TRAMP_STR(key) ": \n" \
+ __ARCH_STATIC_CALL_TRAMP_JMP(key, func) " \n" \
+ ".popsection \n")
+
+#endif /* _ASM_STATIC_CALL_H */
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -63,6 +63,7 @@ obj-y += tsc.o tsc_msr.o io_delay.o rt
obj-y += pci-iommu_table.o
obj-y += resource.o
obj-y += irqflags.o
+obj-y += static_call.o

obj-y += process.o
obj-y += fpu/
--- /dev/null
+++ b/arch/x86/kernel/static_call.c
@@ -0,0 +1,38 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/static_call.h>
+#include <linux/memory.h>
+#include <linux/bug.h>
+#include <asm/text-patching.h>
+#include <asm/nospec-branch.h>
+
+#define CALL_INSN_SIZE 5
+
+void arch_static_call_transform(void *site, void *tramp, void *func)
+{
+ unsigned char opcodes[CALL_INSN_SIZE];
+ unsigned char insn_opcode;
+ unsigned long insn;
+ s32 dest_relative;
+
+ mutex_lock(&text_mutex);
+
+ insn = (unsigned long)tramp;
+
+ insn_opcode = *(unsigned char *)insn;
+ if (insn_opcode != 0xE9) {
+ WARN_ONCE(1, "unexpected static call insn opcode 0x%x at %pS",
+ insn_opcode, (void *)insn);
+ goto unlock;
+ }
+
+ dest_relative = (long)(func) - (long)(insn + CALL_INSN_SIZE);
+
+ opcodes[0] = insn_opcode;
+ memcpy(&opcodes[1], &dest_relative, CALL_INSN_SIZE - 1);
+
+ text_poke_bp((void *)insn, opcodes, CALL_INSN_SIZE, NULL);
+
+unlock:
+ mutex_unlock(&text_mutex);
+}
+EXPORT_SYMBOL_GPL(arch_static_call_transform);



2019-06-07 06:16:39

by Nadav Amit

[permalink] [raw]
Subject: Re: [PATCH 12/15] x86/static_call: Add out-of-line static call implementation

> On Jun 5, 2019, at 6:08 AM, Peter Zijlstra <[email protected]> wrote:
>
> From: Josh Poimboeuf <[email protected]>
>
> Add the x86 out-of-line static call implementation. For each key, a
> permanent trampoline is created which is the destination for all static
> calls for the given key. The trampoline has a direct jump which gets
> patched by static_call_update() when the destination function changes.
>
> Cc: [email protected]
> Cc: Steven Rostedt <[email protected]>
> Cc: Julia Cartwright <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Ard Biesheuvel <[email protected]>
> Cc: Jason Baron <[email protected]>
> Cc: Linus Torvalds <[email protected]>
> Cc: Jiri Kosina <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Masami Hiramatsu <[email protected]>
> Cc: Borislav Petkov <[email protected]>
> Cc: David Laight <[email protected]>
> Cc: Jessica Yu <[email protected]>
> Cc: Andy Lutomirski <[email protected]>
> Cc: "H. Peter Anvin" <[email protected]>
> Signed-off-by: Josh Poimboeuf <[email protected]>
> Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
> Link: https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Flkml.kernel.org%2Fr%2F00b08f2194e80241decbf206624b6580b9b8855b.1543200841.git.jpoimboe%40redhat.com&amp;data=02%7C01%7Cnamit%40vmware.com%7C13bc03381930464a018e08d6e9b8f90e%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C636953378007810030&amp;sdata=UnHEUYEYV3FBSZj667lZYzGKRov%2B1PdAjAnM%2BqOz3Ns%3D&amp;reserved=0
> ---
> arch/x86/Kconfig | 1
> arch/x86/include/asm/static_call.h | 28 +++++++++++++++++++++++++++
> arch/x86/kernel/Makefile | 1
> arch/x86/kernel/static_call.c | 38 +++++++++++++++++++++++++++++++++++++
> 4 files changed, 68 insertions(+)
> create mode 100644 arch/x86/include/asm/static_call.h
> create mode 100644 arch/x86/kernel/static_call.c
>
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -198,6 +198,7 @@ config X86
> select HAVE_FUNCTION_ARG_ACCESS_API
> select HAVE_STACKPROTECTOR if CC_HAS_SANE_STACKPROTECTOR
> select HAVE_STACK_VALIDATION if X86_64
> + select HAVE_STATIC_CALL
> select HAVE_RSEQ
> select HAVE_SYSCALL_TRACEPOINTS
> select HAVE_UNSTABLE_SCHED_CLOCK
> --- /dev/null
> +++ b/arch/x86/include/asm/static_call.h
> @@ -0,0 +1,28 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef _ASM_STATIC_CALL_H
> +#define _ASM_STATIC_CALL_H
> +
> +/*
> + * Manually construct a 5-byte direct JMP to prevent the assembler from
> + * optimizing it into a 2-byte JMP.
> + */
> +#define __ARCH_STATIC_CALL_JMP_LABEL(key) ".L" __stringify(key ## _after_jmp)
> +#define __ARCH_STATIC_CALL_TRAMP_JMP(key, func) \
> + ".byte 0xe9 \n" \
> + ".long " #func " - " __ARCH_STATIC_CALL_JMP_LABEL(key) "\n" \
> + __ARCH_STATIC_CALL_JMP_LABEL(key) ":"
> +
> +/*
> + * This is a permanent trampoline which does a direct jump to the function.
> + * The direct jump get patched by static_call_update().
> + */
> +#define ARCH_DEFINE_STATIC_CALL_TRAMP(key, func) \
> + asm(".pushsection .text, \"ax\" \n" \
> + ".align 4 \n" \
> + ".globl " STATIC_CALL_TRAMP_STR(key) " \n" \
> + ".type " STATIC_CALL_TRAMP_STR(key) ", @function \n" \
> + STATIC_CALL_TRAMP_STR(key) ": \n" \
> + __ARCH_STATIC_CALL_TRAMP_JMP(key, func) " \n" \
> + ".popsection \n")
> +
> +#endif /* _ASM_STATIC_CALL_H */
> --- a/arch/x86/kernel/Makefile
> +++ b/arch/x86/kernel/Makefile
> @@ -63,6 +63,7 @@ obj-y += tsc.o tsc_msr.o io_delay.o rt
> obj-y += pci-iommu_table.o
> obj-y += resource.o
> obj-y += irqflags.o
> +obj-y += static_call.o
>
> obj-y += process.o
> obj-y += fpu/
> --- /dev/null
> +++ b/arch/x86/kernel/static_call.c
> @@ -0,0 +1,38 @@
> +// SPDX-License-Identifier: GPL-2.0
> +#include <linux/static_call.h>
> +#include <linux/memory.h>
> +#include <linux/bug.h>
> +#include <asm/text-patching.h>
> +#include <asm/nospec-branch.h>
> +
> +#define CALL_INSN_SIZE 5
> +
> +void arch_static_call_transform(void *site, void *tramp, void *func)
> +{
> + unsigned char opcodes[CALL_INSN_SIZE];
> + unsigned char insn_opcode;
> + unsigned long insn;
> + s32 dest_relative;
> +
> + mutex_lock(&text_mutex);
> +
> + insn = (unsigned long)tramp;
> +
> + insn_opcode = *(unsigned char *)insn;
> + if (insn_opcode != 0xE9) {
> + WARN_ONCE(1, "unexpected static call insn opcode 0x%x at %pS",
> + insn_opcode, (void *)insn);
> + goto unlock;

This might happen if a kprobe is installed on the call, no?

I don’t know if you want to be more gentle handling of this case (or perhaps
modify can_probe() to prevent such a case).

2019-06-07 07:53:48

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH 12/15] x86/static_call: Add out-of-line static call implementation

On Fri, 7 Jun 2019 06:13:58 +0000
Nadav Amit <[email protected]> wrote:

> > On Jun 5, 2019, at 6:08 AM, Peter Zijlstra <[email protected]> wrote:
> >
> > From: Josh Poimboeuf <[email protected]>
> >
> > Add the x86 out-of-line static call implementation. For each key, a
> > permanent trampoline is created which is the destination for all static
> > calls for the given key. The trampoline has a direct jump which gets
> > patched by static_call_update() when the destination function changes.
> >
> > Cc: [email protected]
> > Cc: Steven Rostedt <[email protected]>
> > Cc: Julia Cartwright <[email protected]>
> > Cc: Ingo Molnar <[email protected]>
> > Cc: Ard Biesheuvel <[email protected]>
> > Cc: Jason Baron <[email protected]>
> > Cc: Linus Torvalds <[email protected]>
> > Cc: Jiri Kosina <[email protected]>
> > Cc: Thomas Gleixner <[email protected]>
> > Cc: Masami Hiramatsu <[email protected]>
> > Cc: Borislav Petkov <[email protected]>
> > Cc: David Laight <[email protected]>
> > Cc: Jessica Yu <[email protected]>
> > Cc: Andy Lutomirski <[email protected]>
> > Cc: "H. Peter Anvin" <[email protected]>
> > Signed-off-by: Josh Poimboeuf <[email protected]>
> > Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
> > Link: https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Flkml.kernel.org%2Fr%2F00b08f2194e80241decbf206624b6580b9b8855b.1543200841.git.jpoimboe%40redhat.com&amp;data=02%7C01%7Cnamit%40vmware.com%7C13bc03381930464a018e08d6e9b8f90e%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C636953378007810030&amp;sdata=UnHEUYEYV3FBSZj667lZYzGKRov%2B1PdAjAnM%2BqOz3Ns%3D&amp;reserved=0
> > ---
> > arch/x86/Kconfig | 1
> > arch/x86/include/asm/static_call.h | 28 +++++++++++++++++++++++++++
> > arch/x86/kernel/Makefile | 1
> > arch/x86/kernel/static_call.c | 38 +++++++++++++++++++++++++++++++++++++
> > 4 files changed, 68 insertions(+)
> > create mode 100644 arch/x86/include/asm/static_call.h
> > create mode 100644 arch/x86/kernel/static_call.c
> >
> > --- a/arch/x86/Kconfig
> > +++ b/arch/x86/Kconfig
> > @@ -198,6 +198,7 @@ config X86
> > select HAVE_FUNCTION_ARG_ACCESS_API
> > select HAVE_STACKPROTECTOR if CC_HAS_SANE_STACKPROTECTOR
> > select HAVE_STACK_VALIDATION if X86_64
> > + select HAVE_STATIC_CALL
> > select HAVE_RSEQ
> > select HAVE_SYSCALL_TRACEPOINTS
> > select HAVE_UNSTABLE_SCHED_CLOCK
> > --- /dev/null
> > +++ b/arch/x86/include/asm/static_call.h
> > @@ -0,0 +1,28 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +#ifndef _ASM_STATIC_CALL_H
> > +#define _ASM_STATIC_CALL_H
> > +
> > +/*
> > + * Manually construct a 5-byte direct JMP to prevent the assembler from
> > + * optimizing it into a 2-byte JMP.
> > + */
> > +#define __ARCH_STATIC_CALL_JMP_LABEL(key) ".L" __stringify(key ## _after_jmp)
> > +#define __ARCH_STATIC_CALL_TRAMP_JMP(key, func) \
> > + ".byte 0xe9 \n" \
> > + ".long " #func " - " __ARCH_STATIC_CALL_JMP_LABEL(key) "\n" \
> > + __ARCH_STATIC_CALL_JMP_LABEL(key) ":"
> > +
> > +/*
> > + * This is a permanent trampoline which does a direct jump to the function.
> > + * The direct jump get patched by static_call_update().
> > + */
> > +#define ARCH_DEFINE_STATIC_CALL_TRAMP(key, func) \
> > + asm(".pushsection .text, \"ax\" \n" \
> > + ".align 4 \n" \
> > + ".globl " STATIC_CALL_TRAMP_STR(key) " \n" \
> > + ".type " STATIC_CALL_TRAMP_STR(key) ", @function \n" \
> > + STATIC_CALL_TRAMP_STR(key) ": \n" \
> > + __ARCH_STATIC_CALL_TRAMP_JMP(key, func) " \n" \
> > + ".popsection \n")
> > +
> > +#endif /* _ASM_STATIC_CALL_H */
> > --- a/arch/x86/kernel/Makefile
> > +++ b/arch/x86/kernel/Makefile
> > @@ -63,6 +63,7 @@ obj-y += tsc.o tsc_msr.o io_delay.o rt
> > obj-y += pci-iommu_table.o
> > obj-y += resource.o
> > obj-y += irqflags.o
> > +obj-y += static_call.o
> >
> > obj-y += process.o
> > obj-y += fpu/
> > --- /dev/null
> > +++ b/arch/x86/kernel/static_call.c
> > @@ -0,0 +1,38 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +#include <linux/static_call.h>
> > +#include <linux/memory.h>
> > +#include <linux/bug.h>
> > +#include <asm/text-patching.h>
> > +#include <asm/nospec-branch.h>
> > +
> > +#define CALL_INSN_SIZE 5
> > +
> > +void arch_static_call_transform(void *site, void *tramp, void *func)
> > +{
> > + unsigned char opcodes[CALL_INSN_SIZE];
> > + unsigned char insn_opcode;
> > + unsigned long insn;
> > + s32 dest_relative;
> > +
> > + mutex_lock(&text_mutex);
> > +
> > + insn = (unsigned long)tramp;
> > +
> > + insn_opcode = *(unsigned char *)insn;
> > + if (insn_opcode != 0xE9) {
> > + WARN_ONCE(1, "unexpected static call insn opcode 0x%x at %pS",
> > + insn_opcode, (void *)insn);
> > + goto unlock;
>
> This might happen if a kprobe is installed on the call, no?
>
> I don’t know if you want to be more gentle handling of this case (or perhaps
> modify can_probe() to prevent such a case).
>

Perhaps it is better to block kprobes from attaching to a static call.
Or have it use the static call directly as it does with ftrace. But
that would probably be much more work.

-- Steve

2019-06-07 08:41:07

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 12/15] x86/static_call: Add out-of-line static call implementation

On Fri, Jun 07, 2019 at 06:13:58AM +0000, Nadav Amit wrote:
> > On Jun 5, 2019, at 6:08 AM, Peter Zijlstra <[email protected]> wrote:

> > +void arch_static_call_transform(void *site, void *tramp, void *func)
> > +{
> > + unsigned char opcodes[CALL_INSN_SIZE];
> > + unsigned char insn_opcode;
> > + unsigned long insn;
> > + s32 dest_relative;
> > +
> > + mutex_lock(&text_mutex);
> > +
> > + insn = (unsigned long)tramp;
> > +
> > + insn_opcode = *(unsigned char *)insn;
> > + if (insn_opcode != 0xE9) {
> > + WARN_ONCE(1, "unexpected static call insn opcode 0x%x at %pS",
> > + insn_opcode, (void *)insn);
> > + goto unlock;
>
> This might happen if a kprobe is installed on the call, no?
>
> I don’t know if you want to be more gentle handling of this case (or perhaps
> modify can_probe() to prevent such a case).
>

yuck.. yes, that's something that needs consideration.

2019-06-07 08:54:44

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 12/15] x86/static_call: Add out-of-line static call implementation

On Fri, Jun 07, 2019 at 10:38:46AM +0200, Peter Zijlstra wrote:
> On Fri, Jun 07, 2019 at 06:13:58AM +0000, Nadav Amit wrote:
> > > On Jun 5, 2019, at 6:08 AM, Peter Zijlstra <[email protected]> wrote:
>
> > > +void arch_static_call_transform(void *site, void *tramp, void *func)
> > > +{
> > > + unsigned char opcodes[CALL_INSN_SIZE];
> > > + unsigned char insn_opcode;
> > > + unsigned long insn;
> > > + s32 dest_relative;
> > > +
> > > + mutex_lock(&text_mutex);
> > > +
> > > + insn = (unsigned long)tramp;
> > > +
> > > + insn_opcode = *(unsigned char *)insn;
> > > + if (insn_opcode != 0xE9) {
> > > + WARN_ONCE(1, "unexpected static call insn opcode 0x%x at %pS",
> > > + insn_opcode, (void *)insn);
> > > + goto unlock;
> >
> > This might happen if a kprobe is installed on the call, no?
> >
> > I don’t know if you want to be more gentle handling of this case (or perhaps
> > modify can_probe() to prevent such a case).
> >
>
> yuck.. yes, that's something that needs consideration.

For jump_label this is avoided by jump_label_text_reserved(), I'm
thinking static_call should do the same.