2023-11-05 21:38:13

by Uros Bizjak

[permalink] [raw]
Subject: [PATCH -tip v2 0/3] x86/callthunks: Fix and unify call thunk assembly snippets

Currently INCREMENT_CALL_DEPTH and thunk debug macros explicitly
define %gs: segment register prefix for their percpu variables.
This is not compatible with !CONFIG_SMP, which requires non-prefixed
percpu variables.

Contrary to alternatives, relocations are currently not supported in
call thunk templates. Due to unsupported relocations, two variants of
INCREMENT_CALL_DEPTH macro are needed, ASM_ prefixed that allows
relocations and non-prefixed version that allows only absolute
addresses.

The following patch series fixes above issues by

a) Making PER_CPU_VAR macro from percpu.h also available without
__ASSEMBLY__, so it can be used in 'asm' statements to conditionally
use %gs: segment register prefix, depending on CONFIG_SMP.

b) Re-using existing infrastructure from alternative.c to allow
%rip-relative relocations when copying call thunk template from its
storage location.

c) Fixing call thunks debug macros to use PER_CPU_VAR macro from
percpu.h to conditionally use %gs: segment register prefix, depending
on CONFIG_SMP.

d) Unifying ASM_ prefixed assembly macros with their non-prefixed
variants. With support of %rip-relative relocations in place, call
thunk templates allow %rip-relative addressing, so unified assembly
snippet can be used everywhere.

v2:
- Make PER_CPU_VAR macro available in 'asm' statements instead of
moving 'asm' statements to *.S assembly files.
- Re-use existing relocation infrastructure from alternative.c

Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Peter Zijlstra <[email protected]>

Uros Bizjak (3):
x86/percpu: Define PER_CPU_VAR macro also for !__ASSEMBLY__
x86/callthunks: Handle %rip-relative relocations in call thunk
template
x86/callthunks: Fix and unify call thunks assembly snippets

arch/x86/include/asm/nospec-branch.h | 23 ++++++--------------
arch/x86/include/asm/percpu.h | 5 +++++
arch/x86/include/asm/text-patching.h | 2 ++
arch/x86/kernel/alternative.c | 3 +--
arch/x86/kernel/callthunks.c | 32 ++++++++++++++++++++++------
5 files changed, 40 insertions(+), 25 deletions(-)

--
2.41.0


2023-11-05 21:38:18

by Uros Bizjak

[permalink] [raw]
Subject: [PATCH -tip v2 1/3] x86/percpu: Define PER_CPU_VAR macro also for !__ASSEMBLY__

Some C source files define 'asm' statements that use PER_CPU_VAR,
so make PER_CPU_VAR macro available also without __ASSEMBLY__.

Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Signed-off-by: Uros Bizjak <[email protected]>
---
v2: Make PER_CPU_VAR macro available in 'asm' statements
instead of moving 'asm' statements to *.S assembly files.
---
arch/x86/include/asm/percpu.h | 5 +++++
1 file changed, 5 insertions(+)

diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
index b86b27d15e52..0f12b2004b94 100644
--- a/arch/x86/include/asm/percpu.h
+++ b/arch/x86/include/asm/percpu.h
@@ -84,10 +84,15 @@
})
#endif /* CONFIG_USE_X86_SEG_SUPPORT */

+#define PER_CPU_VAR(var) %__percpu_seg:(var)__percpu_rel
+
#else /* CONFIG_SMP */
#define __percpu_seg_override
#define __percpu_prefix ""
#define __force_percpu_prefix ""
+
+#define PER_CPU_VAR(var) (var)__percpu_rel
+
#endif /* CONFIG_SMP */

#define __my_cpu_type(var) typeof(var) __percpu_seg_override
--
2.41.0

2023-11-05 21:38:30

by Uros Bizjak

[permalink] [raw]
Subject: [PATCH -tip v2 2/3] x86/callthunks: Handle %rip-relative relocations in call thunk template

Contrary to alternatives, relocations are currently not supported in
call thunk templates. Re-use the existing infrastructure from
alternative.c to allow %rip-relative relocations when copying call
thunk template from its storage location.

The patch allows unification of ASM_INCREMENT_CALL_DEPTH, which already
uses PER_CPU_VAR macro, with INCREMENT_CALL_DEPTH, used in call thunk
template, which is currently limited to use absolute address.

Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Signed-off-by: Uros Bizjak <[email protected]>
---
v2: Reuse existing relocation infrastructure from alternative.c.
---
arch/x86/include/asm/text-patching.h | 2 ++
arch/x86/kernel/alternative.c | 3 +--
arch/x86/kernel/callthunks.c | 32 ++++++++++++++++++++++------
3 files changed, 28 insertions(+), 9 deletions(-)

diff --git a/arch/x86/include/asm/text-patching.h b/arch/x86/include/asm/text-patching.h
index 29832c338cdc..ba8d900f3ebe 100644
--- a/arch/x86/include/asm/text-patching.h
+++ b/arch/x86/include/asm/text-patching.h
@@ -18,6 +18,8 @@ static inline void apply_paravirt(struct paravirt_patch_site *start,
#define __parainstructions_end NULL
#endif

+void apply_relocation(u8 *buf, size_t len, u8 *dest, u8 *src, size_t src_len);
+
/*
* Currently, the max observed size in the kernel code is
* JUMP_LABEL_NOP_SIZE/RELATIVEJUMP_SIZE, which are 5.
diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index 73be3931e4f0..66140c54d4f6 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -325,8 +325,7 @@ bool need_reloc(unsigned long offset, u8 *src, size_t src_len)
return (target < src || target > src + src_len);
}

-static void __init_or_module noinline
-apply_relocation(u8 *buf, size_t len, u8 *dest, u8 *src, size_t src_len)
+void apply_relocation(u8 *buf, size_t len, u8 *dest, u8 *src, size_t src_len)
{
int prev, target = 0;

diff --git a/arch/x86/kernel/callthunks.c b/arch/x86/kernel/callthunks.c
index e9ad518a5003..ef9c04707b3c 100644
--- a/arch/x86/kernel/callthunks.c
+++ b/arch/x86/kernel/callthunks.c
@@ -24,6 +24,8 @@

static int __initdata_or_module debug_callthunks;

+#define MAX_PATCH_LEN (255-1)
+
#define prdbg(fmt, args...) \
do { \
if (debug_callthunks) \
@@ -179,10 +181,15 @@ static const u8 nops[] = {
static void *patch_dest(void *dest, bool direct)
{
unsigned int tsize = SKL_TMPL_SIZE;
+ u8 insn_buff[MAX_PATCH_LEN];
u8 *pad = dest - tsize;

+ memcpy(insn_buff, skl_call_thunk_template, tsize);
+ apply_relocation(insn_buff, tsize, pad,
+ skl_call_thunk_template, tsize);
+
/* Already patched? */
- if (!bcmp(pad, skl_call_thunk_template, tsize))
+ if (!bcmp(pad, insn_buff, tsize))
return pad;

/* Ensure there are nops */
@@ -192,9 +199,9 @@ static void *patch_dest(void *dest, bool direct)
}

if (direct)
- memcpy(pad, skl_call_thunk_template, tsize);
+ memcpy(pad, insn_buff, tsize);
else
- text_poke_copy_locked(pad, skl_call_thunk_template, tsize, true);
+ text_poke_copy_locked(pad, insn_buff, tsize, true);
return pad;
}

@@ -291,20 +298,27 @@ void *callthunks_translate_call_dest(void *dest)
static bool is_callthunk(void *addr)
{
unsigned int tmpl_size = SKL_TMPL_SIZE;
- void *tmpl = skl_call_thunk_template;
+ u8 insn_buff[MAX_PATCH_LEN];
unsigned long dest;
+ u8 *pad;

dest = roundup((unsigned long)addr, CONFIG_FUNCTION_ALIGNMENT);
if (!thunks_initialized || skip_addr((void *)dest))
return false;

- return !bcmp((void *)(dest - tmpl_size), tmpl, tmpl_size);
+ *pad = dest - tmpl_size;
+
+ memcpy(insn_buff, skl_call_thunk_template, tmpl_size);
+ apply_relocation(insn_buff, tmpl_size, pad,
+ skl_call_thunk_template, tmpl_size);
+
+ return !bcmp(pad, insn_buff, tmpl_size);
}

int x86_call_depth_emit_accounting(u8 **pprog, void *func)
{
unsigned int tmpl_size = SKL_TMPL_SIZE;
- void *tmpl = skl_call_thunk_template;
+ u8 insn_buff[MAX_PATCH_LEN];

if (!thunks_initialized)
return 0;
@@ -313,7 +327,11 @@ int x86_call_depth_emit_accounting(u8 **pprog, void *func)
if (func && is_callthunk(func))
return 0;

- memcpy(*pprog, tmpl, tmpl_size);
+ memcpy(insn_buff, skl_call_thunk_template, tmpl_size);
+ apply_relocation(insn_buff, tmpl_size, *pprog,
+ skl_call_thunk_template, tmpl_size);
+
+ memcpy(*pprog, insn_buff, tmpl_size);
*pprog += tmpl_size;
return tmpl_size;
}
--
2.41.0

2023-11-05 21:38:31

by Uros Bizjak

[permalink] [raw]
Subject: [PATCH -tip v2 3/3] x86/callthunks: Fix and unify call thunks assembly snippets

Currently thunk debug macros explicitly define %gs: segment register
prefix for their percpu variables. This is not compatible with
!CONFIG_SMP, which requires non-prefixed percpu variables.

Fix call thunks debug macros to use PER_CPU_VAR macro from percpu.h
to conditionally use %gs: segment register prefix, depending on
CONFIG_SMP.

Finally, unify ASM_ prefixed assembly macros with their non-prefixed
variants. With support of %rip-relative relocations in place, call
thunk templates allow %rip-relative addressing, so unified assembly
snippet can be used everywhere.

Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Signed-off-by: Uros Bizjak <[email protected]>
---
arch/x86/include/asm/nospec-branch.h | 23 +++++++----------------
1 file changed, 7 insertions(+), 16 deletions(-)

diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
index f93e9b96927a..6f677be6bdb9 100644
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -59,13 +59,13 @@

#ifdef CONFIG_CALL_THUNKS_DEBUG
# define CALL_THUNKS_DEBUG_INC_CALLS \
- incq %gs:__x86_call_count;
+ incq PER_CPU_VAR(__x86_call_count);
# define CALL_THUNKS_DEBUG_INC_RETS \
- incq %gs:__x86_ret_count;
+ incq PER_CPU_VAR(__x86_ret_count);
# define CALL_THUNKS_DEBUG_INC_STUFFS \
- incq %gs:__x86_stuffs_count;
+ incq PER_CPU_VAR(__x86_stuffs_count);
# define CALL_THUNKS_DEBUG_INC_CTXSW \
- incq %gs:__x86_ctxsw_count;
+ incq PER_CPU_VAR(__x86_ctxsw_count);
#else
# define CALL_THUNKS_DEBUG_INC_CALLS
# define CALL_THUNKS_DEBUG_INC_RETS
@@ -80,9 +80,6 @@
#define CREDIT_CALL_DEPTH \
movq $-1, PER_CPU_VAR(pcpu_hot + X86_call_depth);

-#define ASM_CREDIT_CALL_DEPTH \
- movq $-1, PER_CPU_VAR(pcpu_hot + X86_call_depth);
-
#define RESET_CALL_DEPTH \
xor %eax, %eax; \
bts $63, %rax; \
@@ -95,20 +92,14 @@
CALL_THUNKS_DEBUG_INC_CALLS

#define INCREMENT_CALL_DEPTH \
- sarq $5, %gs:pcpu_hot + X86_call_depth; \
- CALL_THUNKS_DEBUG_INC_CALLS
-
-#define ASM_INCREMENT_CALL_DEPTH \
sarq $5, PER_CPU_VAR(pcpu_hot + X86_call_depth); \
CALL_THUNKS_DEBUG_INC_CALLS

#else
#define CREDIT_CALL_DEPTH
-#define ASM_CREDIT_CALL_DEPTH
#define RESET_CALL_DEPTH
-#define INCREMENT_CALL_DEPTH
-#define ASM_INCREMENT_CALL_DEPTH
#define RESET_CALL_DEPTH_FROM_CALL
+#define INCREMENT_CALL_DEPTH
#endif

/*
@@ -158,7 +149,7 @@
jnz 771b; \
/* barrier for jnz misprediction */ \
lfence; \
- ASM_CREDIT_CALL_DEPTH \
+ CREDIT_CALL_DEPTH \
CALL_THUNKS_DEBUG_INC_CTXSW
#else
/*
@@ -311,7 +302,7 @@
.macro CALL_DEPTH_ACCOUNT
#ifdef CONFIG_CALL_DEPTH_TRACKING
ALTERNATIVE "", \
- __stringify(ASM_INCREMENT_CALL_DEPTH), X86_FEATURE_CALL_DEPTH
+ __stringify(INCREMENT_CALL_DEPTH), X86_FEATURE_CALL_DEPTH
#endif
.endm

--
2.41.0

Subject: [tip: x86/percpu] x86/callthunks: Mark apply_relocation() as __init_or_module

The following commit has been merged into the x86/percpu branch of tip:

Commit-ID: 6724ba89e0b03667d56616614f55e1f772d38fdb
Gitweb: https://git.kernel.org/tip/6724ba89e0b03667d56616614f55e1f772d38fdb
Author: Ingo Molnar <[email protected]>
AuthorDate: Thu, 30 Nov 2023 20:15:51 +01:00
Committer: Ingo Molnar <[email protected]>
CommitterDate: Thu, 30 Nov 2023 20:15:51 +01:00

x86/callthunks: Mark apply_relocation() as __init_or_module

Do it like the rest of the methods using it.

Signed-off-by: Ingo Molnar <[email protected]>
Cc: Uros Bizjak <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
arch/x86/include/asm/text-patching.h | 2 +-
arch/x86/kernel/alternative.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/text-patching.h b/arch/x86/include/asm/text-patching.h
index ba8d900..fb338f0 100644
--- a/arch/x86/include/asm/text-patching.h
+++ b/arch/x86/include/asm/text-patching.h
@@ -18,7 +18,7 @@ static inline void apply_paravirt(struct paravirt_patch_site *start,
#define __parainstructions_end NULL
#endif

-void apply_relocation(u8 *buf, size_t len, u8 *dest, u8 *src, size_t src_len);
+extern void apply_relocation(u8 *buf, size_t len, u8 *dest, u8 *src, size_t src_len);

/*
* Currently, the max observed size in the kernel code is
diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index aa86415..5052371 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -325,7 +325,7 @@ bool need_reloc(unsigned long offset, u8 *src, size_t src_len)
return (target < src || target > src + src_len);
}

-void apply_relocation(u8 *buf, size_t len, u8 *dest, u8 *src, size_t src_len)
+void __init_or_module apply_relocation(u8 *buf, size_t len, u8 *dest, u8 *src, size_t src_len)
{
int prev, target = 0;

Subject: [tip: x86/percpu] x86/callthunks: Fix and unify call thunks assembly snippets

The following commit has been merged into the x86/percpu branch of tip:

Commit-ID: 2adeed985a42ff3334e6898c8c0827e7626c1336
Gitweb: https://git.kernel.org/tip/2adeed985a42ff3334e6898c8c0827e7626c1336
Author: Uros Bizjak <[email protected]>
AuthorDate: Sun, 05 Nov 2023 22:34:37 +01:00
Committer: Ingo Molnar <[email protected]>
CommitterDate: Thu, 30 Nov 2023 20:06:17 +01:00

x86/callthunks: Fix and unify call thunks assembly snippets

Currently thunk debug macros explicitly define %gs: segment register
prefix for their percpu variables. This is not compatible with
!CONFIG_SMP, which requires non-prefixed percpu variables.

Fix call thunks debug macros to use PER_CPU_VAR macro from percpu.h
to conditionally use %gs: segment register prefix, depending on
CONFIG_SMP.

Finally, unify ASM_ prefixed assembly macros with their non-prefixed
variants. With support of %rip-relative relocations in place, call
thunk templates allow %rip-relative addressing, so unified assembly
snippet can be used everywhere.

Signed-off-by: Uros Bizjak <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
arch/x86/include/asm/nospec-branch.h | 23 +++++++----------------
1 file changed, 7 insertions(+), 16 deletions(-)

diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
index c55cc24..65fbf6b 100644
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -59,13 +59,13 @@

#ifdef CONFIG_CALL_THUNKS_DEBUG
# define CALL_THUNKS_DEBUG_INC_CALLS \
- incq %gs:__x86_call_count;
+ incq PER_CPU_VAR(__x86_call_count);
# define CALL_THUNKS_DEBUG_INC_RETS \
- incq %gs:__x86_ret_count;
+ incq PER_CPU_VAR(__x86_ret_count);
# define CALL_THUNKS_DEBUG_INC_STUFFS \
- incq %gs:__x86_stuffs_count;
+ incq PER_CPU_VAR(__x86_stuffs_count);
# define CALL_THUNKS_DEBUG_INC_CTXSW \
- incq %gs:__x86_ctxsw_count;
+ incq PER_CPU_VAR(__x86_ctxsw_count);
#else
# define CALL_THUNKS_DEBUG_INC_CALLS
# define CALL_THUNKS_DEBUG_INC_RETS
@@ -80,9 +80,6 @@
#define CREDIT_CALL_DEPTH \
movq $-1, PER_CPU_VAR(pcpu_hot + X86_call_depth);

-#define ASM_CREDIT_CALL_DEPTH \
- movq $-1, PER_CPU_VAR(pcpu_hot + X86_call_depth);
-
#define RESET_CALL_DEPTH \
xor %eax, %eax; \
bts $63, %rax; \
@@ -95,20 +92,14 @@
CALL_THUNKS_DEBUG_INC_CALLS

#define INCREMENT_CALL_DEPTH \
- sarq $5, %gs:pcpu_hot + X86_call_depth; \
- CALL_THUNKS_DEBUG_INC_CALLS
-
-#define ASM_INCREMENT_CALL_DEPTH \
sarq $5, PER_CPU_VAR(pcpu_hot + X86_call_depth); \
CALL_THUNKS_DEBUG_INC_CALLS

#else
#define CREDIT_CALL_DEPTH
-#define ASM_CREDIT_CALL_DEPTH
#define RESET_CALL_DEPTH
-#define INCREMENT_CALL_DEPTH
-#define ASM_INCREMENT_CALL_DEPTH
#define RESET_CALL_DEPTH_FROM_CALL
+#define INCREMENT_CALL_DEPTH
#endif

/*
@@ -158,7 +149,7 @@
jnz 771b; \
/* barrier for jnz misprediction */ \
lfence; \
- ASM_CREDIT_CALL_DEPTH \
+ CREDIT_CALL_DEPTH \
CALL_THUNKS_DEBUG_INC_CTXSW
#else
/*
@@ -325,7 +316,7 @@
.macro CALL_DEPTH_ACCOUNT
#ifdef CONFIG_CALL_DEPTH_TRACKING
ALTERNATIVE "", \
- __stringify(ASM_INCREMENT_CALL_DEPTH), X86_FEATURE_CALL_DEPTH
+ __stringify(INCREMENT_CALL_DEPTH), X86_FEATURE_CALL_DEPTH
#endif
.endm

Subject: [tip: x86/percpu] x86/percpu: Define PER_CPU_VAR macro also for !__ASSEMBLY__

The following commit has been merged into the x86/percpu branch of tip:

Commit-ID: 43bda69ed9e3b86d0ba5ff9256e437d50074d7d5
Gitweb: https://git.kernel.org/tip/43bda69ed9e3b86d0ba5ff9256e437d50074d7d5
Author: Uros Bizjak <[email protected]>
AuthorDate: Sun, 05 Nov 2023 22:34:35 +01:00
Committer: Ingo Molnar <[email protected]>
CommitterDate: Thu, 30 Nov 2023 20:06:16 +01:00

x86/percpu: Define PER_CPU_VAR macro also for !__ASSEMBLY__

Some C source files define 'asm' statements that use PER_CPU_VAR,
so make PER_CPU_VAR macro available also without __ASSEMBLY__.

Signed-off-by: Uros Bizjak <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
arch/x86/include/asm/percpu.h | 5 +++++
1 file changed, 5 insertions(+)

diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
index b86b27d..0f12b20 100644
--- a/arch/x86/include/asm/percpu.h
+++ b/arch/x86/include/asm/percpu.h
@@ -84,10 +84,15 @@
})
#endif /* CONFIG_USE_X86_SEG_SUPPORT */

+#define PER_CPU_VAR(var) %__percpu_seg:(var)__percpu_rel
+
#else /* CONFIG_SMP */
#define __percpu_seg_override
#define __percpu_prefix ""
#define __force_percpu_prefix ""
+
+#define PER_CPU_VAR(var) (var)__percpu_rel
+
#endif /* CONFIG_SMP */

#define __my_cpu_type(var) typeof(var) __percpu_seg_override

Subject: [tip: x86/percpu] x86/callthunks: Handle %rip-relative relocations in call thunk template

The following commit has been merged into the x86/percpu branch of tip:

Commit-ID: 17bce3b2ae2d52e8c5c12274ce4c3a631ce9e66b
Gitweb: https://git.kernel.org/tip/17bce3b2ae2d52e8c5c12274ce4c3a631ce9e66b
Author: Uros Bizjak <[email protected]>
AuthorDate: Sun, 05 Nov 2023 22:34:36 +01:00
Committer: Ingo Molnar <[email protected]>
CommitterDate: Thu, 30 Nov 2023 20:06:17 +01:00

x86/callthunks: Handle %rip-relative relocations in call thunk template

Contrary to alternatives, relocations are currently not supported in
call thunk templates. Re-use the existing infrastructure from
alternative.c to allow %rip-relative relocations when copying call
thunk template from its storage location.

The patch allows unification of ASM_INCREMENT_CALL_DEPTH, which already
uses PER_CPU_VAR macro, with INCREMENT_CALL_DEPTH, used in call thunk
template, which is currently limited to use absolute address.

Reuse existing relocation infrastructure from alternative.c.,
as suggested by Peter Zijlstra.

Signed-off-by: Uros Bizjak <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
arch/x86/include/asm/text-patching.h | 2 ++-
arch/x86/kernel/alternative.c | 3 +--
arch/x86/kernel/callthunks.c | 32 +++++++++++++++++++++------
3 files changed, 28 insertions(+), 9 deletions(-)

diff --git a/arch/x86/include/asm/text-patching.h b/arch/x86/include/asm/text-patching.h
index 29832c3..ba8d900 100644
--- a/arch/x86/include/asm/text-patching.h
+++ b/arch/x86/include/asm/text-patching.h
@@ -18,6 +18,8 @@ static inline void apply_paravirt(struct paravirt_patch_site *start,
#define __parainstructions_end NULL
#endif

+void apply_relocation(u8 *buf, size_t len, u8 *dest, u8 *src, size_t src_len);
+
/*
* Currently, the max observed size in the kernel code is
* JUMP_LABEL_NOP_SIZE/RELATIVEJUMP_SIZE, which are 5.
diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index a5ead6a..aa86415 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -325,8 +325,7 @@ bool need_reloc(unsigned long offset, u8 *src, size_t src_len)
return (target < src || target > src + src_len);
}

-static void __init_or_module noinline
-apply_relocation(u8 *buf, size_t len, u8 *dest, u8 *src, size_t src_len)
+void apply_relocation(u8 *buf, size_t len, u8 *dest, u8 *src, size_t src_len)
{
int prev, target = 0;

diff --git a/arch/x86/kernel/callthunks.c b/arch/x86/kernel/callthunks.c
index c06bfc0..f56fa30 100644
--- a/arch/x86/kernel/callthunks.c
+++ b/arch/x86/kernel/callthunks.c
@@ -24,6 +24,8 @@

static int __initdata_or_module debug_callthunks;

+#define MAX_PATCH_LEN (255-1)
+
#define prdbg(fmt, args...) \
do { \
if (debug_callthunks) \
@@ -184,10 +186,15 @@ static const u8 nops[] = {
static void *patch_dest(void *dest, bool direct)
{
unsigned int tsize = SKL_TMPL_SIZE;
+ u8 insn_buff[MAX_PATCH_LEN];
u8 *pad = dest - tsize;

+ memcpy(insn_buff, skl_call_thunk_template, tsize);
+ apply_relocation(insn_buff, tsize, pad,
+ skl_call_thunk_template, tsize);
+
/* Already patched? */
- if (!bcmp(pad, skl_call_thunk_template, tsize))
+ if (!bcmp(pad, insn_buff, tsize))
return pad;

/* Ensure there are nops */
@@ -197,9 +204,9 @@ static void *patch_dest(void *dest, bool direct)
}

if (direct)
- memcpy(pad, skl_call_thunk_template, tsize);
+ memcpy(pad, insn_buff, tsize);
else
- text_poke_copy_locked(pad, skl_call_thunk_template, tsize, true);
+ text_poke_copy_locked(pad, insn_buff, tsize, true);
return pad;
}

@@ -297,20 +304,27 @@ void *callthunks_translate_call_dest(void *dest)
static bool is_callthunk(void *addr)
{
unsigned int tmpl_size = SKL_TMPL_SIZE;
- void *tmpl = skl_call_thunk_template;
+ u8 insn_buff[MAX_PATCH_LEN];
unsigned long dest;
+ u8 *pad;

dest = roundup((unsigned long)addr, CONFIG_FUNCTION_ALIGNMENT);
if (!thunks_initialized || skip_addr((void *)dest))
return false;

- return !bcmp((void *)(dest - tmpl_size), tmpl, tmpl_size);
+ *pad = dest - tmpl_size;
+
+ memcpy(insn_buff, skl_call_thunk_template, tmpl_size);
+ apply_relocation(insn_buff, tmpl_size, pad,
+ skl_call_thunk_template, tmpl_size);
+
+ return !bcmp(pad, insn_buff, tmpl_size);
}

int x86_call_depth_emit_accounting(u8 **pprog, void *func)
{
unsigned int tmpl_size = SKL_TMPL_SIZE;
- void *tmpl = skl_call_thunk_template;
+ u8 insn_buff[MAX_PATCH_LEN];

if (!thunks_initialized)
return 0;
@@ -319,7 +333,11 @@ int x86_call_depth_emit_accounting(u8 **pprog, void *func)
if (func && is_callthunk(func))
return 0;

- memcpy(*pprog, tmpl, tmpl_size);
+ memcpy(insn_buff, skl_call_thunk_template, tmpl_size);
+ apply_relocation(insn_buff, tmpl_size, *pprog,
+ skl_call_thunk_template, tmpl_size);
+
+ memcpy(*pprog, insn_buff, tmpl_size);
*pprog += tmpl_size;
return tmpl_size;
}

2023-12-01 03:55:24

by Nathan Chancellor

[permalink] [raw]
Subject: Re: [PATCH -tip v2 2/3] x86/callthunks: Handle %rip-relative relocations in call thunk template

Hi Uros,

On Sun, Nov 05, 2023 at 10:34:36PM +0100, Uros Bizjak wrote:
> Contrary to alternatives, relocations are currently not supported in
> call thunk templates. Re-use the existing infrastructure from
> alternative.c to allow %rip-relative relocations when copying call
> thunk template from its storage location.
>
> The patch allows unification of ASM_INCREMENT_CALL_DEPTH, which already
> uses PER_CPU_VAR macro, with INCREMENT_CALL_DEPTH, used in call thunk
> template, which is currently limited to use absolute address.
>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Borislav Petkov <[email protected]>
> Cc: Dave Hansen <[email protected]>
> Cc: "H. Peter Anvin" <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Signed-off-by: Uros Bizjak <[email protected]>
...
> diff --git a/arch/x86/kernel/callthunks.c b/arch/x86/kernel/callthunks.c
> index e9ad518a5003..ef9c04707b3c 100644
> --- a/arch/x86/kernel/callthunks.c
> +++ b/arch/x86/kernel/callthunks.c
...
> @@ -291,20 +298,27 @@ void *callthunks_translate_call_dest(void *dest)
> static bool is_callthunk(void *addr)
> {
> unsigned int tmpl_size = SKL_TMPL_SIZE;
> - void *tmpl = skl_call_thunk_template;
> + u8 insn_buff[MAX_PATCH_LEN];
> unsigned long dest;
> + u8 *pad;
>
> dest = roundup((unsigned long)addr, CONFIG_FUNCTION_ALIGNMENT);
> if (!thunks_initialized || skip_addr((void *)dest))
> return false;
>
> - return !bcmp((void *)(dest - tmpl_size), tmpl, tmpl_size);
> + *pad = dest - tmpl_size;

Clang warns (or errors with CONFIG_WERROR=y):

arch/x86/kernel/callthunks.c:315:3: error: variable 'pad' is uninitialized when used here [-Werror,-Wuninitialized]
315 | *pad = dest - tmpl_size;
| ^~~
arch/x86/kernel/callthunks.c:309:9: note: initialize the variable 'pad' to silence this warning
309 | u8 *pad;
| ^
| = NULL
1 error generated.

which came from our continuous integration:

https://github.com/ClangBuiltLinux/continuous-integration2/actions/runs/7054081453/job/19205345548
https://storage.tuxsuite.com/public/clangbuiltlinux/continuous-integration2/builds/2Yv1FATZZIeD3P7S57ZkHYhyZ8A/build.log

> +
> + memcpy(insn_buff, skl_call_thunk_template, tmpl_size);
> + apply_relocation(insn_buff, tmpl_size, pad,
> + skl_call_thunk_template, tmpl_size);
> +
> + return !bcmp(pad, insn_buff, tmpl_size);
> }

Cheers,
Nathan

2023-12-01 07:48:37

by Uros Bizjak

[permalink] [raw]
Subject: Re: [PATCH -tip v2 2/3] x86/callthunks: Handle %rip-relative relocations in call thunk template

On Fri, Dec 1, 2023 at 4:55 AM Nathan Chancellor <[email protected]> wrote:
>
> Hi Uros,
>
> On Sun, Nov 05, 2023 at 10:34:36PM +0100, Uros Bizjak wrote:
> > Contrary to alternatives, relocations are currently not supported in
> > call thunk templates. Re-use the existing infrastructure from
> > alternative.c to allow %rip-relative relocations when copying call
> > thunk template from its storage location.
> >
> > The patch allows unification of ASM_INCREMENT_CALL_DEPTH, which already
> > uses PER_CPU_VAR macro, with INCREMENT_CALL_DEPTH, used in call thunk
> > template, which is currently limited to use absolute address.
> >
> > Cc: Thomas Gleixner <[email protected]>
> > Cc: Ingo Molnar <[email protected]>
> > Cc: Borislav Petkov <[email protected]>
> > Cc: Dave Hansen <[email protected]>
> > Cc: "H. Peter Anvin" <[email protected]>
> > Cc: Peter Zijlstra <[email protected]>
> > Signed-off-by: Uros Bizjak <[email protected]>
> ...
> > diff --git a/arch/x86/kernel/callthunks.c b/arch/x86/kernel/callthunks.c
> > index e9ad518a5003..ef9c04707b3c 100644
> > --- a/arch/x86/kernel/callthunks.c
> > +++ b/arch/x86/kernel/callthunks.c
> ...
> > @@ -291,20 +298,27 @@ void *callthunks_translate_call_dest(void *dest)
> > static bool is_callthunk(void *addr)
> > {
> > unsigned int tmpl_size = SKL_TMPL_SIZE;
> > - void *tmpl = skl_call_thunk_template;
> > + u8 insn_buff[MAX_PATCH_LEN];
> > unsigned long dest;
> > + u8 *pad;
> >
> > dest = roundup((unsigned long)addr, CONFIG_FUNCTION_ALIGNMENT);
> > if (!thunks_initialized || skip_addr((void *)dest))
> > return false;
> >
> > - return !bcmp((void *)(dest - tmpl_size), tmpl, tmpl_size);
> > + *pad = dest - tmpl_size;
>
> Clang warns (or errors with CONFIG_WERROR=y):

Uh, GCC didn't warn at all (and there is some mixup with types here,
so a thinko slipped through.

The attached patch fixes the oversight. I'll post a formal patch later
today after some more testing.

Thanks,
Uros.


Attachments:
p.diff.txt (479.00 B)

2023-12-07 15:30:10

by Borislav Petkov

[permalink] [raw]
Subject: Re: [tip: x86/percpu] x86/callthunks: Mark apply_relocation() as __init_or_module

On Thu, Nov 30, 2023 at 09:16:31PM -0000, tip-bot2 for Ingo Molnar wrote:
> -void apply_relocation(u8 *buf, size_t len, u8 *dest, u8 *src, size_t src_len)
> +void __init_or_module apply_relocation(u8 *buf, size_t len, u8 *dest, u8 *src, size_t src_len)
> {
> int prev, target = 0;

Can't do that for a CONFIG_MODULES=n build:

WARNING: modpost: vmlinux: section mismatch in reference: patch_dest+0x61 (section: .text) -> apply_relocation (section: .init.text)
ERROR: modpost: Section mismatches detected.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette