2024-03-22 10:27:47

by Uros Bizjak

[permalink] [raw]
Subject: [PATCH] x86/percpu: Unify arch_raw_cpu_ptr() defines

When building a 32 bit VDSO for a 64 bit kernel, games are played with
CONFIG_X86_64. {this,raw}_cpu_read_8() macros are conditionally defined
on CONFIG_X86_64 and when CONFIG_X86_64 is undefined in fake_32bit_build.h
various build failures in generic percpu header files can happen. To make
things worse, the build of 32 bit VDSO for a 64 bit kernel grew dependency
on arch_raw_cpu_ptr() macro and the build fails if arch_raw_cpu_ptr()
macro is not defined.

To mitigate these issues, x86 carefully defines arch_raw_cpu_ptr() to
avoid any dependency on raw_cpu_read_8() and thus CONFIG_X86_64. W/o
segment register support, the definition uses size-agnostic MOV asm
mnemonic and hopes that _ptr argument won't ever be 64 bit size on
32 bit targets (although newer GCCs warn for this situation with
"unsupported size for integer register"), and w/ segment register
support the definition uses size-agnostic __raw_cpu_read() macro.

Fortunately, raw_cpu_read() is not used in 32 bit VDSO for a 64 bit kernel.
However, we can't simply omit the definition of arch_raw_cpu_read(),
since the build will fail when building vdso/vdso32/vclock_gettime.o.

The patch defines arch_raw_cpu_ptr to BUILD_BUG() when BUILD_VDSO32_64
macro is defined. This way, we are sure that arch_raw_cpu_ptr() won't
actually be used in 32 bit VDSO for a 64 bit kernel, but it is still
defined to prevent build failure.

Finally, we can unify arch_raw_cpu_ptr() between builds w/ and w/o
x86 segment register support, substituting two tricky macro definitions
with a straightforward implementation.

There is no size difference and no difference in number of this_cpu_off
accesses between patched and unpatched kernel when the kernel is built
either w/ and w/o segment register support.

Signed-off-by: Uros Bizjak <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: Denys Vlasenko <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
---
arch/x86/include/asm/percpu.h | 42 +++++++++++++++--------------------
1 file changed, 18 insertions(+), 24 deletions(-)

diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
index 7563e69838c4..39d394ce1fca 100644
--- a/arch/x86/include/asm/percpu.h
+++ b/arch/x86/include/asm/percpu.h
@@ -59,36 +59,30 @@
#define __force_percpu_prefix "%%"__stringify(__percpu_seg)":"
#define __my_cpu_offset this_cpu_read(this_cpu_off)

-#ifdef CONFIG_USE_X86_SEG_SUPPORT
-/*
- * Efficient implementation for cases in which the compiler supports
- * named address spaces. Allows the compiler to perform additional
- * optimizations that can save more instructions.
- */
-#define arch_raw_cpu_ptr(ptr) \
-({ \
- unsigned long tcp_ptr__; \
- tcp_ptr__ = __raw_cpu_read(, this_cpu_off); \
- \
- tcp_ptr__ += (unsigned long)(ptr); \
- (typeof(*(ptr)) __kernel __force *)tcp_ptr__; \
-})
-#else /* CONFIG_USE_X86_SEG_SUPPORT */
+#ifdef CONFIG_X86_64
+#define __raw_my_cpu_offset raw_cpu_read_8(this_cpu_off);
+#else
+#define __raw_my_cpu_offset raw_cpu_read_4(this_cpu_off);
+#endif
+
/*
* Compared to the generic __my_cpu_offset version, the following
* saves one instruction and avoids clobbering a temp register.
+ *
+ * arch_raw_cpu_ptr should not be used in 32 bit VDSO for a 64 bit
+ * kernel, because games are played with CONFIG_X86_64 there and
+ * sizeof(this_cpu_off) becames 4.
*/
-#define arch_raw_cpu_ptr(ptr) \
+#ifndef BUILD_VDSO32_64
+#define arch_raw_cpu_ptr(_ptr) \
({ \
- unsigned long tcp_ptr__; \
- asm ("mov " __percpu_arg(1) ", %0" \
- : "=r" (tcp_ptr__) \
- : "m" (__my_cpu_var(this_cpu_off))); \
- \
- tcp_ptr__ += (unsigned long)(ptr); \
- (typeof(*(ptr)) __kernel __force *)tcp_ptr__; \
+ unsigned long tcp_ptr__ = __raw_my_cpu_offset; \
+ tcp_ptr__ += (unsigned long)(_ptr); \
+ (typeof(*(_ptr)) __kernel __force *)tcp_ptr__; \
})
-#endif /* CONFIG_USE_X86_SEG_SUPPORT */
+#else
+#define arch_raw_cpu_ptr(_ptr) ({ BUILD_BUG(); (typeof(_ptr))0; })
+#endif

#define PER_CPU_VAR(var) %__percpu_seg:(var)__percpu_rel

--
2.44.0



2024-03-22 11:01:08

by tip-bot2 for Jacob Pan

[permalink] [raw]
Subject: [tip: x86/percpu] x86/percpu: Unify arch_raw_cpu_ptr() defines

The following commit has been merged into the x86/percpu branch of tip:

Commit-ID: 4e5b0e8003df05983b6dabcdde7ff447d53b49d7
Gitweb: https://git.kernel.org/tip/4e5b0e8003df05983b6dabcdde7ff447d53b49d7
Author: Uros Bizjak <[email protected]>
AuthorDate: Fri, 22 Mar 2024 11:27:14 +01:00
Committer: Ingo Molnar <[email protected]>
CommitterDate: Fri, 22 Mar 2024 11:34:12 +01:00

x86/percpu: Unify arch_raw_cpu_ptr() defines

When building a 32-bit vDSO for a 64-bit kernel, games are played with
CONFIG_X86_64. {this,raw}_cpu_read_8() macros are conditionally defined
on CONFIG_X86_64 and when CONFIG_X86_64 is undefined in fake_32bit_build.h
various build failures in generic percpu header files can happen. To make
things worse, the build of 32-bit vDSO for a 64-bit kernel grew dependency
on arch_raw_cpu_ptr() macro and the build fails if arch_raw_cpu_ptr()
macro is not defined.

To mitigate these issues, x86 carefully defines arch_raw_cpu_ptr() to
avoid any dependency on raw_cpu_read_8() and thus CONFIG_X86_64. W/o
segment register support, the definition uses size-agnostic MOV asm
mnemonic and hopes that _ptr argument won't ever be 64-bit size on
32-bit targets (although newer GCCs warn for this situation with
"unsupported size for integer register"), and w/ segment register
support the definition uses size-agnostic __raw_cpu_read() macro.

Fortunately, raw_cpu_read() is not used in 32-bit vDSO for a 64-bit kernel.
However, we can't simply omit the definition of arch_raw_cpu_read(),
since the build will fail when building vdso/vdso32/vclock_gettime.o.

The patch defines arch_raw_cpu_ptr to BUILD_BUG() when BUILD_VDSO32_64
macro is defined. This way, we are sure that arch_raw_cpu_ptr() won't
actually be used in 32-bit VDSO for a 64-bit kernel, but it is still
defined to prevent build failure.

Finally, we can unify arch_raw_cpu_ptr() between builds w/ and w/o
x86 segment register support, substituting two tricky macro definitions
with a straightforward implementation.

There is no size difference and no difference in number of this_cpu_off
accesses between patched and unpatched kernel when the kernel is built
either w/ and w/o segment register support.

Signed-off-by: Uros Bizjak <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Linus Torvalds <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
arch/x86/include/asm/percpu.h | 42 ++++++++++++++--------------------
1 file changed, 18 insertions(+), 24 deletions(-)

diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
index 7563e69..f6ddbaa 100644
--- a/arch/x86/include/asm/percpu.h
+++ b/arch/x86/include/asm/percpu.h
@@ -59,36 +59,30 @@
#define __force_percpu_prefix "%%"__stringify(__percpu_seg)":"
#define __my_cpu_offset this_cpu_read(this_cpu_off)

-#ifdef CONFIG_USE_X86_SEG_SUPPORT
-/*
- * Efficient implementation for cases in which the compiler supports
- * named address spaces. Allows the compiler to perform additional
- * optimizations that can save more instructions.
- */
-#define arch_raw_cpu_ptr(ptr) \
-({ \
- unsigned long tcp_ptr__; \
- tcp_ptr__ = __raw_cpu_read(, this_cpu_off); \
- \
- tcp_ptr__ += (unsigned long)(ptr); \
- (typeof(*(ptr)) __kernel __force *)tcp_ptr__; \
-})
-#else /* CONFIG_USE_X86_SEG_SUPPORT */
+#ifdef CONFIG_X86_64
+#define __raw_my_cpu_offset raw_cpu_read_8(this_cpu_off);
+#else
+#define __raw_my_cpu_offset raw_cpu_read_4(this_cpu_off);
+#endif
+
/*
* Compared to the generic __my_cpu_offset version, the following
* saves one instruction and avoids clobbering a temp register.
+ *
+ * arch_raw_cpu_ptr should not be used in 32-bit VDSO for a 64-bit
+ * kernel, because games are played with CONFIG_X86_64 there and
+ * sizeof(this_cpu_off) becames 4.
*/
-#define arch_raw_cpu_ptr(ptr) \
+#ifndef BUILD_VDSO32_64
+#define arch_raw_cpu_ptr(_ptr) \
({ \
- unsigned long tcp_ptr__; \
- asm ("mov " __percpu_arg(1) ", %0" \
- : "=r" (tcp_ptr__) \
- : "m" (__my_cpu_var(this_cpu_off))); \
- \
- tcp_ptr__ += (unsigned long)(ptr); \
- (typeof(*(ptr)) __kernel __force *)tcp_ptr__; \
+ unsigned long tcp_ptr__ = __raw_my_cpu_offset; \
+ tcp_ptr__ += (unsigned long)(_ptr); \
+ (typeof(*(_ptr)) __kernel __force *)tcp_ptr__; \
})
-#endif /* CONFIG_USE_X86_SEG_SUPPORT */
+#else
+#define arch_raw_cpu_ptr(_ptr) ({ BUILD_BUG(); (typeof(_ptr))0; })
+#endif

#define PER_CPU_VAR(var) %__percpu_seg:(var)__percpu_rel