2020-07-20 20:50:20

by Nick Desaulniers

[permalink] [raw]
Subject: [PATCH v3 00/11] i386 Clang support

Resend of Brian's v2 with Acks from Peter and Linus collected, as well
as the final patch (mine) added. The commit of the final patch discusses
some of the architectural differences between GCC and Clang, and the
kernels tickling of this difference for i386, which necessitated these
patches.

Brian Gerst (10):
x86/percpu: Introduce size abstraction macros
x86/percpu: Clean up percpu_to_op()
x86/percpu: Clean up percpu_from_op()
x86/percpu: Clean up percpu_add_op()
x86/percpu: Remove "e" constraint from XADD
x86/percpu: Clean up percpu_add_return_op()
x86/percpu: Clean up percpu_xchg_op()
x86/percpu: Clean up percpu_cmpxchg_op()
x86/percpu: Clean up percpu_stable_op()
x86/percpu: Remove unused PER_CPU() macro

Nick Desaulniers (1):
x86: support i386 with Clang

arch/x86/include/asm/percpu.h | 510 +++++++++++----------------------
arch/x86/include/asm/uaccess.h | 4 +-
2 files changed, 175 insertions(+), 339 deletions(-)

--
2.28.0.rc0.105.gf9edc3c819-goog


2020-07-20 20:50:30

by Nick Desaulniers

[permalink] [raw]
Subject: [PATCH v3 01/11] x86/percpu: Introduce size abstraction macros

From: Brian Gerst <[email protected]>

In preparation for cleaning up the percpu operations, define macros for
abstraction based on the width of the operation.

Tested-by: Nick Desaulniers <[email protected]>
Reviewed-by: Nick Desaulniers <[email protected]>
Acked-by: Linus Torvalds <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Brian Gerst <[email protected]>
Signed-off-by: Nick Desaulniers <[email protected]>
---
arch/x86/include/asm/percpu.h | 30 ++++++++++++++++++++++++++++++
1 file changed, 30 insertions(+)

diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
index 2278797c769d..19838e4f7a8f 100644
--- a/arch/x86/include/asm/percpu.h
+++ b/arch/x86/include/asm/percpu.h
@@ -87,6 +87,36 @@
* don't give an lvalue though). */
extern void __bad_percpu_size(void);

+#define __pcpu_type_1 u8
+#define __pcpu_type_2 u16
+#define __pcpu_type_4 u32
+#define __pcpu_type_8 u64
+
+#define __pcpu_cast_1(val) ((u8)(((unsigned long) val) & 0xff))
+#define __pcpu_cast_2(val) ((u16)(((unsigned long) val) & 0xffff))
+#define __pcpu_cast_4(val) ((u32)(((unsigned long) val) & 0xffffffff))
+#define __pcpu_cast_8(val) ((u64)(val))
+
+#define __pcpu_op1_1(op, dst) op "b " dst
+#define __pcpu_op1_2(op, dst) op "w " dst
+#define __pcpu_op1_4(op, dst) op "l " dst
+#define __pcpu_op1_8(op, dst) op "q " dst
+
+#define __pcpu_op2_1(op, src, dst) op "b " src ", " dst
+#define __pcpu_op2_2(op, src, dst) op "w " src ", " dst
+#define __pcpu_op2_4(op, src, dst) op "l " src ", " dst
+#define __pcpu_op2_8(op, src, dst) op "q " src ", " dst
+
+#define __pcpu_reg_1(mod, x) mod "q" (x)
+#define __pcpu_reg_2(mod, x) mod "r" (x)
+#define __pcpu_reg_4(mod, x) mod "r" (x)
+#define __pcpu_reg_8(mod, x) mod "r" (x)
+
+#define __pcpu_reg_imm_1(x) "qi" (x)
+#define __pcpu_reg_imm_2(x) "ri" (x)
+#define __pcpu_reg_imm_4(x) "ri" (x)
+#define __pcpu_reg_imm_8(x) "re" (x)
+
#define percpu_to_op(qual, op, var, val) \
do { \
typedef typeof(var) pto_T__; \
--
2.28.0.rc0.105.gf9edc3c819-goog

2020-07-20 20:50:44

by Nick Desaulniers

[permalink] [raw]
Subject: [PATCH v3 02/11] x86/percpu: Clean up percpu_to_op()

From: Brian Gerst <[email protected]>

The core percpu macros already have a switch on the data size, so the switch
in the x86 code is redundant and produces more dead code.

Also use appropriate types for the width of the instructions. This avoids
errors when compiling with Clang.

Tested-by: Nick Desaulniers <[email protected]>
Reviewed-by: Nick Desaulniers <[email protected]>
Acked-by: Linus Torvalds <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Brian Gerst <[email protected]>
Signed-off-by: Nick Desaulniers <[email protected]>
---
arch/x86/include/asm/percpu.h | 90 ++++++++++++++---------------------
1 file changed, 35 insertions(+), 55 deletions(-)

diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
index 19838e4f7a8f..fb280fba94c5 100644
--- a/arch/x86/include/asm/percpu.h
+++ b/arch/x86/include/asm/percpu.h
@@ -117,37 +117,17 @@ extern void __bad_percpu_size(void);
#define __pcpu_reg_imm_4(x) "ri" (x)
#define __pcpu_reg_imm_8(x) "re" (x)

-#define percpu_to_op(qual, op, var, val) \
-do { \
- typedef typeof(var) pto_T__; \
- if (0) { \
- pto_T__ pto_tmp__; \
- pto_tmp__ = (val); \
- (void)pto_tmp__; \
- } \
- switch (sizeof(var)) { \
- case 1: \
- asm qual (op "b %1,"__percpu_arg(0) \
- : "+m" (var) \
- : "qi" ((pto_T__)(val))); \
- break; \
- case 2: \
- asm qual (op "w %1,"__percpu_arg(0) \
- : "+m" (var) \
- : "ri" ((pto_T__)(val))); \
- break; \
- case 4: \
- asm qual (op "l %1,"__percpu_arg(0) \
- : "+m" (var) \
- : "ri" ((pto_T__)(val))); \
- break; \
- case 8: \
- asm qual (op "q %1,"__percpu_arg(0) \
- : "+m" (var) \
- : "re" ((pto_T__)(val))); \
- break; \
- default: __bad_percpu_size(); \
- } \
+#define percpu_to_op(size, qual, op, _var, _val) \
+do { \
+ __pcpu_type_##size pto_val__ = __pcpu_cast_##size(_val); \
+ if (0) { \
+ typeof(_var) pto_tmp__; \
+ pto_tmp__ = (_val); \
+ (void)pto_tmp__; \
+ } \
+ asm qual(__pcpu_op2_##size(op, "%[val]", __percpu_arg([var])) \
+ : [var] "+m" (_var) \
+ : [val] __pcpu_reg_imm_##size(pto_val__)); \
} while (0)

/*
@@ -425,18 +405,18 @@ do { \
#define raw_cpu_read_2(pcp) percpu_from_op(, "mov", pcp)
#define raw_cpu_read_4(pcp) percpu_from_op(, "mov", pcp)

-#define raw_cpu_write_1(pcp, val) percpu_to_op(, "mov", (pcp), val)
-#define raw_cpu_write_2(pcp, val) percpu_to_op(, "mov", (pcp), val)
-#define raw_cpu_write_4(pcp, val) percpu_to_op(, "mov", (pcp), val)
+#define raw_cpu_write_1(pcp, val) percpu_to_op(1, , "mov", (pcp), val)
+#define raw_cpu_write_2(pcp, val) percpu_to_op(2, , "mov", (pcp), val)
+#define raw_cpu_write_4(pcp, val) percpu_to_op(4, , "mov", (pcp), val)
#define raw_cpu_add_1(pcp, val) percpu_add_op(, (pcp), val)
#define raw_cpu_add_2(pcp, val) percpu_add_op(, (pcp), val)
#define raw_cpu_add_4(pcp, val) percpu_add_op(, (pcp), val)
-#define raw_cpu_and_1(pcp, val) percpu_to_op(, "and", (pcp), val)
-#define raw_cpu_and_2(pcp, val) percpu_to_op(, "and", (pcp), val)
-#define raw_cpu_and_4(pcp, val) percpu_to_op(, "and", (pcp), val)
-#define raw_cpu_or_1(pcp, val) percpu_to_op(, "or", (pcp), val)
-#define raw_cpu_or_2(pcp, val) percpu_to_op(, "or", (pcp), val)
-#define raw_cpu_or_4(pcp, val) percpu_to_op(, "or", (pcp), val)
+#define raw_cpu_and_1(pcp, val) percpu_to_op(1, , "and", (pcp), val)
+#define raw_cpu_and_2(pcp, val) percpu_to_op(2, , "and", (pcp), val)
+#define raw_cpu_and_4(pcp, val) percpu_to_op(4, , "and", (pcp), val)
+#define raw_cpu_or_1(pcp, val) percpu_to_op(1, , "or", (pcp), val)
+#define raw_cpu_or_2(pcp, val) percpu_to_op(2, , "or", (pcp), val)
+#define raw_cpu_or_4(pcp, val) percpu_to_op(4, , "or", (pcp), val)

/*
* raw_cpu_xchg() can use a load-store since it is not required to be
@@ -456,18 +436,18 @@ do { \
#define this_cpu_read_1(pcp) percpu_from_op(volatile, "mov", pcp)
#define this_cpu_read_2(pcp) percpu_from_op(volatile, "mov", pcp)
#define this_cpu_read_4(pcp) percpu_from_op(volatile, "mov", pcp)
-#define this_cpu_write_1(pcp, val) percpu_to_op(volatile, "mov", (pcp), val)
-#define this_cpu_write_2(pcp, val) percpu_to_op(volatile, "mov", (pcp), val)
-#define this_cpu_write_4(pcp, val) percpu_to_op(volatile, "mov", (pcp), val)
+#define this_cpu_write_1(pcp, val) percpu_to_op(1, volatile, "mov", (pcp), val)
+#define this_cpu_write_2(pcp, val) percpu_to_op(2, volatile, "mov", (pcp), val)
+#define this_cpu_write_4(pcp, val) percpu_to_op(4, volatile, "mov", (pcp), val)
#define this_cpu_add_1(pcp, val) percpu_add_op(volatile, (pcp), val)
#define this_cpu_add_2(pcp, val) percpu_add_op(volatile, (pcp), val)
#define this_cpu_add_4(pcp, val) percpu_add_op(volatile, (pcp), val)
-#define this_cpu_and_1(pcp, val) percpu_to_op(volatile, "and", (pcp), val)
-#define this_cpu_and_2(pcp, val) percpu_to_op(volatile, "and", (pcp), val)
-#define this_cpu_and_4(pcp, val) percpu_to_op(volatile, "and", (pcp), val)
-#define this_cpu_or_1(pcp, val) percpu_to_op(volatile, "or", (pcp), val)
-#define this_cpu_or_2(pcp, val) percpu_to_op(volatile, "or", (pcp), val)
-#define this_cpu_or_4(pcp, val) percpu_to_op(volatile, "or", (pcp), val)
+#define this_cpu_and_1(pcp, val) percpu_to_op(1, volatile, "and", (pcp), val)
+#define this_cpu_and_2(pcp, val) percpu_to_op(2, volatile, "and", (pcp), val)
+#define this_cpu_and_4(pcp, val) percpu_to_op(4, volatile, "and", (pcp), val)
+#define this_cpu_or_1(pcp, val) percpu_to_op(1, volatile, "or", (pcp), val)
+#define this_cpu_or_2(pcp, val) percpu_to_op(2, volatile, "or", (pcp), val)
+#define this_cpu_or_4(pcp, val) percpu_to_op(4, volatile, "or", (pcp), val)
#define this_cpu_xchg_1(pcp, nval) percpu_xchg_op(volatile, pcp, nval)
#define this_cpu_xchg_2(pcp, nval) percpu_xchg_op(volatile, pcp, nval)
#define this_cpu_xchg_4(pcp, nval) percpu_xchg_op(volatile, pcp, nval)
@@ -509,19 +489,19 @@ do { \
*/
#ifdef CONFIG_X86_64
#define raw_cpu_read_8(pcp) percpu_from_op(, "mov", pcp)
-#define raw_cpu_write_8(pcp, val) percpu_to_op(, "mov", (pcp), val)
+#define raw_cpu_write_8(pcp, val) percpu_to_op(8, , "mov", (pcp), val)
#define raw_cpu_add_8(pcp, val) percpu_add_op(, (pcp), val)
-#define raw_cpu_and_8(pcp, val) percpu_to_op(, "and", (pcp), val)
-#define raw_cpu_or_8(pcp, val) percpu_to_op(, "or", (pcp), val)
+#define raw_cpu_and_8(pcp, val) percpu_to_op(8, , "and", (pcp), val)
+#define raw_cpu_or_8(pcp, val) percpu_to_op(8, , "or", (pcp), val)
#define raw_cpu_add_return_8(pcp, val) percpu_add_return_op(, pcp, val)
#define raw_cpu_xchg_8(pcp, nval) raw_percpu_xchg_op(pcp, nval)
#define raw_cpu_cmpxchg_8(pcp, oval, nval) percpu_cmpxchg_op(, pcp, oval, nval)

#define this_cpu_read_8(pcp) percpu_from_op(volatile, "mov", pcp)
-#define this_cpu_write_8(pcp, val) percpu_to_op(volatile, "mov", (pcp), val)
+#define this_cpu_write_8(pcp, val) percpu_to_op(8, volatile, "mov", (pcp), val)
#define this_cpu_add_8(pcp, val) percpu_add_op(volatile, (pcp), val)
-#define this_cpu_and_8(pcp, val) percpu_to_op(volatile, "and", (pcp), val)
-#define this_cpu_or_8(pcp, val) percpu_to_op(volatile, "or", (pcp), val)
+#define this_cpu_and_8(pcp, val) percpu_to_op(8, volatile, "and", (pcp), val)
+#define this_cpu_or_8(pcp, val) percpu_to_op(8, volatile, "or", (pcp), val)
#define this_cpu_add_return_8(pcp, val) percpu_add_return_op(volatile, pcp, val)
#define this_cpu_xchg_8(pcp, nval) percpu_xchg_op(volatile, pcp, nval)
#define this_cpu_cmpxchg_8(pcp, oval, nval) percpu_cmpxchg_op(volatile, pcp, oval, nval)
--
2.28.0.rc0.105.gf9edc3c819-goog

2020-07-20 20:50:53

by Nick Desaulniers

[permalink] [raw]
Subject: [PATCH v3 03/11] x86/percpu: Clean up percpu_from_op()

From: Brian Gerst <[email protected]>

The core percpu macros already have a switch on the data size, so the switch
in the x86 code is redundant and produces more dead code.

Also use appropriate types for the width of the instructions. This avoids
errors when compiling with Clang.

Tested-by: Nick Desaulniers <[email protected]>
Reviewed-by: Nick Desaulniers <[email protected]>
Acked-by: Linus Torvalds <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Brian Gerst <[email protected]>
Signed-off-by: Nick Desaulniers <[email protected]>
---
arch/x86/include/asm/percpu.h | 50 +++++++++++------------------------
1 file changed, 15 insertions(+), 35 deletions(-)

diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
index fb280fba94c5..a40d2e055f58 100644
--- a/arch/x86/include/asm/percpu.h
+++ b/arch/x86/include/asm/percpu.h
@@ -190,33 +190,13 @@ do { \
} \
} while (0)

-#define percpu_from_op(qual, op, var) \
-({ \
- typeof(var) pfo_ret__; \
- switch (sizeof(var)) { \
- case 1: \
- asm qual (op "b "__percpu_arg(1)",%0" \
- : "=q" (pfo_ret__) \
- : "m" (var)); \
- break; \
- case 2: \
- asm qual (op "w "__percpu_arg(1)",%0" \
- : "=r" (pfo_ret__) \
- : "m" (var)); \
- break; \
- case 4: \
- asm qual (op "l "__percpu_arg(1)",%0" \
- : "=r" (pfo_ret__) \
- : "m" (var)); \
- break; \
- case 8: \
- asm qual (op "q "__percpu_arg(1)",%0" \
- : "=r" (pfo_ret__) \
- : "m" (var)); \
- break; \
- default: __bad_percpu_size(); \
- } \
- pfo_ret__; \
+#define percpu_from_op(size, qual, op, _var) \
+({ \
+ __pcpu_type_##size pfo_val__; \
+ asm qual (__pcpu_op2_##size(op, __percpu_arg([var]), "%[val]") \
+ : [val] __pcpu_reg_##size("=", pfo_val__) \
+ : [var] "m" (_var)); \
+ (typeof(_var))(unsigned long) pfo_val__; \
})

#define percpu_stable_op(op, var) \
@@ -401,9 +381,9 @@ do { \
*/
#define this_cpu_read_stable(var) percpu_stable_op("mov", var)

-#define raw_cpu_read_1(pcp) percpu_from_op(, "mov", pcp)
-#define raw_cpu_read_2(pcp) percpu_from_op(, "mov", pcp)
-#define raw_cpu_read_4(pcp) percpu_from_op(, "mov", pcp)
+#define raw_cpu_read_1(pcp) percpu_from_op(1, , "mov", pcp)
+#define raw_cpu_read_2(pcp) percpu_from_op(2, , "mov", pcp)
+#define raw_cpu_read_4(pcp) percpu_from_op(4, , "mov", pcp)

#define raw_cpu_write_1(pcp, val) percpu_to_op(1, , "mov", (pcp), val)
#define raw_cpu_write_2(pcp, val) percpu_to_op(2, , "mov", (pcp), val)
@@ -433,9 +413,9 @@ do { \
#define raw_cpu_xchg_2(pcp, val) raw_percpu_xchg_op(pcp, val)
#define raw_cpu_xchg_4(pcp, val) raw_percpu_xchg_op(pcp, val)

-#define this_cpu_read_1(pcp) percpu_from_op(volatile, "mov", pcp)
-#define this_cpu_read_2(pcp) percpu_from_op(volatile, "mov", pcp)
-#define this_cpu_read_4(pcp) percpu_from_op(volatile, "mov", pcp)
+#define this_cpu_read_1(pcp) percpu_from_op(1, volatile, "mov", pcp)
+#define this_cpu_read_2(pcp) percpu_from_op(2, volatile, "mov", pcp)
+#define this_cpu_read_4(pcp) percpu_from_op(4, volatile, "mov", pcp)
#define this_cpu_write_1(pcp, val) percpu_to_op(1, volatile, "mov", (pcp), val)
#define this_cpu_write_2(pcp, val) percpu_to_op(2, volatile, "mov", (pcp), val)
#define this_cpu_write_4(pcp, val) percpu_to_op(4, volatile, "mov", (pcp), val)
@@ -488,7 +468,7 @@ do { \
* 32 bit must fall back to generic operations.
*/
#ifdef CONFIG_X86_64
-#define raw_cpu_read_8(pcp) percpu_from_op(, "mov", pcp)
+#define raw_cpu_read_8(pcp) percpu_from_op(8, , "mov", pcp)
#define raw_cpu_write_8(pcp, val) percpu_to_op(8, , "mov", (pcp), val)
#define raw_cpu_add_8(pcp, val) percpu_add_op(, (pcp), val)
#define raw_cpu_and_8(pcp, val) percpu_to_op(8, , "and", (pcp), val)
@@ -497,7 +477,7 @@ do { \
#define raw_cpu_xchg_8(pcp, nval) raw_percpu_xchg_op(pcp, nval)
#define raw_cpu_cmpxchg_8(pcp, oval, nval) percpu_cmpxchg_op(, pcp, oval, nval)

-#define this_cpu_read_8(pcp) percpu_from_op(volatile, "mov", pcp)
+#define this_cpu_read_8(pcp) percpu_from_op(8, volatile, "mov", pcp)
#define this_cpu_write_8(pcp, val) percpu_to_op(8, volatile, "mov", (pcp), val)
#define this_cpu_add_8(pcp, val) percpu_add_op(volatile, (pcp), val)
#define this_cpu_and_8(pcp, val) percpu_to_op(8, volatile, "and", (pcp), val)
--
2.28.0.rc0.105.gf9edc3c819-goog

2020-07-20 20:51:10

by Nick Desaulniers

[permalink] [raw]
Subject: [PATCH v3 04/11] x86/percpu: Clean up percpu_add_op()

From: Brian Gerst <[email protected]>

The core percpu macros already have a switch on the data size, so the switch
in the x86 code is redundant and produces more dead code.

Also use appropriate types for the width of the instructions. This avoids
errors when compiling with Clang.

Tested-by: Nick Desaulniers <[email protected]>
Reviewed-by: Nick Desaulniers <[email protected]>
Acked-by: Linus Torvalds <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Brian Gerst <[email protected]>
Signed-off-by: Nick Desaulniers <[email protected]>
---
arch/x86/include/asm/percpu.h | 99 ++++++++---------------------------
1 file changed, 22 insertions(+), 77 deletions(-)

diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
index a40d2e055f58..2a24f3c795eb 100644
--- a/arch/x86/include/asm/percpu.h
+++ b/arch/x86/include/asm/percpu.h
@@ -130,64 +130,32 @@ do { \
: [val] __pcpu_reg_imm_##size(pto_val__)); \
} while (0)

+#define percpu_unary_op(size, qual, op, _var) \
+({ \
+ asm qual (__pcpu_op1_##size(op, __percpu_arg([var])) \
+ : [var] "+m" (_var)); \
+})
+
/*
* Generate a percpu add to memory instruction and optimize code
* if one is added or subtracted.
*/
-#define percpu_add_op(qual, var, val) \
+#define percpu_add_op(size, qual, var, val) \
do { \
- typedef typeof(var) pao_T__; \
const int pao_ID__ = (__builtin_constant_p(val) && \
((val) == 1 || (val) == -1)) ? \
(int)(val) : 0; \
if (0) { \
- pao_T__ pao_tmp__; \
+ typeof(var) pao_tmp__; \
pao_tmp__ = (val); \
(void)pao_tmp__; \
} \
- switch (sizeof(var)) { \
- case 1: \
- if (pao_ID__ == 1) \
- asm qual ("incb "__percpu_arg(0) : "+m" (var)); \
- else if (pao_ID__ == -1) \
- asm qual ("decb "__percpu_arg(0) : "+m" (var)); \
- else \
- asm qual ("addb %1, "__percpu_arg(0) \
- : "+m" (var) \
- : "qi" ((pao_T__)(val))); \
- break; \
- case 2: \
- if (pao_ID__ == 1) \
- asm qual ("incw "__percpu_arg(0) : "+m" (var)); \
- else if (pao_ID__ == -1) \
- asm qual ("decw "__percpu_arg(0) : "+m" (var)); \
- else \
- asm qual ("addw %1, "__percpu_arg(0) \
- : "+m" (var) \
- : "ri" ((pao_T__)(val))); \
- break; \
- case 4: \
- if (pao_ID__ == 1) \
- asm qual ("incl "__percpu_arg(0) : "+m" (var)); \
- else if (pao_ID__ == -1) \
- asm qual ("decl "__percpu_arg(0) : "+m" (var)); \
- else \
- asm qual ("addl %1, "__percpu_arg(0) \
- : "+m" (var) \
- : "ri" ((pao_T__)(val))); \
- break; \
- case 8: \
- if (pao_ID__ == 1) \
- asm qual ("incq "__percpu_arg(0) : "+m" (var)); \
- else if (pao_ID__ == -1) \
- asm qual ("decq "__percpu_arg(0) : "+m" (var)); \
- else \
- asm qual ("addq %1, "__percpu_arg(0) \
- : "+m" (var) \
- : "re" ((pao_T__)(val))); \
- break; \
- default: __bad_percpu_size(); \
- } \
+ if (pao_ID__ == 1) \
+ percpu_unary_op(size, qual, "inc", var); \
+ else if (pao_ID__ == -1) \
+ percpu_unary_op(size, qual, "dec", var); \
+ else \
+ percpu_to_op(size, qual, "add", var, val); \
} while (0)

#define percpu_from_op(size, qual, op, _var) \
@@ -228,29 +196,6 @@ do { \
pfo_ret__; \
})

-#define percpu_unary_op(qual, op, var) \
-({ \
- switch (sizeof(var)) { \
- case 1: \
- asm qual (op "b "__percpu_arg(0) \
- : "+m" (var)); \
- break; \
- case 2: \
- asm qual (op "w "__percpu_arg(0) \
- : "+m" (var)); \
- break; \
- case 4: \
- asm qual (op "l "__percpu_arg(0) \
- : "+m" (var)); \
- break; \
- case 8: \
- asm qual (op "q "__percpu_arg(0) \
- : "+m" (var)); \
- break; \
- default: __bad_percpu_size(); \
- } \
-})
-
/*
* Add return operation
*/
@@ -388,9 +333,9 @@ do { \
#define raw_cpu_write_1(pcp, val) percpu_to_op(1, , "mov", (pcp), val)
#define raw_cpu_write_2(pcp, val) percpu_to_op(2, , "mov", (pcp), val)
#define raw_cpu_write_4(pcp, val) percpu_to_op(4, , "mov", (pcp), val)
-#define raw_cpu_add_1(pcp, val) percpu_add_op(, (pcp), val)
-#define raw_cpu_add_2(pcp, val) percpu_add_op(, (pcp), val)
-#define raw_cpu_add_4(pcp, val) percpu_add_op(, (pcp), val)
+#define raw_cpu_add_1(pcp, val) percpu_add_op(1, , (pcp), val)
+#define raw_cpu_add_2(pcp, val) percpu_add_op(2, , (pcp), val)
+#define raw_cpu_add_4(pcp, val) percpu_add_op(4, , (pcp), val)
#define raw_cpu_and_1(pcp, val) percpu_to_op(1, , "and", (pcp), val)
#define raw_cpu_and_2(pcp, val) percpu_to_op(2, , "and", (pcp), val)
#define raw_cpu_and_4(pcp, val) percpu_to_op(4, , "and", (pcp), val)
@@ -419,9 +364,9 @@ do { \
#define this_cpu_write_1(pcp, val) percpu_to_op(1, volatile, "mov", (pcp), val)
#define this_cpu_write_2(pcp, val) percpu_to_op(2, volatile, "mov", (pcp), val)
#define this_cpu_write_4(pcp, val) percpu_to_op(4, volatile, "mov", (pcp), val)
-#define this_cpu_add_1(pcp, val) percpu_add_op(volatile, (pcp), val)
-#define this_cpu_add_2(pcp, val) percpu_add_op(volatile, (pcp), val)
-#define this_cpu_add_4(pcp, val) percpu_add_op(volatile, (pcp), val)
+#define this_cpu_add_1(pcp, val) percpu_add_op(1, volatile, (pcp), val)
+#define this_cpu_add_2(pcp, val) percpu_add_op(2, volatile, (pcp), val)
+#define this_cpu_add_4(pcp, val) percpu_add_op(4, volatile, (pcp), val)
#define this_cpu_and_1(pcp, val) percpu_to_op(1, volatile, "and", (pcp), val)
#define this_cpu_and_2(pcp, val) percpu_to_op(2, volatile, "and", (pcp), val)
#define this_cpu_and_4(pcp, val) percpu_to_op(4, volatile, "and", (pcp), val)
@@ -470,7 +415,7 @@ do { \
#ifdef CONFIG_X86_64
#define raw_cpu_read_8(pcp) percpu_from_op(8, , "mov", pcp)
#define raw_cpu_write_8(pcp, val) percpu_to_op(8, , "mov", (pcp), val)
-#define raw_cpu_add_8(pcp, val) percpu_add_op(, (pcp), val)
+#define raw_cpu_add_8(pcp, val) percpu_add_op(8, , (pcp), val)
#define raw_cpu_and_8(pcp, val) percpu_to_op(8, , "and", (pcp), val)
#define raw_cpu_or_8(pcp, val) percpu_to_op(8, , "or", (pcp), val)
#define raw_cpu_add_return_8(pcp, val) percpu_add_return_op(, pcp, val)
@@ -479,7 +424,7 @@ do { \

#define this_cpu_read_8(pcp) percpu_from_op(8, volatile, "mov", pcp)
#define this_cpu_write_8(pcp, val) percpu_to_op(8, volatile, "mov", (pcp), val)
-#define this_cpu_add_8(pcp, val) percpu_add_op(volatile, (pcp), val)
+#define this_cpu_add_8(pcp, val) percpu_add_op(8, volatile, (pcp), val)
#define this_cpu_and_8(pcp, val) percpu_to_op(8, volatile, "and", (pcp), val)
#define this_cpu_or_8(pcp, val) percpu_to_op(8, volatile, "or", (pcp), val)
#define this_cpu_add_return_8(pcp, val) percpu_add_return_op(volatile, pcp, val)
--
2.28.0.rc0.105.gf9edc3c819-goog

2020-07-20 20:51:58

by Nick Desaulniers

[permalink] [raw]
Subject: [PATCH v3 05/11] x86/percpu: Remove "e" constraint from XADD

From: Brian Gerst <[email protected]>

The "e" constraint represents a constant, but the XADD instruction doesn't
accept immediate operands.

Tested-by: Nick Desaulniers <[email protected]>
Reviewed-by: Nick Desaulniers <[email protected]>
Acked-by: Linus Torvalds <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Brian Gerst <[email protected]>
Signed-off-by: Nick Desaulniers <[email protected]>
---
arch/x86/include/asm/percpu.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
index 2a24f3c795eb..9bb5440d98d3 100644
--- a/arch/x86/include/asm/percpu.h
+++ b/arch/x86/include/asm/percpu.h
@@ -220,7 +220,7 @@ do { \
break; \
case 8: \
asm qual ("xaddq %0, "__percpu_arg(1) \
- : "+re" (paro_ret__), "+m" (var) \
+ : "+r" (paro_ret__), "+m" (var) \
: : "memory"); \
break; \
default: __bad_percpu_size(); \
--
2.28.0.rc0.105.gf9edc3c819-goog

2020-07-20 20:52:16

by Nick Desaulniers

[permalink] [raw]
Subject: [PATCH v3 06/11] x86/percpu: Clean up percpu_add_return_op()

From: Brian Gerst <[email protected]>

The core percpu macros already have a switch on the data size, so the switch
in the x86 code is redundant and produces more dead code.

Also use appropriate types for the width of the instructions. This avoids
errors when compiling with Clang.

Tested-by: Nick Desaulniers <[email protected]>
Reviewed-by: Nick Desaulniers <[email protected]>
Acked-by: Linus Torvalds <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Brian Gerst <[email protected]>
Signed-off-by: Nick Desaulniers <[email protected]>
---
arch/x86/include/asm/percpu.h | 51 +++++++++++------------------------
1 file changed, 16 insertions(+), 35 deletions(-)

diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
index 9bb5440d98d3..0776a11e7e11 100644
--- a/arch/x86/include/asm/percpu.h
+++ b/arch/x86/include/asm/percpu.h
@@ -199,34 +199,15 @@ do { \
/*
* Add return operation
*/
-#define percpu_add_return_op(qual, var, val) \
+#define percpu_add_return_op(size, qual, _var, _val) \
({ \
- typeof(var) paro_ret__ = val; \
- switch (sizeof(var)) { \
- case 1: \
- asm qual ("xaddb %0, "__percpu_arg(1) \
- : "+q" (paro_ret__), "+m" (var) \
- : : "memory"); \
- break; \
- case 2: \
- asm qual ("xaddw %0, "__percpu_arg(1) \
- : "+r" (paro_ret__), "+m" (var) \
- : : "memory"); \
- break; \
- case 4: \
- asm qual ("xaddl %0, "__percpu_arg(1) \
- : "+r" (paro_ret__), "+m" (var) \
- : : "memory"); \
- break; \
- case 8: \
- asm qual ("xaddq %0, "__percpu_arg(1) \
- : "+r" (paro_ret__), "+m" (var) \
- : : "memory"); \
- break; \
- default: __bad_percpu_size(); \
- } \
- paro_ret__ += val; \
- paro_ret__; \
+ __pcpu_type_##size paro_tmp__ = __pcpu_cast_##size(_val); \
+ asm qual (__pcpu_op2_##size("xadd", "%[tmp]", \
+ __percpu_arg([var])) \
+ : [tmp] __pcpu_reg_##size("+", paro_tmp__), \
+ [var] "+m" (_var) \
+ : : "memory"); \
+ (typeof(_var))(unsigned long) (paro_tmp__ + _val); \
})

/*
@@ -377,16 +358,16 @@ do { \
#define this_cpu_xchg_2(pcp, nval) percpu_xchg_op(volatile, pcp, nval)
#define this_cpu_xchg_4(pcp, nval) percpu_xchg_op(volatile, pcp, nval)

-#define raw_cpu_add_return_1(pcp, val) percpu_add_return_op(, pcp, val)
-#define raw_cpu_add_return_2(pcp, val) percpu_add_return_op(, pcp, val)
-#define raw_cpu_add_return_4(pcp, val) percpu_add_return_op(, pcp, val)
+#define raw_cpu_add_return_1(pcp, val) percpu_add_return_op(1, , pcp, val)
+#define raw_cpu_add_return_2(pcp, val) percpu_add_return_op(2, , pcp, val)
+#define raw_cpu_add_return_4(pcp, val) percpu_add_return_op(4, , pcp, val)
#define raw_cpu_cmpxchg_1(pcp, oval, nval) percpu_cmpxchg_op(, pcp, oval, nval)
#define raw_cpu_cmpxchg_2(pcp, oval, nval) percpu_cmpxchg_op(, pcp, oval, nval)
#define raw_cpu_cmpxchg_4(pcp, oval, nval) percpu_cmpxchg_op(, pcp, oval, nval)

-#define this_cpu_add_return_1(pcp, val) percpu_add_return_op(volatile, pcp, val)
-#define this_cpu_add_return_2(pcp, val) percpu_add_return_op(volatile, pcp, val)
-#define this_cpu_add_return_4(pcp, val) percpu_add_return_op(volatile, pcp, val)
+#define this_cpu_add_return_1(pcp, val) percpu_add_return_op(1, volatile, pcp, val)
+#define this_cpu_add_return_2(pcp, val) percpu_add_return_op(2, volatile, pcp, val)
+#define this_cpu_add_return_4(pcp, val) percpu_add_return_op(4, volatile, pcp, val)
#define this_cpu_cmpxchg_1(pcp, oval, nval) percpu_cmpxchg_op(volatile, pcp, oval, nval)
#define this_cpu_cmpxchg_2(pcp, oval, nval) percpu_cmpxchg_op(volatile, pcp, oval, nval)
#define this_cpu_cmpxchg_4(pcp, oval, nval) percpu_cmpxchg_op(volatile, pcp, oval, nval)
@@ -418,7 +399,7 @@ do { \
#define raw_cpu_add_8(pcp, val) percpu_add_op(8, , (pcp), val)
#define raw_cpu_and_8(pcp, val) percpu_to_op(8, , "and", (pcp), val)
#define raw_cpu_or_8(pcp, val) percpu_to_op(8, , "or", (pcp), val)
-#define raw_cpu_add_return_8(pcp, val) percpu_add_return_op(, pcp, val)
+#define raw_cpu_add_return_8(pcp, val) percpu_add_return_op(8, , pcp, val)
#define raw_cpu_xchg_8(pcp, nval) raw_percpu_xchg_op(pcp, nval)
#define raw_cpu_cmpxchg_8(pcp, oval, nval) percpu_cmpxchg_op(, pcp, oval, nval)

@@ -427,7 +408,7 @@ do { \
#define this_cpu_add_8(pcp, val) percpu_add_op(8, volatile, (pcp), val)
#define this_cpu_and_8(pcp, val) percpu_to_op(8, volatile, "and", (pcp), val)
#define this_cpu_or_8(pcp, val) percpu_to_op(8, volatile, "or", (pcp), val)
-#define this_cpu_add_return_8(pcp, val) percpu_add_return_op(volatile, pcp, val)
+#define this_cpu_add_return_8(pcp, val) percpu_add_return_op(8, volatile, pcp, val)
#define this_cpu_xchg_8(pcp, nval) percpu_xchg_op(volatile, pcp, nval)
#define this_cpu_cmpxchg_8(pcp, oval, nval) percpu_cmpxchg_op(volatile, pcp, oval, nval)

--
2.28.0.rc0.105.gf9edc3c819-goog

2020-07-20 20:52:44

by Nick Desaulniers

[permalink] [raw]
Subject: [PATCH v3 10/11] x86/percpu: Remove unused PER_CPU() macro

From: Brian Gerst <[email protected]>

Also remove now unused __percpu_mov_op.

Tested-by: Nick Desaulniers <[email protected]>
Reviewed-by: Nick Desaulniers <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Acked-by: Linus Torvalds <[email protected]>
Signed-off-by: Brian Gerst <[email protected]>
Signed-off-by: Nick Desaulniers <[email protected]>
---
arch/x86/include/asm/percpu.h | 18 ------------------
1 file changed, 18 deletions(-)

diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
index cf2b9c2a241e..a3c33b79fb86 100644
--- a/arch/x86/include/asm/percpu.h
+++ b/arch/x86/include/asm/percpu.h
@@ -4,33 +4,15 @@

#ifdef CONFIG_X86_64
#define __percpu_seg gs
-#define __percpu_mov_op movq
#else
#define __percpu_seg fs
-#define __percpu_mov_op movl
#endif

#ifdef __ASSEMBLY__

-/*
- * PER_CPU finds an address of a per-cpu variable.
- *
- * Args:
- * var - variable name
- * reg - 32bit register
- *
- * The resulting address is stored in the "reg" argument.
- *
- * Example:
- * PER_CPU(cpu_gdt_descr, %ebx)
- */
#ifdef CONFIG_SMP
-#define PER_CPU(var, reg) \
- __percpu_mov_op %__percpu_seg:this_cpu_off, reg; \
- lea var(reg), reg
#define PER_CPU_VAR(var) %__percpu_seg:var
#else /* ! SMP */
-#define PER_CPU(var, reg) __percpu_mov_op $var, reg
#define PER_CPU_VAR(var) var
#endif /* SMP */

--
2.28.0.rc0.105.gf9edc3c819-goog

2020-07-20 20:52:45

by Nick Desaulniers

[permalink] [raw]
Subject: [PATCH v3 11/11] x86: support i386 with Clang

GCC and Clang are architecturally different, which leads to subtle
issues for code that's invalid but clearly dead. This can happen with
code that emulates polymorphism with the preprocessor and sizeof.

GCC will perform semantic analysis after early inlining and dead code
elimination, so it will not warn on invalid code that's dead. Clang
strictly performs optimizations after semantic analysis, so it will warn
for dead code.

Neither Clang nor GCC like this very much with -m32:

long long ret;
asm ("movb $5, %0" : "=q" (ret));

However, GCC can tolerate this variant:

long long ret;
switch (sizeof(ret)) {
case 1:
asm ("movb $5, %0" : "=q" (ret));
break;
case 8:;
}

Clang, on the other hand, won't accept that because it validates the
inline asm for the '1' case *before* the optimisation phase where it
realises that it wouldn't have to emit it anyway.

If LLVM (Clang's "back end") fails such as during instruction selection
or register allocation, it cannot provide accurate diagnostics
(warnings/errors) that contain line information, as the AST has been
discarded from memory at that point.

While there have been early discussions about having C/C++ specific
language optimizations in Clang via the use of MLIR, which would enable
such earlier optimizations, such work is not scoped and likely a
multi-year endeavor.

We also don't want to swap the use of "=q" with "=r". For 64b, it
doesn't matter. For 32b, it's possible that a 32b register without a 8b
lower alias (i.e. ESI, EDI, EBP) is selected which the assembler will
then reject.

With this, Clang can finally build an i386 defconfig.

Link: https://bugs.llvm.org/show_bug.cgi?id=33587
Link: https://github.com/ClangBuiltLinux/linux/issues/3
Link: https://github.com/ClangBuiltLinux/linux/issues/194
Link: https://github.com/ClangBuiltLinux/linux/issues/781
Link: https://lore.kernel.org/lkml/[email protected]/
Link: https://lore.kernel.org/lkml/CAK8P3a1EBaWdbAEzirFDSgHVJMtWjuNt2HGG8z+vpXeNHwETFQ@mail.gmail.com/
Reported-by: Arnd Bergmann <[email protected]>
Reported-by: David Woodhouse <[email protected]>
Reported-by: Dmitry Golovin <[email protected]>
Reported-by: Linus Torvalds <[email protected]>
Acked-by: Linus Torvalds <[email protected]>
Signed-off-by: Nick Desaulniers <[email protected]>
---
arch/x86/include/asm/uaccess.h | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
index dd3261f9f4ea..9d57556ad42f 100644
--- a/arch/x86/include/asm/uaccess.h
+++ b/arch/x86/include/asm/uaccess.h
@@ -314,11 +314,13 @@ do { \

#define __get_user_size(x, ptr, size, retval) \
do { \
+ unsigned char x_u8__; \
retval = 0; \
__chk_user_ptr(ptr); \
switch (size) { \
case 1: \
- __get_user_asm(x, ptr, retval, "b", "=q"); \
+ __get_user_asm(x_u8__, ptr, retval, "b", "=q"); \
+ (x) = x_u8__; \
break; \
case 2: \
__get_user_asm(x, ptr, retval, "w", "=r"); \
--
2.28.0.rc0.105.gf9edc3c819-goog

2020-07-20 20:54:10

by Nick Desaulniers

[permalink] [raw]
Subject: [PATCH v3 07/11] x86/percpu: Clean up percpu_xchg_op()

From: Brian Gerst <[email protected]>

The core percpu macros already have a switch on the data size, so the switch
in the x86 code is redundant and produces more dead code.

Also use appropriate types for the width of the instructions. This avoids
errors when compiling with Clang.

Tested-by: Nick Desaulniers <[email protected]>
Reviewed-by: Nick Desaulniers <[email protected]>
Acked-by: Linus Torvalds <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Brian Gerst <[email protected]>
Signed-off-by: Nick Desaulniers <[email protected]>
---
arch/x86/include/asm/percpu.h | 61 +++++++++++------------------------
1 file changed, 18 insertions(+), 43 deletions(-)

diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
index 0776a11e7e11..ac6d7e76c0d4 100644
--- a/arch/x86/include/asm/percpu.h
+++ b/arch/x86/include/asm/percpu.h
@@ -215,46 +215,21 @@ do { \
* expensive due to the implied lock prefix. The processor cannot prefetch
* cachelines if xchg is used.
*/
-#define percpu_xchg_op(qual, var, nval) \
+#define percpu_xchg_op(size, qual, _var, _nval) \
({ \
- typeof(var) pxo_ret__; \
- typeof(var) pxo_new__ = (nval); \
- switch (sizeof(var)) { \
- case 1: \
- asm qual ("\n\tmov "__percpu_arg(1)",%%al" \
- "\n1:\tcmpxchgb %2, "__percpu_arg(1) \
- "\n\tjnz 1b" \
- : "=&a" (pxo_ret__), "+m" (var) \
- : "q" (pxo_new__) \
- : "memory"); \
- break; \
- case 2: \
- asm qual ("\n\tmov "__percpu_arg(1)",%%ax" \
- "\n1:\tcmpxchgw %2, "__percpu_arg(1) \
- "\n\tjnz 1b" \
- : "=&a" (pxo_ret__), "+m" (var) \
- : "r" (pxo_new__) \
- : "memory"); \
- break; \
- case 4: \
- asm qual ("\n\tmov "__percpu_arg(1)",%%eax" \
- "\n1:\tcmpxchgl %2, "__percpu_arg(1) \
- "\n\tjnz 1b" \
- : "=&a" (pxo_ret__), "+m" (var) \
- : "r" (pxo_new__) \
- : "memory"); \
- break; \
- case 8: \
- asm qual ("\n\tmov "__percpu_arg(1)",%%rax" \
- "\n1:\tcmpxchgq %2, "__percpu_arg(1) \
- "\n\tjnz 1b" \
- : "=&a" (pxo_ret__), "+m" (var) \
- : "r" (pxo_new__) \
- : "memory"); \
- break; \
- default: __bad_percpu_size(); \
- } \
- pxo_ret__; \
+ __pcpu_type_##size pxo_old__; \
+ __pcpu_type_##size pxo_new__ = __pcpu_cast_##size(_nval); \
+ asm qual (__pcpu_op2_##size("mov", __percpu_arg([var]), \
+ "%[oval]") \
+ "\n1:\t" \
+ __pcpu_op2_##size("cmpxchg", "%[nval]", \
+ __percpu_arg([var])) \
+ "\n\tjnz 1b" \
+ : [oval] "=&a" (pxo_old__), \
+ [var] "+m" (_var) \
+ : [nval] __pcpu_reg_##size(, pxo_new__) \
+ : "memory"); \
+ (typeof(_var))(unsigned long) pxo_old__; \
})

/*
@@ -354,9 +329,9 @@ do { \
#define this_cpu_or_1(pcp, val) percpu_to_op(1, volatile, "or", (pcp), val)
#define this_cpu_or_2(pcp, val) percpu_to_op(2, volatile, "or", (pcp), val)
#define this_cpu_or_4(pcp, val) percpu_to_op(4, volatile, "or", (pcp), val)
-#define this_cpu_xchg_1(pcp, nval) percpu_xchg_op(volatile, pcp, nval)
-#define this_cpu_xchg_2(pcp, nval) percpu_xchg_op(volatile, pcp, nval)
-#define this_cpu_xchg_4(pcp, nval) percpu_xchg_op(volatile, pcp, nval)
+#define this_cpu_xchg_1(pcp, nval) percpu_xchg_op(1, volatile, pcp, nval)
+#define this_cpu_xchg_2(pcp, nval) percpu_xchg_op(2, volatile, pcp, nval)
+#define this_cpu_xchg_4(pcp, nval) percpu_xchg_op(4, volatile, pcp, nval)

#define raw_cpu_add_return_1(pcp, val) percpu_add_return_op(1, , pcp, val)
#define raw_cpu_add_return_2(pcp, val) percpu_add_return_op(2, , pcp, val)
@@ -409,7 +384,7 @@ do { \
#define this_cpu_and_8(pcp, val) percpu_to_op(8, volatile, "and", (pcp), val)
#define this_cpu_or_8(pcp, val) percpu_to_op(8, volatile, "or", (pcp), val)
#define this_cpu_add_return_8(pcp, val) percpu_add_return_op(8, volatile, pcp, val)
-#define this_cpu_xchg_8(pcp, nval) percpu_xchg_op(volatile, pcp, nval)
+#define this_cpu_xchg_8(pcp, nval) percpu_xchg_op(8, volatile, pcp, nval)
#define this_cpu_cmpxchg_8(pcp, oval, nval) percpu_cmpxchg_op(volatile, pcp, oval, nval)

/*
--
2.28.0.rc0.105.gf9edc3c819-goog

2020-07-20 20:54:29

by Nick Desaulniers

[permalink] [raw]
Subject: [PATCH v3 08/11] x86/percpu: Clean up percpu_cmpxchg_op()

From: Brian Gerst <[email protected]>

The core percpu macros already have a switch on the data size, so the switch
in the x86 code is redundant and produces more dead code.

Also use appropriate types for the width of the instructions. This avoids
errors when compiling with Clang.

Tested-by: Nick Desaulniers <[email protected]>
Reviewed-by: Nick Desaulniers <[email protected]>
Acked-by: Linus Torvalds <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Brian Gerst <[email protected]>
Signed-off-by: Nick Desaulniers <[email protected]>
---
arch/x86/include/asm/percpu.h | 58 +++++++++++------------------------
1 file changed, 18 insertions(+), 40 deletions(-)

diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
index ac6d7e76c0d4..7efc0b5c4ff0 100644
--- a/arch/x86/include/asm/percpu.h
+++ b/arch/x86/include/asm/percpu.h
@@ -236,39 +236,17 @@ do { \
* cmpxchg has no such implied lock semantics as a result it is much
* more efficient for cpu local operations.
*/
-#define percpu_cmpxchg_op(qual, var, oval, nval) \
+#define percpu_cmpxchg_op(size, qual, _var, _oval, _nval) \
({ \
- typeof(var) pco_ret__; \
- typeof(var) pco_old__ = (oval); \
- typeof(var) pco_new__ = (nval); \
- switch (sizeof(var)) { \
- case 1: \
- asm qual ("cmpxchgb %2, "__percpu_arg(1) \
- : "=a" (pco_ret__), "+m" (var) \
- : "q" (pco_new__), "0" (pco_old__) \
- : "memory"); \
- break; \
- case 2: \
- asm qual ("cmpxchgw %2, "__percpu_arg(1) \
- : "=a" (pco_ret__), "+m" (var) \
- : "r" (pco_new__), "0" (pco_old__) \
- : "memory"); \
- break; \
- case 4: \
- asm qual ("cmpxchgl %2, "__percpu_arg(1) \
- : "=a" (pco_ret__), "+m" (var) \
- : "r" (pco_new__), "0" (pco_old__) \
- : "memory"); \
- break; \
- case 8: \
- asm qual ("cmpxchgq %2, "__percpu_arg(1) \
- : "=a" (pco_ret__), "+m" (var) \
- : "r" (pco_new__), "0" (pco_old__) \
- : "memory"); \
- break; \
- default: __bad_percpu_size(); \
- } \
- pco_ret__; \
+ __pcpu_type_##size pco_old__ = __pcpu_cast_##size(_oval); \
+ __pcpu_type_##size pco_new__ = __pcpu_cast_##size(_nval); \
+ asm qual (__pcpu_op2_##size("cmpxchg", "%[nval]", \
+ __percpu_arg([var])) \
+ : [oval] "+a" (pco_old__), \
+ [var] "+m" (_var) \
+ : [nval] __pcpu_reg_##size(, pco_new__) \
+ : "memory"); \
+ (typeof(_var))(unsigned long) pco_old__; \
})

/*
@@ -336,16 +314,16 @@ do { \
#define raw_cpu_add_return_1(pcp, val) percpu_add_return_op(1, , pcp, val)
#define raw_cpu_add_return_2(pcp, val) percpu_add_return_op(2, , pcp, val)
#define raw_cpu_add_return_4(pcp, val) percpu_add_return_op(4, , pcp, val)
-#define raw_cpu_cmpxchg_1(pcp, oval, nval) percpu_cmpxchg_op(, pcp, oval, nval)
-#define raw_cpu_cmpxchg_2(pcp, oval, nval) percpu_cmpxchg_op(, pcp, oval, nval)
-#define raw_cpu_cmpxchg_4(pcp, oval, nval) percpu_cmpxchg_op(, pcp, oval, nval)
+#define raw_cpu_cmpxchg_1(pcp, oval, nval) percpu_cmpxchg_op(1, , pcp, oval, nval)
+#define raw_cpu_cmpxchg_2(pcp, oval, nval) percpu_cmpxchg_op(2, , pcp, oval, nval)
+#define raw_cpu_cmpxchg_4(pcp, oval, nval) percpu_cmpxchg_op(4, , pcp, oval, nval)

#define this_cpu_add_return_1(pcp, val) percpu_add_return_op(1, volatile, pcp, val)
#define this_cpu_add_return_2(pcp, val) percpu_add_return_op(2, volatile, pcp, val)
#define this_cpu_add_return_4(pcp, val) percpu_add_return_op(4, volatile, pcp, val)
-#define this_cpu_cmpxchg_1(pcp, oval, nval) percpu_cmpxchg_op(volatile, pcp, oval, nval)
-#define this_cpu_cmpxchg_2(pcp, oval, nval) percpu_cmpxchg_op(volatile, pcp, oval, nval)
-#define this_cpu_cmpxchg_4(pcp, oval, nval) percpu_cmpxchg_op(volatile, pcp, oval, nval)
+#define this_cpu_cmpxchg_1(pcp, oval, nval) percpu_cmpxchg_op(1, volatile, pcp, oval, nval)
+#define this_cpu_cmpxchg_2(pcp, oval, nval) percpu_cmpxchg_op(2, volatile, pcp, oval, nval)
+#define this_cpu_cmpxchg_4(pcp, oval, nval) percpu_cmpxchg_op(4, volatile, pcp, oval, nval)

#ifdef CONFIG_X86_CMPXCHG64
#define percpu_cmpxchg8b_double(pcp1, pcp2, o1, o2, n1, n2) \
@@ -376,7 +354,7 @@ do { \
#define raw_cpu_or_8(pcp, val) percpu_to_op(8, , "or", (pcp), val)
#define raw_cpu_add_return_8(pcp, val) percpu_add_return_op(8, , pcp, val)
#define raw_cpu_xchg_8(pcp, nval) raw_percpu_xchg_op(pcp, nval)
-#define raw_cpu_cmpxchg_8(pcp, oval, nval) percpu_cmpxchg_op(, pcp, oval, nval)
+#define raw_cpu_cmpxchg_8(pcp, oval, nval) percpu_cmpxchg_op(8, , pcp, oval, nval)

#define this_cpu_read_8(pcp) percpu_from_op(8, volatile, "mov", pcp)
#define this_cpu_write_8(pcp, val) percpu_to_op(8, volatile, "mov", (pcp), val)
@@ -385,7 +363,7 @@ do { \
#define this_cpu_or_8(pcp, val) percpu_to_op(8, volatile, "or", (pcp), val)
#define this_cpu_add_return_8(pcp, val) percpu_add_return_op(8, volatile, pcp, val)
#define this_cpu_xchg_8(pcp, nval) percpu_xchg_op(8, volatile, pcp, nval)
-#define this_cpu_cmpxchg_8(pcp, oval, nval) percpu_cmpxchg_op(volatile, pcp, oval, nval)
+#define this_cpu_cmpxchg_8(pcp, oval, nval) percpu_cmpxchg_op(8, volatile, pcp, oval, nval)

/*
* Pretty complex macro to generate cmpxchg16 instruction. The instruction
--
2.28.0.rc0.105.gf9edc3c819-goog

2020-07-20 20:54:33

by Nick Desaulniers

[permalink] [raw]
Subject: [PATCH v3 09/11] x86/percpu: Clean up percpu_stable_op()

From: Brian Gerst <[email protected]>

Use __pcpu_size_call_return() to simplify this_cpu_read_stable().
Also remove __bad_percpu_size() which is now unused.

Tested-by: Nick Desaulniers <[email protected]>
Reviewed-by: Nick Desaulniers <[email protected]>
Acked-by: Linus Torvalds <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Brian Gerst <[email protected]>
Signed-off-by: Nick Desaulniers <[email protected]>
---
arch/x86/include/asm/percpu.h | 41 ++++++++++-------------------------
1 file changed, 12 insertions(+), 29 deletions(-)

diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
index 7efc0b5c4ff0..cf2b9c2a241e 100644
--- a/arch/x86/include/asm/percpu.h
+++ b/arch/x86/include/asm/percpu.h
@@ -85,7 +85,6 @@

/* For arch-specific code, we can use direct single-insn ops (they
* don't give an lvalue though). */
-extern void __bad_percpu_size(void);

#define __pcpu_type_1 u8
#define __pcpu_type_2 u16
@@ -167,33 +166,13 @@ do { \
(typeof(_var))(unsigned long) pfo_val__; \
})

-#define percpu_stable_op(op, var) \
-({ \
- typeof(var) pfo_ret__; \
- switch (sizeof(var)) { \
- case 1: \
- asm(op "b "__percpu_arg(P1)",%0" \
- : "=q" (pfo_ret__) \
- : "p" (&(var))); \
- break; \
- case 2: \
- asm(op "w "__percpu_arg(P1)",%0" \
- : "=r" (pfo_ret__) \
- : "p" (&(var))); \
- break; \
- case 4: \
- asm(op "l "__percpu_arg(P1)",%0" \
- : "=r" (pfo_ret__) \
- : "p" (&(var))); \
- break; \
- case 8: \
- asm(op "q "__percpu_arg(P1)",%0" \
- : "=r" (pfo_ret__) \
- : "p" (&(var))); \
- break; \
- default: __bad_percpu_size(); \
- } \
- pfo_ret__; \
+#define percpu_stable_op(size, op, _var) \
+({ \
+ __pcpu_type_##size pfo_val__; \
+ asm(__pcpu_op2_##size(op, __percpu_arg(P[var]), "%[val]") \
+ : [val] __pcpu_reg_##size("=", pfo_val__) \
+ : [var] "p" (&(_var))); \
+ (typeof(_var))(unsigned long) pfo_val__; \
})

/*
@@ -258,7 +237,11 @@ do { \
* per-thread variables implemented as per-cpu variables and thus
* stable for the duration of the respective task.
*/
-#define this_cpu_read_stable(var) percpu_stable_op("mov", var)
+#define this_cpu_read_stable_1(pcp) percpu_stable_op(1, "mov", pcp)
+#define this_cpu_read_stable_2(pcp) percpu_stable_op(2, "mov", pcp)
+#define this_cpu_read_stable_4(pcp) percpu_stable_op(4, "mov", pcp)
+#define this_cpu_read_stable_8(pcp) percpu_stable_op(8, "mov", pcp)
+#define this_cpu_read_stable(pcp) __pcpu_size_call_return(this_cpu_read_stable_, pcp)

#define raw_cpu_read_1(pcp) percpu_from_op(1, , "mov", pcp)
#define raw_cpu_read_2(pcp) percpu_from_op(2, , "mov", pcp)
--
2.28.0.rc0.105.gf9edc3c819-goog

2020-07-21 08:10:08

by Sedat Dilek

[permalink] [raw]
Subject: Re: [PATCH v3 00/11] i386 Clang support

On Mon, Jul 20, 2020 at 10:49 PM 'Nick Desaulniers' via Clang Built
Linux <[email protected]> wrote:
>
> Resend of Brian's v2 with Acks from Peter and Linus collected, as well
> as the final patch (mine) added. The commit of the final patch discusses
> some of the architectural differences between GCC and Clang, and the
> kernels tickling of this difference for i386, which necessitated these
> patches.
>
> Brian Gerst (10):
> x86/percpu: Introduce size abstraction macros
> x86/percpu: Clean up percpu_to_op()
> x86/percpu: Clean up percpu_from_op()
> x86/percpu: Clean up percpu_add_op()
> x86/percpu: Remove "e" constraint from XADD
> x86/percpu: Clean up percpu_add_return_op()
> x86/percpu: Clean up percpu_xchg_op()
> x86/percpu: Clean up percpu_cmpxchg_op()
> x86/percpu: Clean up percpu_stable_op()
> x86/percpu: Remove unused PER_CPU() macro
>
> Nick Desaulniers (1):
> x86: support i386 with Clang
>
> arch/x86/include/asm/percpu.h | 510 +++++++++++----------------------
> arch/x86/include/asm/uaccess.h | 4 +-
> 2 files changed, 175 insertions(+), 339 deletions(-)
>
> --
> 2.28.0.rc0.105.gf9edc3c819-goog
>

Hi,

I have tested this patchset v3 on top of Linux v5.8-rc6 with a
selfmade llvm-toolchain (snapshot version: v11+git-e05c7e400f3a plus
cherry-picked 8b354cc8db41).

This happened out of interest if this is in good shape together with
my Debian AMD64 system.
I checked my build-log and dmesg-output and it looks good.

Feel free to add my...

Tested-by: Sedat Dilek <[email protected]>

Thanks.

Regards,
- Sedat -

2020-07-21 22:29:55

by Dennis Zhou

[permalink] [raw]
Subject: Re: [PATCH v3 00/11] i386 Clang support

On Mon, Jul 20, 2020 at 01:49:14PM -0700, Nick Desaulniers wrote:
> Resend of Brian's v2 with Acks from Peter and Linus collected, as well
> as the final patch (mine) added. The commit of the final patch discusses
> some of the architectural differences between GCC and Clang, and the
> kernels tickling of this difference for i386, which necessitated these
> patches.
>
> Brian Gerst (10):
> x86/percpu: Introduce size abstraction macros
> x86/percpu: Clean up percpu_to_op()
> x86/percpu: Clean up percpu_from_op()
> x86/percpu: Clean up percpu_add_op()
> x86/percpu: Remove "e" constraint from XADD
> x86/percpu: Clean up percpu_add_return_op()
> x86/percpu: Clean up percpu_xchg_op()
> x86/percpu: Clean up percpu_cmpxchg_op()
> x86/percpu: Clean up percpu_stable_op()
> x86/percpu: Remove unused PER_CPU() macro
>
> Nick Desaulniers (1):
> x86: support i386 with Clang
>
> arch/x86/include/asm/percpu.h | 510 +++++++++++----------------------
> arch/x86/include/asm/uaccess.h | 4 +-
> 2 files changed, 175 insertions(+), 339 deletions(-)
>
> --
> 2.28.0.rc0.105.gf9edc3c819-goog
>

This looks great to me! I applied it to for-5.9.

Thanks,
Dennis

2020-07-22 23:09:22

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH v3 00/11] i386 Clang support

Dennis Zhou <[email protected]> writes:
> On Mon, Jul 20, 2020 at 01:49:14PM -0700, Nick Desaulniers wrote:
>> Resend of Brian's v2 with Acks from Peter and Linus collected, as well
>> as the final patch (mine) added. The commit of the final patch discusses
>> some of the architectural differences between GCC and Clang, and the
>> kernels tickling of this difference for i386, which necessitated these
>> patches.
>>
>> Brian Gerst (10):
>> x86/percpu: Introduce size abstraction macros
>> x86/percpu: Clean up percpu_to_op()
>> x86/percpu: Clean up percpu_from_op()
>> x86/percpu: Clean up percpu_add_op()
>> x86/percpu: Remove "e" constraint from XADD
>> x86/percpu: Clean up percpu_add_return_op()
>> x86/percpu: Clean up percpu_xchg_op()
>> x86/percpu: Clean up percpu_cmpxchg_op()
>> x86/percpu: Clean up percpu_stable_op()
>> x86/percpu: Remove unused PER_CPU() macro
>>
>> Nick Desaulniers (1):
>> x86: support i386 with Clang
>>
>> arch/x86/include/asm/percpu.h | 510 +++++++++++----------------------
>> arch/x86/include/asm/uaccess.h | 4 +-
>> 2 files changed, 175 insertions(+), 339 deletions(-)
>>
>> --
>> 2.28.0.rc0.105.gf9edc3c819-goog
>>
>
> This looks great to me! I applied it to for-5.9.

You applied it? I'm not aware that you're maintaining x86 nowadays.

Thanks,

tglx

2020-07-22 23:28:08

by Dennis Zhou

[permalink] [raw]
Subject: Re: [PATCH v3 00/11] i386 Clang support

On Thu, Jul 23, 2020 at 01:08:42AM +0200, Thomas Gleixner wrote:
> Dennis Zhou <[email protected]> writes:
> > On Mon, Jul 20, 2020 at 01:49:14PM -0700, Nick Desaulniers wrote:
> >> Resend of Brian's v2 with Acks from Peter and Linus collected, as well
> >> as the final patch (mine) added. The commit of the final patch discusses
> >> some of the architectural differences between GCC and Clang, and the
> >> kernels tickling of this difference for i386, which necessitated these
> >> patches.
> >>
> >> Brian Gerst (10):
> >> x86/percpu: Introduce size abstraction macros
> >> x86/percpu: Clean up percpu_to_op()
> >> x86/percpu: Clean up percpu_from_op()
> >> x86/percpu: Clean up percpu_add_op()
> >> x86/percpu: Remove "e" constraint from XADD
> >> x86/percpu: Clean up percpu_add_return_op()
> >> x86/percpu: Clean up percpu_xchg_op()
> >> x86/percpu: Clean up percpu_cmpxchg_op()
> >> x86/percpu: Clean up percpu_stable_op()
> >> x86/percpu: Remove unused PER_CPU() macro
> >>
> >> Nick Desaulniers (1):
> >> x86: support i386 with Clang
> >>
> >> arch/x86/include/asm/percpu.h | 510 +++++++++++----------------------
> >> arch/x86/include/asm/uaccess.h | 4 +-
> >> 2 files changed, 175 insertions(+), 339 deletions(-)
> >>
> >> --
> >> 2.28.0.rc0.105.gf9edc3c819-goog
> >>
> >
> > This looks great to me! I applied it to for-5.9.
>
> You applied it? I'm not aware that you're maintaining x86 nowadays.
>
> Thanks,
>
> tglx

I'm sorry I overstepped. I've dropped them. Please take them with my
ack.

Thanks,
Dennis

2020-07-23 09:15:11

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH v3 11/11] x86: support i386 with Clang

Nick Desaulniers <[email protected]> writes:

I'm glad I looked myself at this.

> We also don't want to swap the use of "=q" with "=r". For 64b, it
> doesn't matter. For 32b, it's possible that a 32b register without a 8b
> lower alias (i.e. ESI, EDI, EBP) is selected which the assembler will
> then reject.

The above is really garbage.

We don't want? It's simply not possible to do so, because ...

64b,32b,8b. For heavens sake is it too much asked to write a changelog
with understandable wording instead of ambiguous abbreviations?

There is no maximum character limit for changelogs.

Thanks,

tglx

2020-07-23 09:20:33

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH v3 11/11] x86: support i386 with Clang

Thomas Gleixner <[email protected]> writes:
> Nick Desaulniers <[email protected]> writes:
>
> I'm glad I looked myself at this.
>
>> We also don't want to swap the use of "=q" with "=r". For 64b, it
>> doesn't matter. For 32b, it's possible that a 32b register without a 8b
>> lower alias (i.e. ESI, EDI, EBP) is selected which the assembler will
>> then reject.
>
> The above is really garbage.
>
> We don't want? It's simply not possible to do so, because ...
>
> 64b,32b,8b. For heavens sake is it too much asked to write a changelog
> with understandable wording instead of ambiguous abbreviations?
>
> There is no maximum character limit for changelogs.

Gah. Hit send too fast.

>> With this, Clang can finally build an i386 defconfig.

With what? I can't find anything which explains the solution at the
conceptual level. Sigh.

Thanks,

tglx

Subject: [tip: x86/asm] x86/percpu: Remove unused PER_CPU() macro

The following commit has been merged into the x86/asm branch of tip:

Commit-ID: 4719ffecbb0659faf1fd39f4b8eb2674f0042890
Gitweb: https://git.kernel.org/tip/4719ffecbb0659faf1fd39f4b8eb2674f0042890
Author: Brian Gerst <[email protected]>
AuthorDate: Mon, 20 Jul 2020 13:49:24 -07:00
Committer: Thomas Gleixner <[email protected]>
CommitterDate: Thu, 23 Jul 2020 11:46:43 +02:00

x86/percpu: Remove unused PER_CPU() macro

Also remove now unused __percpu_mov_op.

Signed-off-by: Brian Gerst <[email protected]>
Signed-off-by: Nick Desaulniers <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Nick Desaulniers <[email protected]>
Tested-by: Sedat Dilek <[email protected]>
Reviewed-by: Nick Desaulniers <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Acked-by: Linus Torvalds <[email protected]>
Acked-by: Dennis Zhou <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]

---
arch/x86/include/asm/percpu.h | 18 ------------------
1 file changed, 18 deletions(-)

diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
index cf2b9c2..a3c33b7 100644
--- a/arch/x86/include/asm/percpu.h
+++ b/arch/x86/include/asm/percpu.h
@@ -4,33 +4,15 @@

#ifdef CONFIG_X86_64
#define __percpu_seg gs
-#define __percpu_mov_op movq
#else
#define __percpu_seg fs
-#define __percpu_mov_op movl
#endif

#ifdef __ASSEMBLY__

-/*
- * PER_CPU finds an address of a per-cpu variable.
- *
- * Args:
- * var - variable name
- * reg - 32bit register
- *
- * The resulting address is stored in the "reg" argument.
- *
- * Example:
- * PER_CPU(cpu_gdt_descr, %ebx)
- */
#ifdef CONFIG_SMP
-#define PER_CPU(var, reg) \
- __percpu_mov_op %__percpu_seg:this_cpu_off, reg; \
- lea var(reg), reg
#define PER_CPU_VAR(var) %__percpu_seg:var
#else /* ! SMP */
-#define PER_CPU(var, reg) __percpu_mov_op $var, reg
#define PER_CPU_VAR(var) var
#endif /* SMP */

Subject: [tip: x86/asm] x86/percpu: Clean up percpu_xchg_op()

The following commit has been merged into the x86/asm branch of tip:

Commit-ID: 73ca542fbabb68deaa90130a8153cab1fa8288fe
Gitweb: https://git.kernel.org/tip/73ca542fbabb68deaa90130a8153cab1fa8288fe
Author: Brian Gerst <[email protected]>
AuthorDate: Mon, 20 Jul 2020 13:49:21 -07:00
Committer: Thomas Gleixner <[email protected]>
CommitterDate: Thu, 23 Jul 2020 11:46:41 +02:00

x86/percpu: Clean up percpu_xchg_op()

The core percpu macros already have a switch on the data size, so the switch
in the x86 code is redundant and produces more dead code.

Also use appropriate types for the width of the instructions. This avoids
errors when compiling with Clang.

Signed-off-by: Brian Gerst <[email protected]>
Signed-off-by: Nick Desaulniers <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Nick Desaulniers <[email protected]>
Tested-by: Sedat Dilek <[email protected]>
Reviewed-by: Nick Desaulniers <[email protected]>
Acked-by: Linus Torvalds <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Acked-by: Dennis Zhou <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]

---
arch/x86/include/asm/percpu.h | 61 ++++++++++------------------------
1 file changed, 18 insertions(+), 43 deletions(-)

diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
index 0776a11..ac6d7e7 100644
--- a/arch/x86/include/asm/percpu.h
+++ b/arch/x86/include/asm/percpu.h
@@ -215,46 +215,21 @@ do { \
* expensive due to the implied lock prefix. The processor cannot prefetch
* cachelines if xchg is used.
*/
-#define percpu_xchg_op(qual, var, nval) \
+#define percpu_xchg_op(size, qual, _var, _nval) \
({ \
- typeof(var) pxo_ret__; \
- typeof(var) pxo_new__ = (nval); \
- switch (sizeof(var)) { \
- case 1: \
- asm qual ("\n\tmov "__percpu_arg(1)",%%al" \
- "\n1:\tcmpxchgb %2, "__percpu_arg(1) \
- "\n\tjnz 1b" \
- : "=&a" (pxo_ret__), "+m" (var) \
- : "q" (pxo_new__) \
- : "memory"); \
- break; \
- case 2: \
- asm qual ("\n\tmov "__percpu_arg(1)",%%ax" \
- "\n1:\tcmpxchgw %2, "__percpu_arg(1) \
- "\n\tjnz 1b" \
- : "=&a" (pxo_ret__), "+m" (var) \
- : "r" (pxo_new__) \
- : "memory"); \
- break; \
- case 4: \
- asm qual ("\n\tmov "__percpu_arg(1)",%%eax" \
- "\n1:\tcmpxchgl %2, "__percpu_arg(1) \
- "\n\tjnz 1b" \
- : "=&a" (pxo_ret__), "+m" (var) \
- : "r" (pxo_new__) \
- : "memory"); \
- break; \
- case 8: \
- asm qual ("\n\tmov "__percpu_arg(1)",%%rax" \
- "\n1:\tcmpxchgq %2, "__percpu_arg(1) \
- "\n\tjnz 1b" \
- : "=&a" (pxo_ret__), "+m" (var) \
- : "r" (pxo_new__) \
- : "memory"); \
- break; \
- default: __bad_percpu_size(); \
- } \
- pxo_ret__; \
+ __pcpu_type_##size pxo_old__; \
+ __pcpu_type_##size pxo_new__ = __pcpu_cast_##size(_nval); \
+ asm qual (__pcpu_op2_##size("mov", __percpu_arg([var]), \
+ "%[oval]") \
+ "\n1:\t" \
+ __pcpu_op2_##size("cmpxchg", "%[nval]", \
+ __percpu_arg([var])) \
+ "\n\tjnz 1b" \
+ : [oval] "=&a" (pxo_old__), \
+ [var] "+m" (_var) \
+ : [nval] __pcpu_reg_##size(, pxo_new__) \
+ : "memory"); \
+ (typeof(_var))(unsigned long) pxo_old__; \
})

/*
@@ -354,9 +329,9 @@ do { \
#define this_cpu_or_1(pcp, val) percpu_to_op(1, volatile, "or", (pcp), val)
#define this_cpu_or_2(pcp, val) percpu_to_op(2, volatile, "or", (pcp), val)
#define this_cpu_or_4(pcp, val) percpu_to_op(4, volatile, "or", (pcp), val)
-#define this_cpu_xchg_1(pcp, nval) percpu_xchg_op(volatile, pcp, nval)
-#define this_cpu_xchg_2(pcp, nval) percpu_xchg_op(volatile, pcp, nval)
-#define this_cpu_xchg_4(pcp, nval) percpu_xchg_op(volatile, pcp, nval)
+#define this_cpu_xchg_1(pcp, nval) percpu_xchg_op(1, volatile, pcp, nval)
+#define this_cpu_xchg_2(pcp, nval) percpu_xchg_op(2, volatile, pcp, nval)
+#define this_cpu_xchg_4(pcp, nval) percpu_xchg_op(4, volatile, pcp, nval)

#define raw_cpu_add_return_1(pcp, val) percpu_add_return_op(1, , pcp, val)
#define raw_cpu_add_return_2(pcp, val) percpu_add_return_op(2, , pcp, val)
@@ -409,7 +384,7 @@ do { \
#define this_cpu_and_8(pcp, val) percpu_to_op(8, volatile, "and", (pcp), val)
#define this_cpu_or_8(pcp, val) percpu_to_op(8, volatile, "or", (pcp), val)
#define this_cpu_add_return_8(pcp, val) percpu_add_return_op(8, volatile, pcp, val)
-#define this_cpu_xchg_8(pcp, nval) percpu_xchg_op(volatile, pcp, nval)
+#define this_cpu_xchg_8(pcp, nval) percpu_xchg_op(8, volatile, pcp, nval)
#define this_cpu_cmpxchg_8(pcp, oval, nval) percpu_cmpxchg_op(volatile, pcp, oval, nval)

/*

Subject: [tip: x86/asm] x86/percpu: Clean up percpu_add_return_op()

The following commit has been merged into the x86/asm branch of tip:

Commit-ID: bbff583b84a130d4d1234d68906c41690575be36
Gitweb: https://git.kernel.org/tip/bbff583b84a130d4d1234d68906c41690575be36
Author: Brian Gerst <[email protected]>
AuthorDate: Mon, 20 Jul 2020 13:49:20 -07:00
Committer: Thomas Gleixner <[email protected]>
CommitterDate: Thu, 23 Jul 2020 11:46:41 +02:00

x86/percpu: Clean up percpu_add_return_op()

The core percpu macros already have a switch on the data size, so the switch
in the x86 code is redundant and produces more dead code.

Also use appropriate types for the width of the instructions. This avoids
errors when compiling with Clang.

Signed-off-by: Brian Gerst <[email protected]>
Signed-off-by: Nick Desaulniers <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Nick Desaulniers <[email protected]>
Tested-by: Sedat Dilek <[email protected]>
Reviewed-by: Nick Desaulniers <[email protected]>
Acked-by: Linus Torvalds <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Acked-by: Dennis Zhou <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]

---
arch/x86/include/asm/percpu.h | 51 ++++++++++------------------------
1 file changed, 16 insertions(+), 35 deletions(-)

diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
index 9bb5440..0776a11 100644
--- a/arch/x86/include/asm/percpu.h
+++ b/arch/x86/include/asm/percpu.h
@@ -199,34 +199,15 @@ do { \
/*
* Add return operation
*/
-#define percpu_add_return_op(qual, var, val) \
+#define percpu_add_return_op(size, qual, _var, _val) \
({ \
- typeof(var) paro_ret__ = val; \
- switch (sizeof(var)) { \
- case 1: \
- asm qual ("xaddb %0, "__percpu_arg(1) \
- : "+q" (paro_ret__), "+m" (var) \
- : : "memory"); \
- break; \
- case 2: \
- asm qual ("xaddw %0, "__percpu_arg(1) \
- : "+r" (paro_ret__), "+m" (var) \
- : : "memory"); \
- break; \
- case 4: \
- asm qual ("xaddl %0, "__percpu_arg(1) \
- : "+r" (paro_ret__), "+m" (var) \
- : : "memory"); \
- break; \
- case 8: \
- asm qual ("xaddq %0, "__percpu_arg(1) \
- : "+r" (paro_ret__), "+m" (var) \
- : : "memory"); \
- break; \
- default: __bad_percpu_size(); \
- } \
- paro_ret__ += val; \
- paro_ret__; \
+ __pcpu_type_##size paro_tmp__ = __pcpu_cast_##size(_val); \
+ asm qual (__pcpu_op2_##size("xadd", "%[tmp]", \
+ __percpu_arg([var])) \
+ : [tmp] __pcpu_reg_##size("+", paro_tmp__), \
+ [var] "+m" (_var) \
+ : : "memory"); \
+ (typeof(_var))(unsigned long) (paro_tmp__ + _val); \
})

/*
@@ -377,16 +358,16 @@ do { \
#define this_cpu_xchg_2(pcp, nval) percpu_xchg_op(volatile, pcp, nval)
#define this_cpu_xchg_4(pcp, nval) percpu_xchg_op(volatile, pcp, nval)

-#define raw_cpu_add_return_1(pcp, val) percpu_add_return_op(, pcp, val)
-#define raw_cpu_add_return_2(pcp, val) percpu_add_return_op(, pcp, val)
-#define raw_cpu_add_return_4(pcp, val) percpu_add_return_op(, pcp, val)
+#define raw_cpu_add_return_1(pcp, val) percpu_add_return_op(1, , pcp, val)
+#define raw_cpu_add_return_2(pcp, val) percpu_add_return_op(2, , pcp, val)
+#define raw_cpu_add_return_4(pcp, val) percpu_add_return_op(4, , pcp, val)
#define raw_cpu_cmpxchg_1(pcp, oval, nval) percpu_cmpxchg_op(, pcp, oval, nval)
#define raw_cpu_cmpxchg_2(pcp, oval, nval) percpu_cmpxchg_op(, pcp, oval, nval)
#define raw_cpu_cmpxchg_4(pcp, oval, nval) percpu_cmpxchg_op(, pcp, oval, nval)

-#define this_cpu_add_return_1(pcp, val) percpu_add_return_op(volatile, pcp, val)
-#define this_cpu_add_return_2(pcp, val) percpu_add_return_op(volatile, pcp, val)
-#define this_cpu_add_return_4(pcp, val) percpu_add_return_op(volatile, pcp, val)
+#define this_cpu_add_return_1(pcp, val) percpu_add_return_op(1, volatile, pcp, val)
+#define this_cpu_add_return_2(pcp, val) percpu_add_return_op(2, volatile, pcp, val)
+#define this_cpu_add_return_4(pcp, val) percpu_add_return_op(4, volatile, pcp, val)
#define this_cpu_cmpxchg_1(pcp, oval, nval) percpu_cmpxchg_op(volatile, pcp, oval, nval)
#define this_cpu_cmpxchg_2(pcp, oval, nval) percpu_cmpxchg_op(volatile, pcp, oval, nval)
#define this_cpu_cmpxchg_4(pcp, oval, nval) percpu_cmpxchg_op(volatile, pcp, oval, nval)
@@ -418,7 +399,7 @@ do { \
#define raw_cpu_add_8(pcp, val) percpu_add_op(8, , (pcp), val)
#define raw_cpu_and_8(pcp, val) percpu_to_op(8, , "and", (pcp), val)
#define raw_cpu_or_8(pcp, val) percpu_to_op(8, , "or", (pcp), val)
-#define raw_cpu_add_return_8(pcp, val) percpu_add_return_op(, pcp, val)
+#define raw_cpu_add_return_8(pcp, val) percpu_add_return_op(8, , pcp, val)
#define raw_cpu_xchg_8(pcp, nval) raw_percpu_xchg_op(pcp, nval)
#define raw_cpu_cmpxchg_8(pcp, oval, nval) percpu_cmpxchg_op(, pcp, oval, nval)

@@ -427,7 +408,7 @@ do { \
#define this_cpu_add_8(pcp, val) percpu_add_op(8, volatile, (pcp), val)
#define this_cpu_and_8(pcp, val) percpu_to_op(8, volatile, "and", (pcp), val)
#define this_cpu_or_8(pcp, val) percpu_to_op(8, volatile, "or", (pcp), val)
-#define this_cpu_add_return_8(pcp, val) percpu_add_return_op(volatile, pcp, val)
+#define this_cpu_add_return_8(pcp, val) percpu_add_return_op(8, volatile, pcp, val)
#define this_cpu_xchg_8(pcp, nval) percpu_xchg_op(volatile, pcp, nval)
#define this_cpu_cmpxchg_8(pcp, oval, nval) percpu_cmpxchg_op(volatile, pcp, oval, nval)

Subject: [tip: x86/asm] x86/percpu: Remove "e" constraint from XADD

The following commit has been merged into the x86/asm branch of tip:

Commit-ID: e4d16defbbde028aeab2026995f0ced4240df6d6
Gitweb: https://git.kernel.org/tip/e4d16defbbde028aeab2026995f0ced4240df6d6
Author: Brian Gerst <[email protected]>
AuthorDate: Mon, 20 Jul 2020 13:49:19 -07:00
Committer: Thomas Gleixner <[email protected]>
CommitterDate: Thu, 23 Jul 2020 11:46:40 +02:00

x86/percpu: Remove "e" constraint from XADD

The "e" constraint represents a constant, but the XADD instruction doesn't
accept immediate operands.

Signed-off-by: Brian Gerst <[email protected]>
Signed-off-by: Nick Desaulniers <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Nick Desaulniers <[email protected]>
Tested-by: Sedat Dilek <[email protected]>
Reviewed-by: Nick Desaulniers <[email protected]>
Acked-by: Linus Torvalds <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Acked-by: Dennis Zhou <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]

---
arch/x86/include/asm/percpu.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
index 2a24f3c..9bb5440 100644
--- a/arch/x86/include/asm/percpu.h
+++ b/arch/x86/include/asm/percpu.h
@@ -220,7 +220,7 @@ do { \
break; \
case 8: \
asm qual ("xaddq %0, "__percpu_arg(1) \
- : "+re" (paro_ret__), "+m" (var) \
+ : "+r" (paro_ret__), "+m" (var) \
: : "memory"); \
break; \
default: __bad_percpu_size(); \

Subject: [tip: x86/asm] x86/percpu: Clean up percpu_add_op()

The following commit has been merged into the x86/asm branch of tip:

Commit-ID: 33e5614a435ff8047d768e6501454ae1cc7f131f
Gitweb: https://git.kernel.org/tip/33e5614a435ff8047d768e6501454ae1cc7f131f
Author: Brian Gerst <[email protected]>
AuthorDate: Mon, 20 Jul 2020 13:49:18 -07:00
Committer: Thomas Gleixner <[email protected]>
CommitterDate: Thu, 23 Jul 2020 11:46:40 +02:00

x86/percpu: Clean up percpu_add_op()

The core percpu macros already have a switch on the data size, so the switch
in the x86 code is redundant and produces more dead code.

Also use appropriate types for the width of the instructions. This avoids
errors when compiling with Clang.

Signed-off-by: Brian Gerst <[email protected]>
Signed-off-by: Nick Desaulniers <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Nick Desaulniers <[email protected]>
Tested-by: Sedat Dilek <[email protected]>
Reviewed-by: Nick Desaulniers <[email protected]>
Acked-by: Linus Torvalds <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Acked-by: Dennis Zhou <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]

---
arch/x86/include/asm/percpu.h | 99 +++++++--------------------------
1 file changed, 22 insertions(+), 77 deletions(-)

diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
index a40d2e0..2a24f3c 100644
--- a/arch/x86/include/asm/percpu.h
+++ b/arch/x86/include/asm/percpu.h
@@ -130,64 +130,32 @@ do { \
: [val] __pcpu_reg_imm_##size(pto_val__)); \
} while (0)

+#define percpu_unary_op(size, qual, op, _var) \
+({ \
+ asm qual (__pcpu_op1_##size(op, __percpu_arg([var])) \
+ : [var] "+m" (_var)); \
+})
+
/*
* Generate a percpu add to memory instruction and optimize code
* if one is added or subtracted.
*/
-#define percpu_add_op(qual, var, val) \
+#define percpu_add_op(size, qual, var, val) \
do { \
- typedef typeof(var) pao_T__; \
const int pao_ID__ = (__builtin_constant_p(val) && \
((val) == 1 || (val) == -1)) ? \
(int)(val) : 0; \
if (0) { \
- pao_T__ pao_tmp__; \
+ typeof(var) pao_tmp__; \
pao_tmp__ = (val); \
(void)pao_tmp__; \
} \
- switch (sizeof(var)) { \
- case 1: \
- if (pao_ID__ == 1) \
- asm qual ("incb "__percpu_arg(0) : "+m" (var)); \
- else if (pao_ID__ == -1) \
- asm qual ("decb "__percpu_arg(0) : "+m" (var)); \
- else \
- asm qual ("addb %1, "__percpu_arg(0) \
- : "+m" (var) \
- : "qi" ((pao_T__)(val))); \
- break; \
- case 2: \
- if (pao_ID__ == 1) \
- asm qual ("incw "__percpu_arg(0) : "+m" (var)); \
- else if (pao_ID__ == -1) \
- asm qual ("decw "__percpu_arg(0) : "+m" (var)); \
- else \
- asm qual ("addw %1, "__percpu_arg(0) \
- : "+m" (var) \
- : "ri" ((pao_T__)(val))); \
- break; \
- case 4: \
- if (pao_ID__ == 1) \
- asm qual ("incl "__percpu_arg(0) : "+m" (var)); \
- else if (pao_ID__ == -1) \
- asm qual ("decl "__percpu_arg(0) : "+m" (var)); \
- else \
- asm qual ("addl %1, "__percpu_arg(0) \
- : "+m" (var) \
- : "ri" ((pao_T__)(val))); \
- break; \
- case 8: \
- if (pao_ID__ == 1) \
- asm qual ("incq "__percpu_arg(0) : "+m" (var)); \
- else if (pao_ID__ == -1) \
- asm qual ("decq "__percpu_arg(0) : "+m" (var)); \
- else \
- asm qual ("addq %1, "__percpu_arg(0) \
- : "+m" (var) \
- : "re" ((pao_T__)(val))); \
- break; \
- default: __bad_percpu_size(); \
- } \
+ if (pao_ID__ == 1) \
+ percpu_unary_op(size, qual, "inc", var); \
+ else if (pao_ID__ == -1) \
+ percpu_unary_op(size, qual, "dec", var); \
+ else \
+ percpu_to_op(size, qual, "add", var, val); \
} while (0)

#define percpu_from_op(size, qual, op, _var) \
@@ -228,29 +196,6 @@ do { \
pfo_ret__; \
})

-#define percpu_unary_op(qual, op, var) \
-({ \
- switch (sizeof(var)) { \
- case 1: \
- asm qual (op "b "__percpu_arg(0) \
- : "+m" (var)); \
- break; \
- case 2: \
- asm qual (op "w "__percpu_arg(0) \
- : "+m" (var)); \
- break; \
- case 4: \
- asm qual (op "l "__percpu_arg(0) \
- : "+m" (var)); \
- break; \
- case 8: \
- asm qual (op "q "__percpu_arg(0) \
- : "+m" (var)); \
- break; \
- default: __bad_percpu_size(); \
- } \
-})
-
/*
* Add return operation
*/
@@ -388,9 +333,9 @@ do { \
#define raw_cpu_write_1(pcp, val) percpu_to_op(1, , "mov", (pcp), val)
#define raw_cpu_write_2(pcp, val) percpu_to_op(2, , "mov", (pcp), val)
#define raw_cpu_write_4(pcp, val) percpu_to_op(4, , "mov", (pcp), val)
-#define raw_cpu_add_1(pcp, val) percpu_add_op(, (pcp), val)
-#define raw_cpu_add_2(pcp, val) percpu_add_op(, (pcp), val)
-#define raw_cpu_add_4(pcp, val) percpu_add_op(, (pcp), val)
+#define raw_cpu_add_1(pcp, val) percpu_add_op(1, , (pcp), val)
+#define raw_cpu_add_2(pcp, val) percpu_add_op(2, , (pcp), val)
+#define raw_cpu_add_4(pcp, val) percpu_add_op(4, , (pcp), val)
#define raw_cpu_and_1(pcp, val) percpu_to_op(1, , "and", (pcp), val)
#define raw_cpu_and_2(pcp, val) percpu_to_op(2, , "and", (pcp), val)
#define raw_cpu_and_4(pcp, val) percpu_to_op(4, , "and", (pcp), val)
@@ -419,9 +364,9 @@ do { \
#define this_cpu_write_1(pcp, val) percpu_to_op(1, volatile, "mov", (pcp), val)
#define this_cpu_write_2(pcp, val) percpu_to_op(2, volatile, "mov", (pcp), val)
#define this_cpu_write_4(pcp, val) percpu_to_op(4, volatile, "mov", (pcp), val)
-#define this_cpu_add_1(pcp, val) percpu_add_op(volatile, (pcp), val)
-#define this_cpu_add_2(pcp, val) percpu_add_op(volatile, (pcp), val)
-#define this_cpu_add_4(pcp, val) percpu_add_op(volatile, (pcp), val)
+#define this_cpu_add_1(pcp, val) percpu_add_op(1, volatile, (pcp), val)
+#define this_cpu_add_2(pcp, val) percpu_add_op(2, volatile, (pcp), val)
+#define this_cpu_add_4(pcp, val) percpu_add_op(4, volatile, (pcp), val)
#define this_cpu_and_1(pcp, val) percpu_to_op(1, volatile, "and", (pcp), val)
#define this_cpu_and_2(pcp, val) percpu_to_op(2, volatile, "and", (pcp), val)
#define this_cpu_and_4(pcp, val) percpu_to_op(4, volatile, "and", (pcp), val)
@@ -470,7 +415,7 @@ do { \
#ifdef CONFIG_X86_64
#define raw_cpu_read_8(pcp) percpu_from_op(8, , "mov", pcp)
#define raw_cpu_write_8(pcp, val) percpu_to_op(8, , "mov", (pcp), val)
-#define raw_cpu_add_8(pcp, val) percpu_add_op(, (pcp), val)
+#define raw_cpu_add_8(pcp, val) percpu_add_op(8, , (pcp), val)
#define raw_cpu_and_8(pcp, val) percpu_to_op(8, , "and", (pcp), val)
#define raw_cpu_or_8(pcp, val) percpu_to_op(8, , "or", (pcp), val)
#define raw_cpu_add_return_8(pcp, val) percpu_add_return_op(, pcp, val)
@@ -479,7 +424,7 @@ do { \

#define this_cpu_read_8(pcp) percpu_from_op(8, volatile, "mov", pcp)
#define this_cpu_write_8(pcp, val) percpu_to_op(8, volatile, "mov", (pcp), val)
-#define this_cpu_add_8(pcp, val) percpu_add_op(volatile, (pcp), val)
+#define this_cpu_add_8(pcp, val) percpu_add_op(8, volatile, (pcp), val)
#define this_cpu_and_8(pcp, val) percpu_to_op(8, volatile, "and", (pcp), val)
#define this_cpu_or_8(pcp, val) percpu_to_op(8, volatile, "or", (pcp), val)
#define this_cpu_add_return_8(pcp, val) percpu_add_return_op(volatile, pcp, val)

Subject: [tip: x86/asm] x86/percpu: Introduce size abstraction macros

The following commit has been merged into the x86/asm branch of tip:

Commit-ID: 6865dc3ae93b9acb336ca48bd7b2db3446d89370
Gitweb: https://git.kernel.org/tip/6865dc3ae93b9acb336ca48bd7b2db3446d89370
Author: Brian Gerst <[email protected]>
AuthorDate: Mon, 20 Jul 2020 13:49:15 -07:00
Committer: Thomas Gleixner <[email protected]>
CommitterDate: Thu, 23 Jul 2020 11:46:39 +02:00

x86/percpu: Introduce size abstraction macros

In preparation for cleaning up the percpu operations, define macros for
abstraction based on the width of the operation.

Signed-off-by: Brian Gerst <[email protected]>
Signed-off-by: Nick Desaulniers <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Nick Desaulniers <[email protected]>
Tested-by: Sedat Dilek <[email protected]>
Reviewed-by: Nick Desaulniers <[email protected]>
Acked-by: Linus Torvalds <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Acked-by: Dennis Zhou <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]

---
arch/x86/include/asm/percpu.h | 30 ++++++++++++++++++++++++++++++
1 file changed, 30 insertions(+)

diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
index 2278797..19838e4 100644
--- a/arch/x86/include/asm/percpu.h
+++ b/arch/x86/include/asm/percpu.h
@@ -87,6 +87,36 @@
* don't give an lvalue though). */
extern void __bad_percpu_size(void);

+#define __pcpu_type_1 u8
+#define __pcpu_type_2 u16
+#define __pcpu_type_4 u32
+#define __pcpu_type_8 u64
+
+#define __pcpu_cast_1(val) ((u8)(((unsigned long) val) & 0xff))
+#define __pcpu_cast_2(val) ((u16)(((unsigned long) val) & 0xffff))
+#define __pcpu_cast_4(val) ((u32)(((unsigned long) val) & 0xffffffff))
+#define __pcpu_cast_8(val) ((u64)(val))
+
+#define __pcpu_op1_1(op, dst) op "b " dst
+#define __pcpu_op1_2(op, dst) op "w " dst
+#define __pcpu_op1_4(op, dst) op "l " dst
+#define __pcpu_op1_8(op, dst) op "q " dst
+
+#define __pcpu_op2_1(op, src, dst) op "b " src ", " dst
+#define __pcpu_op2_2(op, src, dst) op "w " src ", " dst
+#define __pcpu_op2_4(op, src, dst) op "l " src ", " dst
+#define __pcpu_op2_8(op, src, dst) op "q " src ", " dst
+
+#define __pcpu_reg_1(mod, x) mod "q" (x)
+#define __pcpu_reg_2(mod, x) mod "r" (x)
+#define __pcpu_reg_4(mod, x) mod "r" (x)
+#define __pcpu_reg_8(mod, x) mod "r" (x)
+
+#define __pcpu_reg_imm_1(x) "qi" (x)
+#define __pcpu_reg_imm_2(x) "ri" (x)
+#define __pcpu_reg_imm_4(x) "ri" (x)
+#define __pcpu_reg_imm_8(x) "re" (x)
+
#define percpu_to_op(qual, op, var, val) \
do { \
typedef typeof(var) pto_T__; \

Subject: [tip: x86/asm] x86/percpu: Clean up percpu_stable_op()

The following commit has been merged into the x86/asm branch of tip:

Commit-ID: c94055fe93c8d00bfa23fa2cb9af080f7fc53aa0
Gitweb: https://git.kernel.org/tip/c94055fe93c8d00bfa23fa2cb9af080f7fc53aa0
Author: Brian Gerst <[email protected]>
AuthorDate: Mon, 20 Jul 2020 13:49:23 -07:00
Committer: Thomas Gleixner <[email protected]>
CommitterDate: Thu, 23 Jul 2020 11:46:42 +02:00

x86/percpu: Clean up percpu_stable_op()

Use __pcpu_size_call_return() to simplify this_cpu_read_stable().
Also remove __bad_percpu_size() which is now unused.

Signed-off-by: Brian Gerst <[email protected]>
Signed-off-by: Nick Desaulniers <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Nick Desaulniers <[email protected]>
Tested-by: Sedat Dilek <[email protected]>
Reviewed-by: Nick Desaulniers <[email protected]>
Acked-by: Linus Torvalds <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Acked-by: Dennis Zhou <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]

---
arch/x86/include/asm/percpu.h | 41 +++++++++-------------------------
1 file changed, 12 insertions(+), 29 deletions(-)

diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
index 7efc0b5..cf2b9c2 100644
--- a/arch/x86/include/asm/percpu.h
+++ b/arch/x86/include/asm/percpu.h
@@ -85,7 +85,6 @@

/* For arch-specific code, we can use direct single-insn ops (they
* don't give an lvalue though). */
-extern void __bad_percpu_size(void);

#define __pcpu_type_1 u8
#define __pcpu_type_2 u16
@@ -167,33 +166,13 @@ do { \
(typeof(_var))(unsigned long) pfo_val__; \
})

-#define percpu_stable_op(op, var) \
-({ \
- typeof(var) pfo_ret__; \
- switch (sizeof(var)) { \
- case 1: \
- asm(op "b "__percpu_arg(P1)",%0" \
- : "=q" (pfo_ret__) \
- : "p" (&(var))); \
- break; \
- case 2: \
- asm(op "w "__percpu_arg(P1)",%0" \
- : "=r" (pfo_ret__) \
- : "p" (&(var))); \
- break; \
- case 4: \
- asm(op "l "__percpu_arg(P1)",%0" \
- : "=r" (pfo_ret__) \
- : "p" (&(var))); \
- break; \
- case 8: \
- asm(op "q "__percpu_arg(P1)",%0" \
- : "=r" (pfo_ret__) \
- : "p" (&(var))); \
- break; \
- default: __bad_percpu_size(); \
- } \
- pfo_ret__; \
+#define percpu_stable_op(size, op, _var) \
+({ \
+ __pcpu_type_##size pfo_val__; \
+ asm(__pcpu_op2_##size(op, __percpu_arg(P[var]), "%[val]") \
+ : [val] __pcpu_reg_##size("=", pfo_val__) \
+ : [var] "p" (&(_var))); \
+ (typeof(_var))(unsigned long) pfo_val__; \
})

/*
@@ -258,7 +237,11 @@ do { \
* per-thread variables implemented as per-cpu variables and thus
* stable for the duration of the respective task.
*/
-#define this_cpu_read_stable(var) percpu_stable_op("mov", var)
+#define this_cpu_read_stable_1(pcp) percpu_stable_op(1, "mov", pcp)
+#define this_cpu_read_stable_2(pcp) percpu_stable_op(2, "mov", pcp)
+#define this_cpu_read_stable_4(pcp) percpu_stable_op(4, "mov", pcp)
+#define this_cpu_read_stable_8(pcp) percpu_stable_op(8, "mov", pcp)
+#define this_cpu_read_stable(pcp) __pcpu_size_call_return(this_cpu_read_stable_, pcp)

#define raw_cpu_read_1(pcp) percpu_from_op(1, , "mov", pcp)
#define raw_cpu_read_2(pcp) percpu_from_op(2, , "mov", pcp)

Subject: [tip: x86/asm] x86/uaccess: Make __get_user_size() Clang compliant on 32-bit

The following commit has been merged into the x86/asm branch of tip:

Commit-ID: 158807de5822d1079e162a3762956fd743dd483e
Gitweb: https://git.kernel.org/tip/158807de5822d1079e162a3762956fd743dd483e
Author: Nick Desaulniers <[email protected]>
AuthorDate: Mon, 20 Jul 2020 13:49:25 -07:00
Committer: Thomas Gleixner <[email protected]>
CommitterDate: Thu, 23 Jul 2020 12:38:31 +02:00

x86/uaccess: Make __get_user_size() Clang compliant on 32-bit

Clang fails to compile __get_user_size() on 32-bit for the following code:

long long val;

__get_user(val, usrptr);

with: error: invalid output size for constraint '=q'

GCC compiles the same code without complaints.

The reason is that GCC and Clang are architecturally different, which leads
to subtle issues for code that's invalid but clearly dead, i.e. with code
that emulates polymorphism with the preprocessor and sizeof.

GCC will perform semantic analysis after early inlining and dead code
elimination, so it will not warn on invalid code that's dead. Clang
strictly performs optimizations after semantic analysis, so it will warn
for dead code.

Neither Clang nor GCC like this very much with -m32:

long long ret;
asm ("movb $5, %0" : "=q" (ret));

However, GCC can tolerate this variant:

long long ret;
switch (sizeof(ret)) {
case 1:
asm ("movb $5, %0" : "=q" (ret));
break;
case 8:;
}

Clang, on the other hand, won't accept that because it validates the inline
asm for the '1' case before the optimisation phase where it realises that
it wouldn't have to emit it anyway.

If LLVM (Clang's "back end") fails such as during instruction selection or
register allocation, it cannot provide accurate diagnostics (warnings /
errors) that contain line information, as the AST has been discarded from
memory at that point.

While there have been early discussions about having C/C++ specific
language optimizations in Clang via the use of MLIR, which would enable
such earlier optimizations, such work is not scoped and likely a multi-year
endeavor.

It was discussed to change the asm output constraint for the one byte case
from "=q" to "=r". While it works for 64-bit, it fails on 32-bit. With '=r'
the compiler could fail to chose a register accessible as high/low which is
required for the byte operation. If that happens the assembly will fail.

Use a local temporary variable of type 'unsigned char' as output for the
byte copy inline asm and then assign it to the real output variable. This
prevents Clang from failing the semantic analysis in the above case.

The resulting code for the actual one byte copy is not affected as the
temporary variable is optimized out.

[ tglx: Amended changelog ]

Reported-by: Arnd Bergmann <[email protected]>
Reported-by: David Woodhouse <[email protected]>
Reported-by: Dmitry Golovin <[email protected]>
Reported-by: Linus Torvalds <[email protected]>
Signed-off-by: Nick Desaulniers <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Sedat Dilek <[email protected]>
Acked-by: Linus Torvalds <[email protected]>
Acked-by: Dennis Zhou <[email protected]>
Link: https://bugs.llvm.org/show_bug.cgi?id=33587
Link: https://github.com/ClangBuiltLinux/linux/issues/3
Link: https://github.com/ClangBuiltLinux/linux/issues/194
Link: https://github.com/ClangBuiltLinux/linux/issues/781
Link: https://lore.kernel.org/lkml/[email protected]/
Link: https://lore.kernel.org/lkml/CAK8P3a1EBaWdbAEzirFDSgHVJMtWjuNt2HGG8z+vpXeNHwETFQ@mail.gmail.com/
Link: https://lkml.kernel.org/r/[email protected]
---
arch/x86/include/asm/uaccess.h | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
index 18dfa07..2f3e8f2 100644
--- a/arch/x86/include/asm/uaccess.h
+++ b/arch/x86/include/asm/uaccess.h
@@ -314,11 +314,14 @@ do { \

#define __get_user_size(x, ptr, size, retval) \
do { \
+ unsigned char x_u8__; \
+ \
retval = 0; \
__chk_user_ptr(ptr); \
switch (size) { \
case 1: \
- __get_user_asm(x, ptr, retval, "b", "=q"); \
+ __get_user_asm(x_u8__, ptr, retval, "b", "=q"); \
+ (x) = x_u8__; \
break; \
case 2: \
__get_user_asm(x, ptr, retval, "w", "=r"); \

Subject: [tip: x86/asm] x86/percpu: Clean up percpu_cmpxchg_op()

The following commit has been merged into the x86/asm branch of tip:

Commit-ID: ebcd580bed4a357ea894e6878d9099b3919f727f
Gitweb: https://git.kernel.org/tip/ebcd580bed4a357ea894e6878d9099b3919f727f
Author: Brian Gerst <[email protected]>
AuthorDate: Mon, 20 Jul 2020 13:49:22 -07:00
Committer: Thomas Gleixner <[email protected]>
CommitterDate: Thu, 23 Jul 2020 11:46:42 +02:00

x86/percpu: Clean up percpu_cmpxchg_op()

The core percpu macros already have a switch on the data size, so the switch
in the x86 code is redundant and produces more dead code.

Also use appropriate types for the width of the instructions. This avoids
errors when compiling with Clang.

Signed-off-by: Brian Gerst <[email protected]>
Signed-off-by: Nick Desaulniers <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Nick Desaulniers <[email protected]>
Tested-by: Sedat Dilek <[email protected]>
Reviewed-by: Nick Desaulniers <[email protected]>
Acked-by: Linus Torvalds <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Acked-by: Dennis Zhou <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]

---
arch/x86/include/asm/percpu.h | 58 ++++++++++------------------------
1 file changed, 18 insertions(+), 40 deletions(-)

diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
index ac6d7e7..7efc0b5 100644
--- a/arch/x86/include/asm/percpu.h
+++ b/arch/x86/include/asm/percpu.h
@@ -236,39 +236,17 @@ do { \
* cmpxchg has no such implied lock semantics as a result it is much
* more efficient for cpu local operations.
*/
-#define percpu_cmpxchg_op(qual, var, oval, nval) \
+#define percpu_cmpxchg_op(size, qual, _var, _oval, _nval) \
({ \
- typeof(var) pco_ret__; \
- typeof(var) pco_old__ = (oval); \
- typeof(var) pco_new__ = (nval); \
- switch (sizeof(var)) { \
- case 1: \
- asm qual ("cmpxchgb %2, "__percpu_arg(1) \
- : "=a" (pco_ret__), "+m" (var) \
- : "q" (pco_new__), "0" (pco_old__) \
- : "memory"); \
- break; \
- case 2: \
- asm qual ("cmpxchgw %2, "__percpu_arg(1) \
- : "=a" (pco_ret__), "+m" (var) \
- : "r" (pco_new__), "0" (pco_old__) \
- : "memory"); \
- break; \
- case 4: \
- asm qual ("cmpxchgl %2, "__percpu_arg(1) \
- : "=a" (pco_ret__), "+m" (var) \
- : "r" (pco_new__), "0" (pco_old__) \
- : "memory"); \
- break; \
- case 8: \
- asm qual ("cmpxchgq %2, "__percpu_arg(1) \
- : "=a" (pco_ret__), "+m" (var) \
- : "r" (pco_new__), "0" (pco_old__) \
- : "memory"); \
- break; \
- default: __bad_percpu_size(); \
- } \
- pco_ret__; \
+ __pcpu_type_##size pco_old__ = __pcpu_cast_##size(_oval); \
+ __pcpu_type_##size pco_new__ = __pcpu_cast_##size(_nval); \
+ asm qual (__pcpu_op2_##size("cmpxchg", "%[nval]", \
+ __percpu_arg([var])) \
+ : [oval] "+a" (pco_old__), \
+ [var] "+m" (_var) \
+ : [nval] __pcpu_reg_##size(, pco_new__) \
+ : "memory"); \
+ (typeof(_var))(unsigned long) pco_old__; \
})

/*
@@ -336,16 +314,16 @@ do { \
#define raw_cpu_add_return_1(pcp, val) percpu_add_return_op(1, , pcp, val)
#define raw_cpu_add_return_2(pcp, val) percpu_add_return_op(2, , pcp, val)
#define raw_cpu_add_return_4(pcp, val) percpu_add_return_op(4, , pcp, val)
-#define raw_cpu_cmpxchg_1(pcp, oval, nval) percpu_cmpxchg_op(, pcp, oval, nval)
-#define raw_cpu_cmpxchg_2(pcp, oval, nval) percpu_cmpxchg_op(, pcp, oval, nval)
-#define raw_cpu_cmpxchg_4(pcp, oval, nval) percpu_cmpxchg_op(, pcp, oval, nval)
+#define raw_cpu_cmpxchg_1(pcp, oval, nval) percpu_cmpxchg_op(1, , pcp, oval, nval)
+#define raw_cpu_cmpxchg_2(pcp, oval, nval) percpu_cmpxchg_op(2, , pcp, oval, nval)
+#define raw_cpu_cmpxchg_4(pcp, oval, nval) percpu_cmpxchg_op(4, , pcp, oval, nval)

#define this_cpu_add_return_1(pcp, val) percpu_add_return_op(1, volatile, pcp, val)
#define this_cpu_add_return_2(pcp, val) percpu_add_return_op(2, volatile, pcp, val)
#define this_cpu_add_return_4(pcp, val) percpu_add_return_op(4, volatile, pcp, val)
-#define this_cpu_cmpxchg_1(pcp, oval, nval) percpu_cmpxchg_op(volatile, pcp, oval, nval)
-#define this_cpu_cmpxchg_2(pcp, oval, nval) percpu_cmpxchg_op(volatile, pcp, oval, nval)
-#define this_cpu_cmpxchg_4(pcp, oval, nval) percpu_cmpxchg_op(volatile, pcp, oval, nval)
+#define this_cpu_cmpxchg_1(pcp, oval, nval) percpu_cmpxchg_op(1, volatile, pcp, oval, nval)
+#define this_cpu_cmpxchg_2(pcp, oval, nval) percpu_cmpxchg_op(2, volatile, pcp, oval, nval)
+#define this_cpu_cmpxchg_4(pcp, oval, nval) percpu_cmpxchg_op(4, volatile, pcp, oval, nval)

#ifdef CONFIG_X86_CMPXCHG64
#define percpu_cmpxchg8b_double(pcp1, pcp2, o1, o2, n1, n2) \
@@ -376,7 +354,7 @@ do { \
#define raw_cpu_or_8(pcp, val) percpu_to_op(8, , "or", (pcp), val)
#define raw_cpu_add_return_8(pcp, val) percpu_add_return_op(8, , pcp, val)
#define raw_cpu_xchg_8(pcp, nval) raw_percpu_xchg_op(pcp, nval)
-#define raw_cpu_cmpxchg_8(pcp, oval, nval) percpu_cmpxchg_op(, pcp, oval, nval)
+#define raw_cpu_cmpxchg_8(pcp, oval, nval) percpu_cmpxchg_op(8, , pcp, oval, nval)

#define this_cpu_read_8(pcp) percpu_from_op(8, volatile, "mov", pcp)
#define this_cpu_write_8(pcp, val) percpu_to_op(8, volatile, "mov", (pcp), val)
@@ -385,7 +363,7 @@ do { \
#define this_cpu_or_8(pcp, val) percpu_to_op(8, volatile, "or", (pcp), val)
#define this_cpu_add_return_8(pcp, val) percpu_add_return_op(8, volatile, pcp, val)
#define this_cpu_xchg_8(pcp, nval) percpu_xchg_op(8, volatile, pcp, nval)
-#define this_cpu_cmpxchg_8(pcp, oval, nval) percpu_cmpxchg_op(volatile, pcp, oval, nval)
+#define this_cpu_cmpxchg_8(pcp, oval, nval) percpu_cmpxchg_op(8, volatile, pcp, oval, nval)

/*
* Pretty complex macro to generate cmpxchg16 instruction. The instruction

Subject: [tip: x86/asm] x86/percpu: Clean up percpu_to_op()

The following commit has been merged into the x86/asm branch of tip:

Commit-ID: c175acc14719e69ecec4dafbb642a7f38c76c064
Gitweb: https://git.kernel.org/tip/c175acc14719e69ecec4dafbb642a7f38c76c064
Author: Brian Gerst <[email protected]>
AuthorDate: Mon, 20 Jul 2020 13:49:16 -07:00
Committer: Thomas Gleixner <[email protected]>
CommitterDate: Thu, 23 Jul 2020 11:46:39 +02:00

x86/percpu: Clean up percpu_to_op()

The core percpu macros already have a switch on the data size, so the switch
in the x86 code is redundant and produces more dead code.

Also use appropriate types for the width of the instructions. This avoids
errors when compiling with Clang.

Signed-off-by: Brian Gerst <[email protected]>
Signed-off-by: Nick Desaulniers <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Nick Desaulniers <[email protected]>
Tested-by: Sedat Dilek <[email protected]>
Reviewed-by: Nick Desaulniers <[email protected]>
Acked-by: Linus Torvalds <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Acked-by: Dennis Zhou <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]

---
arch/x86/include/asm/percpu.h | 90 +++++++++++++---------------------
1 file changed, 35 insertions(+), 55 deletions(-)

diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
index 19838e4..fb280fb 100644
--- a/arch/x86/include/asm/percpu.h
+++ b/arch/x86/include/asm/percpu.h
@@ -117,37 +117,17 @@ extern void __bad_percpu_size(void);
#define __pcpu_reg_imm_4(x) "ri" (x)
#define __pcpu_reg_imm_8(x) "re" (x)

-#define percpu_to_op(qual, op, var, val) \
-do { \
- typedef typeof(var) pto_T__; \
- if (0) { \
- pto_T__ pto_tmp__; \
- pto_tmp__ = (val); \
- (void)pto_tmp__; \
- } \
- switch (sizeof(var)) { \
- case 1: \
- asm qual (op "b %1,"__percpu_arg(0) \
- : "+m" (var) \
- : "qi" ((pto_T__)(val))); \
- break; \
- case 2: \
- asm qual (op "w %1,"__percpu_arg(0) \
- : "+m" (var) \
- : "ri" ((pto_T__)(val))); \
- break; \
- case 4: \
- asm qual (op "l %1,"__percpu_arg(0) \
- : "+m" (var) \
- : "ri" ((pto_T__)(val))); \
- break; \
- case 8: \
- asm qual (op "q %1,"__percpu_arg(0) \
- : "+m" (var) \
- : "re" ((pto_T__)(val))); \
- break; \
- default: __bad_percpu_size(); \
- } \
+#define percpu_to_op(size, qual, op, _var, _val) \
+do { \
+ __pcpu_type_##size pto_val__ = __pcpu_cast_##size(_val); \
+ if (0) { \
+ typeof(_var) pto_tmp__; \
+ pto_tmp__ = (_val); \
+ (void)pto_tmp__; \
+ } \
+ asm qual(__pcpu_op2_##size(op, "%[val]", __percpu_arg([var])) \
+ : [var] "+m" (_var) \
+ : [val] __pcpu_reg_imm_##size(pto_val__)); \
} while (0)

/*
@@ -425,18 +405,18 @@ do { \
#define raw_cpu_read_2(pcp) percpu_from_op(, "mov", pcp)
#define raw_cpu_read_4(pcp) percpu_from_op(, "mov", pcp)

-#define raw_cpu_write_1(pcp, val) percpu_to_op(, "mov", (pcp), val)
-#define raw_cpu_write_2(pcp, val) percpu_to_op(, "mov", (pcp), val)
-#define raw_cpu_write_4(pcp, val) percpu_to_op(, "mov", (pcp), val)
+#define raw_cpu_write_1(pcp, val) percpu_to_op(1, , "mov", (pcp), val)
+#define raw_cpu_write_2(pcp, val) percpu_to_op(2, , "mov", (pcp), val)
+#define raw_cpu_write_4(pcp, val) percpu_to_op(4, , "mov", (pcp), val)
#define raw_cpu_add_1(pcp, val) percpu_add_op(, (pcp), val)
#define raw_cpu_add_2(pcp, val) percpu_add_op(, (pcp), val)
#define raw_cpu_add_4(pcp, val) percpu_add_op(, (pcp), val)
-#define raw_cpu_and_1(pcp, val) percpu_to_op(, "and", (pcp), val)
-#define raw_cpu_and_2(pcp, val) percpu_to_op(, "and", (pcp), val)
-#define raw_cpu_and_4(pcp, val) percpu_to_op(, "and", (pcp), val)
-#define raw_cpu_or_1(pcp, val) percpu_to_op(, "or", (pcp), val)
-#define raw_cpu_or_2(pcp, val) percpu_to_op(, "or", (pcp), val)
-#define raw_cpu_or_4(pcp, val) percpu_to_op(, "or", (pcp), val)
+#define raw_cpu_and_1(pcp, val) percpu_to_op(1, , "and", (pcp), val)
+#define raw_cpu_and_2(pcp, val) percpu_to_op(2, , "and", (pcp), val)
+#define raw_cpu_and_4(pcp, val) percpu_to_op(4, , "and", (pcp), val)
+#define raw_cpu_or_1(pcp, val) percpu_to_op(1, , "or", (pcp), val)
+#define raw_cpu_or_2(pcp, val) percpu_to_op(2, , "or", (pcp), val)
+#define raw_cpu_or_4(pcp, val) percpu_to_op(4, , "or", (pcp), val)

/*
* raw_cpu_xchg() can use a load-store since it is not required to be
@@ -456,18 +436,18 @@ do { \
#define this_cpu_read_1(pcp) percpu_from_op(volatile, "mov", pcp)
#define this_cpu_read_2(pcp) percpu_from_op(volatile, "mov", pcp)
#define this_cpu_read_4(pcp) percpu_from_op(volatile, "mov", pcp)
-#define this_cpu_write_1(pcp, val) percpu_to_op(volatile, "mov", (pcp), val)
-#define this_cpu_write_2(pcp, val) percpu_to_op(volatile, "mov", (pcp), val)
-#define this_cpu_write_4(pcp, val) percpu_to_op(volatile, "mov", (pcp), val)
+#define this_cpu_write_1(pcp, val) percpu_to_op(1, volatile, "mov", (pcp), val)
+#define this_cpu_write_2(pcp, val) percpu_to_op(2, volatile, "mov", (pcp), val)
+#define this_cpu_write_4(pcp, val) percpu_to_op(4, volatile, "mov", (pcp), val)
#define this_cpu_add_1(pcp, val) percpu_add_op(volatile, (pcp), val)
#define this_cpu_add_2(pcp, val) percpu_add_op(volatile, (pcp), val)
#define this_cpu_add_4(pcp, val) percpu_add_op(volatile, (pcp), val)
-#define this_cpu_and_1(pcp, val) percpu_to_op(volatile, "and", (pcp), val)
-#define this_cpu_and_2(pcp, val) percpu_to_op(volatile, "and", (pcp), val)
-#define this_cpu_and_4(pcp, val) percpu_to_op(volatile, "and", (pcp), val)
-#define this_cpu_or_1(pcp, val) percpu_to_op(volatile, "or", (pcp), val)
-#define this_cpu_or_2(pcp, val) percpu_to_op(volatile, "or", (pcp), val)
-#define this_cpu_or_4(pcp, val) percpu_to_op(volatile, "or", (pcp), val)
+#define this_cpu_and_1(pcp, val) percpu_to_op(1, volatile, "and", (pcp), val)
+#define this_cpu_and_2(pcp, val) percpu_to_op(2, volatile, "and", (pcp), val)
+#define this_cpu_and_4(pcp, val) percpu_to_op(4, volatile, "and", (pcp), val)
+#define this_cpu_or_1(pcp, val) percpu_to_op(1, volatile, "or", (pcp), val)
+#define this_cpu_or_2(pcp, val) percpu_to_op(2, volatile, "or", (pcp), val)
+#define this_cpu_or_4(pcp, val) percpu_to_op(4, volatile, "or", (pcp), val)
#define this_cpu_xchg_1(pcp, nval) percpu_xchg_op(volatile, pcp, nval)
#define this_cpu_xchg_2(pcp, nval) percpu_xchg_op(volatile, pcp, nval)
#define this_cpu_xchg_4(pcp, nval) percpu_xchg_op(volatile, pcp, nval)
@@ -509,19 +489,19 @@ do { \
*/
#ifdef CONFIG_X86_64
#define raw_cpu_read_8(pcp) percpu_from_op(, "mov", pcp)
-#define raw_cpu_write_8(pcp, val) percpu_to_op(, "mov", (pcp), val)
+#define raw_cpu_write_8(pcp, val) percpu_to_op(8, , "mov", (pcp), val)
#define raw_cpu_add_8(pcp, val) percpu_add_op(, (pcp), val)
-#define raw_cpu_and_8(pcp, val) percpu_to_op(, "and", (pcp), val)
-#define raw_cpu_or_8(pcp, val) percpu_to_op(, "or", (pcp), val)
+#define raw_cpu_and_8(pcp, val) percpu_to_op(8, , "and", (pcp), val)
+#define raw_cpu_or_8(pcp, val) percpu_to_op(8, , "or", (pcp), val)
#define raw_cpu_add_return_8(pcp, val) percpu_add_return_op(, pcp, val)
#define raw_cpu_xchg_8(pcp, nval) raw_percpu_xchg_op(pcp, nval)
#define raw_cpu_cmpxchg_8(pcp, oval, nval) percpu_cmpxchg_op(, pcp, oval, nval)

#define this_cpu_read_8(pcp) percpu_from_op(volatile, "mov", pcp)
-#define this_cpu_write_8(pcp, val) percpu_to_op(volatile, "mov", (pcp), val)
+#define this_cpu_write_8(pcp, val) percpu_to_op(8, volatile, "mov", (pcp), val)
#define this_cpu_add_8(pcp, val) percpu_add_op(volatile, (pcp), val)
-#define this_cpu_and_8(pcp, val) percpu_to_op(volatile, "and", (pcp), val)
-#define this_cpu_or_8(pcp, val) percpu_to_op(volatile, "or", (pcp), val)
+#define this_cpu_and_8(pcp, val) percpu_to_op(8, volatile, "and", (pcp), val)
+#define this_cpu_or_8(pcp, val) percpu_to_op(8, volatile, "or", (pcp), val)
#define this_cpu_add_return_8(pcp, val) percpu_add_return_op(volatile, pcp, val)
#define this_cpu_xchg_8(pcp, nval) percpu_xchg_op(volatile, pcp, nval)
#define this_cpu_cmpxchg_8(pcp, oval, nval) percpu_cmpxchg_op(volatile, pcp, oval, nval)

Subject: [tip: x86/asm] x86/percpu: Clean up percpu_from_op()

The following commit has been merged into the x86/asm branch of tip:

Commit-ID: bb631e3002840706362a7d76e3ebb3604cce91a7
Gitweb: https://git.kernel.org/tip/bb631e3002840706362a7d76e3ebb3604cce91a7
Author: Brian Gerst <[email protected]>
AuthorDate: Mon, 20 Jul 2020 13:49:17 -07:00
Committer: Thomas Gleixner <[email protected]>
CommitterDate: Thu, 23 Jul 2020 11:46:39 +02:00

x86/percpu: Clean up percpu_from_op()

The core percpu macros already have a switch on the data size, so the switch
in the x86 code is redundant and produces more dead code.

Also use appropriate types for the width of the instructions. This avoids
errors when compiling with Clang.

Signed-off-by: Brian Gerst <[email protected]>
Signed-off-by: Nick Desaulniers <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Nick Desaulniers <[email protected]>
Tested-by: Sedat Dilek <[email protected]>
Reviewed-by: Nick Desaulniers <[email protected]>
Acked-by: Linus Torvalds <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Acked-by: Dennis Zhou <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]

---
arch/x86/include/asm/percpu.h | 50 ++++++++++------------------------
1 file changed, 15 insertions(+), 35 deletions(-)

diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
index fb280fb..a40d2e0 100644
--- a/arch/x86/include/asm/percpu.h
+++ b/arch/x86/include/asm/percpu.h
@@ -190,33 +190,13 @@ do { \
} \
} while (0)

-#define percpu_from_op(qual, op, var) \
-({ \
- typeof(var) pfo_ret__; \
- switch (sizeof(var)) { \
- case 1: \
- asm qual (op "b "__percpu_arg(1)",%0" \
- : "=q" (pfo_ret__) \
- : "m" (var)); \
- break; \
- case 2: \
- asm qual (op "w "__percpu_arg(1)",%0" \
- : "=r" (pfo_ret__) \
- : "m" (var)); \
- break; \
- case 4: \
- asm qual (op "l "__percpu_arg(1)",%0" \
- : "=r" (pfo_ret__) \
- : "m" (var)); \
- break; \
- case 8: \
- asm qual (op "q "__percpu_arg(1)",%0" \
- : "=r" (pfo_ret__) \
- : "m" (var)); \
- break; \
- default: __bad_percpu_size(); \
- } \
- pfo_ret__; \
+#define percpu_from_op(size, qual, op, _var) \
+({ \
+ __pcpu_type_##size pfo_val__; \
+ asm qual (__pcpu_op2_##size(op, __percpu_arg([var]), "%[val]") \
+ : [val] __pcpu_reg_##size("=", pfo_val__) \
+ : [var] "m" (_var)); \
+ (typeof(_var))(unsigned long) pfo_val__; \
})

#define percpu_stable_op(op, var) \
@@ -401,9 +381,9 @@ do { \
*/
#define this_cpu_read_stable(var) percpu_stable_op("mov", var)

-#define raw_cpu_read_1(pcp) percpu_from_op(, "mov", pcp)
-#define raw_cpu_read_2(pcp) percpu_from_op(, "mov", pcp)
-#define raw_cpu_read_4(pcp) percpu_from_op(, "mov", pcp)
+#define raw_cpu_read_1(pcp) percpu_from_op(1, , "mov", pcp)
+#define raw_cpu_read_2(pcp) percpu_from_op(2, , "mov", pcp)
+#define raw_cpu_read_4(pcp) percpu_from_op(4, , "mov", pcp)

#define raw_cpu_write_1(pcp, val) percpu_to_op(1, , "mov", (pcp), val)
#define raw_cpu_write_2(pcp, val) percpu_to_op(2, , "mov", (pcp), val)
@@ -433,9 +413,9 @@ do { \
#define raw_cpu_xchg_2(pcp, val) raw_percpu_xchg_op(pcp, val)
#define raw_cpu_xchg_4(pcp, val) raw_percpu_xchg_op(pcp, val)

-#define this_cpu_read_1(pcp) percpu_from_op(volatile, "mov", pcp)
-#define this_cpu_read_2(pcp) percpu_from_op(volatile, "mov", pcp)
-#define this_cpu_read_4(pcp) percpu_from_op(volatile, "mov", pcp)
+#define this_cpu_read_1(pcp) percpu_from_op(1, volatile, "mov", pcp)
+#define this_cpu_read_2(pcp) percpu_from_op(2, volatile, "mov", pcp)
+#define this_cpu_read_4(pcp) percpu_from_op(4, volatile, "mov", pcp)
#define this_cpu_write_1(pcp, val) percpu_to_op(1, volatile, "mov", (pcp), val)
#define this_cpu_write_2(pcp, val) percpu_to_op(2, volatile, "mov", (pcp), val)
#define this_cpu_write_4(pcp, val) percpu_to_op(4, volatile, "mov", (pcp), val)
@@ -488,7 +468,7 @@ do { \
* 32 bit must fall back to generic operations.
*/
#ifdef CONFIG_X86_64
-#define raw_cpu_read_8(pcp) percpu_from_op(, "mov", pcp)
+#define raw_cpu_read_8(pcp) percpu_from_op(8, , "mov", pcp)
#define raw_cpu_write_8(pcp, val) percpu_to_op(8, , "mov", (pcp), val)
#define raw_cpu_add_8(pcp, val) percpu_add_op(, (pcp), val)
#define raw_cpu_and_8(pcp, val) percpu_to_op(8, , "and", (pcp), val)
@@ -497,7 +477,7 @@ do { \
#define raw_cpu_xchg_8(pcp, nval) raw_percpu_xchg_op(pcp, nval)
#define raw_cpu_cmpxchg_8(pcp, oval, nval) percpu_cmpxchg_op(, pcp, oval, nval)

-#define this_cpu_read_8(pcp) percpu_from_op(volatile, "mov", pcp)
+#define this_cpu_read_8(pcp) percpu_from_op(8, volatile, "mov", pcp)
#define this_cpu_write_8(pcp, val) percpu_to_op(8, volatile, "mov", (pcp), val)
#define this_cpu_add_8(pcp, val) percpu_add_op(volatile, (pcp), val)
#define this_cpu_and_8(pcp, val) percpu_to_op(8, volatile, "and", (pcp), val)

2020-07-23 11:09:58

by Sedat Dilek

[permalink] [raw]
Subject: Re: [PATCH v3 11/11] x86: support i386 with Clang

On Thu, Jul 23, 2020 at 11:17 AM Thomas Gleixner <[email protected]> wrote:
>
> Thomas Gleixner <[email protected]> writes:
> > Nick Desaulniers <[email protected]> writes:
> >
> > I'm glad I looked myself at this.
> >
> >> We also don't want to swap the use of "=q" with "=r". For 64b, it
> >> doesn't matter. For 32b, it's possible that a 32b register without a 8b
> >> lower alias (i.e. ESI, EDI, EBP) is selected which the assembler will
> >> then reject.
> >
> > The above is really garbage.
> >
> > We don't want? It's simply not possible to do so, because ...
> >
> > 64b,32b,8b. For heavens sake is it too much asked to write a changelog
> > with understandable wording instead of ambiguous abbreviations?
> >
> > There is no maximum character limit for changelogs.
>
> Gah. Hit send too fast.
>
> >> With this, Clang can finally build an i386 defconfig.
>
> With what? I can't find anything which explains the solution at the
> conceptual level. Sigh.
>

Hi,

I have applied this patch-series v3 but some basics of "i386" usage
are not clear to me when I wanted to test it and give some feedback.

[1] is the original place in CBL where this was reported and I have
commented on this.

Beyond some old cruft in i386_defconfig like non-existent
"CONFIG_CRYPTO_AES_586" I have some fundamental questions:

What means "ARCH=i386" and where it is used (for)?

I can do:

$ ARCH=x86 make V=1 -j3 $MAKE_OPTS i386_defconfig
$ make V=1 -j3 $MAKE_OPTS i386_defconfig

...which results in the same .config.

Whereas when I do:

$ ARCH=i386 make V=1 -j3 $MAKE_OPTS i386_defconfig

...drops CONFIG_64BIT line entirely.

But "# CONFIG_64BIT is not set" is explicitly set in
arch/x86/configs/i386_defconfig but gets dropped.

Unsure if above is the same like:
$ ARCH=i386 make V=1 -j3 $MAKE_OPTS defconfig

When generating via "make ... i386_defconfig" modern gcc-9 and and a
snapshot version of clang-11 build both with:

$ ARCH=x86 make V=1 -j3 $MAKE_OPTS
... -march=i686 -mtune=generic ...

Checking generated .config reveals:

CONFIG_M686=y

So, I guess modern compilers do at least support "i686" as lowest CPU?

Doing some grep+ping:

$ git grep "ARCH=i386"
Documentation/kbuild/headers_install.rst: make headers_install
ARCH=i386 INSTALL_HDR_PATH=/usr
tools/testing/ktest/examples/crosstests.conf:MAKE_CMD = make ARCH=i386
tools/testing/ktest/sample.conf:#MAKE_CMD = CC=i386-gcc AS=i386-as
make ARCH=i386

i386-gcc / i386-as - someone uses that?

Again my question (I did not do a diff):

$ make headers_install ARCH=i386 INSTALL_HDR_PATH=/usr
$ make headers_install ARCH=x86 INSTALL_HDR_PATH=/usr

...should generate the same?

To come back to "i386" again:

$ git grep i386 | grep ARCH

...reveals in top-level Makefile [2]:

376: # Additional ARCH settings for x86
377: ifeq ($(ARCH),i386)
378: SRCARCH := x86

For me this means:

ARCH=i386 make ...
ARCH=x86 make ...

...should result in the same .config, but for what reason CONFIG_64BIT
is dropped when "ARCH=i386" is used.

Coming to a conclusion:

Nick D. says:
> I usually test with make ... i386_defconfig.

Can you enlighten a bit?

Of course, I can send a patch to remove the "CONFIG_CRYPTO_AES_586=y"
line from i386_defconfig.

Thanks.

Regards,
- Sedat -

[1] https://github.com/ClangBuiltLinux/linux/issues/194
[2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Makefile#n376

2020-07-23 11:43:25

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH v3 11/11] x86: support i386 with Clang

On Thu, Jul 23, 2020 at 1:07 PM Sedat Dilek <[email protected]> wrote:
> On Thu, Jul 23, 2020 at 11:17 AM Thomas Gleixner <[email protected]> wrote:
> > Thomas Gleixner <[email protected]> writes:
>
> I have applied this patch-series v3 but some basics of "i386" usage
> are not clear to me when I wanted to test it and give some feedback.
>
> [1] is the original place in CBL where this was reported and I have
> commented on this.
>
> Beyond some old cruft in i386_defconfig like non-existent
> "CONFIG_CRYPTO_AES_586" I have some fundamental questions:
>
> What means "ARCH=i386" and where it is used (for)?
>
> I can do:
>
> $ ARCH=x86 make V=1 -j3 $MAKE_OPTS i386_defconfig
> $ make V=1 -j3 $MAKE_OPTS i386_defconfig
>
> ...which results in the same .config.
>
> Whereas when I do:
>
> $ ARCH=i386 make V=1 -j3 $MAKE_OPTS i386_defconfig
>
> ...drops CONFIG_64BIT line entirely.
>
> But "# CONFIG_64BIT is not set" is explicitly set in
> arch/x86/configs/i386_defconfig but gets dropped.
>
> Unsure if above is the same like:
> $ ARCH=i386 make V=1 -j3 $MAKE_OPTS defconfig

The logic was introduced when arch/i386 and arch/x86_64 got
merged into arch/x86, to stay compatible with the original behavior
that would produce a 32-bit or 64-bit kernel depending on which
machine you are running on.

There are probably not a lot of people building kernels on 32-bit
machines any more (real 32-bit machines are really slow compared
to modern ones, and 64-bit machines running 32-bit distros usually
want a 64-bit kernel), so it could in theory be changed.

It will certainly break someone's workflow though, so nobody has
proposed actually changing it so far.

> When generating via "make ... i386_defconfig" modern gcc-9 and and a
> snapshot version of clang-11 build both with:
>
> $ ARCH=x86 make V=1 -j3 $MAKE_OPTS
> ... -march=i686 -mtune=generic ...
>
> Checking generated .config reveals:
>
> CONFIG_M686=y
>
> So, I guess modern compilers do at least support "i686" as lowest CPU?

i686 compiler support goes back to the 1990s, and the kernel now
requires at least gcc-4.9 from 2014, so yes.

> Nick D. says:
> > I usually test with make ... i386_defconfig.
>
> Can you enlighten a bit?
>
> Of course, I can send a patch to remove the "CONFIG_CRYPTO_AES_586=y"
> line from i386_defconfig.

The "i386" in i386_defconfig is just a synonym for x86-32, it does not
imply a particular CPU generation. The original i386 is no longer supported,
i486sx (barely) is and in practice most 32-bit Linux code gets compiled
for some variant of i586 or i686 variant but run on 64-bit hardware.

Arnd

2020-07-23 13:17:21

by Sedat Dilek

[permalink] [raw]
Subject: Re: [PATCH v3 11/11] x86: support i386 with Clang

On Thu, Jul 23, 2020 at 1:42 PM Arnd Bergmann <[email protected]> wrote:
>
> On Thu, Jul 23, 2020 at 1:07 PM Sedat Dilek <[email protected]> wrote:
> > On Thu, Jul 23, 2020 at 11:17 AM Thomas Gleixner <[email protected]> wrote:
> > > Thomas Gleixner <[email protected]> writes:
> >
> > I have applied this patch-series v3 but some basics of "i386" usage
> > are not clear to me when I wanted to test it and give some feedback.
> >
> > [1] is the original place in CBL where this was reported and I have
> > commented on this.
> >
> > Beyond some old cruft in i386_defconfig like non-existent
> > "CONFIG_CRYPTO_AES_586" I have some fundamental questions:
> >
> > What means "ARCH=i386" and where it is used (for)?
> >
> > I can do:
> >
> > $ ARCH=x86 make V=1 -j3 $MAKE_OPTS i386_defconfig
> > $ make V=1 -j3 $MAKE_OPTS i386_defconfig
> >
> > ...which results in the same .config.
> >
> > Whereas when I do:
> >
> > $ ARCH=i386 make V=1 -j3 $MAKE_OPTS i386_defconfig
> >
> > ...drops CONFIG_64BIT line entirely.
> >
> > But "# CONFIG_64BIT is not set" is explicitly set in
> > arch/x86/configs/i386_defconfig but gets dropped.
> >
> > Unsure if above is the same like:
> > $ ARCH=i386 make V=1 -j3 $MAKE_OPTS defconfig
>
> The logic was introduced when arch/i386 and arch/x86_64 got
> merged into arch/x86, to stay compatible with the original behavior
> that would produce a 32-bit or 64-bit kernel depending on which
> machine you are running on.
>
> There are probably not a lot of people building kernels on 32-bit
> machines any more (real 32-bit machines are really slow compared
> to modern ones, and 64-bit machines running 32-bit distros usually
> want a 64-bit kernel), so it could in theory be changed.
>
> It will certainly break someone's workflow though, so nobody has
> proposed actually changing it so far.
>
> > When generating via "make ... i386_defconfig" modern gcc-9 and and a
> > snapshot version of clang-11 build both with:
> >
> > $ ARCH=x86 make V=1 -j3 $MAKE_OPTS
> > ... -march=i686 -mtune=generic ...
> >
> > Checking generated .config reveals:
> >
> > CONFIG_M686=y
> >
> > So, I guess modern compilers do at least support "i686" as lowest CPU?
>
> i686 compiler support goes back to the 1990s, and the kernel now
> requires at least gcc-4.9 from 2014, so yes.
>
> > Nick D. says:
> > > I usually test with make ... i386_defconfig.
> >
> > Can you enlighten a bit?
> >
> > Of course, I can send a patch to remove the "CONFIG_CRYPTO_AES_586=y"
> > line from i386_defconfig.
>
> The "i386" in i386_defconfig is just a synonym for x86-32, it does not
> imply a particular CPU generation. The original i386 is no longer supported,
> i486sx (barely) is and in practice most 32-bit Linux code gets compiled
> for some variant of i586 or i686 variant but run on 64-bit hardware.
>

Thanks a lot Arnd for all the detailed informations.

A change of i386_defconfig to x86_defconfig will cause a big cry from
all kernel-bot maintainers :-).

- Sedat -

P.S.: CONFIG_64BIT
What I dropped by accident in my previous mail:
What happens when there is no CONFIG_64BIT line?
There exist explicit checks for (and "inverse") of CONFIG_64BIT like
"ifdef" and "ifndef" or any "defined(...)" and its opposite?
I remember I have seen checks for it in x86 tree.

- EOT -

2020-07-23 14:00:31

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH v3 11/11] x86: support i386 with Clang

On Thu, Jul 23, 2020 at 3:14 PM Sedat Dilek <[email protected]> wrote:
> What happens when there is no CONFIG_64BIT line?
> There exist explicit checks for (and "inverse") of CONFIG_64BIT like
> "ifdef" and "ifndef" or any "defined(...)" and its opposite?
> I remember I have seen checks for it in x86 tree.

As long as you consistently pass ARCH=i386 when running 'make',
nothing bad happens, as ARCH=i386 just hides that option.

If you run "make ARCH=i386 defconfig" followed by "make olddefconfig"
(without ARCH=i386) on a non-i386 machine, the absence of that
CONFIG_64BIT line will lead to the kernel going back to a 64-bit
configuration.

Arnd

2020-07-24 13:22:51

by Sedat Dilek

[permalink] [raw]
Subject: Re: [PATCH v3 11/11] x86: support i386 with Clang

On Thu, Jul 23, 2020 at 3:56 PM Arnd Bergmann <[email protected]> wrote:
>
> On Thu, Jul 23, 2020 at 3:14 PM Sedat Dilek <[email protected]> wrote:
> > What happens when there is no CONFIG_64BIT line?
> > There exist explicit checks for (and "inverse") of CONFIG_64BIT like
> > "ifdef" and "ifndef" or any "defined(...)" and its opposite?
> > I remember I have seen checks for it in x86 tree.
>
> As long as you consistently pass ARCH=i386 when running 'make',
> nothing bad happens, as ARCH=i386 just hides that option.
>
> If you run "make ARCH=i386 defconfig" followed by "make olddefconfig"
> (without ARCH=i386) on a non-i386 machine, the absence of that
> CONFIG_64BIT line will lead to the kernel going back to a 64-bit
> configuration.
>

Again thank you for your feedback.

Unsure if people are aware of the different behaviours and results.

That's why I keep the same make line with and without "defconfig".

Unfortunately, I had no opportunity to test the patchset :-(.

For testing I had done:
$ MAKE_OPTS="..."
$ ARCH=x86 make V=1 -j3 $MAKE_OPTS i386_defconfig (whereas V=1 and -j3
can be dropped of course)
$ ARCH=x86 make V=1 -j3 $MAKE_OPTS

Side-note:
How wonderful my patch "x86/defconfigs: Remove CONFIG_CRYPTO_AES_586
from i386_defconfig" landed in <tip.git#x86/build>.

- Sedat -

[1] https://git.kernel.org/tip/tip/c/6526b12de07588253a52577f42ec99fc7ca26a1f