2020-05-30 22:13:53

by Brian Gerst

[permalink] [raw]
Subject: [PATCH v2 00/10] x86: Clean up percpu operations

The core percpu operations already have a switch on the width of the
data type, which resulted in an extra amount of dead code being
generated with the x86 operations having another switch. This patch set
rewrites the x86 ops to remove the switch. Additional cleanups are to
use named assembly operands, and to cast variables to the width used in
the assembly to make Clang happy.

Changes from v1:
- Add separate patch for XADD constraint fix
- Fixed sparse truncation warning
- Add cleanup of percpu_stable_op()
- Add patch to Remove PER_CPU()

Brian Gerst (10):
x86/percpu: Introduce size abstraction macros
x86/percpu: Clean up percpu_to_op()
x86/percpu: Clean up percpu_from_op()
x86/percpu: Clean up percpu_add_op()
x86/percpu: Remove "e" constraint from XADD
x86/percpu: Clean up percpu_add_return_op()
x86/percpu: Clean up percpu_xchg_op()
x86/percpu: Clean up percpu_cmpxchg_op()
x86/percpu: Clean up percpu_stable_op()
x86/percpu: Remove unused PER_CPU() macro

arch/x86/include/asm/percpu.h | 510 ++++++++++++----------------------
1 file changed, 172 insertions(+), 338 deletions(-)


base-commit: 229aaa8c059f2c908e0561453509f996f2b2d5c4
--
2.25.4


2020-05-30 22:14:02

by Brian Gerst

[permalink] [raw]
Subject: [PATCH v2 06/10] x86/percpu: Clean up percpu_add_return_op()

The core percpu macros already have a switch on the data size, so the switch
in the x86 code is redundant and produces more dead code.

Also use appropriate types for the width of the instructions. This avoids
errors when compiling with Clang.

Signed-off-by: Brian Gerst <[email protected]>
---
arch/x86/include/asm/percpu.h | 51 +++++++++++------------------------
1 file changed, 16 insertions(+), 35 deletions(-)

diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
index 9bb5440d98d3..0776a11e7e11 100644
--- a/arch/x86/include/asm/percpu.h
+++ b/arch/x86/include/asm/percpu.h
@@ -199,34 +199,15 @@ do { \
/*
* Add return operation
*/
-#define percpu_add_return_op(qual, var, val) \
+#define percpu_add_return_op(size, qual, _var, _val) \
({ \
- typeof(var) paro_ret__ = val; \
- switch (sizeof(var)) { \
- case 1: \
- asm qual ("xaddb %0, "__percpu_arg(1) \
- : "+q" (paro_ret__), "+m" (var) \
- : : "memory"); \
- break; \
- case 2: \
- asm qual ("xaddw %0, "__percpu_arg(1) \
- : "+r" (paro_ret__), "+m" (var) \
- : : "memory"); \
- break; \
- case 4: \
- asm qual ("xaddl %0, "__percpu_arg(1) \
- : "+r" (paro_ret__), "+m" (var) \
- : : "memory"); \
- break; \
- case 8: \
- asm qual ("xaddq %0, "__percpu_arg(1) \
- : "+r" (paro_ret__), "+m" (var) \
- : : "memory"); \
- break; \
- default: __bad_percpu_size(); \
- } \
- paro_ret__ += val; \
- paro_ret__; \
+ __pcpu_type_##size paro_tmp__ = __pcpu_cast_##size(_val); \
+ asm qual (__pcpu_op2_##size("xadd", "%[tmp]", \
+ __percpu_arg([var])) \
+ : [tmp] __pcpu_reg_##size("+", paro_tmp__), \
+ [var] "+m" (_var) \
+ : : "memory"); \
+ (typeof(_var))(unsigned long) (paro_tmp__ + _val); \
})

/*
@@ -377,16 +358,16 @@ do { \
#define this_cpu_xchg_2(pcp, nval) percpu_xchg_op(volatile, pcp, nval)
#define this_cpu_xchg_4(pcp, nval) percpu_xchg_op(volatile, pcp, nval)

-#define raw_cpu_add_return_1(pcp, val) percpu_add_return_op(, pcp, val)
-#define raw_cpu_add_return_2(pcp, val) percpu_add_return_op(, pcp, val)
-#define raw_cpu_add_return_4(pcp, val) percpu_add_return_op(, pcp, val)
+#define raw_cpu_add_return_1(pcp, val) percpu_add_return_op(1, , pcp, val)
+#define raw_cpu_add_return_2(pcp, val) percpu_add_return_op(2, , pcp, val)
+#define raw_cpu_add_return_4(pcp, val) percpu_add_return_op(4, , pcp, val)
#define raw_cpu_cmpxchg_1(pcp, oval, nval) percpu_cmpxchg_op(, pcp, oval, nval)
#define raw_cpu_cmpxchg_2(pcp, oval, nval) percpu_cmpxchg_op(, pcp, oval, nval)
#define raw_cpu_cmpxchg_4(pcp, oval, nval) percpu_cmpxchg_op(, pcp, oval, nval)

-#define this_cpu_add_return_1(pcp, val) percpu_add_return_op(volatile, pcp, val)
-#define this_cpu_add_return_2(pcp, val) percpu_add_return_op(volatile, pcp, val)
-#define this_cpu_add_return_4(pcp, val) percpu_add_return_op(volatile, pcp, val)
+#define this_cpu_add_return_1(pcp, val) percpu_add_return_op(1, volatile, pcp, val)
+#define this_cpu_add_return_2(pcp, val) percpu_add_return_op(2, volatile, pcp, val)
+#define this_cpu_add_return_4(pcp, val) percpu_add_return_op(4, volatile, pcp, val)
#define this_cpu_cmpxchg_1(pcp, oval, nval) percpu_cmpxchg_op(volatile, pcp, oval, nval)
#define this_cpu_cmpxchg_2(pcp, oval, nval) percpu_cmpxchg_op(volatile, pcp, oval, nval)
#define this_cpu_cmpxchg_4(pcp, oval, nval) percpu_cmpxchg_op(volatile, pcp, oval, nval)
@@ -418,7 +399,7 @@ do { \
#define raw_cpu_add_8(pcp, val) percpu_add_op(8, , (pcp), val)
#define raw_cpu_and_8(pcp, val) percpu_to_op(8, , "and", (pcp), val)
#define raw_cpu_or_8(pcp, val) percpu_to_op(8, , "or", (pcp), val)
-#define raw_cpu_add_return_8(pcp, val) percpu_add_return_op(, pcp, val)
+#define raw_cpu_add_return_8(pcp, val) percpu_add_return_op(8, , pcp, val)
#define raw_cpu_xchg_8(pcp, nval) raw_percpu_xchg_op(pcp, nval)
#define raw_cpu_cmpxchg_8(pcp, oval, nval) percpu_cmpxchg_op(, pcp, oval, nval)

@@ -427,7 +408,7 @@ do { \
#define this_cpu_add_8(pcp, val) percpu_add_op(8, volatile, (pcp), val)
#define this_cpu_and_8(pcp, val) percpu_to_op(8, volatile, "and", (pcp), val)
#define this_cpu_or_8(pcp, val) percpu_to_op(8, volatile, "or", (pcp), val)
-#define this_cpu_add_return_8(pcp, val) percpu_add_return_op(volatile, pcp, val)
+#define this_cpu_add_return_8(pcp, val) percpu_add_return_op(8, volatile, pcp, val)
#define this_cpu_xchg_8(pcp, nval) percpu_xchg_op(volatile, pcp, nval)
#define this_cpu_cmpxchg_8(pcp, oval, nval) percpu_cmpxchg_op(volatile, pcp, oval, nval)

--
2.25.4

2020-05-30 22:14:19

by Brian Gerst

[permalink] [raw]
Subject: [PATCH v2 08/10] x86/percpu: Clean up percpu_cmpxchg_op()

The core percpu macros already have a switch on the data size, so the switch
in the x86 code is redundant and produces more dead code.

Also use appropriate types for the width of the instructions. This avoids
errors when compiling with Clang.

Signed-off-by: Brian Gerst <[email protected]>
Reviewed-by: Nick Desaulniers <[email protected]>
---
arch/x86/include/asm/percpu.h | 58 +++++++++++------------------------
1 file changed, 18 insertions(+), 40 deletions(-)

diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
index ac6d7e76c0d4..7efc0b5c4ff0 100644
--- a/arch/x86/include/asm/percpu.h
+++ b/arch/x86/include/asm/percpu.h
@@ -236,39 +236,17 @@ do { \
* cmpxchg has no such implied lock semantics as a result it is much
* more efficient for cpu local operations.
*/
-#define percpu_cmpxchg_op(qual, var, oval, nval) \
+#define percpu_cmpxchg_op(size, qual, _var, _oval, _nval) \
({ \
- typeof(var) pco_ret__; \
- typeof(var) pco_old__ = (oval); \
- typeof(var) pco_new__ = (nval); \
- switch (sizeof(var)) { \
- case 1: \
- asm qual ("cmpxchgb %2, "__percpu_arg(1) \
- : "=a" (pco_ret__), "+m" (var) \
- : "q" (pco_new__), "0" (pco_old__) \
- : "memory"); \
- break; \
- case 2: \
- asm qual ("cmpxchgw %2, "__percpu_arg(1) \
- : "=a" (pco_ret__), "+m" (var) \
- : "r" (pco_new__), "0" (pco_old__) \
- : "memory"); \
- break; \
- case 4: \
- asm qual ("cmpxchgl %2, "__percpu_arg(1) \
- : "=a" (pco_ret__), "+m" (var) \
- : "r" (pco_new__), "0" (pco_old__) \
- : "memory"); \
- break; \
- case 8: \
- asm qual ("cmpxchgq %2, "__percpu_arg(1) \
- : "=a" (pco_ret__), "+m" (var) \
- : "r" (pco_new__), "0" (pco_old__) \
- : "memory"); \
- break; \
- default: __bad_percpu_size(); \
- } \
- pco_ret__; \
+ __pcpu_type_##size pco_old__ = __pcpu_cast_##size(_oval); \
+ __pcpu_type_##size pco_new__ = __pcpu_cast_##size(_nval); \
+ asm qual (__pcpu_op2_##size("cmpxchg", "%[nval]", \
+ __percpu_arg([var])) \
+ : [oval] "+a" (pco_old__), \
+ [var] "+m" (_var) \
+ : [nval] __pcpu_reg_##size(, pco_new__) \
+ : "memory"); \
+ (typeof(_var))(unsigned long) pco_old__; \
})

/*
@@ -336,16 +314,16 @@ do { \
#define raw_cpu_add_return_1(pcp, val) percpu_add_return_op(1, , pcp, val)
#define raw_cpu_add_return_2(pcp, val) percpu_add_return_op(2, , pcp, val)
#define raw_cpu_add_return_4(pcp, val) percpu_add_return_op(4, , pcp, val)
-#define raw_cpu_cmpxchg_1(pcp, oval, nval) percpu_cmpxchg_op(, pcp, oval, nval)
-#define raw_cpu_cmpxchg_2(pcp, oval, nval) percpu_cmpxchg_op(, pcp, oval, nval)
-#define raw_cpu_cmpxchg_4(pcp, oval, nval) percpu_cmpxchg_op(, pcp, oval, nval)
+#define raw_cpu_cmpxchg_1(pcp, oval, nval) percpu_cmpxchg_op(1, , pcp, oval, nval)
+#define raw_cpu_cmpxchg_2(pcp, oval, nval) percpu_cmpxchg_op(2, , pcp, oval, nval)
+#define raw_cpu_cmpxchg_4(pcp, oval, nval) percpu_cmpxchg_op(4, , pcp, oval, nval)

#define this_cpu_add_return_1(pcp, val) percpu_add_return_op(1, volatile, pcp, val)
#define this_cpu_add_return_2(pcp, val) percpu_add_return_op(2, volatile, pcp, val)
#define this_cpu_add_return_4(pcp, val) percpu_add_return_op(4, volatile, pcp, val)
-#define this_cpu_cmpxchg_1(pcp, oval, nval) percpu_cmpxchg_op(volatile, pcp, oval, nval)
-#define this_cpu_cmpxchg_2(pcp, oval, nval) percpu_cmpxchg_op(volatile, pcp, oval, nval)
-#define this_cpu_cmpxchg_4(pcp, oval, nval) percpu_cmpxchg_op(volatile, pcp, oval, nval)
+#define this_cpu_cmpxchg_1(pcp, oval, nval) percpu_cmpxchg_op(1, volatile, pcp, oval, nval)
+#define this_cpu_cmpxchg_2(pcp, oval, nval) percpu_cmpxchg_op(2, volatile, pcp, oval, nval)
+#define this_cpu_cmpxchg_4(pcp, oval, nval) percpu_cmpxchg_op(4, volatile, pcp, oval, nval)

#ifdef CONFIG_X86_CMPXCHG64
#define percpu_cmpxchg8b_double(pcp1, pcp2, o1, o2, n1, n2) \
@@ -376,7 +354,7 @@ do { \
#define raw_cpu_or_8(pcp, val) percpu_to_op(8, , "or", (pcp), val)
#define raw_cpu_add_return_8(pcp, val) percpu_add_return_op(8, , pcp, val)
#define raw_cpu_xchg_8(pcp, nval) raw_percpu_xchg_op(pcp, nval)
-#define raw_cpu_cmpxchg_8(pcp, oval, nval) percpu_cmpxchg_op(, pcp, oval, nval)
+#define raw_cpu_cmpxchg_8(pcp, oval, nval) percpu_cmpxchg_op(8, , pcp, oval, nval)

#define this_cpu_read_8(pcp) percpu_from_op(8, volatile, "mov", pcp)
#define this_cpu_write_8(pcp, val) percpu_to_op(8, volatile, "mov", (pcp), val)
@@ -385,7 +363,7 @@ do { \
#define this_cpu_or_8(pcp, val) percpu_to_op(8, volatile, "or", (pcp), val)
#define this_cpu_add_return_8(pcp, val) percpu_add_return_op(8, volatile, pcp, val)
#define this_cpu_xchg_8(pcp, nval) percpu_xchg_op(8, volatile, pcp, nval)
-#define this_cpu_cmpxchg_8(pcp, oval, nval) percpu_cmpxchg_op(volatile, pcp, oval, nval)
+#define this_cpu_cmpxchg_8(pcp, oval, nval) percpu_cmpxchg_op(8, volatile, pcp, oval, nval)

/*
* Pretty complex macro to generate cmpxchg16 instruction. The instruction
--
2.25.4

2020-05-30 22:14:37

by Brian Gerst

[permalink] [raw]
Subject: [PATCH v2 07/10] x86/percpu: Clean up percpu_xchg_op()

The core percpu macros already have a switch on the data size, so the switch
in the x86 code is redundant and produces more dead code.

Also use appropriate types for the width of the instructions. This avoids
errors when compiling with Clang.

Signed-off-by: Brian Gerst <[email protected]>
Reviewed-by: Nick Desaulniers <[email protected]>
---
arch/x86/include/asm/percpu.h | 61 +++++++++++------------------------
1 file changed, 18 insertions(+), 43 deletions(-)

diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
index 0776a11e7e11..ac6d7e76c0d4 100644
--- a/arch/x86/include/asm/percpu.h
+++ b/arch/x86/include/asm/percpu.h
@@ -215,46 +215,21 @@ do { \
* expensive due to the implied lock prefix. The processor cannot prefetch
* cachelines if xchg is used.
*/
-#define percpu_xchg_op(qual, var, nval) \
+#define percpu_xchg_op(size, qual, _var, _nval) \
({ \
- typeof(var) pxo_ret__; \
- typeof(var) pxo_new__ = (nval); \
- switch (sizeof(var)) { \
- case 1: \
- asm qual ("\n\tmov "__percpu_arg(1)",%%al" \
- "\n1:\tcmpxchgb %2, "__percpu_arg(1) \
- "\n\tjnz 1b" \
- : "=&a" (pxo_ret__), "+m" (var) \
- : "q" (pxo_new__) \
- : "memory"); \
- break; \
- case 2: \
- asm qual ("\n\tmov "__percpu_arg(1)",%%ax" \
- "\n1:\tcmpxchgw %2, "__percpu_arg(1) \
- "\n\tjnz 1b" \
- : "=&a" (pxo_ret__), "+m" (var) \
- : "r" (pxo_new__) \
- : "memory"); \
- break; \
- case 4: \
- asm qual ("\n\tmov "__percpu_arg(1)",%%eax" \
- "\n1:\tcmpxchgl %2, "__percpu_arg(1) \
- "\n\tjnz 1b" \
- : "=&a" (pxo_ret__), "+m" (var) \
- : "r" (pxo_new__) \
- : "memory"); \
- break; \
- case 8: \
- asm qual ("\n\tmov "__percpu_arg(1)",%%rax" \
- "\n1:\tcmpxchgq %2, "__percpu_arg(1) \
- "\n\tjnz 1b" \
- : "=&a" (pxo_ret__), "+m" (var) \
- : "r" (pxo_new__) \
- : "memory"); \
- break; \
- default: __bad_percpu_size(); \
- } \
- pxo_ret__; \
+ __pcpu_type_##size pxo_old__; \
+ __pcpu_type_##size pxo_new__ = __pcpu_cast_##size(_nval); \
+ asm qual (__pcpu_op2_##size("mov", __percpu_arg([var]), \
+ "%[oval]") \
+ "\n1:\t" \
+ __pcpu_op2_##size("cmpxchg", "%[nval]", \
+ __percpu_arg([var])) \
+ "\n\tjnz 1b" \
+ : [oval] "=&a" (pxo_old__), \
+ [var] "+m" (_var) \
+ : [nval] __pcpu_reg_##size(, pxo_new__) \
+ : "memory"); \
+ (typeof(_var))(unsigned long) pxo_old__; \
})

/*
@@ -354,9 +329,9 @@ do { \
#define this_cpu_or_1(pcp, val) percpu_to_op(1, volatile, "or", (pcp), val)
#define this_cpu_or_2(pcp, val) percpu_to_op(2, volatile, "or", (pcp), val)
#define this_cpu_or_4(pcp, val) percpu_to_op(4, volatile, "or", (pcp), val)
-#define this_cpu_xchg_1(pcp, nval) percpu_xchg_op(volatile, pcp, nval)
-#define this_cpu_xchg_2(pcp, nval) percpu_xchg_op(volatile, pcp, nval)
-#define this_cpu_xchg_4(pcp, nval) percpu_xchg_op(volatile, pcp, nval)
+#define this_cpu_xchg_1(pcp, nval) percpu_xchg_op(1, volatile, pcp, nval)
+#define this_cpu_xchg_2(pcp, nval) percpu_xchg_op(2, volatile, pcp, nval)
+#define this_cpu_xchg_4(pcp, nval) percpu_xchg_op(4, volatile, pcp, nval)

#define raw_cpu_add_return_1(pcp, val) percpu_add_return_op(1, , pcp, val)
#define raw_cpu_add_return_2(pcp, val) percpu_add_return_op(2, , pcp, val)
@@ -409,7 +384,7 @@ do { \
#define this_cpu_and_8(pcp, val) percpu_to_op(8, volatile, "and", (pcp), val)
#define this_cpu_or_8(pcp, val) percpu_to_op(8, volatile, "or", (pcp), val)
#define this_cpu_add_return_8(pcp, val) percpu_add_return_op(8, volatile, pcp, val)
-#define this_cpu_xchg_8(pcp, nval) percpu_xchg_op(volatile, pcp, nval)
+#define this_cpu_xchg_8(pcp, nval) percpu_xchg_op(8, volatile, pcp, nval)
#define this_cpu_cmpxchg_8(pcp, oval, nval) percpu_cmpxchg_op(volatile, pcp, oval, nval)

/*
--
2.25.4

2020-05-30 22:14:59

by Brian Gerst

[permalink] [raw]
Subject: [PATCH v2 10/10] x86/percpu: Remove unused PER_CPU() macro

Also remove now unused __percpu_mov_op.

Signed-off-by: Brian Gerst <[email protected]>
---
arch/x86/include/asm/percpu.h | 18 ------------------
1 file changed, 18 deletions(-)

diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
index cf2b9c2a241e..a3c33b79fb86 100644
--- a/arch/x86/include/asm/percpu.h
+++ b/arch/x86/include/asm/percpu.h
@@ -4,33 +4,15 @@

#ifdef CONFIG_X86_64
#define __percpu_seg gs
-#define __percpu_mov_op movq
#else
#define __percpu_seg fs
-#define __percpu_mov_op movl
#endif

#ifdef __ASSEMBLY__

-/*
- * PER_CPU finds an address of a per-cpu variable.
- *
- * Args:
- * var - variable name
- * reg - 32bit register
- *
- * The resulting address is stored in the "reg" argument.
- *
- * Example:
- * PER_CPU(cpu_gdt_descr, %ebx)
- */
#ifdef CONFIG_SMP
-#define PER_CPU(var, reg) \
- __percpu_mov_op %__percpu_seg:this_cpu_off, reg; \
- lea var(reg), reg
#define PER_CPU_VAR(var) %__percpu_seg:var
#else /* ! SMP */
-#define PER_CPU(var, reg) __percpu_mov_op $var, reg
#define PER_CPU_VAR(var) var
#endif /* SMP */

--
2.25.4

2020-05-30 22:14:59

by Brian Gerst

[permalink] [raw]
Subject: [PATCH v2 03/10] x86/percpu: Clean up percpu_from_op()

The core percpu macros already have a switch on the data size, so the switch
in the x86 code is redundant and produces more dead code.

Also use appropriate types for the width of the instructions. This avoids
errors when compiling with Clang.

Signed-off-by: Brian Gerst <[email protected]>
Reviewed-by: Nick Desaulniers <[email protected]>
---
arch/x86/include/asm/percpu.h | 50 +++++++++++------------------------
1 file changed, 15 insertions(+), 35 deletions(-)

diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
index fb280fba94c5..a40d2e055f58 100644
--- a/arch/x86/include/asm/percpu.h
+++ b/arch/x86/include/asm/percpu.h
@@ -190,33 +190,13 @@ do { \
} \
} while (0)

-#define percpu_from_op(qual, op, var) \
-({ \
- typeof(var) pfo_ret__; \
- switch (sizeof(var)) { \
- case 1: \
- asm qual (op "b "__percpu_arg(1)",%0" \
- : "=q" (pfo_ret__) \
- : "m" (var)); \
- break; \
- case 2: \
- asm qual (op "w "__percpu_arg(1)",%0" \
- : "=r" (pfo_ret__) \
- : "m" (var)); \
- break; \
- case 4: \
- asm qual (op "l "__percpu_arg(1)",%0" \
- : "=r" (pfo_ret__) \
- : "m" (var)); \
- break; \
- case 8: \
- asm qual (op "q "__percpu_arg(1)",%0" \
- : "=r" (pfo_ret__) \
- : "m" (var)); \
- break; \
- default: __bad_percpu_size(); \
- } \
- pfo_ret__; \
+#define percpu_from_op(size, qual, op, _var) \
+({ \
+ __pcpu_type_##size pfo_val__; \
+ asm qual (__pcpu_op2_##size(op, __percpu_arg([var]), "%[val]") \
+ : [val] __pcpu_reg_##size("=", pfo_val__) \
+ : [var] "m" (_var)); \
+ (typeof(_var))(unsigned long) pfo_val__; \
})

#define percpu_stable_op(op, var) \
@@ -401,9 +381,9 @@ do { \
*/
#define this_cpu_read_stable(var) percpu_stable_op("mov", var)

-#define raw_cpu_read_1(pcp) percpu_from_op(, "mov", pcp)
-#define raw_cpu_read_2(pcp) percpu_from_op(, "mov", pcp)
-#define raw_cpu_read_4(pcp) percpu_from_op(, "mov", pcp)
+#define raw_cpu_read_1(pcp) percpu_from_op(1, , "mov", pcp)
+#define raw_cpu_read_2(pcp) percpu_from_op(2, , "mov", pcp)
+#define raw_cpu_read_4(pcp) percpu_from_op(4, , "mov", pcp)

#define raw_cpu_write_1(pcp, val) percpu_to_op(1, , "mov", (pcp), val)
#define raw_cpu_write_2(pcp, val) percpu_to_op(2, , "mov", (pcp), val)
@@ -433,9 +413,9 @@ do { \
#define raw_cpu_xchg_2(pcp, val) raw_percpu_xchg_op(pcp, val)
#define raw_cpu_xchg_4(pcp, val) raw_percpu_xchg_op(pcp, val)

-#define this_cpu_read_1(pcp) percpu_from_op(volatile, "mov", pcp)
-#define this_cpu_read_2(pcp) percpu_from_op(volatile, "mov", pcp)
-#define this_cpu_read_4(pcp) percpu_from_op(volatile, "mov", pcp)
+#define this_cpu_read_1(pcp) percpu_from_op(1, volatile, "mov", pcp)
+#define this_cpu_read_2(pcp) percpu_from_op(2, volatile, "mov", pcp)
+#define this_cpu_read_4(pcp) percpu_from_op(4, volatile, "mov", pcp)
#define this_cpu_write_1(pcp, val) percpu_to_op(1, volatile, "mov", (pcp), val)
#define this_cpu_write_2(pcp, val) percpu_to_op(2, volatile, "mov", (pcp), val)
#define this_cpu_write_4(pcp, val) percpu_to_op(4, volatile, "mov", (pcp), val)
@@ -488,7 +468,7 @@ do { \
* 32 bit must fall back to generic operations.
*/
#ifdef CONFIG_X86_64
-#define raw_cpu_read_8(pcp) percpu_from_op(, "mov", pcp)
+#define raw_cpu_read_8(pcp) percpu_from_op(8, , "mov", pcp)
#define raw_cpu_write_8(pcp, val) percpu_to_op(8, , "mov", (pcp), val)
#define raw_cpu_add_8(pcp, val) percpu_add_op(, (pcp), val)
#define raw_cpu_and_8(pcp, val) percpu_to_op(8, , "and", (pcp), val)
@@ -497,7 +477,7 @@ do { \
#define raw_cpu_xchg_8(pcp, nval) raw_percpu_xchg_op(pcp, nval)
#define raw_cpu_cmpxchg_8(pcp, oval, nval) percpu_cmpxchg_op(, pcp, oval, nval)

-#define this_cpu_read_8(pcp) percpu_from_op(volatile, "mov", pcp)
+#define this_cpu_read_8(pcp) percpu_from_op(8, volatile, "mov", pcp)
#define this_cpu_write_8(pcp, val) percpu_to_op(8, volatile, "mov", (pcp), val)
#define this_cpu_add_8(pcp, val) percpu_add_op(volatile, (pcp), val)
#define this_cpu_and_8(pcp, val) percpu_to_op(8, volatile, "and", (pcp), val)
--
2.25.4

2020-05-30 22:16:11

by Brian Gerst

[permalink] [raw]
Subject: [PATCH v2 09/10] x86/percpu: Clean up percpu_stable_op()

Use __pcpu_size_call_return() to simplify this_cpu_read_stable().
Also remove __bad_percpu_size() which is now unused.

Signed-off-by: Brian Gerst <[email protected]>
---
arch/x86/include/asm/percpu.h | 41 ++++++++++-------------------------
1 file changed, 12 insertions(+), 29 deletions(-)

diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
index 7efc0b5c4ff0..cf2b9c2a241e 100644
--- a/arch/x86/include/asm/percpu.h
+++ b/arch/x86/include/asm/percpu.h
@@ -85,7 +85,6 @@

/* For arch-specific code, we can use direct single-insn ops (they
* don't give an lvalue though). */
-extern void __bad_percpu_size(void);

#define __pcpu_type_1 u8
#define __pcpu_type_2 u16
@@ -167,33 +166,13 @@ do { \
(typeof(_var))(unsigned long) pfo_val__; \
})

-#define percpu_stable_op(op, var) \
-({ \
- typeof(var) pfo_ret__; \
- switch (sizeof(var)) { \
- case 1: \
- asm(op "b "__percpu_arg(P1)",%0" \
- : "=q" (pfo_ret__) \
- : "p" (&(var))); \
- break; \
- case 2: \
- asm(op "w "__percpu_arg(P1)",%0" \
- : "=r" (pfo_ret__) \
- : "p" (&(var))); \
- break; \
- case 4: \
- asm(op "l "__percpu_arg(P1)",%0" \
- : "=r" (pfo_ret__) \
- : "p" (&(var))); \
- break; \
- case 8: \
- asm(op "q "__percpu_arg(P1)",%0" \
- : "=r" (pfo_ret__) \
- : "p" (&(var))); \
- break; \
- default: __bad_percpu_size(); \
- } \
- pfo_ret__; \
+#define percpu_stable_op(size, op, _var) \
+({ \
+ __pcpu_type_##size pfo_val__; \
+ asm(__pcpu_op2_##size(op, __percpu_arg(P[var]), "%[val]") \
+ : [val] __pcpu_reg_##size("=", pfo_val__) \
+ : [var] "p" (&(_var))); \
+ (typeof(_var))(unsigned long) pfo_val__; \
})

/*
@@ -258,7 +237,11 @@ do { \
* per-thread variables implemented as per-cpu variables and thus
* stable for the duration of the respective task.
*/
-#define this_cpu_read_stable(var) percpu_stable_op("mov", var)
+#define this_cpu_read_stable_1(pcp) percpu_stable_op(1, "mov", pcp)
+#define this_cpu_read_stable_2(pcp) percpu_stable_op(2, "mov", pcp)
+#define this_cpu_read_stable_4(pcp) percpu_stable_op(4, "mov", pcp)
+#define this_cpu_read_stable_8(pcp) percpu_stable_op(8, "mov", pcp)
+#define this_cpu_read_stable(pcp) __pcpu_size_call_return(this_cpu_read_stable_, pcp)

#define raw_cpu_read_1(pcp) percpu_from_op(1, , "mov", pcp)
#define raw_cpu_read_2(pcp) percpu_from_op(2, , "mov", pcp)
--
2.25.4

2020-05-30 22:17:14

by Brian Gerst

[permalink] [raw]
Subject: [PATCH v2 04/10] x86/percpu: Clean up percpu_add_op()

The core percpu macros already have a switch on the data size, so the switch
in the x86 code is redundant and produces more dead code.

Also use appropriate types for the width of the instructions. This avoids
errors when compiling with Clang.

Signed-off-by: Brian Gerst <[email protected]>
---
arch/x86/include/asm/percpu.h | 99 ++++++++---------------------------
1 file changed, 22 insertions(+), 77 deletions(-)

diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
index a40d2e055f58..2a24f3c795eb 100644
--- a/arch/x86/include/asm/percpu.h
+++ b/arch/x86/include/asm/percpu.h
@@ -130,64 +130,32 @@ do { \
: [val] __pcpu_reg_imm_##size(pto_val__)); \
} while (0)

+#define percpu_unary_op(size, qual, op, _var) \
+({ \
+ asm qual (__pcpu_op1_##size(op, __percpu_arg([var])) \
+ : [var] "+m" (_var)); \
+})
+
/*
* Generate a percpu add to memory instruction and optimize code
* if one is added or subtracted.
*/
-#define percpu_add_op(qual, var, val) \
+#define percpu_add_op(size, qual, var, val) \
do { \
- typedef typeof(var) pao_T__; \
const int pao_ID__ = (__builtin_constant_p(val) && \
((val) == 1 || (val) == -1)) ? \
(int)(val) : 0; \
if (0) { \
- pao_T__ pao_tmp__; \
+ typeof(var) pao_tmp__; \
pao_tmp__ = (val); \
(void)pao_tmp__; \
} \
- switch (sizeof(var)) { \
- case 1: \
- if (pao_ID__ == 1) \
- asm qual ("incb "__percpu_arg(0) : "+m" (var)); \
- else if (pao_ID__ == -1) \
- asm qual ("decb "__percpu_arg(0) : "+m" (var)); \
- else \
- asm qual ("addb %1, "__percpu_arg(0) \
- : "+m" (var) \
- : "qi" ((pao_T__)(val))); \
- break; \
- case 2: \
- if (pao_ID__ == 1) \
- asm qual ("incw "__percpu_arg(0) : "+m" (var)); \
- else if (pao_ID__ == -1) \
- asm qual ("decw "__percpu_arg(0) : "+m" (var)); \
- else \
- asm qual ("addw %1, "__percpu_arg(0) \
- : "+m" (var) \
- : "ri" ((pao_T__)(val))); \
- break; \
- case 4: \
- if (pao_ID__ == 1) \
- asm qual ("incl "__percpu_arg(0) : "+m" (var)); \
- else if (pao_ID__ == -1) \
- asm qual ("decl "__percpu_arg(0) : "+m" (var)); \
- else \
- asm qual ("addl %1, "__percpu_arg(0) \
- : "+m" (var) \
- : "ri" ((pao_T__)(val))); \
- break; \
- case 8: \
- if (pao_ID__ == 1) \
- asm qual ("incq "__percpu_arg(0) : "+m" (var)); \
- else if (pao_ID__ == -1) \
- asm qual ("decq "__percpu_arg(0) : "+m" (var)); \
- else \
- asm qual ("addq %1, "__percpu_arg(0) \
- : "+m" (var) \
- : "re" ((pao_T__)(val))); \
- break; \
- default: __bad_percpu_size(); \
- } \
+ if (pao_ID__ == 1) \
+ percpu_unary_op(size, qual, "inc", var); \
+ else if (pao_ID__ == -1) \
+ percpu_unary_op(size, qual, "dec", var); \
+ else \
+ percpu_to_op(size, qual, "add", var, val); \
} while (0)

#define percpu_from_op(size, qual, op, _var) \
@@ -228,29 +196,6 @@ do { \
pfo_ret__; \
})

-#define percpu_unary_op(qual, op, var) \
-({ \
- switch (sizeof(var)) { \
- case 1: \
- asm qual (op "b "__percpu_arg(0) \
- : "+m" (var)); \
- break; \
- case 2: \
- asm qual (op "w "__percpu_arg(0) \
- : "+m" (var)); \
- break; \
- case 4: \
- asm qual (op "l "__percpu_arg(0) \
- : "+m" (var)); \
- break; \
- case 8: \
- asm qual (op "q "__percpu_arg(0) \
- : "+m" (var)); \
- break; \
- default: __bad_percpu_size(); \
- } \
-})
-
/*
* Add return operation
*/
@@ -388,9 +333,9 @@ do { \
#define raw_cpu_write_1(pcp, val) percpu_to_op(1, , "mov", (pcp), val)
#define raw_cpu_write_2(pcp, val) percpu_to_op(2, , "mov", (pcp), val)
#define raw_cpu_write_4(pcp, val) percpu_to_op(4, , "mov", (pcp), val)
-#define raw_cpu_add_1(pcp, val) percpu_add_op(, (pcp), val)
-#define raw_cpu_add_2(pcp, val) percpu_add_op(, (pcp), val)
-#define raw_cpu_add_4(pcp, val) percpu_add_op(, (pcp), val)
+#define raw_cpu_add_1(pcp, val) percpu_add_op(1, , (pcp), val)
+#define raw_cpu_add_2(pcp, val) percpu_add_op(2, , (pcp), val)
+#define raw_cpu_add_4(pcp, val) percpu_add_op(4, , (pcp), val)
#define raw_cpu_and_1(pcp, val) percpu_to_op(1, , "and", (pcp), val)
#define raw_cpu_and_2(pcp, val) percpu_to_op(2, , "and", (pcp), val)
#define raw_cpu_and_4(pcp, val) percpu_to_op(4, , "and", (pcp), val)
@@ -419,9 +364,9 @@ do { \
#define this_cpu_write_1(pcp, val) percpu_to_op(1, volatile, "mov", (pcp), val)
#define this_cpu_write_2(pcp, val) percpu_to_op(2, volatile, "mov", (pcp), val)
#define this_cpu_write_4(pcp, val) percpu_to_op(4, volatile, "mov", (pcp), val)
-#define this_cpu_add_1(pcp, val) percpu_add_op(volatile, (pcp), val)
-#define this_cpu_add_2(pcp, val) percpu_add_op(volatile, (pcp), val)
-#define this_cpu_add_4(pcp, val) percpu_add_op(volatile, (pcp), val)
+#define this_cpu_add_1(pcp, val) percpu_add_op(1, volatile, (pcp), val)
+#define this_cpu_add_2(pcp, val) percpu_add_op(2, volatile, (pcp), val)
+#define this_cpu_add_4(pcp, val) percpu_add_op(4, volatile, (pcp), val)
#define this_cpu_and_1(pcp, val) percpu_to_op(1, volatile, "and", (pcp), val)
#define this_cpu_and_2(pcp, val) percpu_to_op(2, volatile, "and", (pcp), val)
#define this_cpu_and_4(pcp, val) percpu_to_op(4, volatile, "and", (pcp), val)
@@ -470,7 +415,7 @@ do { \
#ifdef CONFIG_X86_64
#define raw_cpu_read_8(pcp) percpu_from_op(8, , "mov", pcp)
#define raw_cpu_write_8(pcp, val) percpu_to_op(8, , "mov", (pcp), val)
-#define raw_cpu_add_8(pcp, val) percpu_add_op(, (pcp), val)
+#define raw_cpu_add_8(pcp, val) percpu_add_op(8, , (pcp), val)
#define raw_cpu_and_8(pcp, val) percpu_to_op(8, , "and", (pcp), val)
#define raw_cpu_or_8(pcp, val) percpu_to_op(8, , "or", (pcp), val)
#define raw_cpu_add_return_8(pcp, val) percpu_add_return_op(, pcp, val)
@@ -479,7 +424,7 @@ do { \

#define this_cpu_read_8(pcp) percpu_from_op(8, volatile, "mov", pcp)
#define this_cpu_write_8(pcp, val) percpu_to_op(8, volatile, "mov", (pcp), val)
-#define this_cpu_add_8(pcp, val) percpu_add_op(volatile, (pcp), val)
+#define this_cpu_add_8(pcp, val) percpu_add_op(8, volatile, (pcp), val)
#define this_cpu_and_8(pcp, val) percpu_to_op(8, volatile, "and", (pcp), val)
#define this_cpu_or_8(pcp, val) percpu_to_op(8, volatile, "or", (pcp), val)
#define this_cpu_add_return_8(pcp, val) percpu_add_return_op(volatile, pcp, val)
--
2.25.4

2020-05-30 22:17:19

by Brian Gerst

[permalink] [raw]
Subject: [PATCH v2 05/10] x86/percpu: Remove "e" constraint from XADD

The "e" constraint represents a constant, but the XADD instruction doesn't
accept immediate operands.

Signed-off-by: Brian Gerst <[email protected]>
---
arch/x86/include/asm/percpu.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
index 2a24f3c795eb..9bb5440d98d3 100644
--- a/arch/x86/include/asm/percpu.h
+++ b/arch/x86/include/asm/percpu.h
@@ -220,7 +220,7 @@ do { \
break; \
case 8: \
asm qual ("xaddq %0, "__percpu_arg(1) \
- : "+re" (paro_ret__), "+m" (var) \
+ : "+r" (paro_ret__), "+m" (var) \
: : "memory"); \
break; \
default: __bad_percpu_size(); \
--
2.25.4

2020-05-30 22:17:44

by Brian Gerst

[permalink] [raw]
Subject: [PATCH v2 02/10] x86/percpu: Clean up percpu_to_op()

The core percpu macros already have a switch on the data size, so the switch
in the x86 code is redundant and produces more dead code.

Also use appropriate types for the width of the instructions. This avoids
errors when compiling with Clang.

Signed-off-by: Brian Gerst <[email protected]>
Reviewed-by: Nick Desaulniers <[email protected]>
---
arch/x86/include/asm/percpu.h | 90 ++++++++++++++---------------------
1 file changed, 35 insertions(+), 55 deletions(-)

diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
index 19838e4f7a8f..fb280fba94c5 100644
--- a/arch/x86/include/asm/percpu.h
+++ b/arch/x86/include/asm/percpu.h
@@ -117,37 +117,17 @@ extern void __bad_percpu_size(void);
#define __pcpu_reg_imm_4(x) "ri" (x)
#define __pcpu_reg_imm_8(x) "re" (x)

-#define percpu_to_op(qual, op, var, val) \
-do { \
- typedef typeof(var) pto_T__; \
- if (0) { \
- pto_T__ pto_tmp__; \
- pto_tmp__ = (val); \
- (void)pto_tmp__; \
- } \
- switch (sizeof(var)) { \
- case 1: \
- asm qual (op "b %1,"__percpu_arg(0) \
- : "+m" (var) \
- : "qi" ((pto_T__)(val))); \
- break; \
- case 2: \
- asm qual (op "w %1,"__percpu_arg(0) \
- : "+m" (var) \
- : "ri" ((pto_T__)(val))); \
- break; \
- case 4: \
- asm qual (op "l %1,"__percpu_arg(0) \
- : "+m" (var) \
- : "ri" ((pto_T__)(val))); \
- break; \
- case 8: \
- asm qual (op "q %1,"__percpu_arg(0) \
- : "+m" (var) \
- : "re" ((pto_T__)(val))); \
- break; \
- default: __bad_percpu_size(); \
- } \
+#define percpu_to_op(size, qual, op, _var, _val) \
+do { \
+ __pcpu_type_##size pto_val__ = __pcpu_cast_##size(_val); \
+ if (0) { \
+ typeof(_var) pto_tmp__; \
+ pto_tmp__ = (_val); \
+ (void)pto_tmp__; \
+ } \
+ asm qual(__pcpu_op2_##size(op, "%[val]", __percpu_arg([var])) \
+ : [var] "+m" (_var) \
+ : [val] __pcpu_reg_imm_##size(pto_val__)); \
} while (0)

/*
@@ -425,18 +405,18 @@ do { \
#define raw_cpu_read_2(pcp) percpu_from_op(, "mov", pcp)
#define raw_cpu_read_4(pcp) percpu_from_op(, "mov", pcp)

-#define raw_cpu_write_1(pcp, val) percpu_to_op(, "mov", (pcp), val)
-#define raw_cpu_write_2(pcp, val) percpu_to_op(, "mov", (pcp), val)
-#define raw_cpu_write_4(pcp, val) percpu_to_op(, "mov", (pcp), val)
+#define raw_cpu_write_1(pcp, val) percpu_to_op(1, , "mov", (pcp), val)
+#define raw_cpu_write_2(pcp, val) percpu_to_op(2, , "mov", (pcp), val)
+#define raw_cpu_write_4(pcp, val) percpu_to_op(4, , "mov", (pcp), val)
#define raw_cpu_add_1(pcp, val) percpu_add_op(, (pcp), val)
#define raw_cpu_add_2(pcp, val) percpu_add_op(, (pcp), val)
#define raw_cpu_add_4(pcp, val) percpu_add_op(, (pcp), val)
-#define raw_cpu_and_1(pcp, val) percpu_to_op(, "and", (pcp), val)
-#define raw_cpu_and_2(pcp, val) percpu_to_op(, "and", (pcp), val)
-#define raw_cpu_and_4(pcp, val) percpu_to_op(, "and", (pcp), val)
-#define raw_cpu_or_1(pcp, val) percpu_to_op(, "or", (pcp), val)
-#define raw_cpu_or_2(pcp, val) percpu_to_op(, "or", (pcp), val)
-#define raw_cpu_or_4(pcp, val) percpu_to_op(, "or", (pcp), val)
+#define raw_cpu_and_1(pcp, val) percpu_to_op(1, , "and", (pcp), val)
+#define raw_cpu_and_2(pcp, val) percpu_to_op(2, , "and", (pcp), val)
+#define raw_cpu_and_4(pcp, val) percpu_to_op(4, , "and", (pcp), val)
+#define raw_cpu_or_1(pcp, val) percpu_to_op(1, , "or", (pcp), val)
+#define raw_cpu_or_2(pcp, val) percpu_to_op(2, , "or", (pcp), val)
+#define raw_cpu_or_4(pcp, val) percpu_to_op(4, , "or", (pcp), val)

/*
* raw_cpu_xchg() can use a load-store since it is not required to be
@@ -456,18 +436,18 @@ do { \
#define this_cpu_read_1(pcp) percpu_from_op(volatile, "mov", pcp)
#define this_cpu_read_2(pcp) percpu_from_op(volatile, "mov", pcp)
#define this_cpu_read_4(pcp) percpu_from_op(volatile, "mov", pcp)
-#define this_cpu_write_1(pcp, val) percpu_to_op(volatile, "mov", (pcp), val)
-#define this_cpu_write_2(pcp, val) percpu_to_op(volatile, "mov", (pcp), val)
-#define this_cpu_write_4(pcp, val) percpu_to_op(volatile, "mov", (pcp), val)
+#define this_cpu_write_1(pcp, val) percpu_to_op(1, volatile, "mov", (pcp), val)
+#define this_cpu_write_2(pcp, val) percpu_to_op(2, volatile, "mov", (pcp), val)
+#define this_cpu_write_4(pcp, val) percpu_to_op(4, volatile, "mov", (pcp), val)
#define this_cpu_add_1(pcp, val) percpu_add_op(volatile, (pcp), val)
#define this_cpu_add_2(pcp, val) percpu_add_op(volatile, (pcp), val)
#define this_cpu_add_4(pcp, val) percpu_add_op(volatile, (pcp), val)
-#define this_cpu_and_1(pcp, val) percpu_to_op(volatile, "and", (pcp), val)
-#define this_cpu_and_2(pcp, val) percpu_to_op(volatile, "and", (pcp), val)
-#define this_cpu_and_4(pcp, val) percpu_to_op(volatile, "and", (pcp), val)
-#define this_cpu_or_1(pcp, val) percpu_to_op(volatile, "or", (pcp), val)
-#define this_cpu_or_2(pcp, val) percpu_to_op(volatile, "or", (pcp), val)
-#define this_cpu_or_4(pcp, val) percpu_to_op(volatile, "or", (pcp), val)
+#define this_cpu_and_1(pcp, val) percpu_to_op(1, volatile, "and", (pcp), val)
+#define this_cpu_and_2(pcp, val) percpu_to_op(2, volatile, "and", (pcp), val)
+#define this_cpu_and_4(pcp, val) percpu_to_op(4, volatile, "and", (pcp), val)
+#define this_cpu_or_1(pcp, val) percpu_to_op(1, volatile, "or", (pcp), val)
+#define this_cpu_or_2(pcp, val) percpu_to_op(2, volatile, "or", (pcp), val)
+#define this_cpu_or_4(pcp, val) percpu_to_op(4, volatile, "or", (pcp), val)
#define this_cpu_xchg_1(pcp, nval) percpu_xchg_op(volatile, pcp, nval)
#define this_cpu_xchg_2(pcp, nval) percpu_xchg_op(volatile, pcp, nval)
#define this_cpu_xchg_4(pcp, nval) percpu_xchg_op(volatile, pcp, nval)
@@ -509,19 +489,19 @@ do { \
*/
#ifdef CONFIG_X86_64
#define raw_cpu_read_8(pcp) percpu_from_op(, "mov", pcp)
-#define raw_cpu_write_8(pcp, val) percpu_to_op(, "mov", (pcp), val)
+#define raw_cpu_write_8(pcp, val) percpu_to_op(8, , "mov", (pcp), val)
#define raw_cpu_add_8(pcp, val) percpu_add_op(, (pcp), val)
-#define raw_cpu_and_8(pcp, val) percpu_to_op(, "and", (pcp), val)
-#define raw_cpu_or_8(pcp, val) percpu_to_op(, "or", (pcp), val)
+#define raw_cpu_and_8(pcp, val) percpu_to_op(8, , "and", (pcp), val)
+#define raw_cpu_or_8(pcp, val) percpu_to_op(8, , "or", (pcp), val)
#define raw_cpu_add_return_8(pcp, val) percpu_add_return_op(, pcp, val)
#define raw_cpu_xchg_8(pcp, nval) raw_percpu_xchg_op(pcp, nval)
#define raw_cpu_cmpxchg_8(pcp, oval, nval) percpu_cmpxchg_op(, pcp, oval, nval)

#define this_cpu_read_8(pcp) percpu_from_op(volatile, "mov", pcp)
-#define this_cpu_write_8(pcp, val) percpu_to_op(volatile, "mov", (pcp), val)
+#define this_cpu_write_8(pcp, val) percpu_to_op(8, volatile, "mov", (pcp), val)
#define this_cpu_add_8(pcp, val) percpu_add_op(volatile, (pcp), val)
-#define this_cpu_and_8(pcp, val) percpu_to_op(volatile, "and", (pcp), val)
-#define this_cpu_or_8(pcp, val) percpu_to_op(volatile, "or", (pcp), val)
+#define this_cpu_and_8(pcp, val) percpu_to_op(8, volatile, "and", (pcp), val)
+#define this_cpu_or_8(pcp, val) percpu_to_op(8, volatile, "or", (pcp), val)
#define this_cpu_add_return_8(pcp, val) percpu_add_return_op(volatile, pcp, val)
#define this_cpu_xchg_8(pcp, nval) percpu_xchg_op(volatile, pcp, nval)
#define this_cpu_cmpxchg_8(pcp, oval, nval) percpu_cmpxchg_op(volatile, pcp, oval, nval)
--
2.25.4

2020-06-01 18:54:44

by Nick Desaulniers

[permalink] [raw]
Subject: Re: [PATCH v2 05/10] x86/percpu: Remove "e" constraint from XADD

On Sat, May 30, 2020 at 3:11 PM Brian Gerst <[email protected]> wrote:
>
> The "e" constraint represents a constant, but the XADD instruction doesn't
> accept immediate operands.
>
> Signed-off-by: Brian Gerst <[email protected]>

Yep, as we discussed in v1.
Reviewed-by: Nick Desaulniers <[email protected]>

> ---
> arch/x86/include/asm/percpu.h | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
> index 2a24f3c795eb..9bb5440d98d3 100644
> --- a/arch/x86/include/asm/percpu.h
> +++ b/arch/x86/include/asm/percpu.h
> @@ -220,7 +220,7 @@ do { \
> break; \
> case 8: \
> asm qual ("xaddq %0, "__percpu_arg(1) \
> - : "+re" (paro_ret__), "+m" (var) \
> + : "+r" (paro_ret__), "+m" (var) \
> : : "memory"); \
> break; \
> default: __bad_percpu_size(); \
> --
> 2.25.4
>


--
Thanks,
~Nick Desaulniers

2020-06-01 19:50:57

by Nick Desaulniers

[permalink] [raw]
Subject: Re: [PATCH v2 06/10] x86/percpu: Clean up percpu_add_return_op()

On Sat, May 30, 2020 at 3:11 PM Brian Gerst <[email protected]> wrote:
>
> The core percpu macros already have a switch on the data size, so the switch
> in the x86 code is redundant and produces more dead code.
>
> Also use appropriate types for the width of the instructions. This avoids
> errors when compiling with Clang.
>
> Signed-off-by: Brian Gerst <[email protected]>

I think it would have been ok to carry forward my reviewed by tag here
from v1, hidden at the bottom of
https://lore.kernel.org/lkml/CAKwvOdn7yC1GVA+6gtNewBSq2BK09y9iNWhv1dPFF5i4kT1+6A@mail.gmail.com/
even though you split removing 'e' into it's own patch.

Reviewed-by: Nick Desaulniers <[email protected]>

> ---
> arch/x86/include/asm/percpu.h | 51 +++++++++++------------------------
> 1 file changed, 16 insertions(+), 35 deletions(-)
>
> diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
> index 9bb5440d98d3..0776a11e7e11 100644
> --- a/arch/x86/include/asm/percpu.h
> +++ b/arch/x86/include/asm/percpu.h
> @@ -199,34 +199,15 @@ do { \
> /*
> * Add return operation
> */
> -#define percpu_add_return_op(qual, var, val) \
> +#define percpu_add_return_op(size, qual, _var, _val) \
> ({ \
> - typeof(var) paro_ret__ = val; \
> - switch (sizeof(var)) { \
> - case 1: \
> - asm qual ("xaddb %0, "__percpu_arg(1) \
> - : "+q" (paro_ret__), "+m" (var) \
> - : : "memory"); \
> - break; \
> - case 2: \
> - asm qual ("xaddw %0, "__percpu_arg(1) \
> - : "+r" (paro_ret__), "+m" (var) \
> - : : "memory"); \
> - break; \
> - case 4: \
> - asm qual ("xaddl %0, "__percpu_arg(1) \
> - : "+r" (paro_ret__), "+m" (var) \
> - : : "memory"); \
> - break; \
> - case 8: \
> - asm qual ("xaddq %0, "__percpu_arg(1) \
> - : "+r" (paro_ret__), "+m" (var) \
> - : : "memory"); \
> - break; \
> - default: __bad_percpu_size(); \
> - } \
> - paro_ret__ += val; \
> - paro_ret__; \
> + __pcpu_type_##size paro_tmp__ = __pcpu_cast_##size(_val); \
> + asm qual (__pcpu_op2_##size("xadd", "%[tmp]", \
> + __percpu_arg([var])) \
> + : [tmp] __pcpu_reg_##size("+", paro_tmp__), \
> + [var] "+m" (_var) \
> + : : "memory"); \
> + (typeof(_var))(unsigned long) (paro_tmp__ + _val); \
> })
>
> /*
> @@ -377,16 +358,16 @@ do { \
> #define this_cpu_xchg_2(pcp, nval) percpu_xchg_op(volatile, pcp, nval)
> #define this_cpu_xchg_4(pcp, nval) percpu_xchg_op(volatile, pcp, nval)
>
> -#define raw_cpu_add_return_1(pcp, val) percpu_add_return_op(, pcp, val)
> -#define raw_cpu_add_return_2(pcp, val) percpu_add_return_op(, pcp, val)
> -#define raw_cpu_add_return_4(pcp, val) percpu_add_return_op(, pcp, val)
> +#define raw_cpu_add_return_1(pcp, val) percpu_add_return_op(1, , pcp, val)
> +#define raw_cpu_add_return_2(pcp, val) percpu_add_return_op(2, , pcp, val)
> +#define raw_cpu_add_return_4(pcp, val) percpu_add_return_op(4, , pcp, val)
> #define raw_cpu_cmpxchg_1(pcp, oval, nval) percpu_cmpxchg_op(, pcp, oval, nval)
> #define raw_cpu_cmpxchg_2(pcp, oval, nval) percpu_cmpxchg_op(, pcp, oval, nval)
> #define raw_cpu_cmpxchg_4(pcp, oval, nval) percpu_cmpxchg_op(, pcp, oval, nval)
>
> -#define this_cpu_add_return_1(pcp, val) percpu_add_return_op(volatile, pcp, val)
> -#define this_cpu_add_return_2(pcp, val) percpu_add_return_op(volatile, pcp, val)
> -#define this_cpu_add_return_4(pcp, val) percpu_add_return_op(volatile, pcp, val)
> +#define this_cpu_add_return_1(pcp, val) percpu_add_return_op(1, volatile, pcp, val)
> +#define this_cpu_add_return_2(pcp, val) percpu_add_return_op(2, volatile, pcp, val)
> +#define this_cpu_add_return_4(pcp, val) percpu_add_return_op(4, volatile, pcp, val)
> #define this_cpu_cmpxchg_1(pcp, oval, nval) percpu_cmpxchg_op(volatile, pcp, oval, nval)
> #define this_cpu_cmpxchg_2(pcp, oval, nval) percpu_cmpxchg_op(volatile, pcp, oval, nval)
> #define this_cpu_cmpxchg_4(pcp, oval, nval) percpu_cmpxchg_op(volatile, pcp, oval, nval)
> @@ -418,7 +399,7 @@ do { \
> #define raw_cpu_add_8(pcp, val) percpu_add_op(8, , (pcp), val)
> #define raw_cpu_and_8(pcp, val) percpu_to_op(8, , "and", (pcp), val)
> #define raw_cpu_or_8(pcp, val) percpu_to_op(8, , "or", (pcp), val)
> -#define raw_cpu_add_return_8(pcp, val) percpu_add_return_op(, pcp, val)
> +#define raw_cpu_add_return_8(pcp, val) percpu_add_return_op(8, , pcp, val)
> #define raw_cpu_xchg_8(pcp, nval) raw_percpu_xchg_op(pcp, nval)
> #define raw_cpu_cmpxchg_8(pcp, oval, nval) percpu_cmpxchg_op(, pcp, oval, nval)
>
> @@ -427,7 +408,7 @@ do { \
> #define this_cpu_add_8(pcp, val) percpu_add_op(8, volatile, (pcp), val)
> #define this_cpu_and_8(pcp, val) percpu_to_op(8, volatile, "and", (pcp), val)
> #define this_cpu_or_8(pcp, val) percpu_to_op(8, volatile, "or", (pcp), val)
> -#define this_cpu_add_return_8(pcp, val) percpu_add_return_op(volatile, pcp, val)
> +#define this_cpu_add_return_8(pcp, val) percpu_add_return_op(8, volatile, pcp, val)
> #define this_cpu_xchg_8(pcp, nval) percpu_xchg_op(volatile, pcp, nval)
> #define this_cpu_cmpxchg_8(pcp, oval, nval) percpu_cmpxchg_op(volatile, pcp, oval, nval)
>
> --
> 2.25.4
>


--
Thanks,
~Nick Desaulniers

2020-06-01 20:21:55

by Nick Desaulniers

[permalink] [raw]
Subject: Re: [PATCH v2 04/10] x86/percpu: Clean up percpu_add_op()

On Sat, May 30, 2020 at 3:11 PM Brian Gerst <[email protected]> wrote:
>
> The core percpu macros already have a switch on the data size, so the switch
> in the x86 code is redundant and produces more dead code.
>
> Also use appropriate types for the width of the instructions. This avoids
> errors when compiling with Clang.
>
> Signed-off-by: Brian Gerst <[email protected]>

Reviewed-by: Nick Desaulniers <[email protected]>

> ---
> arch/x86/include/asm/percpu.h | 99 ++++++++---------------------------
> 1 file changed, 22 insertions(+), 77 deletions(-)
>
> diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
> index a40d2e055f58..2a24f3c795eb 100644
> --- a/arch/x86/include/asm/percpu.h
> +++ b/arch/x86/include/asm/percpu.h
> @@ -130,64 +130,32 @@ do { \
> : [val] __pcpu_reg_imm_##size(pto_val__)); \
> } while (0)
>
> +#define percpu_unary_op(size, qual, op, _var) \
> +({ \
> + asm qual (__pcpu_op1_##size(op, __percpu_arg([var])) \
> + : [var] "+m" (_var)); \
> +})
> +
> /*
> * Generate a percpu add to memory instruction and optimize code
> * if one is added or subtracted.
> */
> -#define percpu_add_op(qual, var, val) \
> +#define percpu_add_op(size, qual, var, val) \
> do { \
> - typedef typeof(var) pao_T__; \
> const int pao_ID__ = (__builtin_constant_p(val) && \
> ((val) == 1 || (val) == -1)) ? \
> (int)(val) : 0; \
> if (0) { \
> - pao_T__ pao_tmp__; \
> + typeof(var) pao_tmp__; \
> pao_tmp__ = (val); \
> (void)pao_tmp__; \
> } \
> - switch (sizeof(var)) { \
> - case 1: \
> - if (pao_ID__ == 1) \
> - asm qual ("incb "__percpu_arg(0) : "+m" (var)); \
> - else if (pao_ID__ == -1) \
> - asm qual ("decb "__percpu_arg(0) : "+m" (var)); \
> - else \
> - asm qual ("addb %1, "__percpu_arg(0) \
> - : "+m" (var) \
> - : "qi" ((pao_T__)(val))); \
> - break; \
> - case 2: \
> - if (pao_ID__ == 1) \
> - asm qual ("incw "__percpu_arg(0) : "+m" (var)); \
> - else if (pao_ID__ == -1) \
> - asm qual ("decw "__percpu_arg(0) : "+m" (var)); \
> - else \
> - asm qual ("addw %1, "__percpu_arg(0) \
> - : "+m" (var) \
> - : "ri" ((pao_T__)(val))); \
> - break; \
> - case 4: \
> - if (pao_ID__ == 1) \
> - asm qual ("incl "__percpu_arg(0) : "+m" (var)); \
> - else if (pao_ID__ == -1) \
> - asm qual ("decl "__percpu_arg(0) : "+m" (var)); \
> - else \
> - asm qual ("addl %1, "__percpu_arg(0) \
> - : "+m" (var) \
> - : "ri" ((pao_T__)(val))); \
> - break; \
> - case 8: \
> - if (pao_ID__ == 1) \
> - asm qual ("incq "__percpu_arg(0) : "+m" (var)); \
> - else if (pao_ID__ == -1) \
> - asm qual ("decq "__percpu_arg(0) : "+m" (var)); \
> - else \
> - asm qual ("addq %1, "__percpu_arg(0) \
> - : "+m" (var) \
> - : "re" ((pao_T__)(val))); \
> - break; \
> - default: __bad_percpu_size(); \
> - } \
> + if (pao_ID__ == 1) \
> + percpu_unary_op(size, qual, "inc", var); \
> + else if (pao_ID__ == -1) \
> + percpu_unary_op(size, qual, "dec", var); \
> + else \
> + percpu_to_op(size, qual, "add", var, val); \
> } while (0)
>
> #define percpu_from_op(size, qual, op, _var) \
> @@ -228,29 +196,6 @@ do { \
> pfo_ret__; \
> })
>
> -#define percpu_unary_op(qual, op, var) \
> -({ \
> - switch (sizeof(var)) { \
> - case 1: \
> - asm qual (op "b "__percpu_arg(0) \
> - : "+m" (var)); \
> - break; \
> - case 2: \
> - asm qual (op "w "__percpu_arg(0) \
> - : "+m" (var)); \
> - break; \
> - case 4: \
> - asm qual (op "l "__percpu_arg(0) \
> - : "+m" (var)); \
> - break; \
> - case 8: \
> - asm qual (op "q "__percpu_arg(0) \
> - : "+m" (var)); \
> - break; \
> - default: __bad_percpu_size(); \
> - } \
> -})
> -
> /*
> * Add return operation
> */
> @@ -388,9 +333,9 @@ do { \
> #define raw_cpu_write_1(pcp, val) percpu_to_op(1, , "mov", (pcp), val)
> #define raw_cpu_write_2(pcp, val) percpu_to_op(2, , "mov", (pcp), val)
> #define raw_cpu_write_4(pcp, val) percpu_to_op(4, , "mov", (pcp), val)
> -#define raw_cpu_add_1(pcp, val) percpu_add_op(, (pcp), val)
> -#define raw_cpu_add_2(pcp, val) percpu_add_op(, (pcp), val)
> -#define raw_cpu_add_4(pcp, val) percpu_add_op(, (pcp), val)
> +#define raw_cpu_add_1(pcp, val) percpu_add_op(1, , (pcp), val)
> +#define raw_cpu_add_2(pcp, val) percpu_add_op(2, , (pcp), val)
> +#define raw_cpu_add_4(pcp, val) percpu_add_op(4, , (pcp), val)
> #define raw_cpu_and_1(pcp, val) percpu_to_op(1, , "and", (pcp), val)
> #define raw_cpu_and_2(pcp, val) percpu_to_op(2, , "and", (pcp), val)
> #define raw_cpu_and_4(pcp, val) percpu_to_op(4, , "and", (pcp), val)
> @@ -419,9 +364,9 @@ do { \
> #define this_cpu_write_1(pcp, val) percpu_to_op(1, volatile, "mov", (pcp), val)
> #define this_cpu_write_2(pcp, val) percpu_to_op(2, volatile, "mov", (pcp), val)
> #define this_cpu_write_4(pcp, val) percpu_to_op(4, volatile, "mov", (pcp), val)
> -#define this_cpu_add_1(pcp, val) percpu_add_op(volatile, (pcp), val)
> -#define this_cpu_add_2(pcp, val) percpu_add_op(volatile, (pcp), val)
> -#define this_cpu_add_4(pcp, val) percpu_add_op(volatile, (pcp), val)
> +#define this_cpu_add_1(pcp, val) percpu_add_op(1, volatile, (pcp), val)
> +#define this_cpu_add_2(pcp, val) percpu_add_op(2, volatile, (pcp), val)
> +#define this_cpu_add_4(pcp, val) percpu_add_op(4, volatile, (pcp), val)
> #define this_cpu_and_1(pcp, val) percpu_to_op(1, volatile, "and", (pcp), val)
> #define this_cpu_and_2(pcp, val) percpu_to_op(2, volatile, "and", (pcp), val)
> #define this_cpu_and_4(pcp, val) percpu_to_op(4, volatile, "and", (pcp), val)
> @@ -470,7 +415,7 @@ do { \
> #ifdef CONFIG_X86_64
> #define raw_cpu_read_8(pcp) percpu_from_op(8, , "mov", pcp)
> #define raw_cpu_write_8(pcp, val) percpu_to_op(8, , "mov", (pcp), val)
> -#define raw_cpu_add_8(pcp, val) percpu_add_op(, (pcp), val)
> +#define raw_cpu_add_8(pcp, val) percpu_add_op(8, , (pcp), val)
> #define raw_cpu_and_8(pcp, val) percpu_to_op(8, , "and", (pcp), val)
> #define raw_cpu_or_8(pcp, val) percpu_to_op(8, , "or", (pcp), val)
> #define raw_cpu_add_return_8(pcp, val) percpu_add_return_op(, pcp, val)
> @@ -479,7 +424,7 @@ do { \
>
> #define this_cpu_read_8(pcp) percpu_from_op(8, volatile, "mov", pcp)
> #define this_cpu_write_8(pcp, val) percpu_to_op(8, volatile, "mov", (pcp), val)
> -#define this_cpu_add_8(pcp, val) percpu_add_op(volatile, (pcp), val)
> +#define this_cpu_add_8(pcp, val) percpu_add_op(8, volatile, (pcp), val)
> #define this_cpu_and_8(pcp, val) percpu_to_op(8, volatile, "and", (pcp), val)
> #define this_cpu_or_8(pcp, val) percpu_to_op(8, volatile, "or", (pcp), val)
> #define this_cpu_add_return_8(pcp, val) percpu_add_return_op(volatile, pcp, val)
> --
> 2.25.4
>


--
Thanks,
~Nick Desaulniers

2020-06-01 20:29:35

by Nick Desaulniers

[permalink] [raw]
Subject: Re: [PATCH v2 10/10] x86/percpu: Remove unused PER_CPU() macro

On Sat, May 30, 2020 at 3:11 PM Brian Gerst <[email protected]> wrote:
>
> Also remove now unused __percpu_mov_op.
>
> Signed-off-by: Brian Gerst <[email protected]>

This cleanup looks unrelated to the series, and can be sent separately
if needed.
Reviewed-by: Nick Desaulniers <[email protected]>

> ---
> arch/x86/include/asm/percpu.h | 18 ------------------
> 1 file changed, 18 deletions(-)
>
> diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
> index cf2b9c2a241e..a3c33b79fb86 100644
> --- a/arch/x86/include/asm/percpu.h
> +++ b/arch/x86/include/asm/percpu.h
> @@ -4,33 +4,15 @@
>
> #ifdef CONFIG_X86_64
> #define __percpu_seg gs
> -#define __percpu_mov_op movq
> #else
> #define __percpu_seg fs
> -#define __percpu_mov_op movl
> #endif
>
> #ifdef __ASSEMBLY__
>
> -/*
> - * PER_CPU finds an address of a per-cpu variable.
> - *
> - * Args:
> - * var - variable name
> - * reg - 32bit register
> - *
> - * The resulting address is stored in the "reg" argument.
> - *
> - * Example:
> - * PER_CPU(cpu_gdt_descr, %ebx)
> - */
> #ifdef CONFIG_SMP
> -#define PER_CPU(var, reg) \
> - __percpu_mov_op %__percpu_seg:this_cpu_off, reg; \
> - lea var(reg), reg
> #define PER_CPU_VAR(var) %__percpu_seg:var
> #else /* ! SMP */
> -#define PER_CPU(var, reg) __percpu_mov_op $var, reg
> #define PER_CPU_VAR(var) var
> #endif /* SMP */
>
> --
> 2.25.4
>


--
Thanks,
~Nick Desaulniers

2020-06-01 20:46:00

by Nick Desaulniers

[permalink] [raw]
Subject: Re: [PATCH v2 09/10] x86/percpu: Clean up percpu_stable_op()

On Sat, May 30, 2020 at 3:11 PM Brian Gerst <[email protected]> wrote:
>
> Use __pcpu_size_call_return() to simplify this_cpu_read_stable().

Clever! As in this_cpu_read() in include/linux/percpu-defs.h. Could
be its own patch before this, but it's fine.
Reviewed-by: Nick Desaulniers <[email protected]>

> Also remove __bad_percpu_size() which is now unused.
>
> Signed-off-by: Brian Gerst <[email protected]>
> ---
> arch/x86/include/asm/percpu.h | 41 ++++++++++-------------------------
> 1 file changed, 12 insertions(+), 29 deletions(-)
>
> diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
> index 7efc0b5c4ff0..cf2b9c2a241e 100644
> --- a/arch/x86/include/asm/percpu.h
> +++ b/arch/x86/include/asm/percpu.h
> @@ -85,7 +85,6 @@
>
> /* For arch-specific code, we can use direct single-insn ops (they
> * don't give an lvalue though). */
> -extern void __bad_percpu_size(void);
>
> #define __pcpu_type_1 u8
> #define __pcpu_type_2 u16
> @@ -167,33 +166,13 @@ do { \
> (typeof(_var))(unsigned long) pfo_val__; \
> })
>
> -#define percpu_stable_op(op, var) \
> -({ \
> - typeof(var) pfo_ret__; \
> - switch (sizeof(var)) { \
> - case 1: \
> - asm(op "b "__percpu_arg(P1)",%0" \

What does the `P` do here?
https://gcc.gnu.org/onlinedocs/gcc/Simple-Constraints.html#Simple-Constraints
says can be machine dependent integral literal in a certain range.
https://gcc.gnu.org/onlinedocs/gcc/Machine-Constraints.html#Machine-Constraints
doesn't document `P` for x86 though...

> - : "=q" (pfo_ret__) \
> - : "p" (&(var))); \
> - break; \
> - case 2: \
> - asm(op "w "__percpu_arg(P1)",%0" \
> - : "=r" (pfo_ret__) \
> - : "p" (&(var))); \
> - break; \
> - case 4: \
> - asm(op "l "__percpu_arg(P1)",%0" \
> - : "=r" (pfo_ret__) \
> - : "p" (&(var))); \
> - break; \
> - case 8: \
> - asm(op "q "__percpu_arg(P1)",%0" \
> - : "=r" (pfo_ret__) \
> - : "p" (&(var))); \
> - break; \
> - default: __bad_percpu_size(); \
> - } \
> - pfo_ret__; \
> +#define percpu_stable_op(size, op, _var) \
> +({ \
> + __pcpu_type_##size pfo_val__; \
> + asm(__pcpu_op2_##size(op, __percpu_arg(P[var]), "%[val]") \
> + : [val] __pcpu_reg_##size("=", pfo_val__) \
> + : [var] "p" (&(_var))); \
> + (typeof(_var))(unsigned long) pfo_val__; \
> })
>
> /*
> @@ -258,7 +237,11 @@ do { \
> * per-thread variables implemented as per-cpu variables and thus
> * stable for the duration of the respective task.
> */
> -#define this_cpu_read_stable(var) percpu_stable_op("mov", var)
> +#define this_cpu_read_stable_1(pcp) percpu_stable_op(1, "mov", pcp)
> +#define this_cpu_read_stable_2(pcp) percpu_stable_op(2, "mov", pcp)
> +#define this_cpu_read_stable_4(pcp) percpu_stable_op(4, "mov", pcp)
> +#define this_cpu_read_stable_8(pcp) percpu_stable_op(8, "mov", pcp)
> +#define this_cpu_read_stable(pcp) __pcpu_size_call_return(this_cpu_read_stable_, pcp)
>
> #define raw_cpu_read_1(pcp) percpu_from_op(1, , "mov", pcp)
> #define raw_cpu_read_2(pcp) percpu_from_op(2, , "mov", pcp)
> --
> 2.25.4
>


--
Thanks,
~Nick Desaulniers

2020-06-01 21:05:58

by Nick Desaulniers

[permalink] [raw]
Subject: Re: [PATCH v2 00/10] x86: Clean up percpu operations

On Sat, May 30, 2020 at 3:11 PM Brian Gerst <[email protected]> wrote:
>
> The core percpu operations already have a switch on the width of the
> data type, which resulted in an extra amount of dead code being
> generated with the x86 operations having another switch. This patch set
> rewrites the x86 ops to remove the switch. Additional cleanups are to
> use named assembly operands, and to cast variables to the width used in
> the assembly to make Clang happy.

Thanks for all of the work that went into this series. I think I've
reviewed all of them.
With this series plus this hunk:
https://github.com/ClangBuiltLinux/continuous-integration/blob/master/patches/llvm-all/linux-next/x86/x86-support-i386-with-Clang.patch#L219-L237
I can build and boot i386_defconfig with Clang! So for the series:

Reviewed-by: Nick Desaulniers <[email protected]>
Tested-by: Nick Desaulniers <[email protected]>

>
> Changes from v1:
> - Add separate patch for XADD constraint fix
> - Fixed sparse truncation warning
> - Add cleanup of percpu_stable_op()
> - Add patch to Remove PER_CPU()
>
> Brian Gerst (10):
> x86/percpu: Introduce size abstraction macros
> x86/percpu: Clean up percpu_to_op()
> x86/percpu: Clean up percpu_from_op()
> x86/percpu: Clean up percpu_add_op()
> x86/percpu: Remove "e" constraint from XADD
> x86/percpu: Clean up percpu_add_return_op()
> x86/percpu: Clean up percpu_xchg_op()
> x86/percpu: Clean up percpu_cmpxchg_op()
> x86/percpu: Clean up percpu_stable_op()
> x86/percpu: Remove unused PER_CPU() macro
>
> arch/x86/include/asm/percpu.h | 510 ++++++++++++----------------------
> 1 file changed, 172 insertions(+), 338 deletions(-)
>
>
> base-commit: 229aaa8c059f2c908e0561453509f996f2b2d5c4
> --
> 2.25.4
>


--
Thanks,
~Nick Desaulniers

2020-06-02 14:22:54

by Brian Gerst

[permalink] [raw]
Subject: Re: [PATCH v2 09/10] x86/percpu: Clean up percpu_stable_op()

On Mon, Jun 1, 2020 at 4:43 PM Nick Desaulniers <[email protected]> wrote:
>
> On Sat, May 30, 2020 at 3:11 PM Brian Gerst <[email protected]> wrote:
> >
> > Use __pcpu_size_call_return() to simplify this_cpu_read_stable().
>
> Clever! As in this_cpu_read() in include/linux/percpu-defs.h. Could
> be its own patch before this, but it's fine.
> Reviewed-by: Nick Desaulniers <[email protected]>
>
> > Also remove __bad_percpu_size() which is now unused.
> >
> > Signed-off-by: Brian Gerst <[email protected]>
> > ---
> > arch/x86/include/asm/percpu.h | 41 ++++++++++-------------------------
> > 1 file changed, 12 insertions(+), 29 deletions(-)
> >
> > diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
> > index 7efc0b5c4ff0..cf2b9c2a241e 100644
> > --- a/arch/x86/include/asm/percpu.h
> > +++ b/arch/x86/include/asm/percpu.h
> > @@ -85,7 +85,6 @@
> >
> > /* For arch-specific code, we can use direct single-insn ops (they
> > * don't give an lvalue though). */
> > -extern void __bad_percpu_size(void);
> >
> > #define __pcpu_type_1 u8
> > #define __pcpu_type_2 u16
> > @@ -167,33 +166,13 @@ do { \
> > (typeof(_var))(unsigned long) pfo_val__; \
> > })
> >
> > -#define percpu_stable_op(op, var) \
> > -({ \
> > - typeof(var) pfo_ret__; \
> > - switch (sizeof(var)) { \
> > - case 1: \
> > - asm(op "b "__percpu_arg(P1)",%0" \
>
> What does the `P` do here?
> https://gcc.gnu.org/onlinedocs/gcc/Simple-Constraints.html#Simple-Constraints
> says can be machine dependent integral literal in a certain range.
> https://gcc.gnu.org/onlinedocs/gcc/Machine-Constraints.html#Machine-Constraints
> doesn't document `P` for x86 though...

https://gcc.gnu.org/onlinedocs/gcc-10.1.0/gcc/Extended-Asm.html#x86-Operand-Modifiers

Removing the 'P' modifier results in this:
movq %gs:$current_task, %rdx #, pfo_val__

This is because the 'p' constraint treats a memory address as a
constant. I tried replacing it with __this_cpu_read(), which since
commit 0b9ccc0a should have similar non-volatile semantics. But the
compiler still reloaded it on every use, so I left the asm template
as-is for now until that can be resolved.

--
Brian Gerst

2020-07-08 19:40:01

by Nick Desaulniers

[permalink] [raw]
Subject: Re: [PATCH v2 00/10] x86: Clean up percpu operations

On Mon, Jun 1, 2020 at 2:00 PM Nick Desaulniers <[email protected]> wrote:
>
> On Sat, May 30, 2020 at 3:11 PM Brian Gerst <[email protected]> wrote:
> >
> > The core percpu operations already have a switch on the width of the
> > data type, which resulted in an extra amount of dead code being
> > generated with the x86 operations having another switch. This patch set
> > rewrites the x86 ops to remove the switch. Additional cleanups are to
> > use named assembly operands, and to cast variables to the width used in
> > the assembly to make Clang happy.
>
> Thanks for all of the work that went into this series. I think I've
> reviewed all of them.
> With this series plus this hunk:
> https://github.com/ClangBuiltLinux/continuous-integration/blob/master/patches/llvm-all/linux-next/x86/x86-support-i386-with-Clang.patch#L219-L237
> I can build and boot i386_defconfig with Clang! So for the series:
>
> Reviewed-by: Nick Desaulniers <[email protected]>
> Tested-by: Nick Desaulniers <[email protected]>

tglx, Ingo, Boris, Linus,
Do you all have thoughts on this series? I can understand "let
sleeping dogs lie" but some Android folks are really interested in
i386 testing, and randconfigs/allnoconfigs are doing i386 builds which
are currently broken w/ Clang. This series gets us closer to having
test coverage of this ISA with another toolchain, FWIW.

--
Thanks,
~Nick Desaulniers

2020-07-09 10:27:47

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v2 09/10] x86/percpu: Clean up percpu_stable_op()

On Tue, Jun 02, 2020 at 10:19:51AM -0400, Brian Gerst wrote:
> On Mon, Jun 1, 2020 at 4:43 PM Nick Desaulniers <[email protected]> wrote:
> >
> > On Sat, May 30, 2020 at 3:11 PM Brian Gerst <[email protected]> wrote:
> > >
> > > Use __pcpu_size_call_return() to simplify this_cpu_read_stable().
> >
> > Clever! As in this_cpu_read() in include/linux/percpu-defs.h. Could
> > be its own patch before this, but it's fine.
> > Reviewed-by: Nick Desaulniers <[email protected]>
> >
> > > Also remove __bad_percpu_size() which is now unused.
> > >
> > > Signed-off-by: Brian Gerst <[email protected]>
> > > ---
> > > arch/x86/include/asm/percpu.h | 41 ++++++++++-------------------------
> > > 1 file changed, 12 insertions(+), 29 deletions(-)
> > >
> > > diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
> > > index 7efc0b5c4ff0..cf2b9c2a241e 100644
> > > --- a/arch/x86/include/asm/percpu.h
> > > +++ b/arch/x86/include/asm/percpu.h
> > > @@ -85,7 +85,6 @@
> > >
> > > /* For arch-specific code, we can use direct single-insn ops (they
> > > * don't give an lvalue though). */
> > > -extern void __bad_percpu_size(void);
> > >
> > > #define __pcpu_type_1 u8
> > > #define __pcpu_type_2 u16
> > > @@ -167,33 +166,13 @@ do { \
> > > (typeof(_var))(unsigned long) pfo_val__; \
> > > })
> > >
> > > -#define percpu_stable_op(op, var) \
> > > -({ \
> > > - typeof(var) pfo_ret__; \
> > > - switch (sizeof(var)) { \
> > > - case 1: \
> > > - asm(op "b "__percpu_arg(P1)",%0" \
> >
> > What does the `P` do here?
> > https://gcc.gnu.org/onlinedocs/gcc/Simple-Constraints.html#Simple-Constraints
> > says can be machine dependent integral literal in a certain range.
> > https://gcc.gnu.org/onlinedocs/gcc/Machine-Constraints.html#Machine-Constraints
> > doesn't document `P` for x86 though...
>
> https://gcc.gnu.org/onlinedocs/gcc-10.1.0/gcc/Extended-Asm.html#x86-Operand-Modifiers
>
> Removing the 'P' modifier results in this:
> movq %gs:$current_task, %rdx #, pfo_val__
>
> This is because the 'p' constraint treats a memory address as a
> constant. I tried replacing it with __this_cpu_read(), which since
> commit 0b9ccc0a should have similar non-volatile semantics. But the
> compiler still reloaded it on every use, so I left the asm template
> as-is for now until that can be resolved.

Right, I can into that same issue a while back and gave up staring at
compiler innards. __this_cpu_read() *should* allow re-loads, and it does
in places (we've had bugs because of it), but this_cpu_read_stable() is
somehow far more 'effective'.

It would be good if someone can update the comment with that thing, to
explain matters better.

2020-07-09 10:31:47

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v2 00/10] x86: Clean up percpu operations

On Sat, May 30, 2020 at 06:11:17PM -0400, Brian Gerst wrote:
> The core percpu operations already have a switch on the width of the
> data type, which resulted in an extra amount of dead code being
> generated with the x86 operations having another switch. This patch set
> rewrites the x86 ops to remove the switch. Additional cleanups are to
> use named assembly operands, and to cast variables to the width used in
> the assembly to make Clang happy.
>
> Changes from v1:
> - Add separate patch for XADD constraint fix
> - Fixed sparse truncation warning
> - Add cleanup of percpu_stable_op()
> - Add patch to Remove PER_CPU()
>
> Brian Gerst (10):
> x86/percpu: Introduce size abstraction macros
> x86/percpu: Clean up percpu_to_op()
> x86/percpu: Clean up percpu_from_op()
> x86/percpu: Clean up percpu_add_op()
> x86/percpu: Remove "e" constraint from XADD
> x86/percpu: Clean up percpu_add_return_op()
> x86/percpu: Clean up percpu_xchg_op()
> x86/percpu: Clean up percpu_cmpxchg_op()
> x86/percpu: Clean up percpu_stable_op()
> x86/percpu: Remove unused PER_CPU() macro
>
> arch/x86/include/asm/percpu.h | 510 ++++++++++++----------------------
> 1 file changed, 172 insertions(+), 338 deletions(-)

Nice!

Acked-by: Peter Zijlstra (Intel) <[email protected]>

2020-07-09 10:32:59

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v2 02/10] x86/percpu: Clean up percpu_to_op()

On Sat, May 30, 2020 at 06:11:19PM -0400, Brian Gerst wrote:
> + if (0) { \
> + typeof(_var) pto_tmp__; \
> + pto_tmp__ = (_val); \
> + (void)pto_tmp__; \
> + } \

This is repeated at least once more; and it looks very similar to
__typecheck() and typecheck() but is yet another variant afaict.

2020-07-09 20:06:44

by Nick Desaulniers

[permalink] [raw]
Subject: Re: [PATCH v2 00/10] x86: Clean up percpu operations

On Wed, Jul 8, 2020 at 12:36 PM Nick Desaulniers
<[email protected]> wrote:
>
> On Mon, Jun 1, 2020 at 2:00 PM Nick Desaulniers <[email protected]> wrote:
> >
> > On Sat, May 30, 2020 at 3:11 PM Brian Gerst <[email protected]> wrote:
> > >
> > > The core percpu operations already have a switch on the width of the
> > > data type, which resulted in an extra amount of dead code being
> > > generated with the x86 operations having another switch. This patch set
> > > rewrites the x86 ops to remove the switch. Additional cleanups are to
> > > use named assembly operands, and to cast variables to the width used in
> > > the assembly to make Clang happy.
> >
> > Thanks for all of the work that went into this series. I think I've
> > reviewed all of them.
> > With this series plus this hunk:
> > https://github.com/ClangBuiltLinux/continuous-integration/blob/master/patches/llvm-all/linux-next/x86/x86-support-i386-with-Clang.patch#L219-L237
> > I can build and boot i386_defconfig with Clang! So for the series:
> >
> > Reviewed-by: Nick Desaulniers <[email protected]>
> > Tested-by: Nick Desaulniers <[email protected]>
>
> tglx, Ingo, Boris, Linus,
> Do you all have thoughts on this series? I can understand "let
> sleeping dogs lie" but some Android folks are really interested in
> i386 testing, and randconfigs/allnoconfigs are doing i386 builds which
> are currently broken w/ Clang. This series gets us closer to having
> test coverage of this ISA with another toolchain, FWIW.

Oh, was https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=6ec4476ac825
alluding to this, perchance?

--
Thanks,
~Nick Desaulniers

2020-07-10 04:42:13

by Brian Gerst

[permalink] [raw]
Subject: Re: [PATCH v2 02/10] x86/percpu: Clean up percpu_to_op()

On Thu, Jul 9, 2020 at 6:30 AM Peter Zijlstra <[email protected]> wrote:
>
> On Sat, May 30, 2020 at 06:11:19PM -0400, Brian Gerst wrote:
> > + if (0) { \
> > + typeof(_var) pto_tmp__; \
> > + pto_tmp__ = (_val); \
> > + (void)pto_tmp__; \
> > + } \
>
> This is repeated at least once more; and it looks very similar to
> __typecheck() and typecheck() but is yet another variant afaict.

The problem with typecheck() is that it will complain about a mismatch
between unsigned long and u64 (defined as unsigned long long) even
though both are 64-bits wide on x86-64. Cleaning that mess up is
beyond the scope of this series, so I kept the existing checks.

--
Brian Gerst

2020-07-10 08:56:27

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v2 02/10] x86/percpu: Clean up percpu_to_op()

On Fri, Jul 10, 2020 at 12:38:23AM -0400, Brian Gerst wrote:
> On Thu, Jul 9, 2020 at 6:30 AM Peter Zijlstra <[email protected]> wrote:
> >
> > On Sat, May 30, 2020 at 06:11:19PM -0400, Brian Gerst wrote:
> > > + if (0) { \
> > > + typeof(_var) pto_tmp__; \
> > > + pto_tmp__ = (_val); \
> > > + (void)pto_tmp__; \
> > > + } \
> >
> > This is repeated at least once more; and it looks very similar to
> > __typecheck() and typecheck() but is yet another variant afaict.
>
> The problem with typecheck() is that it will complain about a mismatch
> between unsigned long and u64 (defined as unsigned long long) even
> though both are 64-bits wide on x86-64. Cleaning that mess up is
> beyond the scope of this series, so I kept the existing checks.

Fair enough; thanks for explaining.

2020-07-10 16:57:01

by Nick Desaulniers

[permalink] [raw]
Subject: Re: [PATCH v2 02/10] x86/percpu: Clean up percpu_to_op()

On Fri, Jul 10, 2020 at 1:53 AM Peter Zijlstra <[email protected]> wrote:
>
> On Fri, Jul 10, 2020 at 12:38:23AM -0400, Brian Gerst wrote:
> > On Thu, Jul 9, 2020 at 6:30 AM Peter Zijlstra <[email protected]> wrote:
> > >
> > > On Sat, May 30, 2020 at 06:11:19PM -0400, Brian Gerst wrote:
> > > > + if (0) { \
> > > > + typeof(_var) pto_tmp__; \
> > > > + pto_tmp__ = (_val); \
> > > > + (void)pto_tmp__; \
> > > > + } \
> > >
> > > This is repeated at least once more; and it looks very similar to
> > > __typecheck() and typecheck() but is yet another variant afaict.
> >
> > The problem with typecheck() is that it will complain about a mismatch
> > between unsigned long and u64 (defined as unsigned long long) even
> > though both are 64-bits wide on x86-64. Cleaning that mess up is
> > beyond the scope of this series, so I kept the existing checks.
>
> Fair enough; thanks for explaining.

I brought up the same point in v1, for more context:
https://lore.kernel.org/lkml/CAKwvOdnCcpS_9A2y9tMqeiAg2NfcVx=gNeA2V=+zHknit7wGkg@mail.gmail.com/
--
Thanks,
~Nick Desaulniers

2020-07-13 22:25:52

by Nick Desaulniers

[permalink] [raw]
Subject: Re: [PATCH v2 00/10] x86: Clean up percpu operations

On Wed, Jul 8, 2020 at 12:36 PM Nick Desaulniers
<[email protected]> wrote:
>
> On Mon, Jun 1, 2020 at 2:00 PM Nick Desaulniers <[email protected]> wrote:
> >
> > On Sat, May 30, 2020 at 3:11 PM Brian Gerst <[email protected]> wrote:
> > >
> > > The core percpu operations already have a switch on the width of the
> > > data type, which resulted in an extra amount of dead code being
> > > generated with the x86 operations having another switch. This patch set
> > > rewrites the x86 ops to remove the switch. Additional cleanups are to
> > > use named assembly operands, and to cast variables to the width used in
> > > the assembly to make Clang happy.
> >
> > Thanks for all of the work that went into this series. I think I've
> > reviewed all of them.
> > With this series plus this hunk:
> > https://github.com/ClangBuiltLinux/continuous-integration/blob/master/patches/llvm-all/linux-next/x86/x86-support-i386-with-Clang.patch#L219-L237
> > I can build and boot i386_defconfig with Clang! So for the series:
> >
> > Reviewed-by: Nick Desaulniers <[email protected]>
> > Tested-by: Nick Desaulniers <[email protected]>
>
> tglx, Ingo, Boris, Linus,
> Do you all have thoughts on this series? I can understand "let
> sleeping dogs lie" but some Android folks are really interested in
> i386 testing, and randconfigs/allnoconfigs are doing i386 builds which
> are currently broken w/ Clang. This series gets us closer to having
> test coverage of this ISA with another toolchain, FWIW.

I'm trying to organize an LLVM micro conference at plumbers. I think
a session on "clang and remaining i386 blockers" might be of interest,
which would cover why the existing code pattern is problematic for
compiler portability. This series in particular would be brought up.

Are you all planning on attending plumbers this year? Might such a
session be of interest?

Otherwise, is there any additional feedback on this series or is it good to go?
--
Thanks,
~Nick Desaulniers

2020-07-13 22:41:19

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH v2 00/10] x86: Clean up percpu operations

On Mon, Jul 13, 2020 at 3:24 PM Nick Desaulniers
<[email protected]> wrote:
>
> Otherwise, is there any additional feedback on this series or is it good to go?

I've lost sight of the series. I'm sure it is fine, but maybe you can
resend it to me (in private, if it's already been going out on the
mailing lists and everybody else is completely fed up with it).

And no, pointing to the "plus this hunk" with a web link isn't what I
was looking for ;)

Linus

2020-07-13 22:59:34

by Nick Desaulniers

[permalink] [raw]
Subject: Re: [PATCH v2 00/10] x86: Clean up percpu operations

On Mon, Jul 13, 2020 at 3:40 PM Linus Torvalds
<[email protected]> wrote:
>
> On Mon, Jul 13, 2020 at 3:24 PM Nick Desaulniers
> <[email protected]> wrote:
> >
> > Otherwise, is there any additional feedback on this series or is it good to go?
>
> I've lost sight of the series. I'm sure it is fine, but maybe you can
> resend it to me (in private, if it's already been going out on the
> mailing lists and everybody else is completely fed up with it).

Is there a fast way that maintainters amend ACKs to each commit in a series?

>
> And no, pointing to the "plus this hunk" with a web link isn't what I
> was looking for ;)

So you're not accepting pull requests yet on github? I jest.
--
Thanks,
~Nick Desaulniers

2020-07-14 00:32:56

by Nick Desaulniers

[permalink] [raw]
Subject: Re: [PATCH v2 00/10] x86: Clean up percpu operations

On Mon, Jul 13, 2020 at 3:58 PM Nick Desaulniers
<[email protected]> wrote:
>
> On Mon, Jul 13, 2020 at 3:40 PM Linus Torvalds
> <[email protected]> wrote:
> >
> > On Mon, Jul 13, 2020 at 3:24 PM Nick Desaulniers
> > <[email protected]> wrote:
> > >
> > > Otherwise, is there any additional feedback on this series or is it good to go?
> >
> > I've lost sight of the series. I'm sure it is fine, but maybe you can
> > resend it to me (in private, if it's already been going out on the
> > mailing lists and everybody else is completely fed up with it).
>
> Is there a fast way that maintainters amend ACKs to each commit in a series?

For future travelers (more so myself, since I don't sync my shell
history between machines, and I'm a big fan of aggressively sharing
knowledge. See also the section "Information as Power" and the
anecdote about tank manuals:
https://www.meforum.org/441/why-arabs-lose-wars). `b4` has a pretty
cool feature. When I was fetching this series, it was warning:
```
NOTE: Some trailers were sent to the cover letter:
Tested-by: Nick Desaulniers <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Nick Desaulniers <[email protected]>
NOTE: Rerun with -t to apply them to all patches
```
So I did:
```
$ b4 am -t https://lore.kernel.org/lkml/CAKwvOdmsap8iB+H5JXiHYwSJFrtQ_krjNH7eQCGe7p-LjK7ftA@mail.gmail.com/T/\#t
-o - | git am
$ git filter-branch -f --msg-filter 'cat - && echo "Signed-off-by:
Nick Desaulniers <[email protected]>"" $@";' HEAD~10..HEAD

>
> >
> > And no, pointing to the "plus this hunk" with a web link isn't what I
> > was looking for ;)
>
> So you're not accepting pull requests yet on github? I jest.

Actually, looks like a lot of merged PRs come from github! Grizzly
Adams *did* have a beard! https://www.youtube.com/watch?v=pdwJC9HvKLU

Sent as a series of emails via:
$ git format-patch -o linus_i386 HEAD~11
$ git send-email --to="Linus Torvalds <[email protected]>"
--suppress-cc=all linus_i386

Though, I'm sure a pull request would have been more formal.
--
Thanks,
~Nick Desaulniers

2020-07-14 01:33:28

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH v2 00/10] x86: Clean up percpu operations

On Mon, Jul 13, 2020 at 5:31 PM Nick Desaulniers
<[email protected]> wrote:
>
> So I did:

Ack, I have the series, and it looks fine to me.

> Actually, looks like a lot of merged PRs come from github! Grizzly
> Adams *did* have a beard!

Oh, I think github is an _excellent_ hosting service.

I just can't stand their web workflow for creating commits and merging code.

At least at some point it was really easy to generate totally horrible
commit messages using the web interface without the proper header line
etc. I _think_ that got fixed. But the merge codeflow still doesn't
work at all for the kernel.

(To be fair, I've used it for _other_ projects, and there it can work
really really well, together with all the test infrastructure etc).

They also have a very spammy email system where people can add me to
their projects and "invite" me, and I get email about it.

But again - as a hosting site for kernel pulls, it's one of the better ones.

So you definitely don't need a kernel.org account to send me pull
requests, github is in fact much better than some others (infradead is
very slow to pull for me, for example).

A kernel.org pull doesn't strictly need a signed tag - I do kernel.org
pulls over ssh, and I trust the user security there. Any other hosting
service I _do_ require that the pull requests are proper signed tags,
though.

But even that difference is largely gone: for kernel.org pulls I've
very heavily favored signed tags, and I think almost all of them are
using them by now.

Linus