2018-05-17 23:33:28

by Nadav Amit

[permalink] [raw]
Subject: [PATCH 0/6] Macrofying inline assembly for better compilation

This patch-set deals with an interesting yet stupid problem: kernel code
that does not get inlined despite its simplicity. There are several
causes for this behavior: "cold" attribute on __init, different function
optimization levels; conditional constant computations based on
__builtin_constant_p(); and finally large inline assembly blocks.

This patch-set deals with the inline assembly problem. I separated these
patches from the others (that were sent in the RFC) for easier
inclusion.

The problem with inline assembly is that inline assembly is often used
by the kernel for things that are other than code - for example,
assembly directives and data. GCC however is oblivious to the content of
the blocks and assumes their cost in space and time is proportional to
the number of the perceived assembly "instruction", according to the
number of newlines and semicolons. Alternatives, paravirt and other
mechanisms are affected, causing code not to be inlined, and degrading
compilation quality in general.

The solution that this patch-set carries for this problem is to create
an assembly macro, and then call it from the inline assembly block. As
a result, the compiler sees a single "instruction" and assigns the more
appropriate cost to the code. In addition, this patch-set removes
unneeded new-lines from common x86 inline asm's, which "confuse" GCC
heuristics.

Overall this patch-set slightly increases the kernel size (my build was
done using my localmodconfig + localyesconfig for the record):

text data bss dec hex filename
18126699 10066728 2936832 31130259 1db0293 ./vmlinux before
18148888 10064016 2936832 31149736 1db4ea8 ./vmlinux after (+0.06%)

The patch-set eliminates many of the static text symbols:
Before: 40033
After: 39650 (-1%)

A microbenchmark with a loop of page-fault and MADV_DONTNEED show 2%
performance improvement with this patch-set (when PTI is off).

Changes from RFC:
- Better formatting [Jan]
- i386 build problems [0-day]
- Inline comments
- Separating __builtin_constant_p() into a different future patch-set

Cc: Alok Kataria <[email protected]>
Cc: Christopher Li <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jan Beulich <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: [email protected]
Cc: Peter Zijlstra <[email protected]>
Cc: Randy Dunlap <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: [email protected]

Nadav Amit (6):
x86: objtool: use asm macro for better compiler decisions
x86: bug: prevent gcc distortions
x86: alternative: macrofy locks for better inlining
x86: prevent inline distortion by paravirt ops
x86: refcount: prevent gcc distortions
x86: removing unneeded new-lines

arch/x86/include/asm/alternative.h | 34 +++++++++++----
arch/x86/include/asm/asm.h | 4 +-
arch/x86/include/asm/bug.h | 56 ++++++++++++++++--------
arch/x86/include/asm/cmpxchg.h | 10 ++---
arch/x86/include/asm/paravirt_types.h | 63 +++++++++++++++++----------
arch/x86/include/asm/refcount.h | 62 ++++++++++++++++----------
arch/x86/include/asm/special_insns.h | 12 ++---
include/linux/compiler.h | 37 ++++++++++++----
8 files changed, 183 insertions(+), 95 deletions(-)

--
2.17.0



2018-05-17 23:29:31

by Nadav Amit

[permalink] [raw]
Subject: [PATCH 4/6] x86: prevent inline distortion by paravirt ops

GCC considers the number of statements in inlined assembly blocks,
according to new-lines and semicolons, as an indication to the cost of
the block in time and space. This data is distorted by the kernel code,
which puts information in alternative sections. As a result, the
compiler may perform incorrect inlining and branch optimizations.

The solution is to set an assembly macro and call it from the inlined
assembly block. As a result GCC considers the inline assembly block as
a single instruction.

The effect of the patch is a more aggressive inlining, which also
causes a size increase of kernel.

text data bss dec hex filename
18131468 10068488 2936832 31136788 1db1c14 ./vmlinux before
18146418 10064100 2936832 31147350 1db4556 ./vmlinux after (+10562)

Static text symbols:
Before: 39788
After: 39673 (-115)

Cc: Juergen Gross <[email protected]>
Cc: Alok Kataria <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: [email protected]
Cc: [email protected]

Signed-off-by: Nadav Amit <[email protected]>
---
arch/x86/include/asm/paravirt_types.h | 63 +++++++++++++++++----------
1 file changed, 39 insertions(+), 24 deletions(-)

diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h
index 180bc0bff0fb..ea62204c2ee6 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -346,20 +346,45 @@ extern struct pv_lock_ops pv_lock_ops;
/*
* Generate some code, and mark it as patchable by the
* apply_paravirt() alternate instruction patcher.
+ *
+ * This generates an indirect call based on the operation type number.
+ * The type number, computed in PARAVIRT_PATCH, is derived from the
+ * offset into the paravirt_patch_template structure, and can therefore be
+ * freely converted back into a structure offset.
+ *
+ * The paravirtual alternative logic and data are encapsulated within an
+ * assembly macro, which is then called on each use. This hack is necessary to
+ * prevent GCC from considering the inline assembly blocks as costly in time and
+ * space, which can prevent function inlining and lead to other bad compilation
+ * decisions. GCC computes inline assembly cost according to the number of
+ * perceived number of assembly instruction, based on the number of new-lines
+ * and semicolons in the assembly block. The macro will eventually be compiled
+ * into a single instruction (and some data). This scheme allows GCC to better
+ * understand the inline asm cost.
*/
-#define _paravirt_alt(insn_string, type, clobber) \
- "771:\n\t" insn_string "\n" "772:\n" \
- ".pushsection .parainstructions,\"a\"\n" \
- _ASM_ALIGN "\n" \
- _ASM_PTR " 771b\n" \
- " .byte " type "\n" \
- " .byte 772b-771b\n" \
- " .short " clobber "\n" \
- ".popsection\n"
+asm(".macro __paravirt_alt type:req clobber:req pv_opptr:req\n"
+ "771:\n\t"
+ ANNOTATE_RETPOLINE_SAFE "\n\t"
+ "call *\\pv_opptr\n"
+ "772:\n\t"
+ ".pushsection .parainstructions,\"a\"\n\t"
+ _ASM_ALIGN "\n\t"
+ _ASM_PTR " 771b\n\t"
+ ".byte \\type\n\t"
+ ".byte 772b-771b\n\t"
+ ".short \\clobber\n\t"
+ ".popsection\n\t"
+ ".endm");
+
+#define _paravirt_alt(type, clobber, pv_opptr) \
+ "__paravirt_alt type=" __stringify(type) \
+ " clobber=" __stringify(clobber) \
+ " pv_opptr=" __stringify(pv_opptr) "\n\t"

/* Generate patchable code, with the default asm parameters. */
-#define paravirt_alt(insn_string) \
- _paravirt_alt(insn_string, "%c[paravirt_typenum]", "%c[paravirt_clobber]")
+#define paravirt_alt \
+ _paravirt_alt("%c[paravirt_typenum]", "%c[paravirt_clobber]", \
+ "%c[paravirt_opptr]")

/* Simple instruction patching code. */
#define NATIVE_LABEL(a,x,b) "\n\t.globl " a #x "_" #b "\n" a #x "_" #b ":\n\t"
@@ -387,16 +412,6 @@ unsigned native_patch(u8 type, u16 clobbers, void *ibuf,

int paravirt_disable_iospace(void);

-/*
- * This generates an indirect call based on the operation type number.
- * The type number, computed in PARAVIRT_PATCH, is derived from the
- * offset into the paravirt_patch_template structure, and can therefore be
- * freely converted back into a structure offset.
- */
-#define PARAVIRT_CALL \
- ANNOTATE_RETPOLINE_SAFE \
- "call *%c[paravirt_opptr];"
-
/*
* These macros are intended to wrap calls through one of the paravirt
* ops structs, so that they can be later identified and patched at
@@ -534,7 +549,7 @@ int paravirt_disable_iospace(void);
/* since this condition will never hold */ \
if (sizeof(rettype) > sizeof(unsigned long)) { \
asm volatile(pre \
- paravirt_alt(PARAVIRT_CALL) \
+ paravirt_alt \
post \
: call_clbr, ASM_CALL_CONSTRAINT \
: paravirt_type(op), \
@@ -544,7 +559,7 @@ int paravirt_disable_iospace(void);
__ret = (rettype)((((u64)__edx) << 32) | __eax); \
} else { \
asm volatile(pre \
- paravirt_alt(PARAVIRT_CALL) \
+ paravirt_alt \
post \
: call_clbr, ASM_CALL_CONSTRAINT \
: paravirt_type(op), \
@@ -571,7 +586,7 @@ int paravirt_disable_iospace(void);
PVOP_VCALL_ARGS; \
PVOP_TEST_NULL(op); \
asm volatile(pre \
- paravirt_alt(PARAVIRT_CALL) \
+ paravirt_alt \
post \
: call_clbr, ASM_CALL_CONSTRAINT \
: paravirt_type(op), \
--
2.17.0


2018-05-17 23:29:57

by Nadav Amit

[permalink] [raw]
Subject: [PATCH 2/6] x86: bug: prevent gcc distortions

GCC considers the number of statements in inlined assembly blocks,
according to new-lines and semicolons, as an indication to the cost of
the block in time and space. This data is distorted by the kernel code,
which puts information in alternative sections. As a result, the
compiler may perform incorrect inlining and branch optimizations.

The solution is to set an assembly macro and call it from the inlinedv
assembly block. As a result GCC considers the inline assembly block as
a single instruction.

This patch increases the kernel size:

text data bss dec hex filename
18126824 10067268 2936832 31130924 1db052c ./vmlinux before
18127205 10068388 2936832 31132425 1db0b09 ./vmlinux after (+1501)

But enables more aggressive inlining (and probably branch decisions).
The number of static text symbols in vmlinux is lower.

Before: 40015
After: 39860 (-155)

Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: [email protected]
Cc: Peter Zijlstra <[email protected]>
Cc: Josh Poimboeuf <[email protected]>

Signed-off-by: Nadav Amit <[email protected]>
---
arch/x86/include/asm/bug.h | 56 +++++++++++++++++++++++++-------------
1 file changed, 37 insertions(+), 19 deletions(-)

diff --git a/arch/x86/include/asm/bug.h b/arch/x86/include/asm/bug.h
index 6804d6642767..1167e4822a34 100644
--- a/arch/x86/include/asm/bug.h
+++ b/arch/x86/include/asm/bug.h
@@ -30,33 +30,51 @@

#ifdef CONFIG_DEBUG_BUGVERBOSE

-#define _BUG_FLAGS(ins, flags) \
+/*
+ * Saving the bug data is encapsulated within an assembly macro, which is then
+ * called on each use. This hack is necessary to prevent GCC from considering
+ * the inline assembly blocks as costly in time and space, which can prevent
+ * function inlining and lead to other bad compilation decisions. GCC computes
+ * inline assembly cost according to the number of perceived number of assembly
+ * instruction, based on the number of new-lines and semicolons in the assembly
+ * block. The macro will eventually be compiled into a single instruction (and
+ * some data). This scheme allows GCC to better understand the inline asm cost.
+ */
+asm(".macro __BUG_FLAGS ins:req file:req line:req flags:req size:req\n"
+ "1:\t \\ins\n\t"
+ ".pushsection __bug_table,\"aw\"\n"
+ "2:\t "__BUG_REL(1b) "\t# bug_entry::bug_addr\n\t"
+ __BUG_REL(\\file) "\t# bug_entry::file\n\t"
+ ".word \\line" "\t# bug_entry::line\n\t"
+ ".word \\flags" "\t# bug_entry::flags\n\t"
+ ".org 2b+\\size\n\t"
+ ".popsection\n\t"
+ ".endm");
+
+#define _BUG_FLAGS(ins, flags) \
do { \
- asm volatile("1:\t" ins "\n" \
- ".pushsection __bug_table,\"aw\"\n" \
- "2:\t" __BUG_REL(1b) "\t# bug_entry::bug_addr\n" \
- "\t" __BUG_REL(%c0) "\t# bug_entry::file\n" \
- "\t.word %c1" "\t# bug_entry::line\n" \
- "\t.word %c2" "\t# bug_entry::flags\n" \
- "\t.org 2b+%c3\n" \
- ".popsection" \
- : : "i" (__FILE__), "i" (__LINE__), \
- "i" (flags), \
+ asm volatile("__BUG_FLAGS \"" ins "\" %c0 %c1 %c2 %c3" \
+ : : "i" (__FILE__), "i" (__LINE__), \
+ "i" (flags), \
"i" (sizeof(struct bug_entry))); \
} while (0)

#else /* !CONFIG_DEBUG_BUGVERBOSE */

+asm(".macro __BUG_FLAGS ins:req flags:req size:req\n"
+ "1:\t\\ins\n\t"
+ ".pushsection __bug_table,\"aw\"\n"
+ "2:\t" __BUG_REL(1b) "\t# bug_entry::bug_addr\n\t"
+ ".word \\flags" "\t# bug_entry::flags\n\t"
+ ".org 2b+\\size\n\t"
+ ".popsection\n\t"
+ ".endm");
+
#define _BUG_FLAGS(ins, flags) \
do { \
- asm volatile("1:\t" ins "\n" \
- ".pushsection __bug_table,\"aw\"\n" \
- "2:\t" __BUG_REL(1b) "\t# bug_entry::bug_addr\n" \
- "\t.word %c0" "\t# bug_entry::flags\n" \
- "\t.org 2b+%c1\n" \
- ".popsection" \
- : : "i" (flags), \
- "i" (sizeof(struct bug_entry))); \
+ asm volatile("__BUG_FLAGS \"" ins "\" %c0 %c1" \
+ : : "i" (flags), \
+ "i" (sizeof(struct bug_entry))); \
} while (0)

#endif /* CONFIG_DEBUG_BUGVERBOSE */
--
2.17.0


2018-05-17 23:30:12

by Nadav Amit

[permalink] [raw]
Subject: [PATCH 5/6] x86: refcount: prevent gcc distortions

GCC considers the number of statements in inlined assembly blocks,
according to new-lines and semicolons, as an indication to the cost of
the block in time and space. This data is distorted by the kernel code,
which puts information in alternative sections. As a result, the
compiler may perform incorrect inlining and branch optimizations.

The solution is to set an assembly macro and call it from the inlined
assembly block. As a result GCC considers the inline assembly block as
a single instruction.

This patch allows to inline functions such as __get_seccomp_filter().
The effect of the patch is as follows on the kernel size:

text data bss dec hex filename
18146418 10064100 2936832 31147350 1db4556 ./vmlinux before
18148228 10063968 2936832 31149028 1db4be4 ./vmlinux after (+1678)

Static text symbols:
Before: 39673
After: 39649 (-24)

Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: [email protected]
Cc: Kees Cook <[email protected]>
Cc: Jan Beulich <[email protected]>
Cc: Josh Poimboeuf <[email protected]>

Signed-off-by: Nadav Amit <[email protected]>
---
arch/x86/include/asm/refcount.h | 62 +++++++++++++++++++++------------
1 file changed, 39 insertions(+), 23 deletions(-)

diff --git a/arch/x86/include/asm/refcount.h b/arch/x86/include/asm/refcount.h
index 4cf11d88d3b3..3126141f2c80 100644
--- a/arch/x86/include/asm/refcount.h
+++ b/arch/x86/include/asm/refcount.h
@@ -13,35 +13,49 @@
* the exception handler to use (in mm/extable.c), and then triggers the
* central refcount exception. The fixup address for the exception points
* back to the regular execution flow in .text.
+ *
+ * The logic and data are encapsulated within an assembly macro, which is then
+ * called on each use. This hack is necessary to prevent GCC from considering
+ * the inline assembly blocks as costly in time and space, which can prevent
+ * function inlining and lead to other bad compilation decisions. GCC computes
+ * inline assembly cost according to the number of perceived number of assembly
+ * instruction, based on the number of new-lines and semicolons in the assembly
+ * block. The macros will eventually be compiled into a single instruction that
+ * will be actually executed (unless an exception happens). This scheme allows
+ * GCC to better understand the inline asm cost.
*/
-#define _REFCOUNT_EXCEPTION \
- ".pushsection .text..refcount\n" \
- "111:\tlea %[counter], %%" _ASM_CX "\n" \
- "112:\t" ASM_UD2 "\n" \
- ASM_UNREACHABLE \
- ".popsection\n" \
- "113:\n" \
- _ASM_EXTABLE_REFCOUNT(112b, 113b)
+asm(".macro __REFCOUNT_EXCEPTION counter:req\n\t"
+ ".pushsection .text..refcount\n"
+ "111:\tlea \\counter, %" _ASM_CX "\n"
+ "112:\t" ASM_UD2 "\n\t"
+ ASM_UNREACHABLE
+ ".popsection\n"
+ "113:\n\t"
+ _ASM_EXTABLE_REFCOUNT(112b, 113b) "\n\t"
+ ".endm");

/* Trigger refcount exception if refcount result is negative. */
-#define REFCOUNT_CHECK_LT_ZERO \
- "js 111f\n\t" \
- _REFCOUNT_EXCEPTION
+asm(".macro __REFCOUNT_CHECK_LT_ZERO counter:req\n\t"
+ "js 111f\n\t"
+ "__REFCOUNT_EXCEPTION \\counter\n\t"
+ ".endm");

/* Trigger refcount exception if refcount result is zero or negative. */
-#define REFCOUNT_CHECK_LE_ZERO \
- "jz 111f\n\t" \
- REFCOUNT_CHECK_LT_ZERO
+asm(".macro __REFCOUNT_CHECK_LE_ZERO counter:req\n\t"
+ "jz 111f\n\t"
+ "__REFCOUNT_CHECK_LT_ZERO counter=\\counter\n\t"
+ ".endm");

/* Trigger refcount exception unconditionally. */
-#define REFCOUNT_ERROR \
- "jmp 111f\n\t" \
- _REFCOUNT_EXCEPTION
+asm(".macro __REFCOUNT_ERROR counter:req\n\t"
+ "jmp 111f\n\t"
+ "__REFCOUNT_EXCEPTION counter=\\counter\n\t"
+ ".endm");

static __always_inline void refcount_add(unsigned int i, refcount_t *r)
{
asm volatile(LOCK_PREFIX "addl %1,%0\n\t"
- REFCOUNT_CHECK_LT_ZERO
+ "__REFCOUNT_CHECK_LT_ZERO %[counter]"
: [counter] "+m" (r->refs.counter)
: "ir" (i)
: "cc", "cx");
@@ -50,7 +64,7 @@ static __always_inline void refcount_add(unsigned int i, refcount_t *r)
static __always_inline void refcount_inc(refcount_t *r)
{
asm volatile(LOCK_PREFIX "incl %0\n\t"
- REFCOUNT_CHECK_LT_ZERO
+ "__REFCOUNT_CHECK_LT_ZERO %[counter]"
: [counter] "+m" (r->refs.counter)
: : "cc", "cx");
}
@@ -58,7 +72,7 @@ static __always_inline void refcount_inc(refcount_t *r)
static __always_inline void refcount_dec(refcount_t *r)
{
asm volatile(LOCK_PREFIX "decl %0\n\t"
- REFCOUNT_CHECK_LE_ZERO
+ "__REFCOUNT_CHECK_LE_ZERO %[counter]"
: [counter] "+m" (r->refs.counter)
: : "cc", "cx");
}
@@ -66,13 +80,15 @@ static __always_inline void refcount_dec(refcount_t *r)
static __always_inline __must_check
bool refcount_sub_and_test(unsigned int i, refcount_t *r)
{
- GEN_BINARY_SUFFIXED_RMWcc(LOCK_PREFIX "subl", REFCOUNT_CHECK_LT_ZERO,
+ GEN_BINARY_SUFFIXED_RMWcc(LOCK_PREFIX "subl",
+ "__REFCOUNT_CHECK_LT_ZERO %[counter]",
r->refs.counter, "er", i, "%0", e, "cx");
}

static __always_inline __must_check bool refcount_dec_and_test(refcount_t *r)
{
- GEN_UNARY_SUFFIXED_RMWcc(LOCK_PREFIX "decl", REFCOUNT_CHECK_LT_ZERO,
+ GEN_UNARY_SUFFIXED_RMWcc(LOCK_PREFIX "decl",
+ "__REFCOUNT_CHECK_LT_ZERO %[counter]",
r->refs.counter, "%0", e, "cx");
}

@@ -90,7 +106,7 @@ bool refcount_add_not_zero(unsigned int i, refcount_t *r)

/* Did we try to increment from/to an undesirable state? */
if (unlikely(c < 0 || c == INT_MAX || result < c)) {
- asm volatile(REFCOUNT_ERROR
+ asm volatile("__REFCOUNT_ERROR %[counter]"
: : [counter] "m" (r->refs.counter)
: "cc", "cx");
break;
--
2.17.0


2018-05-17 23:30:49

by Nadav Amit

[permalink] [raw]
Subject: [PATCH 6/6] x86: removing unneeded new-lines

GCC considers the number of statements in inlined assembly blocks,
according to new-lines and semicolons, as an indication to the cost of
the block in time and space. This data is distorted by the kernel code,
which puts information in alternative sections. As a result, the
compiler may perform incorrect inlining and branch optimizations.

This patch removes unneeded new-lines and semicolons to prevent such
distortion.

Functions such as nfs_io_completion_put() get inlined. Its overall
effect is not shown in the absolute numbers, but it seems to enable
slightly better inlining:

text data bss dec hex filename
18148228 10063968 2936832 31149028 1db4be4 ./vmlinux before
18148888 10064016 2936832 31149736 1db4ea8 ./vmlinux after (+708)

Static text symbols:
Before: 39649
After: 39650 (+1)

Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: [email protected]
Cc: Josh Poimboeuf <[email protected]>

Signed-off-by: Nadav Amit <[email protected]>
---
arch/x86/include/asm/asm.h | 4 ++--
arch/x86/include/asm/cmpxchg.h | 10 +++++-----
arch/x86/include/asm/special_insns.h | 12 ++++++------
3 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/arch/x86/include/asm/asm.h b/arch/x86/include/asm/asm.h
index 219faaec51df..571ceec97976 100644
--- a/arch/x86/include/asm/asm.h
+++ b/arch/x86/include/asm/asm.h
@@ -51,10 +51,10 @@
* The output operand must be type "bool".
*/
#ifdef __GCC_ASM_FLAG_OUTPUTS__
-# define CC_SET(c) "\n\t/* output condition code " #c "*/\n"
+# define CC_SET(c) "\n\t/* output condition code " #c "*/"
# define CC_OUT(c) "=@cc" #c
#else
-# define CC_SET(c) "\n\tset" #c " %[_cc_" #c "]\n"
+# define CC_SET(c) "\n\tset" #c " %[_cc_" #c "]"
# define CC_OUT(c) [_cc_ ## c] "=qm"
#endif

diff --git a/arch/x86/include/asm/cmpxchg.h b/arch/x86/include/asm/cmpxchg.h
index e3efd8a06066..2be9582fcb2e 100644
--- a/arch/x86/include/asm/cmpxchg.h
+++ b/arch/x86/include/asm/cmpxchg.h
@@ -44,22 +44,22 @@ extern void __add_wrong_size(void)
__typeof__ (*(ptr)) __ret = (arg); \
switch (sizeof(*(ptr))) { \
case __X86_CASE_B: \
- asm volatile (lock #op "b %b0, %1\n" \
+ asm volatile (lock #op "b %b0, %1" \
: "+q" (__ret), "+m" (*(ptr)) \
: : "memory", "cc"); \
break; \
case __X86_CASE_W: \
- asm volatile (lock #op "w %w0, %1\n" \
+ asm volatile (lock #op "w %w0, %1" \
: "+r" (__ret), "+m" (*(ptr)) \
: : "memory", "cc"); \
break; \
case __X86_CASE_L: \
- asm volatile (lock #op "l %0, %1\n" \
+ asm volatile (lock #op "l %0, %1" \
: "+r" (__ret), "+m" (*(ptr)) \
: : "memory", "cc"); \
break; \
case __X86_CASE_Q: \
- asm volatile (lock #op "q %q0, %1\n" \
+ asm volatile (lock #op "q %q0, %1" \
: "+r" (__ret), "+m" (*(ptr)) \
: : "memory", "cc"); \
break; \
@@ -134,7 +134,7 @@ extern void __add_wrong_size(void)
__raw_cmpxchg((ptr), (old), (new), (size), LOCK_PREFIX)

#define __sync_cmpxchg(ptr, old, new, size) \
- __raw_cmpxchg((ptr), (old), (new), (size), "lock; ")
+ __raw_cmpxchg((ptr), (old), (new), (size), "lock ")

#define __cmpxchg_local(ptr, old, new, size) \
__raw_cmpxchg((ptr), (old), (new), (size), "")
diff --git a/arch/x86/include/asm/special_insns.h b/arch/x86/include/asm/special_insns.h
index 317fc59b512c..9c56059aaf24 100644
--- a/arch/x86/include/asm/special_insns.h
+++ b/arch/x86/include/asm/special_insns.h
@@ -19,7 +19,7 @@ extern unsigned long __force_order;
static inline unsigned long native_read_cr0(void)
{
unsigned long val;
- asm volatile("mov %%cr0,%0\n\t" : "=r" (val), "=m" (__force_order));
+ asm volatile("mov %%cr0,%0" : "=r" (val), "=m" (__force_order));
return val;
}

@@ -31,7 +31,7 @@ static inline void native_write_cr0(unsigned long val)
static inline unsigned long native_read_cr2(void)
{
unsigned long val;
- asm volatile("mov %%cr2,%0\n\t" : "=r" (val), "=m" (__force_order));
+ asm volatile("mov %%cr2,%0" : "=r" (val), "=m" (__force_order));
return val;
}

@@ -43,7 +43,7 @@ static inline void native_write_cr2(unsigned long val)
static inline unsigned long __native_read_cr3(void)
{
unsigned long val;
- asm volatile("mov %%cr3,%0\n\t" : "=r" (val), "=m" (__force_order));
+ asm volatile("mov %%cr3,%0" : "=r" (val), "=m" (__force_order));
return val;
}

@@ -67,7 +67,7 @@ static inline unsigned long native_read_cr4(void)
: "=r" (val), "=m" (__force_order) : "0" (0));
#else
/* CR4 always exists on x86_64. */
- asm volatile("mov %%cr4,%0\n\t" : "=r" (val), "=m" (__force_order));
+ asm volatile("mov %%cr4,%0" : "=r" (val), "=m" (__force_order));
#endif
return val;
}
@@ -101,7 +101,7 @@ static inline u32 __read_pkru(void)
* "rdpkru" instruction. Places PKRU contents in to EAX,
* clears EDX and requires that ecx=0.
*/
- asm volatile(".byte 0x0f,0x01,0xee\n\t"
+ asm volatile(".byte 0x0f,0x01,0xee"
: "=a" (pkru), "=d" (edx)
: "c" (ecx));
return pkru;
@@ -115,7 +115,7 @@ static inline void __write_pkru(u32 pkru)
* "wrpkru" instruction. Loads contents in EAX to PKRU,
* requires that ecx = edx = 0.
*/
- asm volatile(".byte 0x0f,0x01,0xef\n\t"
+ asm volatile(".byte 0x0f,0x01,0xef"
: : "a" (pkru), "c"(ecx), "d"(edx));
}
#else
--
2.17.0


2018-05-17 23:31:01

by Nadav Amit

[permalink] [raw]
Subject: [PATCH 3/6] x86: alternative: macrofy locks for better inlining

GCC considers the number of statements in inlined assembly blocks,
according to new-lines and semicolons, as an indication to the cost of
the block in time and space. This data is distorted by the kernel code,
which puts information in alternative sections. As a result, the
compiler may perform incorrect inlining and branch optimizations.

The solution is to set an assembly macro and call it from the inlined
assembly block. As a result GCC considers the inline assembly block as
a single instruction.

This patch handles the LOCK prefix, allowing more aggresive inlining.

text data bss dec hex filename
18127205 10068388 2936832 31132425 1db0b09 ./vmlinux before
18131468 10068488 2936832 31136788 1db1c14 ./vmlinux after (+4363)

Static text symbols:
Before: 39860
After: 39788 (-72)

Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: [email protected]
Cc: Josh Poimboeuf <[email protected]>

Signed-off-by: Nadav Amit <[email protected]>
---
arch/x86/include/asm/alternative.h | 34 +++++++++++++++++++++++-------
1 file changed, 26 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/alternative.h b/arch/x86/include/asm/alternative.h
index 4cd6a3b71824..1dc47c9fd480 100644
--- a/arch/x86/include/asm/alternative.h
+++ b/arch/x86/include/asm/alternative.h
@@ -28,17 +28,35 @@
* The very common lock prefix is handled as special case in a
* separate table which is a pure address list without replacement ptr
* and size information. That keeps the table sizes small.
+ *
+ * Saving the lock data is encapsulated within an assembly macro, which is then
+ * called on each use. This hack is necessary to prevent GCC from considering
+ * the inline assembly blocks as costly in time and space, which can prevent
+ * function inlining and lead to other bad compilation decisions. GCC computes
+ * inline assembly cost according to the number of perceived number of assembly
+ * instruction, based on the number of new-lines and semicolons in the assembly
+ * block. The macro will eventually be compiled into a single instruction (and
+ * some data). This scheme allows GCC to better understand the inline asm cost.
*/

#ifdef CONFIG_SMP
-#define LOCK_PREFIX_HERE \
- ".pushsection .smp_locks,\"a\"\n" \
- ".balign 4\n" \
- ".long 671f - .\n" /* offset */ \
- ".popsection\n" \
- "671:"
-
-#define LOCK_PREFIX LOCK_PREFIX_HERE "\n\tlock; "
+
+asm(".macro __LOCK_PREFIX_HERE\n\t"
+ ".pushsection .smp_locks,\"a\"\n\t"
+ ".balign 4\n\t"
+ ".long 671f - .\n\t" /* offset */
+ ".popsection\n"
+ "671:\n\t"
+ ".endm");
+
+#define LOCK_PREFIX_HERE "__LOCK_PREFIX_HERE\n\t"
+
+asm(".macro __LOCK_PREFIX ins:vararg\n\t"
+ "__LOCK_PREFIX_HERE\n\t"
+ "lock; \\ins\n\t"
+ ".endm");
+
+#define LOCK_PREFIX "__LOCK_PREFIX "

#else /* ! CONFIG_SMP */
#define LOCK_PREFIX_HERE ""
--
2.17.0


2018-05-17 23:31:07

by Nadav Amit

[permalink] [raw]
Subject: [PATCH 1/6] x86: objtool: use asm macro for better compiler decisions

GCC considers the number of statements in inlined assembly blocks,
according to new-lines and semicolons, as an indication to the cost of
the block in time and space. This data is distorted by the kernel code,
which puts information in alternative sections. As a result, the
compiler may perform incorrect inlining and branch optimizations.

In the case of objtool, this distortion is extreme, since anyhow the
annotations of objtool are discarded during linkage.

The solution is to set an assembly macro and call it from the inlinedv
assembly block. As a result GCC considers the inline assembly block as
a single instruction.

This patch slightly increases the kernel size.

text data bss dec hex filename
18126699 10066728 2936832 31130259 1db0293 ./vmlinux before
18126824 10067268 2936832 31130924 1db052c ./vmlinux after (+665)

But allows more aggressive inlining. Static text symbols:
Before: 40033
After: 40015 (-18)

Cc: Christopher Li <[email protected]>
Cc: [email protected]

Signed-off-by: Nadav Amit <[email protected]>
---
include/linux/compiler.h | 37 +++++++++++++++++++++++++++++--------
1 file changed, 29 insertions(+), 8 deletions(-)

diff --git a/include/linux/compiler.h b/include/linux/compiler.h
index ab4711c63601..6cbabc6b195a 100644
--- a/include/linux/compiler.h
+++ b/include/linux/compiler.h
@@ -97,19 +97,40 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val,
* These macros help objtool understand GCC code flow for unreachable code.
* The __COUNTER__ based labels are a hack to make each instance of the macros
* unique, to convince GCC not to merge duplicate inline asm statements.
+ *
+ * The annotation logic is encapsulated within assembly macros, which are then
+ * called on each annotation. This hack is necessary to prevent GCC from
+ * considering the inline assembly blocks as costly in time and space, which can
+ * prevent function inlining and lead to other bad compilation decisions. GCC
+ * computes inline assembly cost according to the number of perceived number of
+ * assembly instruction, based on the number of new-lines and semicolons in the
+ * assembly block. Since the annotations will be discarded during linkage, the
+ * macros make the annotations to be considered "cheap" and let GCC to emit
+ * better code.
*/
+asm(".macro __annotate_reachable counter:req\n"
+ "\\counter:\n\t"
+ ".pushsection .discard.reachable\n\t"
+ ".long \\counter\\()b -.\n\t"
+ ".popsection\n\t"
+ ".endm");
+
#define annotate_reachable() ({ \
- asm volatile("%c0:\n\t" \
- ".pushsection .discard.reachable\n\t" \
- ".long %c0b - .\n\t" \
- ".popsection\n\t" : : "i" (__COUNTER__)); \
+ asm volatile("__annotate_reachable %c0" : : "i" (__COUNTER__)); \
})
+
+asm(".macro __annotate_unreachable counter:req\n"
+ "\\counter:\n\t"
+ ".pushsection .discard.unreachable\n\t"
+ ".long \\counter\\()b -.\n\t"
+ ".popsection\n\t"
+ ".endm");
+
#define annotate_unreachable() ({ \
- asm volatile("%c0:\n\t" \
- ".pushsection .discard.unreachable\n\t" \
- ".long %c0b - .\n\t" \
- ".popsection\n\t" : : "i" (__COUNTER__)); \
+ asm volatile("__annotate_unreachable %c0" : : \
+ "i" (__COUNTER__)); \
})
+
#define ASM_UNREACHABLE \
"999:\n\t" \
".pushsection .discard.unreachable\n\t" \
--
2.17.0


2018-05-18 07:59:29

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 2/6] x86: bug: prevent gcc distortions

On Thu, May 17, 2018 at 09:13:58AM -0700, Nadav Amit wrote:
> +asm(".macro __BUG_FLAGS ins:req file:req line:req flags:req size:req\n"
> + "1:\t \\ins\n\t"
> + ".pushsection __bug_table,\"aw\"\n"
> + "2:\t "__BUG_REL(1b) "\t# bug_entry::bug_addr\n\t"
> + __BUG_REL(\\file) "\t# bug_entry::file\n\t"
> + ".word \\line" "\t# bug_entry::line\n\t"
> + ".word \\flags" "\t# bug_entry::flags\n\t"
> + ".org 2b+\\size\n\t"
> + ".popsection\n\t"
> + ".endm");
> +
> +#define _BUG_FLAGS(ins, flags) \
> do { \
> + asm volatile("__BUG_FLAGS \"" ins "\" %c0 %c1 %c2 %c3" \
> + : : "i" (__FILE__), "i" (__LINE__), \
> + "i" (flags), \
> "i" (sizeof(struct bug_entry))); \
> } while (0)

This is an awesome hack, but is there really nothing we can do to make
it more readable? Esp, that global asm doing the macro definition is a
pain to read.

Also, can we pretty please used named operands in 'new' code?

2018-05-18 08:14:34

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 2/6] x86: bug: prevent gcc distortions


* Peter Zijlstra <[email protected]> wrote:

> On Thu, May 17, 2018 at 09:13:58AM -0700, Nadav Amit wrote:
> > +asm(".macro __BUG_FLAGS ins:req file:req line:req flags:req size:req\n"
> > + "1:\t \\ins\n\t"
> > + ".pushsection __bug_table,\"aw\"\n"
> > + "2:\t "__BUG_REL(1b) "\t# bug_entry::bug_addr\n\t"
> > + __BUG_REL(\\file) "\t# bug_entry::file\n\t"
> > + ".word \\line" "\t# bug_entry::line\n\t"
> > + ".word \\flags" "\t# bug_entry::flags\n\t"
> > + ".org 2b+\\size\n\t"
> > + ".popsection\n\t"
> > + ".endm");
> > +
> > +#define _BUG_FLAGS(ins, flags) \
> > do { \
> > + asm volatile("__BUG_FLAGS \"" ins "\" %c0 %c1 %c2 %c3" \
> > + : : "i" (__FILE__), "i" (__LINE__), \
> > + "i" (flags), \
> > "i" (sizeof(struct bug_entry))); \
> > } while (0)
>
> This is an awesome hack, but is there really nothing we can do to make
> it more readable? Esp, that global asm doing the macro definition is a
> pain to read.
>
> Also, can we pretty please used named operands in 'new' code?

Yes, that's my main worry too about all these inlining changes:
the very, very marked reduction in the readability of assembly code.

It's bad to an extent that I'm questioning the wisdom of pandering to a compiler
limitation to begin with?

How about asking GCC for an attribute where we can specify the inlined size of an
asm() function? Even if we'll just approximate it due to some vagaries of actual
code generation related to how arguments interact with GCC, an explicit byte value
will do a heck of a better job of it than some sort of implied, vague 'number of
newlines' heuristics ...

Thanks,

Ingo

2018-05-18 09:20:31

by David Laight

[permalink] [raw]
Subject: RE: [PATCH 0/6] Macrofying inline assembly for better compilation

From: Nadav Amit
> Sent: 17 May 2018 17:14
> This patch-set deals with an interesting yet stupid problem: kernel code
> that does not get inlined despite its simplicity. There are several
> causes for this behavior: "cold" attribute on __init, different function
> optimization levels; conditional constant computations based on
> __builtin_constant_p(); and finally large inline assembly blocks.
>
> This patch-set deals with the inline assembly problem. I separated these
> patches from the others (that were sent in the RFC) for easier
> inclusion.
>
> The problem with inline assembly is that inline assembly is often used
> by the kernel for things that are other than code - for example,
> assembly directives and data. GCC however is oblivious to the content of
> the blocks and assumes their cost in space and time is proportional to
> the number of the perceived assembly "instruction", according to the
> number of newlines and semicolons. Alternatives, paravirt and other
> mechanisms are affected, causing code not to be inlined, and degrading
> compilation quality in general.
>
> The solution that this patch-set carries for this problem is to create
> an assembly macro, and then call it from the inline assembly block. As
> a result, the compiler sees a single "instruction" and assigns the more
> appropriate cost to the code. In addition, this patch-set removes
> unneeded new-lines from common x86 inline asm's, which "confuse" GCC
> heuristics.

Can't you get the same effect by using always_inline ?

David


2018-05-18 10:13:25

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH 2/6] x86: bug: prevent gcc distortions

On Fri, May 18, 2018 at 10:13:54AM +0200, Ingo Molnar wrote:
> Yes, that's my main worry too about all these inlining changes:
> the very, very marked reduction in the readability of assembly code.

Same reaction here: the small improvements this brings is simply not
enough to sacrifice readability so much IMO.

--
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

2018-05-18 14:16:44

by Nadav Amit

[permalink] [raw]
Subject: Re: [PATCH 0/6] Macrofying inline assembly for better compilation

David Laight <[email protected]> wrote:

> From: Nadav Amit
>> Sent: 17 May 2018 17:14
>> This patch-set deals with an interesting yet stupid problem: kernel code
>> that does not get inlined despite its simplicity. There are several
>> causes for this behavior: "cold" attribute on __init, different function
>> optimization levels; conditional constant computations based on
>> __builtin_constant_p(); and finally large inline assembly blocks.
>>
>> This patch-set deals with the inline assembly problem. I separated these
>> patches from the others (that were sent in the RFC) for easier
>> inclusion.
>>
>> The problem with inline assembly is that inline assembly is often used
>> by the kernel for things that are other than code - for example,
>> assembly directives and data. GCC however is oblivious to the content of
>> the blocks and assumes their cost in space and time is proportional to
>> the number of the perceived assembly "instruction", according to the
>> number of newlines and semicolons. Alternatives, paravirt and other
>> mechanisms are affected, causing code not to be inlined, and degrading
>> compilation quality in general.
>>
>> The solution that this patch-set carries for this problem is to create
>> an assembly macro, and then call it from the inline assembly block. As
>> a result, the compiler sees a single "instruction" and assigns the more
>> appropriate cost to the code. In addition, this patch-set removes
>> unneeded new-lines from common x86 inline asm's, which "confuse" GCC
>> heuristics.
>
> Can't you get the same effect by using always_inline ?

I wanted and forgot to mention in the cover-letter why always_inline is not
a proper solution:

1. It is not easy to go over 400 functions and mark them as __always_inline.
Maintaining it afterwards (i.e., removing the __always_inline if the
function is changed and becomes “heavier") is even harder.

2. The kernel can be configured in a many ways, which would make
functions more “cheaper” or more “expensive”, so you cannot always
predetermine whether a function should be inlined.

3. If you mark a function __always_inline you can just cause the calling
function not to be inlined (when it should be inlined as well). It becomes
a whack-a-mole.

4. It is not only about inlining. The compiler also makes branch decisions
based on the perceived cost of the code, including inlined function.

Regards,
Nadav

2018-05-18 14:23:10

by Nadav Amit

[permalink] [raw]
Subject: Re: [PATCH 2/6] x86: bug: prevent gcc distortions

Peter Zijlstra <[email protected]> wrote:

> On Thu, May 17, 2018 at 09:13:58AM -0700, Nadav Amit wrote:
>> +asm(".macro __BUG_FLAGS ins:req file:req line:req flags:req size:req\n"
>> + "1:\t \\ins\n\t"
>> + ".pushsection __bug_table,\"aw\"\n"
>> + "2:\t "__BUG_REL(1b) "\t# bug_entry::bug_addr\n\t"
>> + __BUG_REL(\\file) "\t# bug_entry::file\n\t"
>> + ".word \\line" "\t# bug_entry::line\n\t"
>> + ".word \\flags" "\t# bug_entry::flags\n\t"
>> + ".org 2b+\\size\n\t"
>> + ".popsection\n\t"
>> + ".endm");
>> +
>> +#define _BUG_FLAGS(ins, flags) \
>> do { \
>> + asm volatile("__BUG_FLAGS \"" ins "\" %c0 %c1 %c2 %c3" \
>> + : : "i" (__FILE__), "i" (__LINE__), \
>> + "i" (flags), \
>> "i" (sizeof(struct bug_entry))); \
>> } while (0)
>
> This is an awesome hack, but is there really nothing we can do to make
> it more readable? Esp, that global asm doing the macro definition is a
> pain to read.
>
> Also, can we pretty please used named operands in 'new' code?

It is hard to make the code readable in C, readable in the generated asm,
and to follow the coding style imposed by checkpatch (e.g., no space between
the newline and the asm argument before it).

I considered wrapping the asm macro in a C macro, but AFAIK C macros cannot
emit backslashes.

I thought of suggesting to change “ins” into a vararg and removing the
escaped double-quotes in the C macro, but you ask to use named operands.

So I am out of ideas. Do you have anything else in mind?

Thanks,
Nadav

2018-05-18 14:32:33

by Nadav Amit

[permalink] [raw]
Subject: Re: [PATCH 2/6] x86: bug: prevent gcc distortions

Ingo Molnar <[email protected]> wrote:

>
> * Peter Zijlstra <[email protected]> wrote:
>
>> On Thu, May 17, 2018 at 09:13:58AM -0700, Nadav Amit wrote:
>>> +asm(".macro __BUG_FLAGS ins:req file:req line:req flags:req size:req\n"
>>> + "1:\t \\ins\n\t"
>>> + ".pushsection __bug_table,\"aw\"\n"
>>> + "2:\t "__BUG_REL(1b) "\t# bug_entry::bug_addr\n\t"
>>> + __BUG_REL(\\file) "\t# bug_entry::file\n\t"
>>> + ".word \\line" "\t# bug_entry::line\n\t"
>>> + ".word \\flags" "\t# bug_entry::flags\n\t"
>>> + ".org 2b+\\size\n\t"
>>> + ".popsection\n\t"
>>> + ".endm");
>>> +
>>> +#define _BUG_FLAGS(ins, flags) \
>>> do { \
>>> + asm volatile("__BUG_FLAGS \"" ins "\" %c0 %c1 %c2 %c3" \
>>> + : : "i" (__FILE__), "i" (__LINE__), \
>>> + "i" (flags), \
>>> "i" (sizeof(struct bug_entry))); \
>>> } while (0)
>>
>> This is an awesome hack, but is there really nothing we can do to make
>> it more readable? Esp, that global asm doing the macro definition is a
>> pain to read.
>>
>> Also, can we pretty please used named operands in 'new' code?
>
> Yes, that's my main worry too about all these inlining changes:
> the very, very marked reduction in the readability of assembly code.
>
> It's bad to an extent that I'm questioning the wisdom of pandering to a compiler
> limitation to begin with?
>
> How about asking GCC for an attribute where we can specify the inlined size of an
> asm() function? Even if we'll just approximate it due to some vagaries of actual
> code generation related to how arguments interact with GCC, an explicit byte value
> will do a heck of a better job of it than some sort of implied, vague 'number of
> newlines' heuristics ...

If it were to become a GCC feature, I think it is best to be a builtin that
says: consider the enclosed expression as “free”. The problem of poor
inlining decisions is not specific to inline asm. As I mentioned in the RFC,
when there are two code paths for constants and variables based on
__builtin_constant_p(), you can get the “cost” of the constant path for
variables.

It is not hard to add such a feature to GCC, but I don’t know how easy it is
to get new features into the compiler.

2018-05-18 14:36:50

by Nadav Amit

[permalink] [raw]
Subject: Re: [PATCH 2/6] x86: bug: prevent gcc distortions

Borislav Petkov <[email protected]> wrote:

> On Fri, May 18, 2018 at 10:13:54AM +0200, Ingo Molnar wrote:
>> Yes, that's my main worry too about all these inlining changes:
>> the very, very marked reduction in the readability of assembly code.
>
> Same reaction here: the small improvements this brings is simply not
> enough to sacrifice readability so much IMO.

I didn’t try too hard to find more affected (micro)benchmarks, but I am
pretty sure there are: pretty much all the paravirt ops are currently not
inlined, which can affect kernel operations that bang on the page-tables.

Notice that this is not an x86-specific issue. Other architectures, with
not-as-good OOOE for instance, might incur higher overheads. So I don’t
think that this issue should be ignored.

2018-05-18 15:43:03

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH 2/6] x86: bug: prevent gcc distortions

On Fri, May 18, 2018 at 02:36:21PM +0000, Nadav Amit wrote:
> I didn’t try too hard to find more affected (micro)benchmarks, but I am
> pretty sure there are:

So you being pretty sure there are, doesn't make me go, oh, ok, then,
this is an uglification we should try to live with. It is still ugly
with no proven benefit from it.

--
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

2018-05-18 15:48:32

by Nadav Amit

[permalink] [raw]
Subject: Re: [PATCH 2/6] x86: bug: prevent gcc distortions

Borislav Petkov <[email protected]> wrote:

> On Fri, May 18, 2018 at 02:36:21PM +0000, Nadav Amit wrote:
>> I didn’t try too hard to find more affected (micro)benchmarks, but I am
>> pretty sure there are:
>
> So you being pretty sure there are, doesn't make me go, oh, ok, then,
> this is an uglification we should try to live with. It is still ugly
> with no proven benefit from it.

In case you didn’t read the cover-letter: the patch-set does give a 2%
performance improvement for #PF-MADV_DONTNEED microbenchmark loop.

2018-05-18 15:56:35

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH 2/6] x86: bug: prevent gcc distortions

On Fri, May 18, 2018 at 03:46:33PM +0000, Nadav Amit wrote:
> In case you didn’t read the cover-letter: the patch-set does give a 2%
> performance improvement for #PF-MADV_DONTNEED microbenchmark loop.

I saw it but *micro*-benchmark doesn't tell me a whole lot. If that
"improvement" is not visible in real workloads/benchmarks, then I'm just
as unimpressed.

--
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

2018-05-18 16:26:04

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH 2/6] x86: bug: prevent gcc distortions

On Fri, May 18, 2018 at 12:59 AM Peter Zijlstra <[email protected]>
wrote:

> This is an awesome hack, but is there really nothing we can do to make
> it more readable? Esp, that global asm doing the macro definition is a
> pain to read.

I actually find that macro to be *more* legible than what we do now,
although I'm not enamored with the pseudo-operation name ("__BUG_FLAGS").

That said, the C header code itself I don't love.

I wonder if we should just introduce a new assembler header file, and get
it included when processing compiler-generated asm. We already do that for
our _real_ *.S files, with a number of our header files having constants
and code for the asm case too, not just C.

But we could have an <asm/asm-macro.h> header file that has these kinds of
macros (or "pseudo-instructions") for assembly language cases, and then we
could just rely on them in inline asm.

Because if you want to see illegible, look at what we currently generate:

# kernel/exit.c:1761: BUG();
#APP
# 1761 "kernel/exit.c" 1
1: .byte 0x0f, 0x0b
.pushsection __bug_table,"aw"
2: .long 1b - 2b # bug_entry::bug_addr
.long .LC0 - 2b # bug_entry::file #
.word 1761 # bug_entry::line #
.word 0 # bug_entry::flags #
.org 2b+12 #
.popsection
# 0 "" 2
# 1761 "kernel/exit.c" 1
180: #
.pushsection .discard.unreachable
.long 180b - . #
.popsection

# 0 "" 2
#NO_APP

and tell me that's legible.. Of course, I'm probably one of the few people
who actually look at the generated asm fairly regularly.

So a few macros that we can use in inline asm definitely wouldn't hurt
legibility. And if we actually can put them in a header file as legible
code - instead of having to wrap them in a global "asm()" macro in C code,
they'd probably be legible at a source level too.

It's not just the bug_flags cases. It's things like jump labels too:

# ./arch/x86/include/asm/jump_label.h:36: asm_volatile_goto("1:"
#APP
# 36 "./arch/x86/include/asm/jump_label.h" 1
1:.byte 0x0f,0x1f,0x44,0x00,0
.pushsection __jump_table, "aw"
.balign 8
.quad 1b, .L71, __tracepoint_sched_process_free+8 + 0 #,,
.popsection

# 0 "" 2
#NO_APP

and atomics:

# ./arch/x86/include/asm/atomic.h:122: GEN_UNARY_RMWcc(LOCK_PREFIX
"decl", v->counter, "%0", e);
#APP
# 122 "./arch/x86/include/asm/atomic.h" 1
.pushsection .smp_locks,"a"
.balign 4
.long 671f - .
.popsection
671:
lock; decl -2336(%rbp) # _7->counter
/* output condition code e*/

# 0 "" 2
# ./include/linux/sched/task.h:95: if (atomic_dec_and_test(&t->usage))
#NO_APP

where I suspect we could hide the whole "lock" magic in a macro, and make
this much more legible.

Maybe? I think it might be worth trying. It's possible that the macro games
themselves would just cause enough pain to make any gains go away.

Linus

2018-05-18 16:32:25

by Nadav Amit

[permalink] [raw]
Subject: Re: [PATCH 2/6] x86: bug: prevent gcc distortions

Borislav Petkov <[email protected]> wrote:

> On Fri, May 18, 2018 at 03:46:33PM +0000, Nadav Amit wrote:
>> In case you didn’t read the cover-letter: the patch-set does give a 2%
>> performance improvement for #PF-MADV_DONTNEED microbenchmark loop.
>
> I saw it but *micro*-benchmark doesn't tell me a whole lot. If that
> "improvement" is not visible in real workloads/benchmarks, then I'm just
> as unimpressed.

Funny. I found in my mailbox that you once wrote me: "It is a dumb idea, it
doesn't bring us anything besides some superficial readability which you
don't really need.”

To the point, I think you exaggerate with the effect of the patch on the
code readability: it is not much uglier than it was before, and the change
is in a very specific point in the code.

2018-05-18 17:26:31

by Nadav Amit

[permalink] [raw]
Subject: Re: [PATCH 2/6] x86: bug: prevent gcc distortions

Linus Torvalds <[email protected]> wrote:

> On Fri, May 18, 2018 at 12:59 AM Peter Zijlstra <[email protected]>
> wrote:
>
>> This is an awesome hack, but is there really nothing we can do to make
>> it more readable? Esp, that global asm doing the macro definition is a
>> pain to read.
>
> I actually find that macro to be *more* legible than what we do now,
> although I'm not enamored with the pseudo-operation name ("__BUG_FLAGS").
>
> That said, the C header code itself I don't love.
>
> I wonder if we should just introduce a new assembler header file, and get
> it included when processing compiler-generated asm. We already do that for
> our _real_ *.S files, with a number of our header files having constants
> and code for the asm case too, not just C.
>
> But we could have an <asm/asm-macro.h> header file that has these kinds of
> macros (or "pseudo-instructions") for assembly language cases, and then we
> could just rely on them in inline asm.

Will it be ok just to use a global inline asm to set an “.include” directive
that gas would later process? (I can probably wrap it in a C macro so it
won’t be too disgusting)

2018-05-18 17:42:04

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH 2/6] x86: bug: prevent gcc distortions

On May 18, 2018 6:29:59 PM GMT+02:00, Nadav Amit <[email protected]> wrote:
>Funny. I found in my mailbox that you once wrote me: "It is a dumb
>idea, it
>doesn't bring us anything besides some superficial readability which
>you
>don't really need.”

How about a proper quotation with the Message-id you're referring to?


--
Sent from a small device: formatting sux and brevity is inevitable.

2018-05-18 17:54:33

by Joe Perches

[permalink] [raw]
Subject: Re: [PATCH 2/6] x86: bug: prevent gcc distortions

On Fri, 2018-05-18 at 14:22 +0000, Nadav Amit wrote:
> It is hard to make the code readable in C, readable in the generated asm,
> and to follow the coding style imposed by checkpatch (e.g., no space between
> the newline and the asm argument before it).

Ignore checkpatch when you know better.


2018-05-18 18:26:23

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH 2/6] x86: bug: prevent gcc distortions

On Fri, May 18, 2018 at 10:24 AM Nadav Amit <[email protected]> wrote:

> Will it be ok just to use a global inline asm to set an “.include”
directive
> that gas would later process? (I can probably wrap it in a C macro so it
> won’t be too disgusting)

Maybe. I'd almost prefer it to be the automatic kind of thing that we
already do for C files using "-include".

Linus

2018-05-18 18:34:50

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH 2/6] x86: bug: prevent gcc distortions

On May 18, 2018 11:25:32 AM PDT, Linus Torvalds <[email protected]> wrote:
>On Fri, May 18, 2018 at 10:24 AM Nadav Amit <[email protected]> wrote:
>
>> Will it be ok just to use a global inline asm to set an “.include”
>directive
>> that gas would later process? (I can probably wrap it in a C macro so
>it
>> won’t be too disgusting)
>
>Maybe. I'd almost prefer it to be the automatic kind of thing that we
>already do for C files using "-include".
>
> Linus

Unfortunately gcc doesn't guarantee that global assembly inlines will appear at the top of the file.
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

2018-05-18 18:50:51

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH 2/6] x86: bug: prevent gcc distortions

On Fri, May 18, 2018 at 11:34 AM <[email protected]> wrote:

> On May 18, 2018 11:25:32 AM PDT, Linus Torvalds <
[email protected]> wrote:

> Unfortunately gcc doesn't guarantee that global assembly inlines will
appear at the top of the file.

Yeah. It really would be better to do the "asm version of -inline".

We already do something like that for real *.S files on some architectures
(because their assembly really wants it, eg

arch/arm/Makefile:
KBUILD_AFLAGS +=$(CFLAGS_ABI) $(AFLAGS_ISA) $(arch-y) $(tune-y) -include
asm/unified.h -msoft-float

but I do want to point out that KBUILD_AFLAGS is *not* used for
compiler-generated assembly, only for actual *.S files.

Sadly, I don't actually know any way to make gcc call the 'as' phase with
particular options. We can use "-Wa,xyzzy" to pass in xyzzy to the
assembler, but there is no "-include" option for GNU as afaik.

Can you perhaps define a macro symbol for "--defsym"? Probably not.

Linus

2018-05-18 18:54:19

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH 2/6] x86: bug: prevent gcc distortions

On May 18, 2018 11:50:12 AM PDT, Linus Torvalds <[email protected]> wrote:
>On Fri, May 18, 2018 at 11:34 AM <[email protected]> wrote:
>
>> On May 18, 2018 11:25:32 AM PDT, Linus Torvalds <
>[email protected]> wrote:
>
>> Unfortunately gcc doesn't guarantee that global assembly inlines will
>appear at the top of the file.
>
>Yeah. It really would be better to do the "asm version of -inline".
>
>We already do something like that for real *.S files on some
>architectures
>(because their assembly really wants it, eg
>
> arch/arm/Makefile:
>KBUILD_AFLAGS +=$(CFLAGS_ABI) $(AFLAGS_ISA) $(arch-y) $(tune-y)
>-include
>asm/unified.h -msoft-float
>
>but I do want to point out that KBUILD_AFLAGS is *not* used for
>compiler-generated assembly, only for actual *.S files.
>
>Sadly, I don't actually know any way to make gcc call the 'as' phase
>with
>particular options. We can use "-Wa,xyzzy" to pass in xyzzy to the
>assembler, but there is no "-include" option for GNU as afaik.
>
>Can you perhaps define a macro symbol for "--defsym"? Probably not.
>
> Linus

I looked at this thing a long time ago; it's not there, and the best would probably be to get global asm() working properly in gcc.
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

2018-05-18 19:05:24

by Nadav Amit

[permalink] [raw]
Subject: Re: [PATCH 2/6] x86: bug: prevent gcc distortions

[email protected] wrote:

> On May 18, 2018 11:50:12 AM PDT, Linus Torvalds <[email protected]> wrote:
>> On Fri, May 18, 2018 at 11:34 AM <[email protected]> wrote:
>>
>>> On May 18, 2018 11:25:32 AM PDT, Linus Torvalds <
>> [email protected]> wrote:
>>
>>> Unfortunately gcc doesn't guarantee that global assembly inlines will
>> appear at the top of the file.
>>
>> Yeah. It really would be better to do the "asm version of -inline".
>>
>> We already do something like that for real *.S files on some
>> architectures
>> (because their assembly really wants it, eg
>>
>> arch/arm/Makefile:
>> KBUILD_AFLAGS +=$(CFLAGS_ABI) $(AFLAGS_ISA) $(arch-y) $(tune-y)
>> -include
>> asm/unified.h -msoft-float
>>
>> but I do want to point out that KBUILD_AFLAGS is *not* used for
>> compiler-generated assembly, only for actual *.S files.
>>
>> Sadly, I don't actually know any way to make gcc call the 'as' phase
>> with
>> particular options. We can use "-Wa,xyzzy" to pass in xyzzy to the
>> assembler, but there is no "-include" option for GNU as afaik.
>>
>> Can you perhaps define a macro symbol for "--defsym"? Probably not.
>>
>> Linus
>
> I looked at this thing a long time ago; it's not there, and the best would probably be to get global asm() working properly in gcc.

I can add a -Wa,[filename.s] switch. It works, but sort of undocumented.


2018-05-18 19:06:24

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH 2/6] x86: bug: prevent gcc distortions

On May 18, 2018 12:02:50 PM PDT, Nadav Amit <[email protected]> wrote:
>[email protected] wrote:
>
>> On May 18, 2018 11:50:12 AM PDT, Linus Torvalds
><[email protected]> wrote:
>>> On Fri, May 18, 2018 at 11:34 AM <[email protected]> wrote:
>>>
>>>> On May 18, 2018 11:25:32 AM PDT, Linus Torvalds <
>>> [email protected]> wrote:
>>>
>>>> Unfortunately gcc doesn't guarantee that global assembly inlines
>will
>>> appear at the top of the file.
>>>
>>> Yeah. It really would be better to do the "asm version of -inline".
>>>
>>> We already do something like that for real *.S files on some
>>> architectures
>>> (because their assembly really wants it, eg
>>>
>>> arch/arm/Makefile:
>>> KBUILD_AFLAGS +=$(CFLAGS_ABI) $(AFLAGS_ISA) $(arch-y) $(tune-y)
>>> -include
>>> asm/unified.h -msoft-float
>>>
>>> but I do want to point out that KBUILD_AFLAGS is *not* used for
>>> compiler-generated assembly, only for actual *.S files.
>>>
>>> Sadly, I don't actually know any way to make gcc call the 'as' phase
>>> with
>>> particular options. We can use "-Wa,xyzzy" to pass in xyzzy to the
>>> assembler, but there is no "-include" option for GNU as afaik.
>>>
>>> Can you perhaps define a macro symbol for "--defsym"? Probably not.
>>>
>>> Linus
>>
>> I looked at this thing a long time ago; it's not there, and the best
>would probably be to get global asm() working properly in gcc.
>
>I can add a -Wa,[filename.s] switch. It works, but sort of
>undocumented.

Ah, I thought that would have assembled each file separately. We can request it be documented, that's usually much easier than getting a new feature implemented.
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

2018-05-18 19:12:10

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH 2/6] x86: bug: prevent gcc distortions

On Fri, May 18, 2018 at 12:02 PM Nadav Amit <[email protected]> wrote:

> I can add a -Wa,[filename.s] switch. It works, but sort of undocumented.

Oh, if it assembles things together, then that sounds optimal.

And yes, like hpa says, we should make sure that behavior is acknowledged
by the GNU as people, so that they then don't come back and say "hey, now
we assemble things as separate units".

That said, the "separate units" model really doesn't make much sense now
that I think about it, because 'as' supports only one output file. So I
guess the as people wouldn't have any issues with just accepting the
"concatenate all input files" syntax.

Linus

2018-05-18 19:18:40

by Nadav Amit

[permalink] [raw]
Subject: Re: [PATCH 2/6] x86: bug: prevent gcc distortions

Linus Torvalds <[email protected]> wrote:

> On Fri, May 18, 2018 at 12:02 PM Nadav Amit <[email protected]> wrote:
>
>> I can add a -Wa,[filename.s] switch. It works, but sort of undocumented.
>
> Oh, if it assembles things together, then that sounds optimal.
>
> And yes, like hpa says, we should make sure that behavior is acknowledged
> by the GNU as people, so that they then don't come back and say "hey, now
> we assemble things as separate units".
>
> That said, the "separate units" model really doesn't make much sense now
> that I think about it, because 'as' supports only one output file. So I
> guess the as people wouldn't have any issues with just accepting the
> "concatenate all input files" syntax.

Gnu ASM manual says: "Each time you run as it assembles exactly one source
program. The source program is made up of one or more files.”

2018-05-18 19:21:52

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH 2/6] x86: bug: prevent gcc distortions

On Fri, May 18, 2018 at 12:18 PM Nadav Amit <[email protected]> wrote:

> Gnu ASM manual says: "Each time you run as it assembles exactly one source
> program. The source program is made up of one or more files.”

Ok, that counts as documentation, although it's confusing as hell. Is it
one source program or multiple files again? I see what they are trying to
say, but it really could be phrased more clearly ;)

Linus

2018-05-18 19:25:01

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH 2/6] x86: bug: prevent gcc distortions

On May 18, 2018 12:21:00 PM PDT, Linus Torvalds <[email protected]> wrote:
>On Fri, May 18, 2018 at 12:18 PM Nadav Amit <[email protected]> wrote:
>
>> Gnu ASM manual says: "Each time you run as it assembles exactly one
>source
>> program. The source program is made up of one or more files.”
>
>Ok, that counts as documentation, although it's confusing as hell. Is
>it
>one source program or multiple files again? I see what they are trying
>to
>say, but it really could be phrased more clearly ;)
>
> Linus

At least we have a solution!
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

2018-05-18 19:36:48

by Nadav Amit

[permalink] [raw]
Subject: Re: [PATCH 2/6] x86: bug: prevent gcc distortions

[email protected] wrote:

> On May 18, 2018 12:21:00 PM PDT, Linus Torvalds <[email protected]> wrote:
>> On Fri, May 18, 2018 at 12:18 PM Nadav Amit <[email protected]> wrote:
>>
>>> Gnu ASM manual says: "Each time you run as it assembles exactly one
>> source
>>> program. The source program is made up of one or more files.”
>>
>> Ok, that counts as documentation, although it's confusing as hell. Is
>> it
>> one source program or multiple files again? I see what they are trying
>> to
>> say, but it really could be phrased more clearly ;)
>>
>> Linus
>
> At least we have a solution!

Thanks for all your help. I will try to do it over over the weekend.

Funny, I see that this “trick” (-Wa,[filename.s]) is already being used to
include arch/x86/boot/code16gcc.h so I feel “safer” to use it.

Regards,
Nadav

2018-05-18 19:42:34

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH 2/6] x86: bug: prevent gcc distortions

On May 18, 2018 12:36:20 PM PDT, Nadav Amit <[email protected]> wrote:
>[email protected] wrote:
>
>> On May 18, 2018 12:21:00 PM PDT, Linus Torvalds
><[email protected]> wrote:
>>> On Fri, May 18, 2018 at 12:18 PM Nadav Amit <[email protected]>
>wrote:
>>>
>>>> Gnu ASM manual says: "Each time you run as it assembles exactly one
>>> source
>>>> program. The source program is made up of one or more files.”
>>>
>>> Ok, that counts as documentation, although it's confusing as hell.
>Is
>>> it
>>> one source program or multiple files again? I see what they are
>trying
>>> to
>>> say, but it really could be phrased more clearly ;)
>>>
>>> Linus
>>
>> At least we have a solution!
>
>Thanks for all your help. I will try to do it over over the weekend.
>
>Funny, I see that this “trick” (-Wa,[filename.s]) is already being used
>to
>include arch/x86/boot/code16gcc.h so I feel “safer” to use it.
>
>Regards,
>Nadav

Oh. I don't remember when we did that... might even be my doing and then I just forgot about it :)
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

2018-05-19 04:30:41

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH 5/6] x86: refcount: prevent gcc distortions

Hi Nadav,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on tip/auto-latest]
[also build test ERROR on v4.17-rc5 next-20180517]
[cannot apply to tip/x86/core]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url: https://github.com/0day-ci/linux/commits/Nadav-Amit/x86-objtool-use-asm-macro-for-better-compiler-decisions/20180519-045439
config: x86_64-allyesdebian (attached as .config)
compiler: gcc-7 (Debian 7.3.0-16) 7.3.0
reproduce:
# save the attached .config to linux build tree
make ARCH=x86_64

All errors (new ones prefixed by >>):

arch/x86/include/asm/refcount.h: Assembler messages:
>> arch/x86/include/asm/refcount.h:67: Error: too many positional arguments

vim +67 arch/x86/include/asm/refcount.h

63
64 static __always_inline void refcount_inc(refcount_t *r)
65 {
66 asm volatile(LOCK_PREFIX "incl %0\n\t"
> 67 "__REFCOUNT_CHECK_LT_ZERO %[counter]"
68 : [counter] "+m" (r->refs.counter)
69 : : "cc", "cx");
70 }
71

---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation


Attachments:
(No filename) (1.25 kB)
.config.gz (38.50 kB)
Download all attachments