2018-06-04 18:37:22

by Nadav Amit

[permalink] [raw]
Subject: [PATCH v2 0/9] x86: macrofying inline asm for better compilation

This patch-set deals with an interesting yet stupid problem: kernel code
that does not get inlined despite its simplicity. There are several
causes for this behavior: "cold" attribute on __init, different function
optimization levels; conditional constant computations based on
__builtin_constant_p(); and finally large inline assembly blocks.

This patch-set deals with the inline assembly problem. I separated these
patches from the others (that were sent in the RFC) for easier
inclusion. I also separated the removal of unnecessary new-lines which
would be sent separately.

The problem with inline assembly is that inline assembly is often used
by the kernel for things that are other than code - for example,
assembly directives and data. GCC however is oblivious to the content of
the blocks and assumes their cost in space and time is proportional to
the number of the perceived assembly "instruction", according to the
number of newlines and semicolons. Alternatives, paravirt and other
mechanisms are affected, causing code not to be inlined, and degrading
compilation quality in general.

The solution that this patch-set carries for this problem is to create
an assembly macro, and then call it from the inline assembly block. As
a result, the compiler sees a single "instruction" and assigns the more
appropriate cost to the code.

To avoid uglification of the code, as many noted, the macros are first
precompiled into an assembly file, which is later assembled together
with the the C files. This also enables to avoid duplicate
implementation that was set before for the asm and C code. This can be
seen in the exception table changes.

Overall this patch-set slightly increases the kernel size (my build was
done using my Ubuntu 18.04 config + localyesconfig for the record):

text data bss dec hex filename
18140829 10224724 2957312 31322865 1ddf2f1 ./vmlinux before
18163608 10227348 2957312 31348268 1de562c ./vmlinux after (+0.1%)

The number of static functions in the image is reduced by 379, but
actually inlining is even better, which does not always shows in these
numbers: a function may be inlined causing the calling function not to
be inlined.

The Makefile stuff may not be too clean. Ideas for improvements are
welcome.

v1->v2: * Compiling the macros into a separate .s file, improving
readability (Linus)
* Improving assembly formatting, applying most of the comments
according to my judgment (Jan)
* Adding exception-table, cpufeature and jump-labels
* Removing new-line cleanup; to be submitted separately

Cc: Alok Kataria <[email protected]>
Cc: Christopher Li <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jan Beulich <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: Kate Stewart <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: [email protected]
Cc: Peter Zijlstra <[email protected]>
Cc: Philippe Ombredanne <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: Linus Torvalds <[email protected]>
Cc: [email protected]

Nadav Amit (9):
Makefile: Prepare for using macros for inline asm
x86: objtool: use asm macro for better compiler decisions
x86: refcount: prevent gcc distortions
x86: alternatives: macrofy locks for better inlining
x86: bug: prevent gcc distortions
x86: prevent inline distortion by paravirt ops
x86: extable: use macros instead of inline assembly
x86: cpufeature: use macros instead of inline assembly
x86: jump-labels: use macros instead of inline assembly

Makefile | 9 ++-
arch/x86/Makefile | 11 ++-
arch/x86/include/asm/alternative-asm.h | 20 ++++--
arch/x86/include/asm/alternative.h | 16 +----
arch/x86/include/asm/asm.h | 61 +++++++---------
arch/x86/include/asm/bug.h | 98 +++++++++++++++-----------
arch/x86/include/asm/cpufeature.h | 82 ++++++++++++---------
arch/x86/include/asm/jump_label.h | 65 ++++++++++-------
arch/x86/include/asm/paravirt_types.h | 54 +++++++-------
arch/x86/include/asm/refcount.h | 73 +++++++++++--------
arch/x86/kernel/Makefile | 6 ++
arch/x86/kernel/macros.S | 16 +++++
include/linux/compiler.h | 60 ++++++++++++----
scripts/Kbuild.include | 4 +-
14 files changed, 346 insertions(+), 229 deletions(-)
create mode 100644 arch/x86/kernel/macros.S

--
2.17.0



2018-06-04 18:37:50

by Nadav Amit

[permalink] [raw]
Subject: [PATCH v2 5/9] x86: bug: prevent gcc distortions

GCC considers the number of statements in inlined assembly blocks,
according to new-lines and semicolons, as an indication to the cost of
the block in time and space. This data is distorted by the kernel code,
which puts information in alternative sections. As a result, the
compiler may perform incorrect inlining and branch optimizations.

The solution is to set an assembly macro and call it from the inlinedv
assembly block. As a result GCC considers the inline assembly block as
a single instruction.

This patch increases the kernel size:

text data bss dec hex filename
18146889 10225380 2957312 31329581 1de0d2d ./vmlinux before
18147336 10226688 2957312 31331336 1de1408 ./vmlinux after (+1755)

But enables more aggressive inlining (and probably branch decisions).
The number of static text symbols in vmlinux is lower.

Static text symbols:
Before: 40218
After: 40053 (-165)

Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: [email protected]
Cc: Peter Zijlstra <[email protected]>
Cc: Josh Poimboeuf <[email protected]>

Signed-off-by: Nadav Amit <[email protected]>
---
arch/x86/include/asm/bug.h | 98 ++++++++++++++++++++++----------------
arch/x86/kernel/macros.S | 1 +
2 files changed, 57 insertions(+), 42 deletions(-)

diff --git a/arch/x86/include/asm/bug.h b/arch/x86/include/asm/bug.h
index 6804d6642767..5090035e6d16 100644
--- a/arch/x86/include/asm/bug.h
+++ b/arch/x86/include/asm/bug.h
@@ -4,6 +4,8 @@

#include <linux/stringify.h>

+#ifndef __ASSEMBLY__
+
/*
* Despite that some emulators terminate on UD2, we use it for WARN().
*
@@ -20,53 +22,15 @@

#define LEN_UD2 2

-#ifdef CONFIG_GENERIC_BUG
-
-#ifdef CONFIG_X86_32
-# define __BUG_REL(val) ".long " __stringify(val)
-#else
-# define __BUG_REL(val) ".long " __stringify(val) " - 2b"
-#endif
-
-#ifdef CONFIG_DEBUG_BUGVERBOSE
-
-#define _BUG_FLAGS(ins, flags) \
-do { \
- asm volatile("1:\t" ins "\n" \
- ".pushsection __bug_table,\"aw\"\n" \
- "2:\t" __BUG_REL(1b) "\t# bug_entry::bug_addr\n" \
- "\t" __BUG_REL(%c0) "\t# bug_entry::file\n" \
- "\t.word %c1" "\t# bug_entry::line\n" \
- "\t.word %c2" "\t# bug_entry::flags\n" \
- "\t.org 2b+%c3\n" \
- ".popsection" \
- : : "i" (__FILE__), "i" (__LINE__), \
- "i" (flags), \
- "i" (sizeof(struct bug_entry))); \
-} while (0)
-
-#else /* !CONFIG_DEBUG_BUGVERBOSE */
-
#define _BUG_FLAGS(ins, flags) \
do { \
- asm volatile("1:\t" ins "\n" \
- ".pushsection __bug_table,\"aw\"\n" \
- "2:\t" __BUG_REL(1b) "\t# bug_entry::bug_addr\n" \
- "\t.word %c0" "\t# bug_entry::flags\n" \
- "\t.org 2b+%c1\n" \
- ".popsection" \
- : : "i" (flags), \
+ asm volatile("ASM_BUG ins=\"" ins "\" file=%c0 line=%c1 " \
+ "flags=%c2 size=%c3" \
+ : : "i" (__FILE__), "i" (__LINE__), \
+ "i" (flags), \
"i" (sizeof(struct bug_entry))); \
} while (0)

-#endif /* CONFIG_DEBUG_BUGVERBOSE */
-
-#else
-
-#define _BUG_FLAGS(ins, flags) asm volatile(ins)
-
-#endif /* CONFIG_GENERIC_BUG */
-
#define HAVE_ARCH_BUG
#define BUG() \
do { \
@@ -82,4 +46,54 @@ do { \

#include <asm-generic/bug.h>

+#else /* __ASSEMBLY__ */
+
+#ifdef CONFIG_GENERIC_BUG
+
+#ifdef CONFIG_X86_32
+.macro __BUG_REL val:req
+ .long \val
+.endm
+#else
+.macro __BUG_REL val:req
+ .long \val - 2b
+.endm
+#endif
+
+#ifdef CONFIG_DEBUG_BUGVERBOSE
+
+.macro ASM_BUG ins:req file:req line:req flags:req size:req
+1: \ins
+ .pushsection __bug_table,"aw"
+2: __BUG_REL val=1b # bug_entry::bug_addr
+ __BUG_REL val=\file # bug_entry::file
+ .word \line # bug_entry::line
+ .word \flags # bug_entry::flags
+ .org 2b+\size
+ .popsection
+.endm
+
+#else /* !CONFIG_DEBUG_BUGVERBOSE */
+
+.macro ASM_BUG ins:req file:req line:req flags:req size:req
+1: \ins
+ .pushsection __bug_table,"aw"
+2: __BUG_REL val=1b # bug_entry::bug_addr
+ .word \flags # bug_entry::flags
+ .org 2b+\size
+ .popsection
+.endm
+
+#endif /* CONFIG_DEBUG_BUGVERBOSE */
+
+#else /* CONFIG_GENERIC_BUG */
+
+.macro ASM_BUG ins:req file:req line:req flags:req size:req
+ \ins
+.endm
+
+#endif /* CONFIG_GENERIC_BUG */
+
+#endif /* __ASSEMBLY__ */
+
#endif /* _ASM_X86_BUG_H */
diff --git a/arch/x86/kernel/macros.S b/arch/x86/kernel/macros.S
index 852487a9fc56..66ccb8e823b1 100644
--- a/arch/x86/kernel/macros.S
+++ b/arch/x86/kernel/macros.S
@@ -9,3 +9,4 @@
#include <linux/compiler.h>
#include <asm/refcount.h>
#include <asm/alternative-asm.h>
+#include <asm/bug.h>
--
2.17.0


2018-06-04 18:38:30

by Nadav Amit

[permalink] [raw]
Subject: [PATCH v2 9/9] x86: jump-labels: use macros instead of inline assembly

Use assembly macros for jump-labels and call them from inline assembly.
This not only makes the code more readable, but also improves
compilation decision, specifically inline decisions which GCC base on
the number of new lines in inline assembly.

As a result the code size is slightly increased.

text data bss dec hex filename
18163528 10226300 2957312 31347140 1de51c4 ./vmlinux before
18163608 10227348 2957312 31348268 1de562c ./vmlinux after (+1128)

And functions such as intel_pstate_adjust_policy_max(),
kvm_cpu_accept_dm_intr(), kvm_register_read() are inlined.

Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: [email protected]
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Kate Stewart <[email protected]>
Cc: Philippe Ombredanne <[email protected]>

Signed-off-by: Nadav Amit <[email protected]>
---
arch/x86/include/asm/jump_label.h | 65 ++++++++++++++++++-------------
arch/x86/kernel/macros.S | 1 +
2 files changed, 39 insertions(+), 27 deletions(-)

diff --git a/arch/x86/include/asm/jump_label.h b/arch/x86/include/asm/jump_label.h
index 8c0de4282659..ea0633a41122 100644
--- a/arch/x86/include/asm/jump_label.h
+++ b/arch/x86/include/asm/jump_label.h
@@ -2,19 +2,6 @@
#ifndef _ASM_X86_JUMP_LABEL_H
#define _ASM_X86_JUMP_LABEL_H

-#ifndef HAVE_JUMP_LABEL
-/*
- * For better or for worse, if jump labels (the gcc extension) are missing,
- * then the entire static branch patching infrastructure is compiled out.
- * If that happens, the code in here will malfunction. Raise a compiler
- * error instead.
- *
- * In theory, jump labels and the static branch patching infrastructure
- * could be decoupled to fix this.
- */
-#error asm/jump_label.h included on a non-jump-label kernel
-#endif
-
#define JUMP_LABEL_NOP_SIZE 5

#ifdef CONFIG_X86_64
@@ -28,18 +15,27 @@

#ifndef __ASSEMBLY__

+#ifndef HAVE_JUMP_LABEL
+/*
+ * For better or for worse, if jump labels (the gcc extension) are missing,
+ * then the entire static branch patching infrastructure is compiled out.
+ * If that happens, the code in here will malfunction. Raise a compiler
+ * error instead.
+ *
+ * In theory, jump labels and the static branch patching infrastructure
+ * could be decoupled to fix this.
+ */
+#error asm/jump_label.h included on a non-jump-label kernel
+#endif
+
#include <linux/stringify.h>
#include <linux/types.h>

static __always_inline bool arch_static_branch(struct static_key *key, bool branch)
{
- asm_volatile_goto("1:"
- ".byte " __stringify(STATIC_KEY_INIT_NOP) "\n\t"
- ".pushsection __jump_table, \"aw\" \n\t"
- _ASM_ALIGN "\n\t"
- _ASM_PTR "1b, %l[l_yes], %c0 + %c1 \n\t"
- ".popsection \n\t"
- : : "i" (key), "i" (branch) : : l_yes);
+ asm_volatile_goto("STATIC_BRANCH_GOTO l_yes=\"%l[l_yes]\" key=\"%c0\" "
+ "branch=\"%c1\""
+ : : "i" (key), "i" (branch) : : l_yes);

return false;
l_yes:
@@ -48,13 +44,8 @@ static __always_inline bool arch_static_branch(struct static_key *key, bool bran

static __always_inline bool arch_static_branch_jump(struct static_key *key, bool branch)
{
- asm_volatile_goto("1:"
- ".byte 0xe9\n\t .long %l[l_yes] - 2f\n\t"
- "2:\n\t"
- ".pushsection __jump_table, \"aw\" \n\t"
- _ASM_ALIGN "\n\t"
- _ASM_PTR "1b, %l[l_yes], %c0 + %c1 \n\t"
- ".popsection \n\t"
+ asm_volatile_goto("STATIC_BRANCH_JUMP_GOTO l_yes=\"%l[l_yes]\" key=\"%c0\" "
+ "branch=\"%c1\""
: : "i" (key), "i" (branch) : : l_yes);

return false;
@@ -108,6 +99,26 @@ struct jump_entry {
.popsection
.endm

+.macro STATIC_BRANCH_GOTO l_yes:req key:req branch:req
+1:
+ .byte STATIC_KEY_INIT_NOP
+ .pushsection __jump_table, "aw"
+ _ASM_ALIGN
+ _ASM_PTR 1b, \l_yes, \key + \branch
+ .popsection
+.endm
+
+.macro STATIC_BRANCH_JUMP_GOTO l_yes:req key:req branch:req
+1:
+ .byte 0xe9
+ .long \l_yes - 2f
+2:
+ .pushsection __jump_table, "aw"
+ _ASM_ALIGN
+ _ASM_PTR 1b, \l_yes, \key + \branch
+ .popsection
+.endm
+
#endif /* __ASSEMBLY__ */

#endif
diff --git a/arch/x86/kernel/macros.S b/arch/x86/kernel/macros.S
index bf8b9c93e255..161c95059044 100644
--- a/arch/x86/kernel/macros.S
+++ b/arch/x86/kernel/macros.S
@@ -13,3 +13,4 @@
#include <asm/paravirt.h>
#include <asm/asm.h>
#include <asm/cpufeature.h>
+#include <asm/jump_label.h>
--
2.17.0


2018-06-04 18:38:34

by Nadav Amit

[permalink] [raw]
Subject: [PATCH v2 8/9] x86: cpufeature: use macros instead of inline assembly

Use assembly macros for static_cpu_has() and call them from inline
assembly. This not only makes the code more readable, but also improves
compilation decision, specifically inline decisions which GCC base on
the number of new lines in inline assembly.

The patch slightly increases the kernel size:

text data bss dec hex filename
18162879 10226256 2957312 31346447 1de4f0f ./vmlinux before
18163528 10226300 2957312 31347140 1de51c4 ./vmlinux after (+693)

And enables the inlining of function such as free_ldt_pgtables().

Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: [email protected]
Cc: Peter Zijlstra <[email protected]>

Signed-off-by: Nadav Amit <[email protected]>
---
arch/x86/include/asm/cpufeature.h | 82 ++++++++++++++++++-------------
arch/x86/kernel/macros.S | 1 +
2 files changed, 48 insertions(+), 35 deletions(-)

diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h
index b27da9602a6d..33e45dfb211a 100644
--- a/arch/x86/include/asm/cpufeature.h
+++ b/arch/x86/include/asm/cpufeature.h
@@ -2,10 +2,10 @@
#ifndef _ASM_X86_CPUFEATURE_H
#define _ASM_X86_CPUFEATURE_H

-#include <asm/processor.h>
-
-#if defined(__KERNEL__) && !defined(__ASSEMBLY__)
+#ifdef __KERNEL__
+#ifndef __ASSEMBLY__

+#include <asm/processor.h>
#include <asm/asm.h>
#include <linux/bitops.h>

@@ -147,37 +147,10 @@ extern void clear_cpu_cap(struct cpuinfo_x86 *c, unsigned int bit);
*/
static __always_inline __pure bool _static_cpu_has(u16 bit)
{
- asm_volatile_goto("1: jmp 6f\n"
- "2:\n"
- ".skip -(((5f-4f) - (2b-1b)) > 0) * "
- "((5f-4f) - (2b-1b)),0x90\n"
- "3:\n"
- ".section .altinstructions,\"a\"\n"
- " .long 1b - .\n" /* src offset */
- " .long 4f - .\n" /* repl offset */
- " .word %P[always]\n" /* always replace */
- " .byte 3b - 1b\n" /* src len */
- " .byte 5f - 4f\n" /* repl len */
- " .byte 3b - 2b\n" /* pad len */
- ".previous\n"
- ".section .altinstr_replacement,\"ax\"\n"
- "4: jmp %l[t_no]\n"
- "5:\n"
- ".previous\n"
- ".section .altinstructions,\"a\"\n"
- " .long 1b - .\n" /* src offset */
- " .long 0\n" /* no replacement */
- " .word %P[feature]\n" /* feature bit */
- " .byte 3b - 1b\n" /* src len */
- " .byte 0\n" /* repl len */
- " .byte 0\n" /* pad len */
- ".previous\n"
- ".section .altinstr_aux,\"ax\"\n"
- "6:\n"
- " testb %[bitnum],%[cap_byte]\n"
- " jnz %l[t_yes]\n"
- " jmp %l[t_no]\n"
- ".previous\n"
+ asm_volatile_goto("STATIC_CPU_HAS bitnum=%[bitnum] "
+ "cap_byte=\"%[cap_byte]\" "
+ "feature=%P[feature] t_yes=%l[t_yes] "
+ "t_no=%l[t_no] always=%P[always]"
: : [feature] "i" (bit),
[always] "i" (X86_FEATURE_ALWAYS),
[bitnum] "i" (1 << (bit & 7)),
@@ -211,5 +184,44 @@ static __always_inline __pure bool _static_cpu_has(u16 bit)
#define CPU_FEATURE_TYPEVAL boot_cpu_data.x86_vendor, boot_cpu_data.x86, \
boot_cpu_data.x86_model

-#endif /* defined(__KERNEL__) && !defined(__ASSEMBLY__) */
+#else /* __ASSEMBLY__ */
+
+.macro STATIC_CPU_HAS bitnum:req cap_byte:req feature:req t_yes:req t_no:req always:req
+1:
+ jmp 6f
+2:
+ .skip -(((5f-4f) - (2b-1b)) > 0) * ((5f-4f) - (2b-1b)),0x90
+3:
+ .section .altinstructions,"a"
+ .long 1b - . /* src offset */
+ .long 4f - . /* repl offset */
+ .word \always /* always replace */
+ .byte 3b - 1b /* src len */
+ .byte 5f - 4f /* repl len */
+ .byte 3b - 2b /* pad len */
+ .previous
+ .section .altinstr_replacement,"ax"
+4:
+ jmp \t_no
+5:
+ .previous
+ .section .altinstructions,"a"
+ .long 1b - . /* src offset */
+ .long 0 /* no replacement */
+ .word \feature /* feature bit */
+ .byte 3b - 1b /* src len */
+ .byte 0 /* repl len */
+ .byte 0 /* pad len */
+ .previous
+ .section .altinstr_aux,"ax"
+6:
+ testb \bitnum,\cap_byte
+ jnz \t_yes
+ jmp \t_no
+ .previous
+.endm
+
+#endif /* __ASSEMBLY__ */
+
+#endif /* __KERNEL__ */
#endif /* _ASM_X86_CPUFEATURE_H */
diff --git a/arch/x86/kernel/macros.S b/arch/x86/kernel/macros.S
index 7baa40d5bf16..bf8b9c93e255 100644
--- a/arch/x86/kernel/macros.S
+++ b/arch/x86/kernel/macros.S
@@ -12,3 +12,4 @@
#include <asm/bug.h>
#include <asm/paravirt.h>
#include <asm/asm.h>
+#include <asm/cpufeature.h>
--
2.17.0


2018-06-04 18:39:06

by Nadav Amit

[permalink] [raw]
Subject: [PATCH v2 6/9] x86: prevent inline distortion by paravirt ops

GCC considers the number of statements in inlined assembly blocks,
according to new-lines and semicolons, as an indication to the cost of
the block in time and space. This data is distorted by the kernel code,
which puts information in alternative sections. As a result, the
compiler may perform incorrect inlining and branch optimizations.

The solution is to set an assembly macro and call it from the inlined
assembly block. As a result GCC considers the inline assembly block as
a single instruction.

The effect of the patch is a more aggressive inlining, which also
causes a size increase of kernel.

text data bss dec hex filename
18147336 10226688 2957312 31331336 1de1408 ./vmlinux before
18162555 10226288 2957312 31346155 1de4deb ./vmlinux after (+14819)

Static text symbols:
Before: 40053
After: 39942 (-111)

Cc: Juergen Gross <[email protected]>
Cc: Alok Kataria <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: [email protected]
Cc: [email protected]

Signed-off-by: Nadav Amit <[email protected]>
---
arch/x86/include/asm/paravirt_types.h | 54 +++++++++++++++------------
arch/x86/kernel/macros.S | 1 +
2 files changed, 31 insertions(+), 24 deletions(-)

diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h
index 180bc0bff0fb..2a9c53f64f1a 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -347,19 +347,15 @@ extern struct pv_lock_ops pv_lock_ops;
* Generate some code, and mark it as patchable by the
* apply_paravirt() alternate instruction patcher.
*/
-#define _paravirt_alt(insn_string, type, clobber) \
- "771:\n\t" insn_string "\n" "772:\n" \
- ".pushsection .parainstructions,\"a\"\n" \
- _ASM_ALIGN "\n" \
- _ASM_PTR " 771b\n" \
- " .byte " type "\n" \
- " .byte 772b-771b\n" \
- " .short " clobber "\n" \
- ".popsection\n"
+#define _paravirt_alt(type, clobber, pv_opptr) \
+ "PARAVIRT_ALT type=" __stringify(type) \
+ " clobber=" __stringify(clobber) \
+ " pv_opptr=" __stringify(pv_opptr) "\n\t"

/* Generate patchable code, with the default asm parameters. */
-#define paravirt_alt(insn_string) \
- _paravirt_alt(insn_string, "%c[paravirt_typenum]", "%c[paravirt_clobber]")
+#define paravirt_alt \
+ _paravirt_alt("%c[paravirt_typenum]", "%c[paravirt_clobber]", \
+ "%c[paravirt_opptr]")

/* Simple instruction patching code. */
#define NATIVE_LABEL(a,x,b) "\n\t.globl " a #x "_" #b "\n" a #x "_" #b ":\n\t"
@@ -387,16 +383,6 @@ unsigned native_patch(u8 type, u16 clobbers, void *ibuf,

int paravirt_disable_iospace(void);

-/*
- * This generates an indirect call based on the operation type number.
- * The type number, computed in PARAVIRT_PATCH, is derived from the
- * offset into the paravirt_patch_template structure, and can therefore be
- * freely converted back into a structure offset.
- */
-#define PARAVIRT_CALL \
- ANNOTATE_RETPOLINE_SAFE \
- "call *%c[paravirt_opptr];"
-
/*
* These macros are intended to wrap calls through one of the paravirt
* ops structs, so that they can be later identified and patched at
@@ -534,7 +520,7 @@ int paravirt_disable_iospace(void);
/* since this condition will never hold */ \
if (sizeof(rettype) > sizeof(unsigned long)) { \
asm volatile(pre \
- paravirt_alt(PARAVIRT_CALL) \
+ paravirt_alt \
post \
: call_clbr, ASM_CALL_CONSTRAINT \
: paravirt_type(op), \
@@ -544,7 +530,7 @@ int paravirt_disable_iospace(void);
__ret = (rettype)((((u64)__edx) << 32) | __eax); \
} else { \
asm volatile(pre \
- paravirt_alt(PARAVIRT_CALL) \
+ paravirt_alt \
post \
: call_clbr, ASM_CALL_CONSTRAINT \
: paravirt_type(op), \
@@ -571,7 +557,7 @@ int paravirt_disable_iospace(void);
PVOP_VCALL_ARGS; \
PVOP_TEST_NULL(op); \
asm volatile(pre \
- paravirt_alt(PARAVIRT_CALL) \
+ paravirt_alt \
post \
: call_clbr, ASM_CALL_CONSTRAINT \
: paravirt_type(op), \
@@ -691,6 +677,26 @@ struct paravirt_patch_site {
extern struct paravirt_patch_site __parainstructions[],
__parainstructions_end[];

+#else /* __ASSEMBLY__ */
+
+/*
+ * This generates an indirect call based on the operation type number.
+ * The type number, computed in PARAVIRT_PATCH, is derived from the
+ * offset into the paravirt_patch_template structure, and can therefore be
+ * freely converted back into a structure offset.
+ */
+.macro PARAVIRT_ALT type:req clobber:req pv_opptr:req
+771: ANNOTATE_RETPOLINE_SAFE
+ call *\pv_opptr
+772: .pushsection .parainstructions,"a"
+ _ASM_ALIGN
+ _ASM_PTR 771b
+ .byte \type
+ .byte 772b-771b
+ .short \clobber
+ .popsection
+.endm
+
#endif /* __ASSEMBLY__ */

#endif /* _ASM_X86_PARAVIRT_TYPES_H */
diff --git a/arch/x86/kernel/macros.S b/arch/x86/kernel/macros.S
index 66ccb8e823b1..71d8b716b111 100644
--- a/arch/x86/kernel/macros.S
+++ b/arch/x86/kernel/macros.S
@@ -10,3 +10,4 @@
#include <asm/refcount.h>
#include <asm/alternative-asm.h>
#include <asm/bug.h>
+#include <asm/paravirt.h>
--
2.17.0


2018-06-04 18:39:07

by Nadav Amit

[permalink] [raw]
Subject: [PATCH v2 4/9] x86: alternatives: macrofy locks for better inlining

GCC considers the number of statements in inlined assembly blocks,
according to new-lines and semicolons, as an indication to the cost of
the block in time and space. This data is distorted by the kernel code,
which puts information in alternative sections. As a result, the
compiler may perform incorrect inlining and branch optimizations.

The solution is to set an assembly macro and call it from the inlined
assembly block. As a result GCC considers the inline assembly block as
a single instruction.

This patch handles the LOCK prefix, allowing more aggresive inlining.

text data bss dec hex filename
18140140 10225284 2957312 31322736 1ddf270 ./vmlinux before
18146889 10225380 2957312 31329581 1de0d2d ./vmlinux after (+6845)

Static text symbols:
Before: 40286
After: 40218 (-68)

Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: [email protected]
Cc: Josh Poimboeuf <[email protected]>

Signed-off-by: Nadav Amit <[email protected]>
---
arch/x86/include/asm/alternative-asm.h | 20 ++++++++++++++------
arch/x86/include/asm/alternative.h | 16 ++--------------
arch/x86/kernel/macros.S | 1 +
3 files changed, 17 insertions(+), 20 deletions(-)

diff --git a/arch/x86/include/asm/alternative-asm.h b/arch/x86/include/asm/alternative-asm.h
index 31b627b43a8e..8e4ea39e55d0 100644
--- a/arch/x86/include/asm/alternative-asm.h
+++ b/arch/x86/include/asm/alternative-asm.h
@@ -7,16 +7,24 @@
#include <asm/asm.h>

#ifdef CONFIG_SMP
- .macro LOCK_PREFIX
-672: lock
+.macro LOCK_PREFIX_HERE
.pushsection .smp_locks,"a"
.balign 4
- .long 672b - .
+ .long 671f - . # offset
.popsection
- .endm
+671:
+.endm
+
+.macro LOCK_PREFIX insn:vararg
+ LOCK_PREFIX_HERE
+ lock \insn
+.endm
#else
- .macro LOCK_PREFIX
- .endm
+.macro LOCK_PREFIX_HERE
+.endm
+
+.macro LOCK_PREFIX insn:vararg
+.endm
#endif

/*
diff --git a/arch/x86/include/asm/alternative.h b/arch/x86/include/asm/alternative.h
index 4cd6a3b71824..c1a3d7c76151 100644
--- a/arch/x86/include/asm/alternative.h
+++ b/arch/x86/include/asm/alternative.h
@@ -30,20 +30,8 @@
* and size information. That keeps the table sizes small.
*/

-#ifdef CONFIG_SMP
-#define LOCK_PREFIX_HERE \
- ".pushsection .smp_locks,\"a\"\n" \
- ".balign 4\n" \
- ".long 671f - .\n" /* offset */ \
- ".popsection\n" \
- "671:"
-
-#define LOCK_PREFIX LOCK_PREFIX_HERE "\n\tlock; "
-
-#else /* ! CONFIG_SMP */
-#define LOCK_PREFIX_HERE ""
-#define LOCK_PREFIX ""
-#endif
+#define LOCK_PREFIX_HERE "LOCK_PREFIX_HERE\n\t"
+#define LOCK_PREFIX "LOCK_PREFIX "

struct alt_instr {
s32 instr_offset; /* original instruction */
diff --git a/arch/x86/kernel/macros.S b/arch/x86/kernel/macros.S
index f1fe1d570365..852487a9fc56 100644
--- a/arch/x86/kernel/macros.S
+++ b/arch/x86/kernel/macros.S
@@ -8,3 +8,4 @@

#include <linux/compiler.h>
#include <asm/refcount.h>
+#include <asm/alternative-asm.h>
--
2.17.0


2018-06-04 18:39:34

by Nadav Amit

[permalink] [raw]
Subject: [PATCH v2 7/9] x86: extable: use macros instead of inline assembly

Use assembly macros for exception-tables and call them from inline
assembly. This not only makes the code more readable and allows to
avoid the duplicate implementation, but also improves compilation
decision, specifically inline decisions which GCC base on the number of
new lines in inline assembly.

text data bss dec hex filename
18162555 10226288 2957312 31346155 1de4deb ./vmlinux before
18162879 10226256 2957312 31346447 1de4f0f ./vmlinux after (+292)

This allows to inline functions such as nested_vmx_exit_reflected(),
set_segment_reg(), __copy_xstate_to_user().

Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: [email protected]
Cc: Josh Poimboeuf <[email protected]>

Signed-off-by: Nadav Amit <[email protected]>
---
arch/x86/include/asm/asm.h | 61 +++++++++++++++++---------------------
arch/x86/kernel/macros.S | 1 +
2 files changed, 28 insertions(+), 34 deletions(-)

diff --git a/arch/x86/include/asm/asm.h b/arch/x86/include/asm/asm.h
index 219faaec51df..30bc1b0058ef 100644
--- a/arch/x86/include/asm/asm.h
+++ b/arch/x86/include/asm/asm.h
@@ -58,28 +58,44 @@
# define CC_OUT(c) [_cc_ ## c] "=qm"
#endif

-/* Exception table entry */
#ifdef __ASSEMBLY__
# define _ASM_EXTABLE_HANDLE(from, to, handler) \
- .pushsection "__ex_table","a" ; \
- .balign 4 ; \
- .long (from) - . ; \
- .long (to) - . ; \
- .long (handler) - . ; \
- .popsection
+ ASM_EXTABLE_HANDLE from to handler
+
+#else /* __ASSEMBLY__ */
+
+# define _ASM_EXTABLE_HANDLE(from, to, handler) \
+ "ASM_EXTABLE_HANDLE from=" #from " to=" #to \
+ " handler=\"" #handler "\"\n\t"
+
+/* For C file, we already have NOKPROBE_SYMBOL macro */
+
+#endif /* __ASSEMBLY__ */

-# define _ASM_EXTABLE(from, to) \
+#define _ASM_EXTABLE(from, to) \
_ASM_EXTABLE_HANDLE(from, to, ex_handler_default)

-# define _ASM_EXTABLE_FAULT(from, to) \
+#define _ASM_EXTABLE_FAULT(from, to) \
_ASM_EXTABLE_HANDLE(from, to, ex_handler_fault)

-# define _ASM_EXTABLE_EX(from, to) \
+#define _ASM_EXTABLE_EX(from, to) \
_ASM_EXTABLE_HANDLE(from, to, ex_handler_ext)

-# define _ASM_EXTABLE_REFCOUNT(from, to) \
+#define _ASM_EXTABLE_REFCOUNT(from, to) \
_ASM_EXTABLE_HANDLE(from, to, ex_handler_refcount)

+/* Exception table entry */
+#ifdef __ASSEMBLY__
+
+.macro ASM_EXTABLE_HANDLE from:req to:req handler:req
+ .pushsection "__ex_table","a"
+ .balign 4
+ .long (\from) - .
+ .long (\to) - .
+ .long (\handler) - .
+ .popsection
+.endm
+
# define _ASM_NOKPROBE(entry) \
.pushsection "_kprobe_blacklist","aw" ; \
_ASM_ALIGN ; \
@@ -110,29 +126,6 @@
_ASM_EXTABLE(101b,103b)
.endm

-#else
-# define _EXPAND_EXTABLE_HANDLE(x) #x
-# define _ASM_EXTABLE_HANDLE(from, to, handler) \
- " .pushsection \"__ex_table\",\"a\"\n" \
- " .balign 4\n" \
- " .long (" #from ") - .\n" \
- " .long (" #to ") - .\n" \
- " .long (" _EXPAND_EXTABLE_HANDLE(handler) ") - .\n" \
- " .popsection\n"
-
-# define _ASM_EXTABLE(from, to) \
- _ASM_EXTABLE_HANDLE(from, to, ex_handler_default)
-
-# define _ASM_EXTABLE_FAULT(from, to) \
- _ASM_EXTABLE_HANDLE(from, to, ex_handler_fault)
-
-# define _ASM_EXTABLE_EX(from, to) \
- _ASM_EXTABLE_HANDLE(from, to, ex_handler_ext)
-
-# define _ASM_EXTABLE_REFCOUNT(from, to) \
- _ASM_EXTABLE_HANDLE(from, to, ex_handler_refcount)
-
-/* For C file, we already have NOKPROBE_SYMBOL macro */
#endif

#ifndef __ASSEMBLY__
diff --git a/arch/x86/kernel/macros.S b/arch/x86/kernel/macros.S
index 71d8b716b111..7baa40d5bf16 100644
--- a/arch/x86/kernel/macros.S
+++ b/arch/x86/kernel/macros.S
@@ -11,3 +11,4 @@
#include <asm/alternative-asm.h>
#include <asm/bug.h>
#include <asm/paravirt.h>
+#include <asm/asm.h>
--
2.17.0


2018-06-04 18:39:42

by Nadav Amit

[permalink] [raw]
Subject: [PATCH v2 1/9] Makefile: Prepare for using macros for inline asm

Using macros for inline assembly improves both readability and
compilation decisions that are distorted by big assembly blocks that use
alternative sections. Compile macros.S and use it to assemble all C
files. Currently, only x86 will use it.

Signed-off-by: Nadav Amit <[email protected]>
---
Makefile | 9 +++++++--
arch/x86/Makefile | 11 +++++++++--
arch/x86/kernel/Makefile | 6 ++++++
arch/x86/kernel/macros.S | 7 +++++++
scripts/Kbuild.include | 4 +++-
5 files changed, 32 insertions(+), 5 deletions(-)
create mode 100644 arch/x86/kernel/macros.S

diff --git a/Makefile b/Makefile
index 619a85ad716b..c255ed4841d5 100644
--- a/Makefile
+++ b/Makefile
@@ -1082,7 +1082,7 @@ scripts: scripts_basic include/config/auto.conf include/config/tristate.conf \
# version.h and scripts_basic is processed / created.

# Listed in dependency order
-PHONY += prepare archprepare prepare0 prepare1 prepare2 prepare3
+PHONY += prepare archprepare macroprepare prepare0 prepare1 prepare2 prepare3

# prepare3 is used to check if we are building in a separate output directory,
# and if so do:
@@ -1106,7 +1106,9 @@ prepare1: prepare2 $(version_h) $(autoksyms_h) include/generated/utsrelease.h \
include/config/auto.conf
$(cmd_crmodverdir)

-archprepare: archheaders archscripts prepare1 scripts_basic
+macroprepare: prepare1 archmacros
+
+archprepare: archheaders archscripts macroprepare scripts_basic

prepare0: archprepare gcc-plugins
$(Q)$(MAKE) $(build)=.
@@ -1211,6 +1213,9 @@ archheaders:
PHONY += archscripts
archscripts:

+PHONY += archmacros
+archmacros:
+
PHONY += __headers
__headers: $(version_h) scripts_basic uapi-asm-generic archheaders archscripts
$(Q)$(MAKE) $(build)=scripts build_unifdef
diff --git a/arch/x86/Makefile b/arch/x86/Makefile
index 60135cbd905c..6b82314776fd 100644
--- a/arch/x86/Makefile
+++ b/arch/x86/Makefile
@@ -235,8 +235,8 @@ ifdef CONFIG_X86_64
LDFLAGS += $(call ld-option, -z max-page-size=0x200000)
endif

-# Speed up the build
-KBUILD_CFLAGS += -pipe
+# We cannot use -pipe flag since we give an additional .s file to the compiler
+#KBUILD_CFLAGS += -pipe
# Workaround for a gcc prelease that unfortunately was shipped in a suse release
KBUILD_CFLAGS += -Wno-sign-compare
#
@@ -258,11 +258,18 @@ archscripts: scripts_basic
archheaders:
$(Q)$(MAKE) $(build)=arch/x86/entry/syscalls all

+archmacros:
+ $(Q)$(MAKE) $(build)=arch/x86/kernel macros
+
archprepare:
ifeq ($(CONFIG_KEXEC_FILE),y)
$(Q)$(MAKE) $(build)=arch/x86/purgatory arch/x86/purgatory/kexec-purgatory.c
endif

+ASM_MACRO_FLAGS = -Wa,arch/x86/kernel/macros.s
+export ASM_MACRO_FLAGS
+KBUILD_CFLAGS += $(ASM_MACRO_FLAGS)
+
###
# Kernel objects

diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 02d6f5cf4e70..fdb6c5b2a922 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -9,6 +9,12 @@ extra-y += ebda.o
extra-y += platform-quirks.o
extra-y += vmlinux.lds

+$(obj)/macros.s: $(obj)/macros.S FORCE
+ $(call if_changed_dep,cpp_s_S)
+
+macros: $(obj)/macros.s
+ @:
+
CPPFLAGS_vmlinux.lds += -U$(UTS_MACHINE)

ifdef CONFIG_FUNCTION_TRACER
diff --git a/arch/x86/kernel/macros.S b/arch/x86/kernel/macros.S
new file mode 100644
index 000000000000..cfc1c7d1a6eb
--- /dev/null
+++ b/arch/x86/kernel/macros.S
@@ -0,0 +1,7 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+/*
+ * This file includes headers whose assembly part includes macros which are
+ * commonly used. The macros are precompiled into assmebly file which is later
+ * assembled together with each compiled file.
+ */
diff --git a/scripts/Kbuild.include b/scripts/Kbuild.include
index 50cee534fd64..ad2c02062aa4 100644
--- a/scripts/Kbuild.include
+++ b/scripts/Kbuild.include
@@ -189,7 +189,9 @@ __cc-option = $(call try-run-cached,\

# Do not attempt to build with gcc plugins during cc-option tests.
# (And this uses delayed resolution so the flags will be up to date.)
-CC_OPTION_CFLAGS = $(filter-out $(GCC_PLUGINS_CFLAGS),$(KBUILD_CFLAGS))
+# In addition, do not include the asm macros which are built later.
+CC_OPTION_FILTERED = $(GCC_PLUGINS_CFLAGS) $(ASM_MACRO_FLAGS)
+CC_OPTION_CFLAGS = $(filter-out $(CC_OPTION_FILTERED),$(KBUILD_CFLAGS))

# cc-option
# Usage: cflags-y += $(call cc-option,-march=winchip-c6,-march=i586)
--
2.17.0


2018-06-04 18:40:16

by Nadav Amit

[permalink] [raw]
Subject: [PATCH v2 3/9] x86: refcount: prevent gcc distortions

GCC considers the number of statements in inlined assembly blocks,
according to new-lines and semicolons, as an indication to the cost of
the block in time and space. This data is distorted by the kernel code,
which puts information in alternative sections. As a result, the
compiler may perform incorrect inlining and branch optimizations.

The solution is to set an assembly macro and call it from the inlined
assembly block. As a result GCC considers the inline assembly block as
a single instruction.

This patch allows to inline functions such as __get_seccomp_filter().
Interestingly, this allows more aggressive inlining while reducing the
kernel size.

text data bss dec hex filename
18140970 10225412 2957312 31323694 1ddf62e ./vmlinux before
18140140 10225284 2957312 31322736 1ddf270 ./vmlinux after (-958)

Static text symbols:
Before: 40302
After: 40286 (-16)

Functions such as kref_get(), free_user(), fuse_file_get() now get
inlined.

Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: [email protected]
Cc: Kees Cook <[email protected]>
Cc: Jan Beulich <[email protected]>
Cc: Josh Poimboeuf <[email protected]>

Signed-off-by: Nadav Amit <[email protected]>
---
arch/x86/include/asm/refcount.h | 73 ++++++++++++++++++++-------------
arch/x86/kernel/macros.S | 1 +
2 files changed, 45 insertions(+), 29 deletions(-)

diff --git a/arch/x86/include/asm/refcount.h b/arch/x86/include/asm/refcount.h
index 4cf11d88d3b3..53462f32b58e 100644
--- a/arch/x86/include/asm/refcount.h
+++ b/arch/x86/include/asm/refcount.h
@@ -4,6 +4,9 @@
* x86-specific implementation of refcount_t. Based on PAX_REFCOUNT from
* PaX/grsecurity.
*/
+
+#ifndef __ASSEMBLY__
+
#include <linux/refcount.h>

/*
@@ -14,34 +17,11 @@
* central refcount exception. The fixup address for the exception points
* back to the regular execution flow in .text.
*/
-#define _REFCOUNT_EXCEPTION \
- ".pushsection .text..refcount\n" \
- "111:\tlea %[counter], %%" _ASM_CX "\n" \
- "112:\t" ASM_UD2 "\n" \
- ASM_UNREACHABLE \
- ".popsection\n" \
- "113:\n" \
- _ASM_EXTABLE_REFCOUNT(112b, 113b)
-
-/* Trigger refcount exception if refcount result is negative. */
-#define REFCOUNT_CHECK_LT_ZERO \
- "js 111f\n\t" \
- _REFCOUNT_EXCEPTION
-
-/* Trigger refcount exception if refcount result is zero or negative. */
-#define REFCOUNT_CHECK_LE_ZERO \
- "jz 111f\n\t" \
- REFCOUNT_CHECK_LT_ZERO
-
-/* Trigger refcount exception unconditionally. */
-#define REFCOUNT_ERROR \
- "jmp 111f\n\t" \
- _REFCOUNT_EXCEPTION

static __always_inline void refcount_add(unsigned int i, refcount_t *r)
{
asm volatile(LOCK_PREFIX "addl %1,%0\n\t"
- REFCOUNT_CHECK_LT_ZERO
+ "REFCOUNT_CHECK_LT_ZERO counter=\"%[counter]\""
: [counter] "+m" (r->refs.counter)
: "ir" (i)
: "cc", "cx");
@@ -50,7 +30,7 @@ static __always_inline void refcount_add(unsigned int i, refcount_t *r)
static __always_inline void refcount_inc(refcount_t *r)
{
asm volatile(LOCK_PREFIX "incl %0\n\t"
- REFCOUNT_CHECK_LT_ZERO
+ "REFCOUNT_CHECK_LT_ZERO counter=\"%[counter]\""
: [counter] "+m" (r->refs.counter)
: : "cc", "cx");
}
@@ -58,7 +38,7 @@ static __always_inline void refcount_inc(refcount_t *r)
static __always_inline void refcount_dec(refcount_t *r)
{
asm volatile(LOCK_PREFIX "decl %0\n\t"
- REFCOUNT_CHECK_LE_ZERO
+ "REFCOUNT_CHECK_LE_ZERO counter=\"%[counter]\""
: [counter] "+m" (r->refs.counter)
: : "cc", "cx");
}
@@ -66,13 +46,15 @@ static __always_inline void refcount_dec(refcount_t *r)
static __always_inline __must_check
bool refcount_sub_and_test(unsigned int i, refcount_t *r)
{
- GEN_BINARY_SUFFIXED_RMWcc(LOCK_PREFIX "subl", REFCOUNT_CHECK_LT_ZERO,
+ GEN_BINARY_SUFFIXED_RMWcc(LOCK_PREFIX "subl",
+ "REFCOUNT_CHECK_LT_ZERO counter=\"%0\"",
r->refs.counter, "er", i, "%0", e, "cx");
}

static __always_inline __must_check bool refcount_dec_and_test(refcount_t *r)
{
- GEN_UNARY_SUFFIXED_RMWcc(LOCK_PREFIX "decl", REFCOUNT_CHECK_LT_ZERO,
+ GEN_UNARY_SUFFIXED_RMWcc(LOCK_PREFIX "decl",
+ "REFCOUNT_CHECK_LT_ZERO counter=\"%0\"",
r->refs.counter, "%0", e, "cx");
}

@@ -90,7 +72,7 @@ bool refcount_add_not_zero(unsigned int i, refcount_t *r)

/* Did we try to increment from/to an undesirable state? */
if (unlikely(c < 0 || c == INT_MAX || result < c)) {
- asm volatile(REFCOUNT_ERROR
+ asm volatile("REFCOUNT_ERROR counter=\"%[counter]\""
: : [counter] "m" (r->refs.counter)
: "cc", "cx");
break;
@@ -106,4 +88,37 @@ static __always_inline __must_check bool refcount_inc_not_zero(refcount_t *r)
return refcount_add_not_zero(1, r);
}

+#else /* __ASSEMBLY__ */
+#include <asm/asm.h>
+#include <asm/bug.h>
+
+.macro REFCOUNT_EXCEPTION counter:req
+ .pushsection .text..refcount
+111: lea \counter, %_ASM_CX
+112: ud2
+ ASM_UNREACHABLE
+ .popsection
+113: _ASM_EXTABLE_REFCOUNT(112b, 113b)
+.endm
+
+/* Trigger refcount exception if refcount result is negative. */
+.macro REFCOUNT_CHECK_LT_ZERO counter:req
+ js 111f
+ REFCOUNT_EXCEPTION \counter
+.endm
+
+/* Trigger refcount exception if refcount result is zero or negative. */
+.macro REFCOUNT_CHECK_LE_ZERO counter:req
+ jz 111f
+ REFCOUNT_CHECK_LT_ZERO counter=\counter
+.endm
+
+/* Trigger refcount exception unconditionally. */
+.macro REFCOUNT_ERROR counter:req
+ jmp 111f
+ REFCOUNT_EXCEPTION counter=\counter
+.endm
+
+#endif /* __ASSEMBLY__ */
+
#endif
diff --git a/arch/x86/kernel/macros.S b/arch/x86/kernel/macros.S
index cee28c3246dc..f1fe1d570365 100644
--- a/arch/x86/kernel/macros.S
+++ b/arch/x86/kernel/macros.S
@@ -7,3 +7,4 @@
*/

#include <linux/compiler.h>
+#include <asm/refcount.h>
--
2.17.0


2018-06-04 18:41:34

by Nadav Amit

[permalink] [raw]
Subject: [PATCH v2 2/9] x86: objtool: use asm macro for better compiler decisions

GCC considers the number of statements in inlined assembly blocks,
according to new-lines and semicolons, as an indication to the cost of
the block in time and space. This data is distorted by the kernel code,
which puts information in alternative sections. As a result, the
compiler may perform incorrect inlining and branch optimizations.

In the case of objtool, this distortion is extreme, since anyhow the
annotations of objtool are discarded during linkage.

The solution is to set an assembly macro and call it from the inline
assembly block. As a result GCC considers the inline assembly block as
a single instruction.

This patch slightly increases the kernel size.

text data bss dec hex filename
18140829 10224724 2957312 31322865 1ddf2f1 ./vmlinux before
18140970 10225412 2957312 31323694 1ddf62e ./vmlinux after (+829)

Static text symbols:
Before: 40321
After: 40302 (-19)

Cc: Christopher Li <[email protected]>
Cc: [email protected]

Signed-off-by: Nadav Amit <[email protected]>
---
arch/x86/kernel/macros.S | 2 ++
include/linux/compiler.h | 60 +++++++++++++++++++++++++++++++---------
2 files changed, 49 insertions(+), 13 deletions(-)

diff --git a/arch/x86/kernel/macros.S b/arch/x86/kernel/macros.S
index cfc1c7d1a6eb..cee28c3246dc 100644
--- a/arch/x86/kernel/macros.S
+++ b/arch/x86/kernel/macros.S
@@ -5,3 +5,5 @@
* commonly used. The macros are precompiled into assmebly file which is later
* assembled together with each compiled file.
*/
+
+#include <linux/compiler.h>
diff --git a/include/linux/compiler.h b/include/linux/compiler.h
index ab4711c63601..d10e752036c4 100644
--- a/include/linux/compiler.h
+++ b/include/linux/compiler.h
@@ -91,6 +91,10 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val,
# define barrier_before_unreachable() do { } while (0)
#endif

+/* A wrapper to clearly document when a macro is used */
+#define __ASM_MACRO(name, ...) __stringify(name) __stringify(__VA_ARGS__)
+#define ASM_MACRO(name, ...) __ASM_MACRO(name, __VA_ARGS__) "\n\t"
+
/* Unreachable code */
#ifdef CONFIG_STACK_VALIDATION
/*
@@ -99,22 +103,13 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val,
* unique, to convince GCC not to merge duplicate inline asm statements.
*/
#define annotate_reachable() ({ \
- asm volatile("%c0:\n\t" \
- ".pushsection .discard.reachable\n\t" \
- ".long %c0b - .\n\t" \
- ".popsection\n\t" : : "i" (__COUNTER__)); \
+ asm volatile("ANNOTATE_REACHABLE counter=%c0" \
+ : : "i" (__COUNTER__)); \
})
#define annotate_unreachable() ({ \
- asm volatile("%c0:\n\t" \
- ".pushsection .discard.unreachable\n\t" \
- ".long %c0b - .\n\t" \
- ".popsection\n\t" : : "i" (__COUNTER__)); \
+ asm volatile("ANNOTATE_UNREACHABLE counter=%c0" \
+ : : "i" (__COUNTER__)); \
})
-#define ASM_UNREACHABLE \
- "999:\n\t" \
- ".pushsection .discard.unreachable\n\t" \
- ".long 999b - .\n\t" \
- ".popsection\n\t"
#else
#define annotate_reachable()
#define annotate_unreachable()
@@ -280,6 +275,45 @@ unsigned long read_word_at_a_time(const void *addr)

#endif /* __KERNEL__ */

+#else /* __ASSEMBLY__ */
+
+#ifdef __KERNEL__
+#ifndef LINKER_SCRIPT
+
+#ifdef CONFIG_STACK_VALIDATION
+.macro ANNOTATE_UNREACHABLE counter:req
+\counter:
+ .pushsection .discard.unreachable
+ .long \counter\()b -.
+ .popsection
+.endm
+
+.macro ANNOTATE_REACHABLE counter:req
+\counter:
+ .pushsection .discard.reachable
+ .long \counter\()b -.
+ .popsection
+.endm
+
+.macro ASM_UNREACHABLE
+999:
+ .pushsection .discard.unreachable
+ .long 999b - .
+ .popsection
+.endm
+#else /* CONFIG_STACK_VALIDATION */
+.macro ANNOTATE_UNREACHABLE counter:req
+.endm
+
+.macro ANNOTATE_UNREACHABLE counter:req
+.endm
+
+.macro ASM_UNREACHABLE
+.endm /* CONFIG_STACK_VALIDATION */
+#endif
+
+#endif /* LINKER_SCRIPT */
+#endif /* __KERNEL__ */
#endif /* __ASSEMBLY__ */

#ifndef __optimize
--
2.17.0


2018-06-04 19:05:57

by Josh Poimboeuf

[permalink] [raw]
Subject: Re: [PATCH v2 2/9] x86: objtool: use asm macro for better compiler decisions

On Mon, Jun 04, 2018 at 04:21:24AM -0700, Nadav Amit wrote:
> +#ifdef CONFIG_STACK_VALIDATION
> +.macro ANNOTATE_UNREACHABLE counter:req
> +\counter:
> + .pushsection .discard.unreachable
> + .long \counter\()b -.
> + .popsection
> +.endm
> +
> +.macro ANNOTATE_REACHABLE counter:req
> +\counter:
> + .pushsection .discard.reachable
> + .long \counter\()b -.
> + .popsection
> +.endm
> +
> +.macro ASM_UNREACHABLE
> +999:
> + .pushsection .discard.unreachable
> + .long 999b - .
> + .popsection
> +.endm
> +#else /* CONFIG_STACK_VALIDATION */
> +.macro ANNOTATE_UNREACHABLE counter:req
> +.endm
> +
> +.macro ANNOTATE_UNREACHABLE counter:req
> +.endm
> +
> +.macro ASM_UNREACHABLE
> +.endm /* CONFIG_STACK_VALIDATION */
> +#endif

The '/* CONFIG_STACK_VALIDATION */' comment is on the wrong line.

Otherwise:

Reviewed-by: Josh Poimboeuf <[email protected]>

--
Josh

2018-06-04 19:07:10

by Josh Poimboeuf

[permalink] [raw]
Subject: Re: [PATCH v2 0/9] x86: macrofying inline asm for better compilation

On Mon, Jun 04, 2018 at 04:21:22AM -0700, Nadav Amit wrote:
> This patch-set deals with an interesting yet stupid problem: kernel code
> that does not get inlined despite its simplicity. There are several
> causes for this behavior: "cold" attribute on __init, different function
> optimization levels; conditional constant computations based on
> __builtin_constant_p(); and finally large inline assembly blocks.
>
> This patch-set deals with the inline assembly problem. I separated these
> patches from the others (that were sent in the RFC) for easier
> inclusion. I also separated the removal of unnecessary new-lines which
> would be sent separately.
>
> The problem with inline assembly is that inline assembly is often used
> by the kernel for things that are other than code - for example,
> assembly directives and data. GCC however is oblivious to the content of
> the blocks and assumes their cost in space and time is proportional to
> the number of the perceived assembly "instruction", according to the
> number of newlines and semicolons. Alternatives, paravirt and other
> mechanisms are affected, causing code not to be inlined, and degrading
> compilation quality in general.
>
> The solution that this patch-set carries for this problem is to create
> an assembly macro, and then call it from the inline assembly block. As
> a result, the compiler sees a single "instruction" and assigns the more
> appropriate cost to the code.
>
> To avoid uglification of the code, as many noted, the macros are first
> precompiled into an assembly file, which is later assembled together
> with the the C files. This also enables to avoid duplicate
> implementation that was set before for the asm and C code. This can be
> seen in the exception table changes.
>
> Overall this patch-set slightly increases the kernel size (my build was
> done using my Ubuntu 18.04 config + localyesconfig for the record):
>
> text data bss dec hex filename
> 18140829 10224724 2957312 31322865 1ddf2f1 ./vmlinux before
> 18163608 10227348 2957312 31348268 1de562c ./vmlinux after (+0.1%)
>
> The number of static functions in the image is reduced by 379, but
> actually inlining is even better, which does not always shows in these
> numbers: a function may be inlined causing the calling function not to
> be inlined.
>
> The Makefile stuff may not be too clean. Ideas for improvements are
> welcome.
>
> v1->v2: * Compiling the macros into a separate .s file, improving
> readability (Linus)
> * Improving assembly formatting, applying most of the comments
> according to my judgment (Jan)
> * Adding exception-table, cpufeature and jump-labels
> * Removing new-line cleanup; to be submitted separately

How did you find these issues? Is there some way to find them
automatically in the future? Perhaps with a GCC plugin?

--
Josh

2018-06-04 19:56:51

by Nadav Amit

[permalink] [raw]
Subject: Re: [PATCH v2 0/9] x86: macrofying inline asm for better compilation

Josh Poimboeuf <[email protected]> wrote:

> On Mon, Jun 04, 2018 at 04:21:22AM -0700, Nadav Amit wrote:
>> This patch-set deals with an interesting yet stupid problem: kernel code
>> that does not get inlined despite its simplicity. There are several
>> causes for this behavior: "cold" attribute on __init, different function
>> optimization levels; conditional constant computations based on
>> __builtin_constant_p(); and finally large inline assembly blocks.
>>
>> This patch-set deals with the inline assembly problem. I separated these
>> patches from the others (that were sent in the RFC) for easier
>> inclusion. I also separated the removal of unnecessary new-lines which
>> would be sent separately.
>>
>> The problem with inline assembly is that inline assembly is often used
>> by the kernel for things that are other than code - for example,
>> assembly directives and data. GCC however is oblivious to the content of
>> the blocks and assumes their cost in space and time is proportional to
>> the number of the perceived assembly "instruction", according to the
>> number of newlines and semicolons. Alternatives, paravirt and other
>> mechanisms are affected, causing code not to be inlined, and degrading
>> compilation quality in general.
>>
>> The solution that this patch-set carries for this problem is to create
>> an assembly macro, and then call it from the inline assembly block. As
>> a result, the compiler sees a single "instruction" and assigns the more
>> appropriate cost to the code.
>>
>> To avoid uglification of the code, as many noted, the macros are first
>> precompiled into an assembly file, which is later assembled together
>> with the the C files. This also enables to avoid duplicate
>> implementation that was set before for the asm and C code. This can be
>> seen in the exception table changes.
>>
>> Overall this patch-set slightly increases the kernel size (my build was
>> done using my Ubuntu 18.04 config + localyesconfig for the record):
>>
>> text data bss dec hex filename
>> 18140829 10224724 2957312 31322865 1ddf2f1 ./vmlinux before
>> 18163608 10227348 2957312 31348268 1de562c ./vmlinux after (+0.1%)
>>
>> The number of static functions in the image is reduced by 379, but
>> actually inlining is even better, which does not always shows in these
>> numbers: a function may be inlined causing the calling function not to
>> be inlined.
>>
>> The Makefile stuff may not be too clean. Ideas for improvements are
>> welcome.
>>
>> v1->v2: * Compiling the macros into a separate .s file, improving
>> readability (Linus)
>> * Improving assembly formatting, applying most of the comments
>> according to my judgment (Jan)
>> * Adding exception-table, cpufeature and jump-labels
>> * Removing new-line cleanup; to be submitted separately
>
> How did you find these issues? Is there some way to find them
> automatically in the future? Perhaps with a GCC plugin?

Initially I found it while developing something unrelated and seeing the
disassembly going crazy for no good reason.

One way to see problematic functions is finding duplicate static functions,
which mostly happens when inline function in a header is not inlined:

nm ./vmlinux | grep ' t ' | cut -d' ' -f3 | uniq -c | sort | \
grep -v ' 1’

But due to all kind of reasons (duplicate function names, inlined functions
which are being set a function pointers), it still requires manual work to
filter the false-positive.

Another way is to look on small functions, doing something like:
nm --print-size ./vmlinux | grep ' t ' | cut -d' ' -f2- | sort | \
head -n 10000

But again, there are many false-positives so I only looked at functions that
I know or only considered those that are marked as “inline”.

I don’t know how this process can be fully automated.

Regards,
Nadav

2018-06-04 22:07:53

by Kees Cook

[permalink] [raw]
Subject: Re: [PATCH v2 3/9] x86: refcount: prevent gcc distortions

On Mon, Jun 4, 2018 at 4:21 AM, Nadav Amit <[email protected]> wrote:
> GCC considers the number of statements in inlined assembly blocks,
> according to new-lines and semicolons, as an indication to the cost of
> the block in time and space. This data is distorted by the kernel code,
> which puts information in alternative sections. As a result, the
> compiler may perform incorrect inlining and branch optimizations.
>
> The solution is to set an assembly macro and call it from the inlined
> assembly block. As a result GCC considers the inline assembly block as
> a single instruction.
>
> This patch allows to inline functions such as __get_seccomp_filter().
> Interestingly, this allows more aggressive inlining while reducing the
> kernel size.
>
> text data bss dec hex filename
> 18140970 10225412 2957312 31323694 1ddf62e ./vmlinux before
> 18140140 10225284 2957312 31322736 1ddf270 ./vmlinux after (-958)
>
> Static text symbols:
> Before: 40302
> After: 40286 (-16)
>
> Functions such as kref_get(), free_user(), fuse_file_get() now get
> inlined.
>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: "H. Peter Anvin" <[email protected]>
> Cc: [email protected]
> Cc: Kees Cook <[email protected]>
> Cc: Jan Beulich <[email protected]>
> Cc: Josh Poimboeuf <[email protected]>
>
> Signed-off-by: Nadav Amit <[email protected]>
> ---
> arch/x86/include/asm/refcount.h | 73 ++++++++++++++++++++-------------
> arch/x86/kernel/macros.S | 1 +
> 2 files changed, 45 insertions(+), 29 deletions(-)
>
> diff --git a/arch/x86/include/asm/refcount.h b/arch/x86/include/asm/refcount.h
> index 4cf11d88d3b3..53462f32b58e 100644
> --- a/arch/x86/include/asm/refcount.h
> +++ b/arch/x86/include/asm/refcount.h
> @@ -4,6 +4,9 @@
> * x86-specific implementation of refcount_t. Based on PAX_REFCOUNT from
> * PaX/grsecurity.
> */
> +
> +#ifndef __ASSEMBLY__

Can you swap the order here, so that the asm macros are visible first
in the file?

#ifdef __ASSEMBLY__
...macros
#else
....C
#endif

-Kees

> +
> #include <linux/refcount.h>
>
> /*
> @@ -14,34 +17,11 @@
> * central refcount exception. The fixup address for the exception points
> * back to the regular execution flow in .text.
> */
> -#define _REFCOUNT_EXCEPTION \
> - ".pushsection .text..refcount\n" \
> - "111:\tlea %[counter], %%" _ASM_CX "\n" \
> - "112:\t" ASM_UD2 "\n" \
> - ASM_UNREACHABLE \
> - ".popsection\n" \
> - "113:\n" \
> - _ASM_EXTABLE_REFCOUNT(112b, 113b)
> -
> -/* Trigger refcount exception if refcount result is negative. */
> -#define REFCOUNT_CHECK_LT_ZERO \
> - "js 111f\n\t" \
> - _REFCOUNT_EXCEPTION
> -
> -/* Trigger refcount exception if refcount result is zero or negative. */
> -#define REFCOUNT_CHECK_LE_ZERO \
> - "jz 111f\n\t" \
> - REFCOUNT_CHECK_LT_ZERO
> -
> -/* Trigger refcount exception unconditionally. */
> -#define REFCOUNT_ERROR \
> - "jmp 111f\n\t" \
> - _REFCOUNT_EXCEPTION
>
> static __always_inline void refcount_add(unsigned int i, refcount_t *r)
> {
> asm volatile(LOCK_PREFIX "addl %1,%0\n\t"
> - REFCOUNT_CHECK_LT_ZERO
> + "REFCOUNT_CHECK_LT_ZERO counter=\"%[counter]\""
> : [counter] "+m" (r->refs.counter)
> : "ir" (i)
> : "cc", "cx");
> @@ -50,7 +30,7 @@ static __always_inline void refcount_add(unsigned int i, refcount_t *r)
> static __always_inline void refcount_inc(refcount_t *r)
> {
> asm volatile(LOCK_PREFIX "incl %0\n\t"
> - REFCOUNT_CHECK_LT_ZERO
> + "REFCOUNT_CHECK_LT_ZERO counter=\"%[counter]\""
> : [counter] "+m" (r->refs.counter)
> : : "cc", "cx");
> }
> @@ -58,7 +38,7 @@ static __always_inline void refcount_inc(refcount_t *r)
> static __always_inline void refcount_dec(refcount_t *r)
> {
> asm volatile(LOCK_PREFIX "decl %0\n\t"
> - REFCOUNT_CHECK_LE_ZERO
> + "REFCOUNT_CHECK_LE_ZERO counter=\"%[counter]\""
> : [counter] "+m" (r->refs.counter)
> : : "cc", "cx");
> }
> @@ -66,13 +46,15 @@ static __always_inline void refcount_dec(refcount_t *r)
> static __always_inline __must_check
> bool refcount_sub_and_test(unsigned int i, refcount_t *r)
> {
> - GEN_BINARY_SUFFIXED_RMWcc(LOCK_PREFIX "subl", REFCOUNT_CHECK_LT_ZERO,
> + GEN_BINARY_SUFFIXED_RMWcc(LOCK_PREFIX "subl",
> + "REFCOUNT_CHECK_LT_ZERO counter=\"%0\"",
> r->refs.counter, "er", i, "%0", e, "cx");
> }
>
> static __always_inline __must_check bool refcount_dec_and_test(refcount_t *r)
> {
> - GEN_UNARY_SUFFIXED_RMWcc(LOCK_PREFIX "decl", REFCOUNT_CHECK_LT_ZERO,
> + GEN_UNARY_SUFFIXED_RMWcc(LOCK_PREFIX "decl",
> + "REFCOUNT_CHECK_LT_ZERO counter=\"%0\"",
> r->refs.counter, "%0", e, "cx");
> }
>
> @@ -90,7 +72,7 @@ bool refcount_add_not_zero(unsigned int i, refcount_t *r)
>
> /* Did we try to increment from/to an undesirable state? */
> if (unlikely(c < 0 || c == INT_MAX || result < c)) {
> - asm volatile(REFCOUNT_ERROR
> + asm volatile("REFCOUNT_ERROR counter=\"%[counter]\""
> : : [counter] "m" (r->refs.counter)
> : "cc", "cx");
> break;
> @@ -106,4 +88,37 @@ static __always_inline __must_check bool refcount_inc_not_zero(refcount_t *r)
> return refcount_add_not_zero(1, r);
> }
>
> +#else /* __ASSEMBLY__ */
> +#include <asm/asm.h>
> +#include <asm/bug.h>
> +
> +.macro REFCOUNT_EXCEPTION counter:req
> + .pushsection .text..refcount
> +111: lea \counter, %_ASM_CX
> +112: ud2
> + ASM_UNREACHABLE
> + .popsection
> +113: _ASM_EXTABLE_REFCOUNT(112b, 113b)
> +.endm
> +
> +/* Trigger refcount exception if refcount result is negative. */
> +.macro REFCOUNT_CHECK_LT_ZERO counter:req
> + js 111f
> + REFCOUNT_EXCEPTION \counter
> +.endm
> +
> +/* Trigger refcount exception if refcount result is zero or negative. */
> +.macro REFCOUNT_CHECK_LE_ZERO counter:req
> + jz 111f
> + REFCOUNT_CHECK_LT_ZERO counter=\counter
> +.endm
> +
> +/* Trigger refcount exception unconditionally. */
> +.macro REFCOUNT_ERROR counter:req
> + jmp 111f
> + REFCOUNT_EXCEPTION counter=\counter
> +.endm
> +
> +#endif /* __ASSEMBLY__ */
> +
> #endif
> diff --git a/arch/x86/kernel/macros.S b/arch/x86/kernel/macros.S
> index cee28c3246dc..f1fe1d570365 100644
> --- a/arch/x86/kernel/macros.S
> +++ b/arch/x86/kernel/macros.S
> @@ -7,3 +7,4 @@
> */
>
> #include <linux/compiler.h>
> +#include <asm/refcount.h>
> --
> 2.17.0
>



--
Kees Cook
Pixel Security

2018-06-04 22:21:31

by Nadav Amit

[permalink] [raw]
Subject: Re: [PATCH v2 3/9] x86: refcount: prevent gcc distortions

Kees Cook <[email protected]> wrote:

> On Mon, Jun 4, 2018 at 4:21 AM, Nadav Amit <[email protected]> wrote:
>> GCC considers the number of statements in inlined assembly blocks,
>> according to new-lines and semicolons, as an indication to the cost of
>> the block in time and space. This data is distorted by the kernel code,
>> which puts information in alternative sections. As a result, the
>> compiler may perform incorrect inlining and branch optimizations.
>>
>> The solution is to set an assembly macro and call it from the inlined
>> assembly block. As a result GCC considers the inline assembly block as
>> a single instruction.
>>
>> This patch allows to inline functions such as __get_seccomp_filter().
>> Interestingly, this allows more aggressive inlining while reducing the
>> kernel size.
>>
>> text data bss dec hex filename
>> 18140970 10225412 2957312 31323694 1ddf62e ./vmlinux before
>> 18140140 10225284 2957312 31322736 1ddf270 ./vmlinux after (-958)
>>
>> Static text symbols:
>> Before: 40302
>> After: 40286 (-16)
>>
>> Functions such as kref_get(), free_user(), fuse_file_get() now get
>> inlined.
>>
>> Cc: Thomas Gleixner <[email protected]>
>> Cc: Ingo Molnar <[email protected]>
>> Cc: "H. Peter Anvin" <[email protected]>
>> Cc: [email protected]
>> Cc: Kees Cook <[email protected]>
>> Cc: Jan Beulich <[email protected]>
>> Cc: Josh Poimboeuf <[email protected]>
>>
>> Signed-off-by: Nadav Amit <[email protected]>
>> ---
>> arch/x86/include/asm/refcount.h | 73 ++++++++++++++++++++-------------
>> arch/x86/kernel/macros.S | 1 +
>> 2 files changed, 45 insertions(+), 29 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/refcount.h b/arch/x86/include/asm/refcount.h
>> index 4cf11d88d3b3..53462f32b58e 100644
>> --- a/arch/x86/include/asm/refcount.h
>> +++ b/arch/x86/include/asm/refcount.h
>> @@ -4,6 +4,9 @@
>> * x86-specific implementation of refcount_t. Based on PAX_REFCOUNT from
>> * PaX/grsecurity.
>> */
>> +
>> +#ifndef __ASSEMBLY__
>
> Can you swap the order here, so that the asm macros are visible first
> in the file?
>
> #ifdef __ASSEMBLY__
> ...macros
> #else
> ....C
> #endif

Done. I also noticed that I forgot in one instance (REFCOUNT_CHECK_LT_ZERO)
to explicitly mark the parameter name (“counter=\counter”), so I’ll fix it
as well in the next version.

2018-06-05 05:43:22

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH v2 2/9] x86: objtool: use asm macro for better compiler decisions

Hi Nadav,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on linus/master]
[also build test ERROR on v4.17 next-20180604]
[cannot apply to tip/x86/core]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url: https://github.com/0day-ci/linux/commits/Nadav-Amit/x86-macrofying-inline-asm-for-better-compilation/20180605-124313
config: c6x-evmc6678_defconfig (attached as .config)
compiler: c6x-elf-gcc (GCC) 8.1.0
reproduce:
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=c6x

All errors (new ones prefixed by >>):

include/linux/compiler.h: Assembler messages:
>> include/linux/compiler.h:308: Error: Macro `annotate_unreachable' was already defined

vim +308 include/linux/compiler.h

307
> 308 .macro ANNOTATE_UNREACHABLE counter:req
309 .endm
310

---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation


Attachments:
(No filename) (1.18 kB)
.config.gz (4.84 kB)
Download all attachments

2018-06-05 05:45:23

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH v2 4/9] x86: alternatives: macrofy locks for better inlining

Hi Nadav,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on linus/master]
[also build test ERROR on v4.17 next-20180604]
[cannot apply to tip/x86/core]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url: https://github.com/0day-ci/linux/commits/Nadav-Amit/x86-macrofying-inline-asm-for-better-compilation/20180605-124313
config: um-i386_defconfig (attached as .config)
compiler: gcc-7 (Debian 7.3.0-16) 7.3.0
reproduce:
# save the attached .config to linux build tree
make ARCH=um SUBARCH=i386

All errors (new ones prefixed by >>):

arch/x86/include/asm/bitops.h: Assembler messages:
>> arch/x86/include/asm/bitops.h:220: Error: no such instruction: `lock_prefix btsl $0,once.63562'
>> arch/x86/include/asm/atomic.h:122: Error: no such instruction: `lock_prefix decl 16(%esi)'
--
arch/x86/include/asm/atomic.h: Assembler messages:
>> arch/x86/include/asm/atomic.h:96: Error: no such instruction: `lock_prefix incl contig_page_data+500(%edx)'
>> arch/x86/include/asm/atomic.h:96: Error: no such instruction: `lock_prefix incl vm_zone_stat+32'
--
arch/x86/include/asm/bitops.h: Assembler messages:
>> arch/x86/include/asm/bitops.h:267: Error: no such instruction: `lock_prefix btrl $8,4(%eax)'
--
arch/x86/include/asm/bitops.h: Assembler messages:
>> arch/x86/include/asm/bitops.h:76: Error: no such instruction: `lock_prefix orb $2,4(%eax)'
--
arch/x86/include/asm/bitops.h: Assembler messages:
arch/x86/include/asm/bitops.h:76: Error: no such instruction: `lock_prefix orb $1,64(%eax)'
>> arch/x86/include/asm/bitops.h:114: Error: no such instruction: `lock_prefix andb $-2,64(%eax)'
>> arch/x86/include/asm/bitops.h:114: Error: no such instruction: `lock_prefix andb $-2,64(%edx)'
>> arch/x86/include/asm/bitops.h:114: Error: no such instruction: `lock_prefix andb $-2,64(%edx)'
arch/x86/include/asm/bitops.h:76: Error: no such instruction: `lock_prefix orb $1,64(%eax)'
>> arch/x86/include/asm/bitops.h:114: Error: no such instruction: `lock_prefix andb $-2,64(%eax)'
--
arch/x86/include/asm/atomic.h: Assembler messages:
>> arch/x86/include/asm/atomic.h:96: Error: no such instruction: `lock_prefix incl 8(%esi)'
arch/x86/include/asm/atomic.h:108: Error: no such instruction: `lock_prefix decl 8(%esi)'
--
arch/x86/include/asm/atomic.h: Assembler messages:
>> arch/x86/include/asm/atomic.h:96: Error: no such instruction: `lock_prefix incl host_sleep_count'
>> arch/x86/include/asm/atomic.h:122: Error: no such instruction: `lock_prefix decl host_sleep_count'
--
arch/x86/include/asm/atomic.h: Assembler messages:
>> arch/x86/include/asm/atomic.h:55: Error: no such instruction: `lock_prefix addl %edx,contig_page_data+504(%eax)'
>> arch/x86/include/asm/atomic.h:55: Error: no such instruction: `lock_prefix addl %edx,vm_zone_stat+36'
>> arch/x86/include/asm/atomic.h:96: Error: no such instruction: `lock_prefix incl 36(%eax)'
>> arch/x86/include/asm/atomic.h:197: Error: no such instruction: `lock_prefix cmpxchgl %ecx,28(%edx)'
>> arch/x86/include/asm/atomic.h:122: Error: no such instruction: `lock_prefix decl 40(%eax)'
arch/x86/include/asm/atomic.h:122: Error: no such instruction: `lock_prefix decl (%esi)'
arch/x86/include/asm/atomic.h:122: Error: no such instruction: `lock_prefix decl 36(%eax)'
arch/x86/include/asm/atomic.h:96: Error: no such instruction: `lock_prefix incl 28(%eax)'
>> arch/x86/include/asm/atomic.h:122: Error: no such instruction: `lock_prefix decl 36(%ebx)'
arch/x86/include/asm/atomic.h:122: Error: no such instruction: `lock_prefix decl 40(%ebx)'
arch/x86/include/asm/atomic.h:122: Error: no such instruction: `lock_prefix decl -444(%ebx)'
arch/x86/include/asm/atomic.h:122: Error: no such instruction: `lock_prefix decl (%ebx)'
arch/x86/include/asm/bitops.h:114: Error: no such instruction: `lock_prefix andb $-5,4(%eax)'
arch/x86/include/asm/atomic.h:96: Error: no such instruction: `lock_prefix incl (%eax)'
arch/x86/include/asm/atomic.h:96: Error: no such instruction: `lock_prefix incl (%eax)'
>> arch/x86/include/asm/atomic.h:96: Error: no such instruction: `lock_prefix incl 36(%eax)'
arch/x86/include/asm/atomic.h:96: Error: no such instruction: `lock_prefix incl 28(%eax)'
>> arch/x86/include/asm/atomic.h:108: Error: no such instruction: `lock_prefix decl 184(%ecx)'
>> arch/x86/include/asm/atomic.h:96: Error: no such instruction: `lock_prefix incl 12(%edi)'
arch/x86/include/asm/atomic.h:96: Error: no such instruction: `lock_prefix incl (%esi)'
arch/x86/include/asm/atomic.h:96: Error: no such instruction: `lock_prefix incl 4(%esi)'
>> arch/x86/include/asm/atomic.h:96: Error: no such instruction: `lock_prefix incl 8(%esi)'
arch/x86/include/asm/bitops.h:114: Error: no such instruction: `lock_prefix andb $-2,4(%eax)'
>> arch/x86/include/asm/bitops.h:76: Error: no such instruction: `lock_prefix orb $1,404(%ebx)'
arch/x86/include/asm/bitops.h:76: Error: no such instruction: `lock_prefix orb $2,5(%eax)'
>> arch/x86/include/asm/bitops.h:76: Error: no such instruction: `lock_prefix orb $2,4(%eax)'
arch/x86/include/asm/atomic.h:96: Error: no such instruction: `lock_prefix incl 4(%eax)'
arch/x86/include/asm/atomic.h:96: Error: no such instruction: `lock_prefix incl 4(%eax)'
arch/x86/include/asm/atomic.h:96: Error: no such instruction: `lock_prefix incl (%eax)'
arch/x86/include/asm/atomic.h:108: Error: no such instruction: `lock_prefix decl 4(%eax)'
>> arch/x86/include/asm/atomic.h:96: Error: no such instruction: `lock_prefix incl 8(%esi)'
arch/x86/include/asm/atomic.h:122: Error: no such instruction: `lock_prefix decl 8(%esi)'
--
arch/x86/include/asm/bitops.h: Assembler messages:
>> arch/x86/include/asm/bitops.h:81: Error: no such instruction: `lock_prefix btsl %eax,tainted_mask'
>> arch/x86/include/asm/atomic.h:191: Error: no such instruction: `lock_prefix cmpxchgl %edx,panic_cpu'
>> arch/x86/include/asm/atomic.h:191: Error: no such instruction: `lock_prefix cmpxchgl %edx,panic_cpu'
--
arch/x86/include/asm/bitops.h: Assembler messages:
>> arch/x86/include/asm/bitops.h:76: Error: no such instruction: `lock_prefix orb $1,__cpu_online_mask'
>> arch/x86/include/asm/bitops.h:76: Error: no such instruction: `lock_prefix orb $1,__cpu_active_mask'
>> arch/x86/include/asm/bitops.h:76: Error: no such instruction: `lock_prefix orb $1,__cpu_present_mask'
>> arch/x86/include/asm/bitops.h:76: Error: no such instruction: `lock_prefix orb $1,__cpu_possible_mask'
--
arch/x86/include/asm/atomic.h: Assembler messages:
arch/x86/include/asm/atomic.h:122: Error: no such instruction: `lock_prefix decl -888(%eax)'
>> arch/x86/include/asm/atomic.h:122: Error: no such instruction: `lock_prefix decl (%edi)'
>> arch/x86/include/asm/atomic.h:96: Error: no such instruction: `lock_prefix incl 40(%ebx)'
arch/x86/include/asm/atomic.h:108: Error: no such instruction: `lock_prefix decl 4(%eax)'
arch/x86/include/asm/bitops.h:114: Error: no such instruction: `lock_prefix andb $-3,4(%eax)'
arch/x86/include/asm/atomic.h:122: Error: no such instruction: `lock_prefix decl 4(%eax)'
arch/x86/include/asm/atomic.h:122: Error: no such instruction: `lock_prefix decl 16(%eax)'
arch/x86/include/asm/atomic.h:96: Error: no such instruction: `lock_prefix incl 8(%edi)'
arch/x86/include/asm/atomic.h:122: Error: no such instruction: `lock_prefix decl 8(%edi)'
>> include/asm-generic/atomic-instrumented.h:362: Error: no such instruction: `lock_prefix cmpxchgl %ecx,372(%edi)'
arch/x86/include/asm/atomic.h:96: Error: no such instruction: `lock_prefix incl 8(%edi)'
arch/x86/include/asm/atomic.h:122: Error: no such instruction: `lock_prefix decl 8(%edi)'
arch/x86/include/asm/atomic.h:96: Error: no such instruction: `lock_prefix incl 8(%edi)'
arch/x86/include/asm/atomic.h:122: Error: no such instruction: `lock_prefix decl 8(%edi)'
..

vim +220 arch/x86/include/asm/bitops.h

1a750e0cd include/asm-x86/bitops.h Linus Torvalds 2008-06-18 56
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 57 /**
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 58 * set_bit - Atomically set a bit in memory
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 59 * @nr: the bit to set
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 60 * @addr: the address to start counting from
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 61 *
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 62 * This function is atomic and may not be reordered. See __set_bit()
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 63 * if you do not require the atomic guarantees.
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 64 *
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 65 * Note: there are no guarantees that this function will not be reordered
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 66 * on non x86 architectures, so if you are writing portable code,
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 67 * make sure not to rely on its reordering guarantees.
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 68 *
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 69 * Note that @nr may be almost arbitrarily large; this function is not
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 70 * restricted to acting on a single-word quantity.
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 71 */
c8399943b arch/x86/include/asm/bitops.h Andi Kleen 2009-01-12 72 static __always_inline void
9b710506a arch/x86/include/asm/bitops.h H. Peter Anvin 2013-07-16 73 set_bit(long nr, volatile unsigned long *addr)
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 74 {
7dbceaf9b include/asm-x86/bitops.h Ingo Molnar 2008-06-20 75 if (IS_IMMEDIATE(nr)) {
7dbceaf9b include/asm-x86/bitops.h Ingo Molnar 2008-06-20 @76 asm volatile(LOCK_PREFIX "orb %1,%0"
7dbceaf9b include/asm-x86/bitops.h Ingo Molnar 2008-06-20 77 : CONST_MASK_ADDR(nr, addr)
437a0a54e include/asm-x86/bitops.h Ingo Molnar 2008-06-20 78 : "iq" ((u8)CONST_MASK(nr))
7dbceaf9b include/asm-x86/bitops.h Ingo Molnar 2008-06-20 79 : "memory");
7dbceaf9b include/asm-x86/bitops.h Ingo Molnar 2008-06-20 80 } else {
22636f8c9 arch/x86/include/asm/bitops.h Jan Beulich 2018-02-26 @81 asm volatile(LOCK_PREFIX __ASM_SIZE(bts) " %1,%0"
7dbceaf9b include/asm-x86/bitops.h Ingo Molnar 2008-06-20 82 : BITOP_ADDR(addr) : "Ir" (nr) : "memory");
7dbceaf9b include/asm-x86/bitops.h Ingo Molnar 2008-06-20 83 }
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 84 }
1a750e0cd include/asm-x86/bitops.h Linus Torvalds 2008-06-18 85
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 86 /**
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 87 * __set_bit - Set a bit in memory
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 88 * @nr: the bit to set
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 89 * @addr: the address to start counting from
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 90 *
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 91 * Unlike set_bit(), this function is non-atomic and may be reordered.
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 92 * If it's called on the same region of memory simultaneously, the effect
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 93 * may be that only one operation succeeds.
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 94 */
8dd5032d9 arch/x86/include/asm/bitops.h Denys Vlasenko 2016-02-07 95 static __always_inline void __set_bit(long nr, volatile unsigned long *addr)
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 96 {
22636f8c9 arch/x86/include/asm/bitops.h Jan Beulich 2018-02-26 97 asm volatile(__ASM_SIZE(bts) " %1,%0" : ADDR : "Ir" (nr) : "memory");
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 98 }
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 99
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 100 /**
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 101 * clear_bit - Clears a bit in memory
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 102 * @nr: Bit to clear
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 103 * @addr: Address to start counting from
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 104 *
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 105 * clear_bit() is atomic and may not be reordered. However, it does
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 106 * not contain a memory barrier, so if it is used for locking purposes,
d00a56928 arch/x86/include/asm/bitops.h Peter Zijlstra 2014-03-13 107 * you should call smp_mb__before_atomic() and/or smp_mb__after_atomic()
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 108 * in order to ensure changes are visible on other processors.
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 109 */
c8399943b arch/x86/include/asm/bitops.h Andi Kleen 2009-01-12 110 static __always_inline void
9b710506a arch/x86/include/asm/bitops.h H. Peter Anvin 2013-07-16 111 clear_bit(long nr, volatile unsigned long *addr)
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 112 {
7dbceaf9b include/asm-x86/bitops.h Ingo Molnar 2008-06-20 113 if (IS_IMMEDIATE(nr)) {
7dbceaf9b include/asm-x86/bitops.h Ingo Molnar 2008-06-20 @114 asm volatile(LOCK_PREFIX "andb %1,%0"
7dbceaf9b include/asm-x86/bitops.h Ingo Molnar 2008-06-20 115 : CONST_MASK_ADDR(nr, addr)
437a0a54e include/asm-x86/bitops.h Ingo Molnar 2008-06-20 116 : "iq" ((u8)~CONST_MASK(nr)));
7dbceaf9b include/asm-x86/bitops.h Ingo Molnar 2008-06-20 117 } else {
22636f8c9 arch/x86/include/asm/bitops.h Jan Beulich 2018-02-26 @118 asm volatile(LOCK_PREFIX __ASM_SIZE(btr) " %1,%0"
7dbceaf9b include/asm-x86/bitops.h Ingo Molnar 2008-06-20 119 : BITOP_ADDR(addr)
7dbceaf9b include/asm-x86/bitops.h Ingo Molnar 2008-06-20 120 : "Ir" (nr));
7dbceaf9b include/asm-x86/bitops.h Ingo Molnar 2008-06-20 121 }
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 122 }
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 123
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 124 /*
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 125 * clear_bit_unlock - Clears a bit in memory
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 126 * @nr: Bit to clear
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 127 * @addr: Address to start counting from
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 128 *
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 129 * clear_bit() is atomic and implies release semantics before the memory
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 130 * operation. It can be used for an unlock.
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 131 */
8dd5032d9 arch/x86/include/asm/bitops.h Denys Vlasenko 2016-02-07 132 static __always_inline void clear_bit_unlock(long nr, volatile unsigned long *addr)
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 133 {
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 134 barrier();
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 135 clear_bit(nr, addr);
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 136 }
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 137
8dd5032d9 arch/x86/include/asm/bitops.h Denys Vlasenko 2016-02-07 138 static __always_inline void __clear_bit(long nr, volatile unsigned long *addr)
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 139 {
22636f8c9 arch/x86/include/asm/bitops.h Jan Beulich 2018-02-26 140 asm volatile(__ASM_SIZE(btr) " %1,%0" : ADDR : "Ir" (nr));
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 141 }
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 142
b91e1302a arch/x86/include/asm/bitops.h Linus Torvalds 2016-12-27 143 static __always_inline bool clear_bit_unlock_is_negative_byte(long nr, volatile unsigned long *addr)
b91e1302a arch/x86/include/asm/bitops.h Linus Torvalds 2016-12-27 144 {
b91e1302a arch/x86/include/asm/bitops.h Linus Torvalds 2016-12-27 145 bool negative;
3c52b5c64 arch/x86/include/asm/bitops.h Uros Bizjak 2017-09-06 146 asm volatile(LOCK_PREFIX "andb %2,%1"
b91e1302a arch/x86/include/asm/bitops.h Linus Torvalds 2016-12-27 147 CC_SET(s)
b91e1302a arch/x86/include/asm/bitops.h Linus Torvalds 2016-12-27 148 : CC_OUT(s) (negative), ADDR
b91e1302a arch/x86/include/asm/bitops.h Linus Torvalds 2016-12-27 149 : "ir" ((char) ~(1 << nr)) : "memory");
b91e1302a arch/x86/include/asm/bitops.h Linus Torvalds 2016-12-27 150 return negative;
b91e1302a arch/x86/include/asm/bitops.h Linus Torvalds 2016-12-27 151 }
b91e1302a arch/x86/include/asm/bitops.h Linus Torvalds 2016-12-27 152
b91e1302a arch/x86/include/asm/bitops.h Linus Torvalds 2016-12-27 153 // Let everybody know we have it
b91e1302a arch/x86/include/asm/bitops.h Linus Torvalds 2016-12-27 154 #define clear_bit_unlock_is_negative_byte clear_bit_unlock_is_negative_byte
b91e1302a arch/x86/include/asm/bitops.h Linus Torvalds 2016-12-27 155
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 156 /*
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 157 * __clear_bit_unlock - Clears a bit in memory
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 158 * @nr: Bit to clear
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 159 * @addr: Address to start counting from
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 160 *
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 161 * __clear_bit() is non-atomic and implies release semantics before the memory
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 162 * operation. It can be used for an unlock if no other CPUs can concurrently
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 163 * modify other bits in the word.
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 164 *
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 165 * No memory barrier is required here, because x86 cannot reorder stores past
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 166 * older loads. Same principle as spin_unlock.
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 167 */
8dd5032d9 arch/x86/include/asm/bitops.h Denys Vlasenko 2016-02-07 168 static __always_inline void __clear_bit_unlock(long nr, volatile unsigned long *addr)
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 169 {
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 170 barrier();
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 171 __clear_bit(nr, addr);
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 172 }
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 173
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 174 /**
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 175 * __change_bit - Toggle a bit in memory
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 176 * @nr: the bit to change
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 177 * @addr: the address to start counting from
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 178 *
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 179 * Unlike change_bit(), this function is non-atomic and may be reordered.
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 180 * If it's called on the same region of memory simultaneously, the effect
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 181 * may be that only one operation succeeds.
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 182 */
8dd5032d9 arch/x86/include/asm/bitops.h Denys Vlasenko 2016-02-07 183 static __always_inline void __change_bit(long nr, volatile unsigned long *addr)
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 184 {
22636f8c9 arch/x86/include/asm/bitops.h Jan Beulich 2018-02-26 185 asm volatile(__ASM_SIZE(btc) " %1,%0" : ADDR : "Ir" (nr));
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 186 }
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 187
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 188 /**
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 189 * change_bit - Toggle a bit in memory
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 190 * @nr: Bit to change
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 191 * @addr: Address to start counting from
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 192 *
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 193 * change_bit() is atomic and may not be reordered.
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 194 * Note that @nr may be almost arbitrarily large; this function is not
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 195 * restricted to acting on a single-word quantity.
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 196 */
8dd5032d9 arch/x86/include/asm/bitops.h Denys Vlasenko 2016-02-07 197 static __always_inline void change_bit(long nr, volatile unsigned long *addr)
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 198 {
838e8bb71 arch/x86/include/asm/bitops.h Uros Bizjak 2008-10-24 199 if (IS_IMMEDIATE(nr)) {
838e8bb71 arch/x86/include/asm/bitops.h Uros Bizjak 2008-10-24 200 asm volatile(LOCK_PREFIX "xorb %1,%0"
838e8bb71 arch/x86/include/asm/bitops.h Uros Bizjak 2008-10-24 201 : CONST_MASK_ADDR(nr, addr)
838e8bb71 arch/x86/include/asm/bitops.h Uros Bizjak 2008-10-24 202 : "iq" ((u8)CONST_MASK(nr)));
838e8bb71 arch/x86/include/asm/bitops.h Uros Bizjak 2008-10-24 203 } else {
22636f8c9 arch/x86/include/asm/bitops.h Jan Beulich 2018-02-26 204 asm volatile(LOCK_PREFIX __ASM_SIZE(btc) " %1,%0"
838e8bb71 arch/x86/include/asm/bitops.h Uros Bizjak 2008-10-24 205 : BITOP_ADDR(addr)
838e8bb71 arch/x86/include/asm/bitops.h Uros Bizjak 2008-10-24 206 : "Ir" (nr));
838e8bb71 arch/x86/include/asm/bitops.h Uros Bizjak 2008-10-24 207 }
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 208 }
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 209
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 210 /**
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 211 * test_and_set_bit - Set a bit and return its old value
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 212 * @nr: Bit to set
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 213 * @addr: Address to count from
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 214 *
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 215 * This operation is atomic and cannot be reordered.
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 216 * It also implies a memory barrier.
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 217 */
117780eef arch/x86/include/asm/bitops.h H. Peter Anvin 2016-06-08 218 static __always_inline bool test_and_set_bit(long nr, volatile unsigned long *addr)
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 219 {
22636f8c9 arch/x86/include/asm/bitops.h Jan Beulich 2018-02-26 @220 GEN_BINARY_RMWcc(LOCK_PREFIX __ASM_SIZE(bts),
22636f8c9 arch/x86/include/asm/bitops.h Jan Beulich 2018-02-26 221 *addr, "Ir", nr, "%0", c);
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 222 }
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 223
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 224 /**
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 225 * test_and_set_bit_lock - Set a bit and return its old value for lock
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 226 * @nr: Bit to set
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 227 * @addr: Address to count from
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 228 *
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 229 * This is the same as test_and_set_bit on x86.
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 230 */
117780eef arch/x86/include/asm/bitops.h H. Peter Anvin 2016-06-08 231 static __always_inline bool
9b710506a arch/x86/include/asm/bitops.h H. Peter Anvin 2013-07-16 232 test_and_set_bit_lock(long nr, volatile unsigned long *addr)
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 233 {
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 234 return test_and_set_bit(nr, addr);
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 235 }
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 236
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 237 /**
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 238 * __test_and_set_bit - Set a bit and return its old value
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 239 * @nr: Bit to set
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 240 * @addr: Address to count from
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 241 *
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 242 * This operation is non-atomic and can be reordered.
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 243 * If two examples of this operation race, one can appear to succeed
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 244 * but actually fail. You must protect multiple accesses with a lock.
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 245 */
117780eef arch/x86/include/asm/bitops.h H. Peter Anvin 2016-06-08 246 static __always_inline bool __test_and_set_bit(long nr, volatile unsigned long *addr)
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 247 {
117780eef arch/x86/include/asm/bitops.h H. Peter Anvin 2016-06-08 248 bool oldbit;
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 249
22636f8c9 arch/x86/include/asm/bitops.h Jan Beulich 2018-02-26 250 asm(__ASM_SIZE(bts) " %2,%1"
86b61240d arch/x86/include/asm/bitops.h H. Peter Anvin 2016-06-08 251 CC_SET(c)
86b61240d arch/x86/include/asm/bitops.h H. Peter Anvin 2016-06-08 252 : CC_OUT(c) (oldbit), ADDR
eb2b4e682 include/asm-x86/bitops.h Simon Holm Th?gersen 2008-05-05 253 : "Ir" (nr));
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 254 return oldbit;
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 255 }
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 256
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 257 /**
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 258 * test_and_clear_bit - Clear a bit and return its old value
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 259 * @nr: Bit to clear
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 260 * @addr: Address to count from
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 261 *
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 262 * This operation is atomic and cannot be reordered.
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 263 * It also implies a memory barrier.
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 264 */
117780eef arch/x86/include/asm/bitops.h H. Peter Anvin 2016-06-08 265 static __always_inline bool test_and_clear_bit(long nr, volatile unsigned long *addr)
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 266 {
22636f8c9 arch/x86/include/asm/bitops.h Jan Beulich 2018-02-26 @267 GEN_BINARY_RMWcc(LOCK_PREFIX __ASM_SIZE(btr),
22636f8c9 arch/x86/include/asm/bitops.h Jan Beulich 2018-02-26 268 *addr, "Ir", nr, "%0", c);
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 269 }
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 270

:::::: The code at line 220 was first introduced by commit
:::::: 22636f8c9511245cb3c8412039f1dd95afb3aa59 x86/asm: Add instruction suffixes to bitops

:::::: TO: Jan Beulich <[email protected]>
:::::: CC: Thomas Gleixner <[email protected]>

---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation


Attachments:
(No filename) (30.92 kB)
.config.gz (7.36 kB)
Download all attachments

2018-06-05 05:49:23

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH v2 4/9] x86: alternatives: macrofy locks for better inlining

Hi Nadav,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on linus/master]
[also build test ERROR on v4.17 next-20180604]
[cannot apply to tip/x86/core]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url: https://github.com/0day-ci/linux/commits/Nadav-Amit/x86-macrofying-inline-asm-for-better-compilation/20180605-124313
config: um-x86_64_defconfig (attached as .config)
compiler: gcc-7 (Debian 7.3.0-16) 7.3.0
reproduce:
# save the attached .config to linux build tree
make ARCH=um SUBARCH=x86_64

All errors (new ones prefixed by >>):

arch/x86/include/asm/bitops.h: Assembler messages:
>> arch/x86/include/asm/bitops.h:220: Error: no such instruction: `lock_prefix btsq $0,(%rax)'
>> arch/x86/include/asm/atomic.h:122: Error: no such instruction: `lock_prefix decl 28(%r12)'
--
arch/x86/include/asm/atomic64_64.h: Assembler messages:
>> arch/x86/include/asm/atomic64_64.h:87: Error: no such instruction: `lock_prefix incq 1000(%rcx,%rdx)'
>> arch/x86/include/asm/atomic64_64.h:87: Error: no such instruction: `lock_prefix incq 64(%rdx)'
--
arch/x86/include/asm/bitops.h: Assembler messages:
>> arch/x86/include/asm/bitops.h:267: Error: no such instruction: `lock_prefix btrq $8,8(%rax)'
--
arch/x86/include/asm/bitops.h: Assembler messages:
>> arch/x86/include/asm/bitops.h:76: Error: no such instruction: `lock_prefix orb $2,8(%rax)'
--
arch/x86/include/asm/bitops.h: Assembler messages:
arch/x86/include/asm/bitops.h:76: Error: no such instruction: `lock_prefix orb $1,120(%rax)'
>> arch/x86/include/asm/bitops.h:114: Error: no such instruction: `lock_prefix andb $-2,120(%rax)'
>> arch/x86/include/asm/bitops.h:114: Error: no such instruction: `lock_prefix andb $-2,120(%rdx)'
>> arch/x86/include/asm/bitops.h:114: Error: no such instruction: `lock_prefix andb $-2,120(%rdx)'
arch/x86/include/asm/bitops.h:76: Error: no such instruction: `lock_prefix orb $1,120(%rax)'
>> arch/x86/include/asm/bitops.h:114: Error: no such instruction: `lock_prefix andb $-2,120(%rax)'
--
arch/x86/include/asm/atomic.h: Assembler messages:
>> arch/x86/include/asm/atomic.h:96: Error: no such instruction: `lock_prefix incl 16(%r12)'
arch/x86/include/asm/atomic.h:108: Error: no such instruction: `lock_prefix decl 16(%r12)'
--
arch/x86/include/asm/atomic.h: Assembler messages:
>> arch/x86/include/asm/atomic.h:96: Error: no such instruction: `lock_prefix incl (%rdx)'
>> arch/x86/include/asm/atomic.h:122: Error: no such instruction: `lock_prefix decl (%rdx)'
--
arch/x86/include/asm/atomic64_64.h: Assembler messages:
>> arch/x86/include/asm/atomic64_64.h:46: Error: no such instruction: `lock_prefix addq %rsi,1008(%rdx,%rax)'
>> arch/x86/include/asm/atomic64_64.h:46: Error: no such instruction: `lock_prefix addq %rsi,72(%rax)'
>> arch/x86/include/asm/atomic.h:96: Error: no such instruction: `lock_prefix incl 72(%rax)'
>> arch/x86/include/asm/atomic64_64.h:183: Error: no such instruction: `lock_prefix cmpxchgq %rcx,56(%rdx)'
>> arch/x86/include/asm/atomic.h:122: Error: no such instruction: `lock_prefix decl 76(%rdi)'
arch/x86/include/asm/atomic.h:122: Error: no such instruction: `lock_prefix decl (%r12)'
arch/x86/include/asm/atomic.h:122: Error: no such instruction: `lock_prefix decl 72(%rdi)'
>> arch/x86/include/asm/atomic64_64.h:87: Error: no such instruction: `lock_prefix incq 56(%rsi)'
arch/x86/include/asm/atomic.h:122: Error: no such instruction: `lock_prefix decl 72(%rdi)'
>> arch/x86/include/asm/atomic.h:122: Error: no such instruction: `lock_prefix decl 76(%rbx)'
arch/x86/include/asm/atomic.h:122: Error: no such instruction: `lock_prefix decl -868(%rbx)'
arch/x86/include/asm/atomic.h:122: Error: no such instruction: `lock_prefix decl (%rdi)'
arch/x86/include/asm/bitops.h:114: Error: no such instruction: `lock_prefix andb $-5,8(%rax)'
>> arch/x86/include/asm/atomic.h:96: Error: no such instruction: `lock_prefix incl (%rdi)'
arch/x86/include/asm/atomic.h:96: Error: no such instruction: `lock_prefix incl (%rax)'
>> arch/x86/include/asm/atomic.h:96: Error: no such instruction: `lock_prefix incl 72(%rbx)'
>> arch/x86/include/asm/atomic64_64.h:87: Error: no such instruction: `lock_prefix incq 56(%rax)'
>> arch/x86/include/asm/atomic.h:108: Error: no such instruction: `lock_prefix decl 296(%rcx)'
arch/x86/include/asm/atomic.h:96: Error: no such instruction: `lock_prefix incl 24(%rdx)'
>> arch/x86/include/asm/atomic64_64.h:87: Error: no such instruction: `lock_prefix incq (%rbx)'
arch/x86/include/asm/atomic.h:96: Error: no such instruction: `lock_prefix incl 8(%rbx)'
arch/x86/include/asm/atomic.h:96: Error: no such instruction: `lock_prefix incl 12(%rbx)'
arch/x86/include/asm/bitops.h:114: Error: no such instruction: `lock_prefix andb $-2,8(%rax)'
>> arch/x86/include/asm/bitops.h:76: Error: no such instruction: `lock_prefix orb $1,600(%r15)'
arch/x86/include/asm/bitops.h:76: Error: no such instruction: `lock_prefix orb $2,9(%rax)'
>> arch/x86/include/asm/bitops.h:76: Error: no such instruction: `lock_prefix orb $2,8(%rax)'
arch/x86/include/asm/atomic.h:96: Error: no such instruction: `lock_prefix incl 4(%rax)'
arch/x86/include/asm/atomic.h:96: Error: no such instruction: `lock_prefix incl 4(%rax)'
arch/x86/include/asm/atomic.h:96: Error: no such instruction: `lock_prefix incl (%rax)'
>> arch/x86/include/asm/atomic.h:108: Error: no such instruction: `lock_prefix decl 4(%rax)'
arch/x86/include/asm/atomic.h:96: Error: no such instruction: `lock_prefix incl 16(%rbx)'
arch/x86/include/asm/atomic.h:122: Error: no such instruction: `lock_prefix decl 16(%rbx)'
--
arch/x86/include/asm/bitops.h: Assembler messages:
>> arch/x86/include/asm/bitops.h:81: Error: no such instruction: `lock_prefix btsq %rbx,(%rax)'
>> arch/x86/include/asm/atomic.h:191: Error: no such instruction: `lock_prefix cmpxchgl %ecx,(%rdx)'
>> arch/x86/include/asm/atomic.h:191: Error: no such instruction: `lock_prefix cmpxchgl %ecx,(%rdx)'
--
arch/x86/include/asm/atomic.h: Assembler messages:
arch/x86/include/asm/atomic.h:122: Error: no such instruction: `lock_prefix decl -1408(%rdi)'
>> arch/x86/include/asm/atomic.h:122: Error: no such instruction: `lock_prefix decl (%r15)'
arch/x86/include/asm/atomic.h:96: Error: no such instruction: `lock_prefix incl 76(%rbx)'
>> arch/x86/include/asm/atomic.h:108: Error: no such instruction: `lock_prefix decl 4(%rax)'
>> arch/x86/include/asm/bitops.h:114: Error: no such instruction: `lock_prefix andb $-3,8(%rcx)'
arch/x86/include/asm/atomic.h:122: Error: no such instruction: `lock_prefix decl 4(%rax)'
arch/x86/include/asm/atomic.h:122: Error: no such instruction: `lock_prefix decl 28(%rdi)'
>> arch/x86/include/asm/atomic.h:96: Error: no such instruction: `lock_prefix incl 16(%r14)'
>> arch/x86/include/asm/atomic.h:122: Error: no such instruction: `lock_prefix decl 16(%r14)'
>> include/asm-generic/atomic-instrumented.h:362: Error: no such instruction: `lock_prefix cmpxchgl %r8d,560(%r14)'
>> arch/x86/include/asm/atomic.h:96: Error: no such instruction: `lock_prefix incl 16(%r14)'
>> arch/x86/include/asm/atomic.h:122: Error: no such instruction: `lock_prefix decl 16(%r14)'
>> arch/x86/include/asm/atomic.h:96: Error: no such instruction: `lock_prefix incl 16(%r14)'
>> arch/x86/include/asm/atomic.h:122: Error: no such instruction: `lock_prefix decl 16(%r14)'
--
arch/x86/include/asm/bitops.h: Assembler messages:
>> arch/x86/include/asm/bitops.h:220: Error: no such instruction: `lock_prefix btsq $0,8(%rbx)'
arch/x86/include/asm/bitops.h:114: Error: no such instruction: `lock_prefix andb $-2,(%rax)'
>> arch/x86/include/asm/bitops.h:220: Error: no such instruction: `lock_prefix btsq $0,72(%rdi)'
>> arch/x86/include/asm/bitops.h:267: Error: no such instruction: `lock_prefix btrq $0,8(%r15)'
..

vim +220 arch/x86/include/asm/bitops.h

1a750e0cd include/asm-x86/bitops.h Linus Torvalds 2008-06-18 56
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 57 /**
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 58 * set_bit - Atomically set a bit in memory
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 59 * @nr: the bit to set
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 60 * @addr: the address to start counting from
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 61 *
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 62 * This function is atomic and may not be reordered. See __set_bit()
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 63 * if you do not require the atomic guarantees.
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 64 *
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 65 * Note: there are no guarantees that this function will not be reordered
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 66 * on non x86 architectures, so if you are writing portable code,
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 67 * make sure not to rely on its reordering guarantees.
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 68 *
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 69 * Note that @nr may be almost arbitrarily large; this function is not
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 70 * restricted to acting on a single-word quantity.
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 71 */
c8399943b arch/x86/include/asm/bitops.h Andi Kleen 2009-01-12 72 static __always_inline void
9b710506a arch/x86/include/asm/bitops.h H. Peter Anvin 2013-07-16 73 set_bit(long nr, volatile unsigned long *addr)
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 74 {
7dbceaf9b include/asm-x86/bitops.h Ingo Molnar 2008-06-20 75 if (IS_IMMEDIATE(nr)) {
7dbceaf9b include/asm-x86/bitops.h Ingo Molnar 2008-06-20 @76 asm volatile(LOCK_PREFIX "orb %1,%0"
7dbceaf9b include/asm-x86/bitops.h Ingo Molnar 2008-06-20 77 : CONST_MASK_ADDR(nr, addr)
437a0a54e include/asm-x86/bitops.h Ingo Molnar 2008-06-20 78 : "iq" ((u8)CONST_MASK(nr))
7dbceaf9b include/asm-x86/bitops.h Ingo Molnar 2008-06-20 79 : "memory");
7dbceaf9b include/asm-x86/bitops.h Ingo Molnar 2008-06-20 80 } else {
22636f8c9 arch/x86/include/asm/bitops.h Jan Beulich 2018-02-26 @81 asm volatile(LOCK_PREFIX __ASM_SIZE(bts) " %1,%0"
7dbceaf9b include/asm-x86/bitops.h Ingo Molnar 2008-06-20 82 : BITOP_ADDR(addr) : "Ir" (nr) : "memory");
7dbceaf9b include/asm-x86/bitops.h Ingo Molnar 2008-06-20 83 }
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 84 }
1a750e0cd include/asm-x86/bitops.h Linus Torvalds 2008-06-18 85
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 86 /**
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 87 * __set_bit - Set a bit in memory
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 88 * @nr: the bit to set
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 89 * @addr: the address to start counting from
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 90 *
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 91 * Unlike set_bit(), this function is non-atomic and may be reordered.
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 92 * If it's called on the same region of memory simultaneously, the effect
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 93 * may be that only one operation succeeds.
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 94 */
8dd5032d9 arch/x86/include/asm/bitops.h Denys Vlasenko 2016-02-07 95 static __always_inline void __set_bit(long nr, volatile unsigned long *addr)
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 96 {
22636f8c9 arch/x86/include/asm/bitops.h Jan Beulich 2018-02-26 97 asm volatile(__ASM_SIZE(bts) " %1,%0" : ADDR : "Ir" (nr) : "memory");
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 98 }
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 99
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 100 /**
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 101 * clear_bit - Clears a bit in memory
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 102 * @nr: Bit to clear
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 103 * @addr: Address to start counting from
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 104 *
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 105 * clear_bit() is atomic and may not be reordered. However, it does
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 106 * not contain a memory barrier, so if it is used for locking purposes,
d00a56928 arch/x86/include/asm/bitops.h Peter Zijlstra 2014-03-13 107 * you should call smp_mb__before_atomic() and/or smp_mb__after_atomic()
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 108 * in order to ensure changes are visible on other processors.
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 109 */
c8399943b arch/x86/include/asm/bitops.h Andi Kleen 2009-01-12 110 static __always_inline void
9b710506a arch/x86/include/asm/bitops.h H. Peter Anvin 2013-07-16 111 clear_bit(long nr, volatile unsigned long *addr)
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 112 {
7dbceaf9b include/asm-x86/bitops.h Ingo Molnar 2008-06-20 113 if (IS_IMMEDIATE(nr)) {
7dbceaf9b include/asm-x86/bitops.h Ingo Molnar 2008-06-20 @114 asm volatile(LOCK_PREFIX "andb %1,%0"
7dbceaf9b include/asm-x86/bitops.h Ingo Molnar 2008-06-20 115 : CONST_MASK_ADDR(nr, addr)
437a0a54e include/asm-x86/bitops.h Ingo Molnar 2008-06-20 116 : "iq" ((u8)~CONST_MASK(nr)));
7dbceaf9b include/asm-x86/bitops.h Ingo Molnar 2008-06-20 117 } else {
22636f8c9 arch/x86/include/asm/bitops.h Jan Beulich 2018-02-26 @118 asm volatile(LOCK_PREFIX __ASM_SIZE(btr) " %1,%0"
7dbceaf9b include/asm-x86/bitops.h Ingo Molnar 2008-06-20 119 : BITOP_ADDR(addr)
7dbceaf9b include/asm-x86/bitops.h Ingo Molnar 2008-06-20 120 : "Ir" (nr));
7dbceaf9b include/asm-x86/bitops.h Ingo Molnar 2008-06-20 121 }
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 122 }
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 123
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 124 /*
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 125 * clear_bit_unlock - Clears a bit in memory
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 126 * @nr: Bit to clear
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 127 * @addr: Address to start counting from
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 128 *
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 129 * clear_bit() is atomic and implies release semantics before the memory
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 130 * operation. It can be used for an unlock.
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 131 */
8dd5032d9 arch/x86/include/asm/bitops.h Denys Vlasenko 2016-02-07 132 static __always_inline void clear_bit_unlock(long nr, volatile unsigned long *addr)
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 133 {
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 134 barrier();
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 135 clear_bit(nr, addr);
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 136 }
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 137
8dd5032d9 arch/x86/include/asm/bitops.h Denys Vlasenko 2016-02-07 138 static __always_inline void __clear_bit(long nr, volatile unsigned long *addr)
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 139 {
22636f8c9 arch/x86/include/asm/bitops.h Jan Beulich 2018-02-26 140 asm volatile(__ASM_SIZE(btr) " %1,%0" : ADDR : "Ir" (nr));
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 141 }
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 142
b91e1302a arch/x86/include/asm/bitops.h Linus Torvalds 2016-12-27 143 static __always_inline bool clear_bit_unlock_is_negative_byte(long nr, volatile unsigned long *addr)
b91e1302a arch/x86/include/asm/bitops.h Linus Torvalds 2016-12-27 144 {
b91e1302a arch/x86/include/asm/bitops.h Linus Torvalds 2016-12-27 145 bool negative;
3c52b5c64 arch/x86/include/asm/bitops.h Uros Bizjak 2017-09-06 @146 asm volatile(LOCK_PREFIX "andb %2,%1"
b91e1302a arch/x86/include/asm/bitops.h Linus Torvalds 2016-12-27 147 CC_SET(s)
b91e1302a arch/x86/include/asm/bitops.h Linus Torvalds 2016-12-27 148 : CC_OUT(s) (negative), ADDR
b91e1302a arch/x86/include/asm/bitops.h Linus Torvalds 2016-12-27 149 : "ir" ((char) ~(1 << nr)) : "memory");
b91e1302a arch/x86/include/asm/bitops.h Linus Torvalds 2016-12-27 150 return negative;
b91e1302a arch/x86/include/asm/bitops.h Linus Torvalds 2016-12-27 151 }
b91e1302a arch/x86/include/asm/bitops.h Linus Torvalds 2016-12-27 152
b91e1302a arch/x86/include/asm/bitops.h Linus Torvalds 2016-12-27 153 // Let everybody know we have it
b91e1302a arch/x86/include/asm/bitops.h Linus Torvalds 2016-12-27 154 #define clear_bit_unlock_is_negative_byte clear_bit_unlock_is_negative_byte
b91e1302a arch/x86/include/asm/bitops.h Linus Torvalds 2016-12-27 155
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 156 /*
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 157 * __clear_bit_unlock - Clears a bit in memory
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 158 * @nr: Bit to clear
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 159 * @addr: Address to start counting from
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 160 *
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 161 * __clear_bit() is non-atomic and implies release semantics before the memory
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 162 * operation. It can be used for an unlock if no other CPUs can concurrently
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 163 * modify other bits in the word.
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 164 *
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 165 * No memory barrier is required here, because x86 cannot reorder stores past
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 166 * older loads. Same principle as spin_unlock.
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 167 */
8dd5032d9 arch/x86/include/asm/bitops.h Denys Vlasenko 2016-02-07 168 static __always_inline void __clear_bit_unlock(long nr, volatile unsigned long *addr)
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 169 {
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 170 barrier();
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 171 __clear_bit(nr, addr);
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 172 }
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 173
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 174 /**
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 175 * __change_bit - Toggle a bit in memory
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 176 * @nr: the bit to change
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 177 * @addr: the address to start counting from
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 178 *
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 179 * Unlike change_bit(), this function is non-atomic and may be reordered.
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 180 * If it's called on the same region of memory simultaneously, the effect
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 181 * may be that only one operation succeeds.
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 182 */
8dd5032d9 arch/x86/include/asm/bitops.h Denys Vlasenko 2016-02-07 183 static __always_inline void __change_bit(long nr, volatile unsigned long *addr)
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 184 {
22636f8c9 arch/x86/include/asm/bitops.h Jan Beulich 2018-02-26 185 asm volatile(__ASM_SIZE(btc) " %1,%0" : ADDR : "Ir" (nr));
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 186 }
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 187
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 188 /**
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 189 * change_bit - Toggle a bit in memory
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 190 * @nr: Bit to change
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 191 * @addr: Address to start counting from
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 192 *
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 193 * change_bit() is atomic and may not be reordered.
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 194 * Note that @nr may be almost arbitrarily large; this function is not
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 195 * restricted to acting on a single-word quantity.
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 196 */
8dd5032d9 arch/x86/include/asm/bitops.h Denys Vlasenko 2016-02-07 197 static __always_inline void change_bit(long nr, volatile unsigned long *addr)
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 198 {
838e8bb71 arch/x86/include/asm/bitops.h Uros Bizjak 2008-10-24 199 if (IS_IMMEDIATE(nr)) {
838e8bb71 arch/x86/include/asm/bitops.h Uros Bizjak 2008-10-24 200 asm volatile(LOCK_PREFIX "xorb %1,%0"
838e8bb71 arch/x86/include/asm/bitops.h Uros Bizjak 2008-10-24 201 : CONST_MASK_ADDR(nr, addr)
838e8bb71 arch/x86/include/asm/bitops.h Uros Bizjak 2008-10-24 202 : "iq" ((u8)CONST_MASK(nr)));
838e8bb71 arch/x86/include/asm/bitops.h Uros Bizjak 2008-10-24 203 } else {
22636f8c9 arch/x86/include/asm/bitops.h Jan Beulich 2018-02-26 204 asm volatile(LOCK_PREFIX __ASM_SIZE(btc) " %1,%0"
838e8bb71 arch/x86/include/asm/bitops.h Uros Bizjak 2008-10-24 205 : BITOP_ADDR(addr)
838e8bb71 arch/x86/include/asm/bitops.h Uros Bizjak 2008-10-24 206 : "Ir" (nr));
838e8bb71 arch/x86/include/asm/bitops.h Uros Bizjak 2008-10-24 207 }
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 208 }
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 209
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 210 /**
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 211 * test_and_set_bit - Set a bit and return its old value
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 212 * @nr: Bit to set
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 213 * @addr: Address to count from
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 214 *
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 215 * This operation is atomic and cannot be reordered.
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 216 * It also implies a memory barrier.
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 217 */
117780eef arch/x86/include/asm/bitops.h H. Peter Anvin 2016-06-08 218 static __always_inline bool test_and_set_bit(long nr, volatile unsigned long *addr)
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 219 {
22636f8c9 arch/x86/include/asm/bitops.h Jan Beulich 2018-02-26 @220 GEN_BINARY_RMWcc(LOCK_PREFIX __ASM_SIZE(bts),
22636f8c9 arch/x86/include/asm/bitops.h Jan Beulich 2018-02-26 221 *addr, "Ir", nr, "%0", c);
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 222 }
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 223
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 224 /**
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 225 * test_and_set_bit_lock - Set a bit and return its old value for lock
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 226 * @nr: Bit to set
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 227 * @addr: Address to count from
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 228 *
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 229 * This is the same as test_and_set_bit on x86.
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 230 */
117780eef arch/x86/include/asm/bitops.h H. Peter Anvin 2016-06-08 231 static __always_inline bool
9b710506a arch/x86/include/asm/bitops.h H. Peter Anvin 2013-07-16 232 test_and_set_bit_lock(long nr, volatile unsigned long *addr)
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 233 {
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 234 return test_and_set_bit(nr, addr);
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 235 }
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 236
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 237 /**
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 238 * __test_and_set_bit - Set a bit and return its old value
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 239 * @nr: Bit to set
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 240 * @addr: Address to count from
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 241 *
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 242 * This operation is non-atomic and can be reordered.
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 243 * If two examples of this operation race, one can appear to succeed
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 244 * but actually fail. You must protect multiple accesses with a lock.
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 245 */
117780eef arch/x86/include/asm/bitops.h H. Peter Anvin 2016-06-08 246 static __always_inline bool __test_and_set_bit(long nr, volatile unsigned long *addr)
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 247 {
117780eef arch/x86/include/asm/bitops.h H. Peter Anvin 2016-06-08 248 bool oldbit;
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 249
22636f8c9 arch/x86/include/asm/bitops.h Jan Beulich 2018-02-26 250 asm(__ASM_SIZE(bts) " %2,%1"
86b61240d arch/x86/include/asm/bitops.h H. Peter Anvin 2016-06-08 251 CC_SET(c)
86b61240d arch/x86/include/asm/bitops.h H. Peter Anvin 2016-06-08 252 : CC_OUT(c) (oldbit), ADDR
eb2b4e682 include/asm-x86/bitops.h Simon Holm Th?gersen 2008-05-05 253 : "Ir" (nr));
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 254 return oldbit;
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 255 }
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 256
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 257 /**
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 258 * test_and_clear_bit - Clear a bit and return its old value
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 259 * @nr: Bit to clear
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 260 * @addr: Address to count from
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 261 *
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 262 * This operation is atomic and cannot be reordered.
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 263 * It also implies a memory barrier.
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 264 */
117780eef arch/x86/include/asm/bitops.h H. Peter Anvin 2016-06-08 265 static __always_inline bool test_and_clear_bit(long nr, volatile unsigned long *addr)
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 266 {
22636f8c9 arch/x86/include/asm/bitops.h Jan Beulich 2018-02-26 @267 GEN_BINARY_RMWcc(LOCK_PREFIX __ASM_SIZE(btr),
22636f8c9 arch/x86/include/asm/bitops.h Jan Beulich 2018-02-26 268 *addr, "Ir", nr, "%0", c);
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 269 }
1c54d7707 include/asm-x86/bitops.h Jeremy Fitzhardinge 2008-01-30 270

:::::: The code at line 220 was first introduced by commit
:::::: 22636f8c9511245cb3c8412039f1dd95afb3aa59 x86/asm: Add instruction suffixes to bitops

:::::: TO: Jan Beulich <[email protected]>
:::::: CC: Thomas Gleixner <[email protected]>

---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation


Attachments:
(No filename) (30.88 kB)
.config.gz (7.19 kB)
Download all attachments

2018-06-05 07:36:46

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH v2 5/9] x86: bug: prevent gcc distortions

Hi Nadav,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on linus/master]
[also build test ERROR on v4.17 next-20180604]
[cannot apply to tip/x86/core]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url: https://github.com/0day-ci/linux/commits/Nadav-Amit/x86-macrofying-inline-asm-for-better-compilation/20180605-124313
config: i386-tinyconfig (attached as .config)
compiler: gcc-7 (Debian 7.3.0-16) 7.3.0
reproduce:
# save the attached .config to linux build tree
make ARCH=i386

All error/warnings (new ones prefixed by >>):

In file included from include/linux/bug.h:5:0,
from include/linux/crypto.h:23,
from arch/x86/kernel/asm-offsets.c:9:
include/linux/ktime.h: In function 'ktime_divns':
>> arch/x86/include/asm/bug.h:31:17: error: invalid application of 'sizeof' to incomplete type 'struct bug_entry'
"i" (sizeof(struct bug_entry))); \
^
arch/x86/include/asm/bug.h:37:2: note: in expansion of macro '_BUG_FLAGS'
_BUG_FLAGS(ASM_UD2, 0); \
^~~~~~~~~~
include/asm-generic/bug.h:176:47: note: in expansion of macro 'BUG'
#define BUG_ON(condition) do { if (condition) BUG(); } while (0)
^~~
>> include/linux/ktime.h:150:2: note: in expansion of macro 'BUG_ON'
BUG_ON(div < 0);
^~~~~~
include/linux/rhashtable.h: In function 'rhashtable_lookup_insert_fast':
>> arch/x86/include/asm/bug.h:31:17: error: invalid application of 'sizeof' to incomplete type 'struct bug_entry'
"i" (sizeof(struct bug_entry))); \
^
arch/x86/include/asm/bug.h:37:2: note: in expansion of macro '_BUG_FLAGS'
_BUG_FLAGS(ASM_UD2, 0); \
^~~~~~~~~~
include/asm-generic/bug.h:176:47: note: in expansion of macro 'BUG'
#define BUG_ON(condition) do { if (condition) BUG(); } while (0)
^~~
>> include/linux/rhashtable.h:936:2: note: in expansion of macro 'BUG_ON'
BUG_ON(ht->p.obj_hashfn);
^~~~~~
include/linux/rhashtable.h: In function 'rhashtable_lookup_get_insert_fast':
>> arch/x86/include/asm/bug.h:31:17: error: invalid application of 'sizeof' to incomplete type 'struct bug_entry'
"i" (sizeof(struct bug_entry))); \
^
arch/x86/include/asm/bug.h:37:2: note: in expansion of macro '_BUG_FLAGS'
_BUG_FLAGS(ASM_UD2, 0); \
^~~~~~~~~~
include/asm-generic/bug.h:176:47: note: in expansion of macro 'BUG'
#define BUG_ON(condition) do { if (condition) BUG(); } while (0)
^~~
include/linux/rhashtable.h:962:2: note: in expansion of macro 'BUG_ON'
BUG_ON(ht->p.obj_hashfn);
^~~~~~
include/linux/rhashtable.h: In function 'rhashtable_lookup_insert_key':
>> arch/x86/include/asm/bug.h:31:17: error: invalid application of 'sizeof' to incomplete type 'struct bug_entry'
"i" (sizeof(struct bug_entry))); \
^
arch/x86/include/asm/bug.h:37:2: note: in expansion of macro '_BUG_FLAGS'
_BUG_FLAGS(ASM_UD2, 0); \
^~~~~~~~~~
include/asm-generic/bug.h:176:47: note: in expansion of macro 'BUG'
#define BUG_ON(condition) do { if (condition) BUG(); } while (0)
^~~
include/linux/rhashtable.h:996:2: note: in expansion of macro 'BUG_ON'
BUG_ON(!ht->p.obj_hashfn || !key);
^~~~~~
include/linux/rhashtable.h: In function 'rhashtable_lookup_get_insert_key':
>> arch/x86/include/asm/bug.h:31:17: error: invalid application of 'sizeof' to incomplete type 'struct bug_entry'
"i" (sizeof(struct bug_entry))); \
^
arch/x86/include/asm/bug.h:37:2: note: in expansion of macro '_BUG_FLAGS'
_BUG_FLAGS(ASM_UD2, 0); \
^~~~~~~~~~
include/asm-generic/bug.h:176:47: note: in expansion of macro 'BUG'
#define BUG_ON(condition) do { if (condition) BUG(); } while (0)
^~~
include/linux/rhashtable.h:1020:2: note: in expansion of macro 'BUG_ON'
BUG_ON(!ht->p.obj_hashfn || !key);
^~~~~~
include/linux/crypto.h: In function 'crypto_blkcipher_cast':
>> arch/x86/include/asm/bug.h:31:17: error: invalid application of 'sizeof' to incomplete type 'struct bug_entry'
"i" (sizeof(struct bug_entry))); \
^
arch/x86/include/asm/bug.h:37:2: note: in expansion of macro '_BUG_FLAGS'
_BUG_FLAGS(ASM_UD2, 0); \
^~~~~~~~~~
include/asm-generic/bug.h:176:47: note: in expansion of macro 'BUG'
#define BUG_ON(condition) do { if (condition) BUG(); } while (0)
^~~
>> include/linux/crypto.h:1118:2: note: in expansion of macro 'BUG_ON'
BUG_ON(crypto_tfm_alg_type(tfm) != CRYPTO_ALG_TYPE_BLKCIPHER);
^~~~~~
include/linux/crypto.h: In function 'crypto_cipher_cast':
>> arch/x86/include/asm/bug.h:31:17: error: invalid application of 'sizeof' to incomplete type 'struct bug_entry'
"i" (sizeof(struct bug_entry))); \
^
arch/x86/include/asm/bug.h:37:2: note: in expansion of macro '_BUG_FLAGS'
_BUG_FLAGS(ASM_UD2, 0); \
^~~~~~~~~~
include/asm-generic/bug.h:176:47: note: in expansion of macro 'BUG'
#define BUG_ON(condition) do { if (condition) BUG(); } while (0)
^~~
include/linux/crypto.h:1438:2: note: in expansion of macro 'BUG_ON'
BUG_ON(crypto_tfm_alg_type(tfm) != CRYPTO_ALG_TYPE_CIPHER);
^~~~~~
include/linux/crypto.h: In function 'crypto_comp_cast':
>> arch/x86/include/asm/bug.h:31:17: error: invalid application of 'sizeof' to incomplete type 'struct bug_entry'
"i" (sizeof(struct bug_entry))); \
^
arch/x86/include/asm/bug.h:37:2: note: in expansion of macro '_BUG_FLAGS'
_BUG_FLAGS(ASM_UD2, 0); \
^~~~~~~~~~
include/asm-generic/bug.h:176:47: note: in expansion of macro 'BUG'
#define BUG_ON(condition) do { if (condition) BUG(); } while (0)
^~~
include/linux/crypto.h:1603:2: note: in expansion of macro 'BUG_ON'
BUG_ON((crypto_tfm_alg_type(tfm) ^ CRYPTO_ALG_TYPE_COMPRESS) &
^~~~~~
include/linux/quota.h: In function 'make_kqid':
>> arch/x86/include/asm/bug.h:31:17: error: invalid application of 'sizeof' to incomplete type 'struct bug_entry'
"i" (sizeof(struct bug_entry))); \
^
arch/x86/include/asm/bug.h:37:2: note: in expansion of macro '_BUG_FLAGS'
_BUG_FLAGS(ASM_UD2, 0); \
^~~~~~~~~~
>> include/linux/quota.h:114:3: note: in expansion of macro 'BUG'
BUG();
^~~
include/linux/quota.h: In function 'make_kqid_invalid':
>> arch/x86/include/asm/bug.h:31:17: error: invalid application of 'sizeof' to incomplete type 'struct bug_entry'
"i" (sizeof(struct bug_entry))); \
^
arch/x86/include/asm/bug.h:37:2: note: in expansion of macro '_BUG_FLAGS'
_BUG_FLAGS(ASM_UD2, 0); \
^~~~~~~~~~
include/linux/quota.h:141:3: note: in expansion of macro 'BUG'
BUG();
^~~
include/linux/fs.h: In function 'kill_block_super':
>> arch/x86/include/asm/bug.h:31:17: error: invalid application of 'sizeof' to incomplete type 'struct bug_entry'
"i" (sizeof(struct bug_entry))); \
^
arch/x86/include/asm/bug.h:37:2: note: in expansion of macro '_BUG_FLAGS'
_BUG_FLAGS(ASM_UD2, 0); \
^~~~~~~~~~
>> include/linux/fs.h:2124:2: note: in expansion of macro 'BUG'
BUG();
^~~
include/linux/fs.h: In function 'break_deleg_wait':
>> arch/x86/include/asm/bug.h:31:17: error: invalid application of 'sizeof' to incomplete type 'struct bug_entry'
"i" (sizeof(struct bug_entry))); \
^
arch/x86/include/asm/bug.h:37:2: note: in expansion of macro '_BUG_FLAGS'
_BUG_FLAGS(ASM_UD2, 0); \
^~~~~~~~~~
include/linux/fs.h:2371:2: note: in expansion of macro 'BUG'
BUG();
^~~
include/linux/seq_file.h: In function 'seq_get_buf':
>> arch/x86/include/asm/bug.h:31:17: error: invalid application of 'sizeof' to incomplete type 'struct bug_entry'
"i" (sizeof(struct bug_entry))); \
^
arch/x86/include/asm/bug.h:37:2: note: in expansion of macro '_BUG_FLAGS'
_BUG_FLAGS(ASM_UD2, 0); \
^~~~~~~~~~
include/asm-generic/bug.h:176:47: note: in expansion of macro 'BUG'
#define BUG_ON(condition) do { if (condition) BUG(); } while (0)
^~~
>> include/linux/seq_file.h:66:2: note: in expansion of macro 'BUG_ON'
BUG_ON(m->count > m->size);
^~~~~~
include/linux/seq_file.h: In function 'seq_commit':
>> arch/x86/include/asm/bug.h:31:17: error: invalid application of 'sizeof' to incomplete type 'struct bug_entry'
"i" (sizeof(struct bug_entry))); \
^
arch/x86/include/asm/bug.h:37:2: note: in expansion of macro '_BUG_FLAGS'
_BUG_FLAGS(ASM_UD2, 0); \
^~~~~~~~~~
include/asm-generic/bug.h:176:47: note: in expansion of macro 'BUG'
#define BUG_ON(condition) do { if (condition) BUG(); } while (0)
^~~
include/linux/seq_file.h:89:3: note: in expansion of macro 'BUG_ON'
BUG_ON(m->count + num > m->size);
^~~~~~
include/asm-generic/fixmap.h: In function 'virt_to_fix':

vim +31 arch/x86/include/asm/bug.h

9a93848f arch/x86/include/asm/bug.h Peter Zijlstra 2017-02-02 24
9a93848f arch/x86/include/asm/bug.h Peter Zijlstra 2017-02-02 25 #define _BUG_FLAGS(ins, flags) \
68fdc55c include/asm-x86/bug.h Thomas Gleixner 2007-10-17 26 do { \
6eca12b3 arch/x86/include/asm/bug.h Nadav Amit 2018-06-04 27 asm volatile("ASM_BUG ins=\"" ins "\" file=%c0 line=%c1 " \
6eca12b3 arch/x86/include/asm/bug.h Nadav Amit 2018-06-04 28 "flags=%c2 size=%c3" \
68fdc55c include/asm-x86/bug.h Thomas Gleixner 2007-10-17 29 : : "i" (__FILE__), "i" (__LINE__), \
9a93848f arch/x86/include/asm/bug.h Peter Zijlstra 2017-02-02 30 "i" (flags), \
68fdc55c include/asm-x86/bug.h Thomas Gleixner 2007-10-17 @31 "i" (sizeof(struct bug_entry))); \
68fdc55c include/asm-x86/bug.h Thomas Gleixner 2007-10-17 32 } while (0)
68fdc55c include/asm-x86/bug.h Thomas Gleixner 2007-10-17 33

:::::: The code at line 31 was first introduced by commit
:::::: 68fdc55c48fd2e8f4938a1e815216c25baf8a17e x86: unify include/asm/bug_32/64.h

:::::: TO: Thomas Gleixner <[email protected]>
:::::: CC: Thomas Gleixner <[email protected]>

---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation


Attachments:
(No filename) (11.00 kB)
.config.gz (6.15 kB)
Download all attachments

2018-06-05 08:27:12

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH v2 3/9] x86: refcount: prevent gcc distortions

Hi Nadav,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on linus/master]
[also build test ERROR on v4.17 next-20180604]
[cannot apply to tip/x86/core]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url: https://github.com/0day-ci/linux/commits/Nadav-Amit/x86-macrofying-inline-asm-for-better-compilation/20180605-124313
config: x86_64-fedora-25 (attached as .config)
compiler: gcc-7 (Debian 7.3.0-16) 7.3.0
reproduce:
# save the attached .config to linux build tree
make ARCH=x86_64

All errors (new ones prefixed by >>):

arch/x86/include/asm/refcount.h: Assembler messages:
arch/x86/include/asm/refcount.h:38: Error: too many positional arguments
>> /tmp/cc0rd6kn.s: Error: local label `"111" (instance number 3 of a fb label)' is not defined

---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation


Attachments:
(No filename) (1.02 kB)
.config.gz (46.71 kB)
Download all attachments

2018-06-07 03:25:57

by kernel test robot

[permalink] [raw]
Subject: [lkp-robot] [x86] 1a39381d70: WARNING:at_kernel/locking/mutex.c:#__mutex_unlock_slowpath


FYI, we noticed the following commit (built with gcc-7):

commit: 1a39381d70000f0097ec6e2ceb75812d6c00b2f1 ("x86: alternatives: macrofy locks for better inlining")
url: https://github.com/0day-ci/linux/commits/Nadav-Amit/x86-macrofying-inline-asm-for-better-compilation/20180605-124313


in testcase: trinity
with following parameters:

runtime: 300s

test-description: Trinity is a linux system call fuzz tester.
test-url: http://codemonkey.org.uk/projects/trinity/


on test machine: qemu-system-x86_64 -enable-kvm -cpu Haswell,+smep,+smap -m 512M

caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):


+-----------------------------------------------------------------------------------------+------------+------------+
| | 2c1316126e | 1a39381d70 |
+-----------------------------------------------------------------------------------------+------------+------------+
| boot_successes | 0 | 0 |
| boot_failures | 18 | 17 |
| WARNING:at_lib/debugobjects.c:#__debug_object_init | 18 | |
| RIP:__debug_object_init | 18 | |
| WARNING:at_kernel/locking/mutex.c:#__mutex_unlock_slowpath | 0 | 17 |
| RIP:__mutex_unlock_slowpath | 0 | 17 |
| WARNING:at_arch/x86/kernel/idt.c:#update_intr_gate | 0 | 17 |
| RIP:update_intr_gate | 0 | 17 |
| page_allocation_failure:order:#,mode:#(),nodemask=(null) | 0 | 17 |
| Mem-Info | 0 | 17 |
| Kernel_panic-not_syncing:kmem_cache_create:Failed_to_create_slab'radix_tree_node'.Error | 0 | 17 |
+-----------------------------------------------------------------------------------------+------------+------------+



[ 0.000000] WARNING: CPU: 0 PID: 0 at kernel/locking/mutex.c:1032 __mutex_unlock_slowpath+0x1ff/0x2a0
[ 0.000000] Modules linked in:
[ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 4.17.0-00004-g1a39381 #2
[ 0.000000] RIP: 0010:__mutex_unlock_slowpath+0x1ff/0x2a0
[ 0.000000] RSP: 0000:ffffffffb4a03df0 EFLAGS: 00010082 ORIG_RAX: 0000000000000000
[ 0.000000] RAX: 0000000000000033 RBX: 0000000000000000 RCX: ffffffffb4a797c0
[ 0.000000] RDX: ffffffffb36b41de RSI: 0000000000000001 RDI: 0000000000000046
[ 0.000000] RBP: ffffffffb4a03e30 R08: 0000000000000001 R09: 0000000000000000
[ 0.000000] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffffb4c789e0
[ 0.000000] R13: ffffffffb4a03df8 R14: ffffffffb48dcfb0 R15: 0000000000000000
[ 0.000000] FS: 0000000000000000(0000) GS:ffffffffb4a8e000(0000) knlGS:0000000000000000
[ 0.000000] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 0.000000] CR2: ffffa2951a312000 CR3: 0000000018a74000 CR4: 00000000000606b0
[ 0.000000] Call Trace:
[ 0.000000] mutex_unlock+0x12/0x20
[ 0.000000] __clocksource_register_scale+0xda/0x120
[ 0.000000] kvmclock_init+0x22d/0x23f
[ 0.000000] setup_arch+0xa20/0xaf6
[ 0.000000] start_kernel+0x6a/0x4dc
[ 0.000000] ? copy_bootdata+0x1f/0xb8
[ 0.000000] x86_64_start_reservations+0x24/0x26
[ 0.000000] x86_64_start_kernel+0x73/0x76
[ 0.000000] secondary_startup_64+0xa5/0xb0
[ 0.000000] Code: 0f 0b e9 ae fe ff ff e8 d0 15 a5 ff 85 c0 74 1d 44 8b 15 1d 0c 8f 01 45 85 d2 75 11 4c 89 f6 48 c7 c7 53 b4 8c b4 e8 81 3a 5c ff <0f> 0b 44 8b 0d c0 b2 7c 01 45 85 c9 0f 84 6f fe ff ff e9 73 fe
[ 0.000000] random: get_random_bytes called from print_oops_end_marker+0x3f/0x60 with crng_init=0
[ 0.000000] ---[ end trace a9c261981ad74140 ]---


To reproduce:

git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp qemu -k <bzImage> job-script # job-script is attached in this email



Thanks,
Xiaolong


Attachments:
(No filename) (4.39 kB)
config-4.17.0-00004-g1a39381 (122.56 kB)
job-script (3.88 kB)
dmesg.xz (5.18 kB)
Download all attachments