2021-09-08 17:49:44

by H. Peter Anvin

[permalink] [raw]
Subject: [PATCH] x86/asm: pessimize the pre-initialization case in static_cpu_has()

gcc will sometimes manifest the address of boot_cpu_data in a register
as part of constant propagation. When multiple static_cpu_has() are
used this may foul the mainline code with a register load which will
only be used on the fallback path, which is unused after
initialization.

Explicitly force gcc to use immediate (rip-relative) addressing for
the fallback path, thus removing any possible register use from
static_cpu_has().

Signed-off-by: H. Peter Anvin (Intel) <[email protected]>
---
arch/x86/include/asm/cpufeature.h | 13 +++++++++----
1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h
index 16a51e7288d5..ff18906b60d8 100644
--- a/arch/x86/include/asm/cpufeature.h
+++ b/arch/x86/include/asm/cpufeature.h
@@ -173,20 +173,25 @@ extern void clear_cpu_cap(struct cpuinfo_x86 *c, unsigned int bit);
* means that the boot_cpu_has() variant is already fast enough for the
* majority of cases and you should stick to using it as it is generally
* only two instructions: a RIP-relative MOV and a TEST.
+ *
+ * Do not use an "m" constraint for [cap_byte] here: gcc doesn't know
+ * that this is only used on a fallback path and will sometimes cause
+ * it to manifest the address of boot_cpu_data in a register, fouling
+ * the mainline (post-initialization) code.
*/
static __always_inline bool _static_cpu_has(u16 bit)
{
asm_volatile_goto(
ALTERNATIVE_TERNARY("jmp 6f", %P[feature], "", "jmp %l[t_no]")
- ".section .altinstr_aux,\"ax\"\n"
+ ".pushsection .altinstr_aux,\"ax\"\n"
"6:\n"
- " testb %[bitnum],%[cap_byte]\n"
+ " testb %[bitnum],%P[cap_byte]\n"
" jnz %l[t_yes]\n"
" jmp %l[t_no]\n"
- ".previous\n"
+ ".popsection\n"
: : [feature] "i" (bit),
[bitnum] "i" (1 << (bit & 7)),
- [cap_byte] "m" (((const char *)boot_cpu_data.x86_capability)[bit >> 3])
+ [cap_byte] "i" (&((const char *)boot_cpu_data.x86_capability)[bit >> 3])
: : t_yes, t_no);
t_yes:
return true;
--
2.31.1


2021-09-09 17:04:16

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH] x86/asm: pessimize the pre-initialization case in static_cpu_has()

On Wed, Sep 08, 2021 at 10:17:16AM -0700, H. Peter Anvin (Intel) wrote:

> Subject: Re: [PATCH] x86/asm: pessimize the pre-initialization case in static_cpu_has()

"pessimize" huh? :)

Why not simply

"Do not waste registers in the pre-initialization case... "

?

> gcc will sometimes manifest the address of boot_cpu_data in a register
> as part of constant propagation. When multiple static_cpu_has() are
> used this may foul the mainline code with a register load which will
> only be used on the fallback path, which is unused after
> initialization.

So a before-after thing looks like this here:

before:

ffffffff89696517 <.altinstr_aux>:
ffffffff89696517: f6 05 cb 09 cb ff 80 testb $0x80,-0x34f635(%rip) # ffffffff89346ee9 <boot_cpu_data+0x69>
ffffffff8969651e: 0f 85 fc 3e fb ff jne ffffffff8964a420 <intel_pmu_init+0x14e7>
ffffffff89696524: e9 ee 3e fb ff jmp ffffffff8964a417 <intel_pmu_init+0x14de>
ffffffff89696529: f6 45 6a 08 testb $0x8,0x6a(%rbp)
ffffffff8969652d: 0f 85 45 b9 97 f7 jne ffffffff81011e78 <intel_pmu_lbr_filter+0x68>
ffffffff89696533: e9 95 b9 97 f7 jmp ffffffff81011ecd <intel_pmu_lbr_filter+0xbd>
ffffffff89696538: 41 f6 44 24 6a 08 testb $0x8,0x6a(%r12)
ffffffff8969653e: 0f 85 d3 bc 97 f7 jne ffffffff81012217 <intel_pmu_store_lbr+0x77>
ffffffff89696544: e9 d9 bc 97 f7 jmp ffffffff81012222 <intel_pmu_store_lbr+0x82>
ffffffff89696549: 41 f6 44 24 6a 08 testb $0x8,0x6a(%r12)

after:

ffffffff89696517 <.altinstr_aux>:
ffffffff89696517: f6 04 25 e9 6e 34 89 testb $0x80,0xffffffff89346ee9
ffffffff8969651e: 80
ffffffff8969651f: 0f 85 fb 3e fb ff jne ffffffff8964a420 <intel_pmu_init+0x14e7>
ffffffff89696525: e9 ed 3e fb ff jmp ffffffff8964a417 <intel_pmu_init+0x14de>
ffffffff8969652a: f6 04 25 ea 6e 34 89 testb $0x8,0xffffffff89346eea
ffffffff89696531: 08
ffffffff89696532: 0f 85 37 b9 97 f7 jne ffffffff81011e6f <intel_pmu_lbr_filter+0x5f>
ffffffff89696538: e9 89 b9 97 f7 jmp ffffffff81011ec6 <intel_pmu_lbr_filter+0xb6>
ffffffff8969653d: f6 04 25 ea 6e 34 89 testb $0x8,0xffffffff89346eea
ffffffff89696544: 08
ffffffff89696545: 0f 85 b5 bc 97 f7 jne ffffffff81012200 <intel_pmu_store_lbr+0x70>
ffffffff8969654b: e9 bb bc 97 f7 jmp ffffffff8101220b <intel_pmu_store_lbr+0x7b>
ffffffff89696550: f6 04 25 ea 6e 34 89 testb $0x8,0xffffffff89346eea

so you're basically forcing an immediate thing.

And you wanna get rid of the (%<reg>) relative addressing and force it
to be rip-relative.

> Explicitly force gcc to use immediate (rip-relative) addressing for

Right, the rip-relative addressing doesn't happen here:

--- /tmp/before 2021-09-09 18:18:28.693009433 +0200
+++ /tmp/after 2021-09-09 18:19:06.285009113 +0200
@@ -1,5 +1,5 @@
-# ./arch/x86/include/asm/cpufeature.h:179: asm_volatile_goto(
-# 179 "./arch/x86/include/asm/cpufeature.h" 1
+# ./arch/x86/include/asm/cpufeature.h:184: asm_volatile_goto(
+# 184 "./arch/x86/include/asm/cpufeature.h" 1
# ALT: oldinstr2
661:
jmp 6f
@@ -29,12 +29,12 @@

6652:
.popsection
-.section .altinstr_aux,"ax"
+.pushsection .altinstr_aux,"ax"
6:
- testb $1,boot_cpu_data+62(%rip) #, MEM[(const char *)&boot_cpu_data + 62B]
+ testb $1,boot_cpu_data+62 #,
jnz .L99 #
jmp .L100 #
-.previous
+.popsection

# 0 "" 2


.vminstr_aux even on an allyesconfig build is solely immediate
addressing in the TEST insn.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2021-09-09 21:32:24

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH] x86/asm: pessimize the pre-initialization case in static_cpu_has()

On 9/9/21 10:01 AM, Borislav Petkov wrote:
> On Wed, Sep 08, 2021 at 10:17:16AM -0700, H. Peter Anvin (Intel) wrote:
>
>> Subject: Re: [PATCH] x86/asm: pessimize the pre-initialization case in static_cpu_has()
>
> "pessimize" huh? :)
>
> Why not simply
>
> "Do not waste registers in the pre-initialization case..."
>

Because it is shorter and thus can fit more contents

> ?
>
>> gcc will sometimes manifest the address of boot_cpu_data in a register
>> as part of constant propagation. When multiple static_cpu_has() are
>> used this may foul the mainline code with a register load which will
>> only be used on the fallback path, which is unused after
>> initialization.
>
> So a before-after thing looks like this here:
>
> before:
>
> ffffffff89696517 <.altinstr_aux>:
> ffffffff89696517: f6 05 cb 09 cb ff 80 testb $0x80,-0x34f635(%rip) # ffffffff89346ee9 <boot_cpu_data+0x69>
> ffffffff8969651e: 0f 85 fc 3e fb ff jne ffffffff8964a420 <intel_pmu_init+0x14e7>
> ffffffff89696524: e9 ee 3e fb ff jmp ffffffff8964a417 <intel_pmu_init+0x14de>
> ffffffff89696529: f6 45 6a 08 testb $0x8,0x6a(%rbp)
> ffffffff8969652d: 0f 85 45 b9 97 f7 jne ffffffff81011e78 <intel_pmu_lbr_filter+0x68>
> ffffffff89696533: e9 95 b9 97 f7 jmp ffffffff81011ecd <intel_pmu_lbr_filter+0xbd>
> ffffffff89696538: 41 f6 44 24 6a 08 testb $0x8,0x6a(%r12)
> ffffffff8969653e: 0f 85 d3 bc 97 f7 jne ffffffff81012217 <intel_pmu_store_lbr+0x77>
> ffffffff89696544: e9 d9 bc 97 f7 jmp ffffffff81012222 <intel_pmu_store_lbr+0x82>
> ffffffff89696549: 41 f6 44 24 6a 08 testb $0x8,0x6a(%r12)
>
> after:
>
> ffffffff89696517 <.altinstr_aux>:
> ffffffff89696517: f6 04 25 e9 6e 34 89 testb $0x80,0xffffffff89346ee9
> ffffffff8969651e: 80
> ffffffff8969651f: 0f 85 fb 3e fb ff jne ffffffff8964a420 <intel_pmu_init+0x14e7>
> ffffffff89696525: e9 ed 3e fb ff jmp ffffffff8964a417 <intel_pmu_init+0x14de>
> ffffffff8969652a: f6 04 25 ea 6e 34 89 testb $0x8,0xffffffff89346eea
> ffffffff89696531: 08
> ffffffff89696532: 0f 85 37 b9 97 f7 jne ffffffff81011e6f <intel_pmu_lbr_filter+0x5f>
> ffffffff89696538: e9 89 b9 97 f7 jmp ffffffff81011ec6 <intel_pmu_lbr_filter+0xb6>
> ffffffff8969653d: f6 04 25 ea 6e 34 89 testb $0x8,0xffffffff89346eea
> ffffffff89696544: 08
> ffffffff89696545: 0f 85 b5 bc 97 f7 jne ffffffff81012200 <intel_pmu_store_lbr+0x70>
> ffffffff8969654b: e9 bb bc 97 f7 jmp ffffffff8101220b <intel_pmu_store_lbr+0x7b>
> ffffffff89696550: f6 04 25 ea 6e 34 89 testb $0x8,0xffffffff89346eea
>
> so you're basically forcing an immediate thing.
>
> And you wanna get rid of the (%<reg>) relative addressing and force it
> to be rip-relative.
>
>> Explicitly force gcc to use immediate (rip-relative) addressing for
>
> Right, the rip-relative addressing doesn't happen here:
>

Indeed it doesn't (egg on my face), nor does it turn out is there
currently a way to do so (just adding (%%rip) breaks i386, and there is
no equivalent to %{pP} which adds the suffix). Let me fix both; will
have a patchset shortly.

-hpa

2021-09-09 22:19:34

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH] x86/asm: pessimize the pre-initialization case in static_cpu_has()

On 9/9/21 2:28 PM, H. Peter Anvin wrote:
>
> Because it is shorter and thus can fit more contents
>

... into the visible part of the subject line, that was supposed to say.
This is rather important when browsing logs, e.g. using gitk.

-hpa

2021-09-09 22:26:19

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH] x86/asm: pessimize the pre-initialization case in static_cpu_has()

On Thu, Sep 09, 2021 at 02:28:42PM -0700, H. Peter Anvin wrote:
> Because it is shorter and thus can fit more contents

Pessimize the pre-initialization case

vs

Do not waste registers in the pre-initialization case

I'll take 16 chars longer but a lot more understandable any day of the
week.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2021-09-09 22:31:41

by H. Peter Anvin

[permalink] [raw]
Subject: [PATCH v2 0/2] x86/asm: avoid register pressure from static_cpu_has()

gcc will sometimes manifest the address of boot_cpu_data in a register
as part of constant propagation. When multiple static_cpu_has() are
used this may foul the mainline code with a register load which will
only be used on the fallback path, which is unused after
initialization.

Explicitly force gcc to use immediate (rip-relative) addressing for
the fallback path, thus removing any possible register use from
static_cpu_has().

However, currently there is no convenient way to make gcc generate a
%rip-relative immediate reference without splitting code into i386 and
x86-64 versions, so add a new macro to <asm/asm.h> for this purpose.

Changes in v2:
--------------
* Add new macro to <asm/asm.h>
* *Actually* generate the %rip-relative addressing mode.

2021-09-09 22:32:19

by H. Peter Anvin

[permalink] [raw]
Subject: [PATCH v2 2/2] x86/asm: pessimize the pre-initialization case in static_cpu_has()

gcc will sometimes manifest the address of boot_cpu_data in a register
as part of constant propagation. When multiple static_cpu_has() are
used this may foul the mainline code with a register load which will
only be used on the fallback path, which is unused after
initialization.

Explicitly force gcc to use immediate (rip-relative on x86-64)
addressing for the fallback path, thus removing any possible register
use from static_cpu_has().

While making changes, modernize the code to use
.pushsection...popsection instead of .section...previous.

Signed-off-by: H. Peter Anvin (Intel) <[email protected]>
---
arch/x86/include/asm/cpufeature.h | 13 +++++++++----
1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h
index 16a51e7288d5..1261842d006c 100644
--- a/arch/x86/include/asm/cpufeature.h
+++ b/arch/x86/include/asm/cpufeature.h
@@ -173,20 +173,25 @@ extern void clear_cpu_cap(struct cpuinfo_x86 *c, unsigned int bit);
* means that the boot_cpu_has() variant is already fast enough for the
* majority of cases and you should stick to using it as it is generally
* only two instructions: a RIP-relative MOV and a TEST.
+ *
+ * Do not use an "m" constraint for [cap_byte] here: gcc doesn't know
+ * that this is only used on a fallback path and will sometimes cause
+ * it to manifest the address of boot_cpu_data in a register, fouling
+ * the mainline (post-initialization) code.
*/
static __always_inline bool _static_cpu_has(u16 bit)
{
asm_volatile_goto(
ALTERNATIVE_TERNARY("jmp 6f", %P[feature], "", "jmp %l[t_no]")
- ".section .altinstr_aux,\"ax\"\n"
+ ".pushsection .altinstr_aux,\"ax\"\n"
"6:\n"
- " testb %[bitnum],%[cap_byte]\n"
+ " testb %[bitnum]," _ASM_RIP(%P[cap_byte]) "\n"
" jnz %l[t_yes]\n"
" jmp %l[t_no]\n"
- ".previous\n"
+ ".popsection\n"
: : [feature] "i" (bit),
[bitnum] "i" (1 << (bit & 7)),
- [cap_byte] "m" (((const char *)boot_cpu_data.x86_capability)[bit >> 3])
+ [cap_byte] "i" (&((const char *)boot_cpu_data.x86_capability)[bit >> 3])
: : t_yes, t_no);
t_yes:
return true;
--
2.31.1

2021-09-10 09:18:28

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH] x86/asm: pessimize the pre-initialization case in static_cpu_has()

On Thu, Sep 09, 2021 at 03:17:14PM -0700, H. Peter Anvin wrote:
> ... into the visible part of the subject line, that was supposed to say.
> This is rather important when browsing logs, e.g. using gitk.

So you resize your gitk window. It's not like there are no other commits
with a bit longer commit titles.

The important part is that our commit messages should be readable months
and years from now - I don't have to explain this to you, of all people.

"pessimize" is not even a word:

https://www.merriam-webster.com/dictionary/pessimize:

“pessimize”

The word you've entered isn't in the dictionary. Click on a spelling
suggestion below or try again using the search bar above.

So while I have an idea what you mean, I'm sure that "what" can be
formulated in a better way.

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2021-09-10 09:18:43

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v2 0/2] x86/asm: avoid register pressure from static_cpu_has()

On Thu, Sep 09, 2021 at 03:08:16PM -0700, H. Peter Anvin (Intel) wrote:
> gcc will sometimes manifest the address of boot_cpu_data in a register
> as part of constant propagation. When multiple static_cpu_has() are
> used this may foul the mainline code with a register load which will
> only be used on the fallback path, which is unused after
> initialization.
>
> Explicitly force gcc to use immediate (rip-relative) addressing for
> the fallback path, thus removing any possible register use from
> static_cpu_has().

Right, maybe I'm missing something but what is wrong with the immediate
addressing variant, i.e., that thing:

testb $0x8,0xffffffff89346eea

and you need to *force* %rip-relative?

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2021-09-10 13:28:19

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v2 0/2] x86/asm: avoid register pressure from static_cpu_has()

On Fri, Sep 10, 2021 at 11:16:47AM +0200, Borislav Petkov wrote:
> Right, maybe I'm missing something but what is wrong with the immediate
> addressing variant, i.e., that thing:
>
> testb $0x8,0xffffffff89346eea
>
> and you need to *force* %rip-relative?

To answer my own question after peterz asked: that hardcoded address
won't work with KASLR, ofc.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2021-09-10 19:27:40

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH] x86/asm: pessimize the pre-initialization case in static_cpu_has()



On 9/10/21 2:14 AM, Borislav Petkov wrote:
> On Thu, Sep 09, 2021 at 03:17:14PM -0700, H. Peter Anvin wrote:
>> ... into the visible part of the subject line, that was supposed to say.
>> This is rather important when browsing logs, e.g. using gitk.
>
> So you resize your gitk window. It's not like there are no other commits
> with a bit longer commit titles.
>
> The important part is that our commit messages should be readable months
> and years from now - I don't have to explain this to you, of all people.
>
> "pessimize" is not even a word:
>
> https://www.merriam-webster.com/dictionary/pessimize:
>
> “pessimize”
>
> The word you've entered isn't in the dictionary. Click on a spelling
> suggestion below or try again using the search bar above.
>

Depends on which dictionary you use.

https://www.yourdictionary.com/pessimize

> So while I have an idea what you mean, I'm sure that "what" can be
> formulated in a better way.
>

OK, no worries. I'll resubmit.

-hpa

2021-09-10 20:01:55

by H. Peter Anvin

[permalink] [raw]
Subject: [PATCH v3 0/2] x86/asm: avoid register pressure from the init case in static_cpu_has()

gcc will sometimes manifest the address of boot_cpu_data in a register
as part of constant propagation. When multiple static_cpu_has() are
used this may foul the mainline code with a register load which will
only be used on the fallback path, which is unused after
initialization.

Explicitly force gcc to use immediate (rip-relative) addressing for
the fallback path, thus removing any possible register use from
static_cpu_has().

However, currently there is no convenient way to make gcc generate a
%rip-relative immediate reference without splitting code into i386 and
x86-64 versions, so add a new macro to <asm/asm.h> for this purpose.

Changes in v3:
--------------
* Clarify the subject line

Changes in v2:
--------------
* Add new macro to <asm/asm.h>
* *Actually* generate the %rip-relative addressing mode.

arch/x86/include/asm/asm.h | 5 +++++
arch/x86/include/asm/cpufeature.h | 13 +++++++++----
2 files changed, 14 insertions(+), 4 deletions(-)

2021-09-10 20:02:08

by H. Peter Anvin

[permalink] [raw]
Subject: [PATCH v3 1/2] x86/asm: add _ASM_RIP() macro for x86-64 (%rip) suffix

Add a macro _ASM_RIP() to add a (%rip) suffix on 64 bits only. This is
useful for immediate memory references where one doesn't want gcc
to possibly use a register indirection as it may in the case of an "m"
constraint.

Signed-off-by: H. Peter Anvin (Intel) <[email protected]>
---
arch/x86/include/asm/asm.h | 5 +++++
1 file changed, 5 insertions(+)

diff --git a/arch/x86/include/asm/asm.h b/arch/x86/include/asm/asm.h
index 3ad3da9a7d97..c5a19ccda0fe 100644
--- a/arch/x86/include/asm/asm.h
+++ b/arch/x86/include/asm/asm.h
@@ -6,11 +6,13 @@
# define __ASM_FORM(x, ...) x,## __VA_ARGS__
# define __ASM_FORM_RAW(x, ...) x,## __VA_ARGS__
# define __ASM_FORM_COMMA(x, ...) x,## __VA_ARGS__,
+# define __ASM_REGPFX %
#else
#include <linux/stringify.h>
# define __ASM_FORM(x, ...) " " __stringify(x,##__VA_ARGS__) " "
# define __ASM_FORM_RAW(x, ...) __stringify(x,##__VA_ARGS__)
# define __ASM_FORM_COMMA(x, ...) " " __stringify(x,##__VA_ARGS__) ","
+# define __ASM_REGPFX %%
#endif

#define _ASM_BYTES(x, ...) __ASM_FORM(.byte x,##__VA_ARGS__ ;)
@@ -49,6 +51,9 @@
#define _ASM_SI __ASM_REG(si)
#define _ASM_DI __ASM_REG(di)

+/* Adds a (%rip) suffix on 64 bits only; for immediate memory references */
+#define _ASM_RIP(x) __ASM_SEL_RAW(x, x (__ASM_REGPFX rip))
+
#ifndef __x86_64__
/* 32 bit */

--
2.31.1

2021-09-10 20:02:24

by H. Peter Anvin

[permalink] [raw]
Subject: [PATCH v3 2/2] x86/asm: avoid adding register pressure for the init case in static_cpu_has()

From: "H. Peter Anvin" <[email protected]>

gcc will sometimes manifest the address of boot_cpu_data in a register
as part of constant propagation. When multiple static_cpu_has() are
used this may foul the mainline code with a register load which will
only be used on the fallback path, which is unused after
initialization.

Explicitly force gcc to use immediate (rip-relative) addressing for
the fallback path, thus removing any possible register use from
static_cpu_has().

While making changes, modernize the code to use
.pushsection...popsection instead of .section...previous.

Signed-off-by: H. Peter Anvin (Intel) <[email protected]>
---
arch/x86/include/asm/cpufeature.h | 13 +++++++++----
1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h
index 16a51e7288d5..1261842d006c 100644
--- a/arch/x86/include/asm/cpufeature.h
+++ b/arch/x86/include/asm/cpufeature.h
@@ -173,20 +173,25 @@ extern void clear_cpu_cap(struct cpuinfo_x86 *c, unsigned int bit);
* means that the boot_cpu_has() variant is already fast enough for the
* majority of cases and you should stick to using it as it is generally
* only two instructions: a RIP-relative MOV and a TEST.
+ *
+ * Do not use an "m" constraint for [cap_byte] here: gcc doesn't know
+ * that this is only used on a fallback path and will sometimes cause
+ * it to manifest the address of boot_cpu_data in a register, fouling
+ * the mainline (post-initialization) code.
*/
static __always_inline bool _static_cpu_has(u16 bit)
{
asm_volatile_goto(
ALTERNATIVE_TERNARY("jmp 6f", %P[feature], "", "jmp %l[t_no]")
- ".section .altinstr_aux,\"ax\"\n"
+ ".pushsection .altinstr_aux,\"ax\"\n"
"6:\n"
- " testb %[bitnum],%[cap_byte]\n"
+ " testb %[bitnum]," _ASM_RIP(%P[cap_byte]) "\n"
" jnz %l[t_yes]\n"
" jmp %l[t_no]\n"
- ".previous\n"
+ ".popsection\n"
: : [feature] "i" (bit),
[bitnum] "i" (1 << (bit & 7)),
- [cap_byte] "m" (((const char *)boot_cpu_data.x86_capability)[bit >> 3])
+ [cap_byte] "i" (&((const char *)boot_cpu_data.x86_capability)[bit >> 3])
: : t_yes, t_no);
t_yes:
return true;
--
2.31.1

Subject: [tip: x86/cpu] x86/asm: Add _ASM_RIP() macro for x86-64 (%rip) suffix

The following commit has been merged into the x86/cpu branch of tip:

Commit-ID: f87bc8dc7a7c438c70f97b4e51c76a183313272e
Gitweb: https://git.kernel.org/tip/f87bc8dc7a7c438c70f97b4e51c76a183313272e
Author: H. Peter Anvin (Intel) <[email protected]>
AuthorDate: Fri, 10 Sep 2021 12:59:09 -07:00
Committer: Borislav Petkov <[email protected]>
CommitterDate: Mon, 13 Sep 2021 19:38:40 +02:00

x86/asm: Add _ASM_RIP() macro for x86-64 (%rip) suffix

Add a macro _ASM_RIP() to add a (%rip) suffix on 64 bits only. This is
useful for immediate memory references where one doesn't want gcc
to possibly use a register indirection as it may in the case of an "m"
constraint.

Signed-off-by: H. Peter Anvin (Intel) <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
---
arch/x86/include/asm/asm.h | 5 +++++
1 file changed, 5 insertions(+)

diff --git a/arch/x86/include/asm/asm.h b/arch/x86/include/asm/asm.h
index 3ad3da9..c5a19cc 100644
--- a/arch/x86/include/asm/asm.h
+++ b/arch/x86/include/asm/asm.h
@@ -6,11 +6,13 @@
# define __ASM_FORM(x, ...) x,## __VA_ARGS__
# define __ASM_FORM_RAW(x, ...) x,## __VA_ARGS__
# define __ASM_FORM_COMMA(x, ...) x,## __VA_ARGS__,
+# define __ASM_REGPFX %
#else
#include <linux/stringify.h>
# define __ASM_FORM(x, ...) " " __stringify(x,##__VA_ARGS__) " "
# define __ASM_FORM_RAW(x, ...) __stringify(x,##__VA_ARGS__)
# define __ASM_FORM_COMMA(x, ...) " " __stringify(x,##__VA_ARGS__) ","
+# define __ASM_REGPFX %%
#endif

#define _ASM_BYTES(x, ...) __ASM_FORM(.byte x,##__VA_ARGS__ ;)
@@ -49,6 +51,9 @@
#define _ASM_SI __ASM_REG(si)
#define _ASM_DI __ASM_REG(di)

+/* Adds a (%rip) suffix on 64 bits only; for immediate memory references */
+#define _ASM_RIP(x) __ASM_SEL_RAW(x, x (__ASM_REGPFX rip))
+
#ifndef __x86_64__
/* 32 bit */

Subject: [tip: x86/cpu] x86/asm: Avoid adding register pressure for the init case in static_cpu_has()

The following commit has been merged into the x86/cpu branch of tip:

Commit-ID: 0507503671f9b1c867e889cbec0f43abf904f23c
Gitweb: https://git.kernel.org/tip/0507503671f9b1c867e889cbec0f43abf904f23c
Author: H. Peter Anvin <[email protected]>
AuthorDate: Fri, 10 Sep 2021 12:59:10 -07:00
Committer: Borislav Petkov <[email protected]>
CommitterDate: Mon, 13 Sep 2021 19:48:21 +02:00

x86/asm: Avoid adding register pressure for the init case in static_cpu_has()

gcc will sometimes manifest the address of boot_cpu_data in a register
as part of constant propagation. When multiple static_cpu_has() are used
this may foul the mainline code with a register load which will only be
used on the fallback path, which is unused after initialization.

Explicitly force gcc to use immediate (rip-relative) addressing for
the fallback path, thus removing any possible register use from
static_cpu_has().

While making changes, modernize the code to use
.pushsection...popsection instead of .section...previous.

Signed-off-by: H. Peter Anvin (Intel) <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
---
arch/x86/include/asm/cpufeature.h | 13 +++++++++----
1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h
index 16a51e7..1261842 100644
--- a/arch/x86/include/asm/cpufeature.h
+++ b/arch/x86/include/asm/cpufeature.h
@@ -173,20 +173,25 @@ extern void clear_cpu_cap(struct cpuinfo_x86 *c, unsigned int bit);
* means that the boot_cpu_has() variant is already fast enough for the
* majority of cases and you should stick to using it as it is generally
* only two instructions: a RIP-relative MOV and a TEST.
+ *
+ * Do not use an "m" constraint for [cap_byte] here: gcc doesn't know
+ * that this is only used on a fallback path and will sometimes cause
+ * it to manifest the address of boot_cpu_data in a register, fouling
+ * the mainline (post-initialization) code.
*/
static __always_inline bool _static_cpu_has(u16 bit)
{
asm_volatile_goto(
ALTERNATIVE_TERNARY("jmp 6f", %P[feature], "", "jmp %l[t_no]")
- ".section .altinstr_aux,\"ax\"\n"
+ ".pushsection .altinstr_aux,\"ax\"\n"
"6:\n"
- " testb %[bitnum],%[cap_byte]\n"
+ " testb %[bitnum]," _ASM_RIP(%P[cap_byte]) "\n"
" jnz %l[t_yes]\n"
" jmp %l[t_no]\n"
- ".previous\n"
+ ".popsection\n"
: : [feature] "i" (bit),
[bitnum] "i" (1 << (bit & 7)),
- [cap_byte] "m" (((const char *)boot_cpu_data.x86_capability)[bit >> 3])
+ [cap_byte] "i" (&((const char *)boot_cpu_data.x86_capability)[bit >> 3])
: : t_yes, t_no);
t_yes:
return true;