2020-08-05 02:12:00

by Ricardo Neri

[permalink] [raw]
Subject: [PATCH v2] x86/cpu: Use SERIALIZE in sync_core() when available

The SERIALIZE instruction gives software a way to force the processor to
complete all modifications to flags, registers and memory from previous
instructions and drain all buffered writes to memory before the next
instruction is fetched and executed. Thus, it serves the purpose of
sync_core(). Use it when available.

Commit 7117f16bf460 ("objtool: Fix ORC vs alternatives") enforced stack
invariance in alternatives. The iret-to-self does not comply with such
invariance. Thus, it cannot be used inside alternative code. Instead, use
an alternative that jumps to SERIALIZE when available.

Cc: Andy Lutomirski <[email protected]>
Cc: Cathy Zhang <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Kyung Min Park <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: "Ravi V. Shankar" <[email protected]>
Cc: Sean Christopherson <[email protected]>
Cc: [email protected]
Cc: [email protected]
Suggested-by: Andy Lutomirski <[email protected]>
Signed-off-by: Ricardo Neri <[email protected]>
---
This is a v2 from my initial submission [1]. The first three patches of
the series have been merged in Linus' tree. Hence, I am submitting only
this patch for review.

[1]. https://lkml.org/lkml/2020/7/27/8

Changes since v1:
* Support SERIALIZE using alternative runtime patching.
(Peter Zijlstra, H. Peter Anvin)
* Added a note to specify which version of binutils supports SERIALIZE.
(Peter Zijlstra)
* Verified that (::: "memory") is used. (H. Peter Anvin)
---
arch/x86/include/asm/special_insns.h | 2 ++
arch/x86/include/asm/sync_core.h | 10 +++++++++-
2 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/special_insns.h b/arch/x86/include/asm/special_insns.h
index 59a3e13204c3..25cd67801dda 100644
--- a/arch/x86/include/asm/special_insns.h
+++ b/arch/x86/include/asm/special_insns.h
@@ -10,6 +10,8 @@
#include <linux/irqflags.h>
#include <linux/jump_label.h>

+/* Instruction opcode for SERIALIZE; supported in binutils >= 2.35. */
+#define __ASM_SERIALIZE ".byte 0xf, 0x1, 0xe8"
/*
* Volatile isn't enough to prevent the compiler from reordering the
* read/write functions for the control registers and messing everything up.
diff --git a/arch/x86/include/asm/sync_core.h b/arch/x86/include/asm/sync_core.h
index fdb5b356e59b..201ea3d9a6bd 100644
--- a/arch/x86/include/asm/sync_core.h
+++ b/arch/x86/include/asm/sync_core.h
@@ -5,15 +5,19 @@
#include <linux/preempt.h>
#include <asm/processor.h>
#include <asm/cpufeature.h>
+#include <asm/special_insns.h>

#ifdef CONFIG_X86_32
static inline void iret_to_self(void)
{
asm volatile (
+ ALTERNATIVE("", "jmp 2f", X86_FEATURE_SERIALIZE)
"pushfl\n\t"
"pushl %%cs\n\t"
"pushl $1f\n\t"
"iret\n\t"
+ "2:\n\t"
+ __ASM_SERIALIZE "\n"
"1:"
: ASM_CALL_CONSTRAINT : : "memory");
}
@@ -23,6 +27,7 @@ static inline void iret_to_self(void)
unsigned int tmp;

asm volatile (
+ ALTERNATIVE("", "jmp 2f", X86_FEATURE_SERIALIZE)
"mov %%ss, %0\n\t"
"pushq %q0\n\t"
"pushq %%rsp\n\t"
@@ -32,6 +37,8 @@ static inline void iret_to_self(void)
"pushq %q0\n\t"
"pushq $1f\n\t"
"iretq\n\t"
+ "2:\n\t"
+ __ASM_SERIALIZE "\n"
"1:"
: "=&r" (tmp), ASM_CALL_CONSTRAINT : : "cc", "memory");
}
@@ -54,7 +61,8 @@ static inline void iret_to_self(void)
static inline void sync_core(void)
{
/*
- * There are quite a few ways to do this. IRET-to-self is nice
+ * Hardware can do this for us if SERIALIZE is available. Otherwise,
+ * there are quite a few ways to do this. IRET-to-self is nice
* because it works on every CPU, at any CPL (so it's compatible
* with paravirtualization), and it never exits to a hypervisor.
* The only down sides are that it's a bit slow (it seems to be
--
2.17.1


2020-08-05 04:51:29

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v2] x86/cpu: Use SERIALIZE in sync_core() when available

On Tue, Aug 04, 2020 at 07:10:59PM -0700, Ricardo Neri wrote:
> The SERIALIZE instruction gives software a way to force the processor to
> complete all modifications to flags, registers and memory from previous
> instructions and drain all buffered writes to memory before the next
> instruction is fetched and executed. Thus, it serves the purpose of
> sync_core(). Use it when available.
>
> Commit 7117f16bf460 ("objtool: Fix ORC vs alternatives") enforced stack
> invariance in alternatives. The iret-to-self does not comply with such
> invariance. Thus, it cannot be used inside alternative code. Instead, use
> an alternative that jumps to SERIALIZE when available.
>
> Cc: Andy Lutomirski <[email protected]>
> Cc: Cathy Zhang <[email protected]>
> Cc: Dave Hansen <[email protected]>
> Cc: Fenghua Yu <[email protected]>
> Cc: "H. Peter Anvin" <[email protected]>
> Cc: Kyung Min Park <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: "Ravi V. Shankar" <[email protected]>
> Cc: Sean Christopherson <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Suggested-by: Andy Lutomirski <[email protected]>
> Signed-off-by: Ricardo Neri <[email protected]>
> ---
> This is a v2 from my initial submission [1]. The first three patches of
> the series have been merged in Linus' tree. Hence, I am submitting only
> this patch for review.
>
> [1]. https://lkml.org/lkml/2020/7/27/8
>
> Changes since v1:
> * Support SERIALIZE using alternative runtime patching.
> (Peter Zijlstra, H. Peter Anvin)
> * Added a note to specify which version of binutils supports SERIALIZE.
> (Peter Zijlstra)
> * Verified that (::: "memory") is used. (H. Peter Anvin)
> ---
> arch/x86/include/asm/special_insns.h | 2 ++
> arch/x86/include/asm/sync_core.h | 10 +++++++++-
> 2 files changed, 11 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/include/asm/special_insns.h b/arch/x86/include/asm/special_insns.h
> index 59a3e13204c3..25cd67801dda 100644
> --- a/arch/x86/include/asm/special_insns.h
> +++ b/arch/x86/include/asm/special_insns.h
> @@ -10,6 +10,8 @@
> #include <linux/irqflags.h>
> #include <linux/jump_label.h>
>
> +/* Instruction opcode for SERIALIZE; supported in binutils >= 2.35. */
> +#define __ASM_SERIALIZE ".byte 0xf, 0x1, 0xe8"
> /*
> * Volatile isn't enough to prevent the compiler from reordering the
> * read/write functions for the control registers and messing everything up.
> diff --git a/arch/x86/include/asm/sync_core.h b/arch/x86/include/asm/sync_core.h
> index fdb5b356e59b..201ea3d9a6bd 100644
> --- a/arch/x86/include/asm/sync_core.h
> +++ b/arch/x86/include/asm/sync_core.h
> @@ -5,15 +5,19 @@
> #include <linux/preempt.h>
> #include <asm/processor.h>
> #include <asm/cpufeature.h>
> +#include <asm/special_insns.h>
>
> #ifdef CONFIG_X86_32
> static inline void iret_to_self(void)
> {
> asm volatile (
> + ALTERNATIVE("", "jmp 2f", X86_FEATURE_SERIALIZE)
> "pushfl\n\t"
> "pushl %%cs\n\t"
> "pushl $1f\n\t"
> "iret\n\t"
> + "2:\n\t"
> + __ASM_SERIALIZE "\n"
> "1:"
> : ASM_CALL_CONSTRAINT : : "memory");
> }
> @@ -23,6 +27,7 @@ static inline void iret_to_self(void)
> unsigned int tmp;
>
> asm volatile (
> + ALTERNATIVE("", "jmp 2f", X86_FEATURE_SERIALIZE)

Why is this and above stuck inside the asm statement?

Why can't you simply do:

if (static_cpu_has(X86_FEATURE_SERIALIZE)) {
asm volatile(__ASM_SERIALIZE ::: "memory");
return;
}

on function entry instead of making it more unreadable for no particular
reason?

--
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Mary Higgins, Sri Rasiah, HRB 21284 (AG Nürnberg)
--

2020-08-05 04:59:47

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH v2] x86/cpu: Use SERIALIZE in sync_core() when available

On August 4, 2020 9:48:40 PM PDT, Borislav Petkov <[email protected]> wrote:
>On Tue, Aug 04, 2020 at 07:10:59PM -0700, Ricardo Neri wrote:
>> The SERIALIZE instruction gives software a way to force the processor
>to
>> complete all modifications to flags, registers and memory from
>previous
>> instructions and drain all buffered writes to memory before the next
>> instruction is fetched and executed. Thus, it serves the purpose of
>> sync_core(). Use it when available.
>>
>> Commit 7117f16bf460 ("objtool: Fix ORC vs alternatives") enforced
>stack
>> invariance in alternatives. The iret-to-self does not comply with
>such
>> invariance. Thus, it cannot be used inside alternative code. Instead,
>use
>> an alternative that jumps to SERIALIZE when available.
>>
>> Cc: Andy Lutomirski <[email protected]>
>> Cc: Cathy Zhang <[email protected]>
>> Cc: Dave Hansen <[email protected]>
>> Cc: Fenghua Yu <[email protected]>
>> Cc: "H. Peter Anvin" <[email protected]>
>> Cc: Kyung Min Park <[email protected]>
>> Cc: Peter Zijlstra <[email protected]>
>> Cc: "Ravi V. Shankar" <[email protected]>
>> Cc: Sean Christopherson <[email protected]>
>> Cc: [email protected]
>> Cc: [email protected]
>> Suggested-by: Andy Lutomirski <[email protected]>
>> Signed-off-by: Ricardo Neri <[email protected]>
>> ---
>> This is a v2 from my initial submission [1]. The first three patches
>of
>> the series have been merged in Linus' tree. Hence, I am submitting
>only
>> this patch for review.
>>
>> [1]. https://lkml.org/lkml/2020/7/27/8
>>
>> Changes since v1:
>> * Support SERIALIZE using alternative runtime patching.
>> (Peter Zijlstra, H. Peter Anvin)
>> * Added a note to specify which version of binutils supports
>SERIALIZE.
>> (Peter Zijlstra)
>> * Verified that (::: "memory") is used. (H. Peter Anvin)
>> ---
>> arch/x86/include/asm/special_insns.h | 2 ++
>> arch/x86/include/asm/sync_core.h | 10 +++++++++-
>> 2 files changed, 11 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/x86/include/asm/special_insns.h
>b/arch/x86/include/asm/special_insns.h
>> index 59a3e13204c3..25cd67801dda 100644
>> --- a/arch/x86/include/asm/special_insns.h
>> +++ b/arch/x86/include/asm/special_insns.h
>> @@ -10,6 +10,8 @@
>> #include <linux/irqflags.h>
>> #include <linux/jump_label.h>
>>
>> +/* Instruction opcode for SERIALIZE; supported in binutils >= 2.35.
>*/
>> +#define __ASM_SERIALIZE ".byte 0xf, 0x1, 0xe8"
>> /*
>> * Volatile isn't enough to prevent the compiler from reordering the
>> * read/write functions for the control registers and messing
>everything up.
>> diff --git a/arch/x86/include/asm/sync_core.h
>b/arch/x86/include/asm/sync_core.h
>> index fdb5b356e59b..201ea3d9a6bd 100644
>> --- a/arch/x86/include/asm/sync_core.h
>> +++ b/arch/x86/include/asm/sync_core.h
>> @@ -5,15 +5,19 @@
>> #include <linux/preempt.h>
>> #include <asm/processor.h>
>> #include <asm/cpufeature.h>
>> +#include <asm/special_insns.h>
>>
>> #ifdef CONFIG_X86_32
>> static inline void iret_to_self(void)
>> {
>> asm volatile (
>> + ALTERNATIVE("", "jmp 2f", X86_FEATURE_SERIALIZE)
>> "pushfl\n\t"
>> "pushl %%cs\n\t"
>> "pushl $1f\n\t"
>> "iret\n\t"
>> + "2:\n\t"
>> + __ASM_SERIALIZE "\n"
>> "1:"
>> : ASM_CALL_CONSTRAINT : : "memory");
>> }
>> @@ -23,6 +27,7 @@ static inline void iret_to_self(void)
>> unsigned int tmp;
>>
>> asm volatile (
>> + ALTERNATIVE("", "jmp 2f", X86_FEATURE_SERIALIZE)
>
>Why is this and above stuck inside the asm statement?
>
>Why can't you simply do:
>
> if (static_cpu_has(X86_FEATURE_SERIALIZE)) {
> asm volatile(__ASM_SERIALIZE ::: "memory");
> return;
> }
>
>on function entry instead of making it more unreadable for no
>particular
>reason?

Because why use an alternative to jump over one instruction?

I personally would prefer to have the IRET put out of line and have the call/jmp replaced by SERIALIZE inline.

--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

2020-08-05 05:09:03

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v2] x86/cpu: Use SERIALIZE in sync_core() when available

On Tue, Aug 04, 2020 at 09:58:25PM -0700, [email protected] wrote:
> Because why use an alternative to jump over one instruction?
>
> I personally would prefer to have the IRET put out of line

Can't yet - SERIALIZE CPUs are a minority at the moment.

> and have the call/jmp replaced by SERIALIZE inline.

Well, we could do:

alternative_io("... IRET bunch", __ASM_SERIALIZE, X86_FEATURE_SERIALIZE, ...);

and avoid all kinds of jumping. Alternatives get padded so there
would be a couple of NOPs following when SERIALIZE gets patched in
but it shouldn't be a problem. I guess one needs to look at what gcc
generates...

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2020-08-05 05:13:24

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH v2] x86/cpu: Use SERIALIZE in sync_core() when available

On August 4, 2020 10:08:08 PM PDT, Borislav Petkov <[email protected]> wrote:
>On Tue, Aug 04, 2020 at 09:58:25PM -0700, [email protected] wrote:
>> Because why use an alternative to jump over one instruction?
>>
>> I personally would prefer to have the IRET put out of line
>
>Can't yet - SERIALIZE CPUs are a minority at the moment.
>
>> and have the call/jmp replaced by SERIALIZE inline.
>
>Well, we could do:
>
> alternative_io("... IRET bunch", __ASM_SERIALIZE,
>X86_FEATURE_SERIALIZE, ...);
>
>and avoid all kinds of jumping. Alternatives get padded so there
>would be a couple of NOPs following when SERIALIZE gets patched in
>but it shouldn't be a problem. I guess one needs to look at what gcc
>generates...

I didn't say behind a trap. IRET is a control transfer instruction, and slow, so putting it out of line really isn't unreasonable. Can even do a call to a common handler.
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

2020-08-05 19:12:17

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH v2] x86/cpu: Use SERIALIZE in sync_core() when available

On Wed, Aug 5, 2020 at 10:07 AM Ricardo Neri
<[email protected]> wrote:
>
> On Wed, Aug 05, 2020 at 07:08:08AM +0200, Borislav Petkov wrote:
> > On Tue, Aug 04, 2020 at 09:58:25PM -0700, [email protected] wrote:
> > > Because why use an alternative to jump over one instruction?
> > >
> > > I personally would prefer to have the IRET put out of line
> >
> > Can't yet - SERIALIZE CPUs are a minority at the moment.
> >
> > > and have the call/jmp replaced by SERIALIZE inline.
> >
> > Well, we could do:
> >
> > alternative_io("... IRET bunch", __ASM_SERIALIZE, X86_FEATURE_SERIALIZE, ...);
> >
> > and avoid all kinds of jumping. Alternatives get padded so there
> > would be a couple of NOPs following when SERIALIZE gets patched in
> > but it shouldn't be a problem. I guess one needs to look at what gcc
> > generates...
>
> But the IRET-TO-SELF code has instruction which modify the stack. This
> would violate stack invariance in alternatives as enforced in commit
> 7117f16bf460 ("objtool: Fix ORC vs alternatives"). As a result, objtool
> gives warnings as follows:
>
> arch/x86/kernel/alternative.o: warning: objtool: do_sync_core()+0xe:
> alternative modifies stack
>
> Perhaps in this specific case it does not matter as the changes in the
> stack will be undone by IRET. However, using alternative_io would require
> adding the macro STACK_FRAME_NON_STANDARD to functions using sync_core().
> IMHO, it wouldn't look good.
>
> So maybe the best approach is to implement as you suggested using
> static_cpu_has()?

I agree. Let's keep it simple.

Honestly, I think the right solution is to have iret_to_self() in
actual asm and invoke it from C as needed. IRET is *slow* -- trying
to optimize it at all is silly. The big optimization was switching
from CPUID to IRET, since CPUID is slooooooooooooooooooow in virtual
environments, whereas IRET is merely sloooooooow and SERIALIZE is
probably just sloooow.

(I once benchmarked it. IIRC the winning version on my laptop is MOV
to CR2 on bare metal and IRET in a Xen PV guest. This optimization
was not obviously worthwhile.)

>
> Thanks and BR,
> Ricardo

2020-08-05 19:15:30

by Ricardo Neri

[permalink] [raw]
Subject: Re: [PATCH v2] x86/cpu: Use SERIALIZE in sync_core() when available

On Wed, Aug 05, 2020 at 11:28:31AM -0700, Andy Lutomirski wrote:
> On Wed, Aug 5, 2020 at 10:07 AM Ricardo Neri
> <[email protected]> wrote:
> >
> > On Wed, Aug 05, 2020 at 07:08:08AM +0200, Borislav Petkov wrote:
> > > On Tue, Aug 04, 2020 at 09:58:25PM -0700, [email protected] wrote:
> > > > Because why use an alternative to jump over one instruction?
> > > >
> > > > I personally would prefer to have the IRET put out of line
> > >
> > > Can't yet - SERIALIZE CPUs are a minority at the moment.
> > >
> > > > and have the call/jmp replaced by SERIALIZE inline.
> > >
> > > Well, we could do:
> > >
> > > alternative_io("... IRET bunch", __ASM_SERIALIZE, X86_FEATURE_SERIALIZE, ...);
> > >
> > > and avoid all kinds of jumping. Alternatives get padded so there
> > > would be a couple of NOPs following when SERIALIZE gets patched in
> > > but it shouldn't be a problem. I guess one needs to look at what gcc
> > > generates...
> >
> > But the IRET-TO-SELF code has instruction which modify the stack. This
> > would violate stack invariance in alternatives as enforced in commit
> > 7117f16bf460 ("objtool: Fix ORC vs alternatives"). As a result, objtool
> > gives warnings as follows:
> >
> > arch/x86/kernel/alternative.o: warning: objtool: do_sync_core()+0xe:
> > alternative modifies stack
> >
> > Perhaps in this specific case it does not matter as the changes in the
> > stack will be undone by IRET. However, using alternative_io would require
> > adding the macro STACK_FRAME_NON_STANDARD to functions using sync_core().
> > IMHO, it wouldn't look good.
> >
> > So maybe the best approach is to implement as you suggested using
> > static_cpu_has()?
>
> I agree. Let's keep it simple.
>
> Honestly, I think the right solution is to have iret_to_self() in
> actual asm and invoke it from C as needed.

Do you mean anything different from what we have already [1]? If I
understand your comment correctly, we have exactly that: an
iret_to_self() asm implementation invoked from C.

Thanks and BR,
Ricardo

[1]. https://lore.kernel.org/lkml/[email protected]/

Thanks and BR,
Ricardo

2020-08-05 19:47:47

by Ricardo Neri

[permalink] [raw]
Subject: Re: [PATCH v2] x86/cpu: Use SERIALIZE in sync_core() when available

On Wed, Aug 05, 2020 at 07:08:08AM +0200, Borislav Petkov wrote:
> On Tue, Aug 04, 2020 at 09:58:25PM -0700, [email protected] wrote:
> > Because why use an alternative to jump over one instruction?
> >
> > I personally would prefer to have the IRET put out of line
>
> Can't yet - SERIALIZE CPUs are a minority at the moment.
>
> > and have the call/jmp replaced by SERIALIZE inline.
>
> Well, we could do:
>
> alternative_io("... IRET bunch", __ASM_SERIALIZE, X86_FEATURE_SERIALIZE, ...);
>
> and avoid all kinds of jumping. Alternatives get padded so there
> would be a couple of NOPs following when SERIALIZE gets patched in
> but it shouldn't be a problem. I guess one needs to look at what gcc
> generates...

But the IRET-TO-SELF code has instruction which modify the stack. This
would violate stack invariance in alternatives as enforced in commit
7117f16bf460 ("objtool: Fix ORC vs alternatives"). As a result, objtool
gives warnings as follows:

arch/x86/kernel/alternative.o: warning: objtool: do_sync_core()+0xe:
alternative modifies stack

Perhaps in this specific case it does not matter as the changes in the
stack will be undone by IRET. However, using alternative_io would require
adding the macro STACK_FRAME_NON_STANDARD to functions using sync_core().
IMHO, it wouldn't look good.

So maybe the best approach is to implement as you suggested using
static_cpu_has()?

Thanks and BR,
Ricardo

2020-08-05 20:18:35

by Ricardo Neri

[permalink] [raw]
Subject: Re: [PATCH v2] x86/cpu: Use SERIALIZE in sync_core() when available

On Wed, Aug 05, 2020 at 06:48:40AM +0200, Borislav Petkov wrote:
> On Tue, Aug 04, 2020 at 07:10:59PM -0700, Ricardo Neri wrote:
> > The SERIALIZE instruction gives software a way to force the processor to
> > complete all modifications to flags, registers and memory from previous
> > instructions and drain all buffered writes to memory before the next
> > instruction is fetched and executed. Thus, it serves the purpose of
> > sync_core(). Use it when available.
> >
> > Commit 7117f16bf460 ("objtool: Fix ORC vs alternatives") enforced stack
> > invariance in alternatives. The iret-to-self does not comply with such
> > invariance. Thus, it cannot be used inside alternative code. Instead, use
> > an alternative that jumps to SERIALIZE when available.
> >
> > Cc: Andy Lutomirski <[email protected]>
> > Cc: Cathy Zhang <[email protected]>
> > Cc: Dave Hansen <[email protected]>
> > Cc: Fenghua Yu <[email protected]>
> > Cc: "H. Peter Anvin" <[email protected]>
> > Cc: Kyung Min Park <[email protected]>
> > Cc: Peter Zijlstra <[email protected]>
> > Cc: "Ravi V. Shankar" <[email protected]>
> > Cc: Sean Christopherson <[email protected]>
> > Cc: [email protected]
> > Cc: [email protected]
> > Suggested-by: Andy Lutomirski <[email protected]>
> > Signed-off-by: Ricardo Neri <[email protected]>
> > ---
> > This is a v2 from my initial submission [1]. The first three patches of
> > the series have been merged in Linus' tree. Hence, I am submitting only
> > this patch for review.
> >
> > [1]. https://lkml.org/lkml/2020/7/27/8
> >
> > Changes since v1:
> > * Support SERIALIZE using alternative runtime patching.
> > (Peter Zijlstra, H. Peter Anvin)
> > * Added a note to specify which version of binutils supports SERIALIZE.
> > (Peter Zijlstra)
> > * Verified that (::: "memory") is used. (H. Peter Anvin)
> > ---
> > arch/x86/include/asm/special_insns.h | 2 ++
> > arch/x86/include/asm/sync_core.h | 10 +++++++++-
> > 2 files changed, 11 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/x86/include/asm/special_insns.h b/arch/x86/include/asm/special_insns.h
> > index 59a3e13204c3..25cd67801dda 100644
> > --- a/arch/x86/include/asm/special_insns.h
> > +++ b/arch/x86/include/asm/special_insns.h
> > @@ -10,6 +10,8 @@
> > #include <linux/irqflags.h>
> > #include <linux/jump_label.h>
> >
> > +/* Instruction opcode for SERIALIZE; supported in binutils >= 2.35. */
> > +#define __ASM_SERIALIZE ".byte 0xf, 0x1, 0xe8"
> > /*
> > * Volatile isn't enough to prevent the compiler from reordering the
> > * read/write functions for the control registers and messing everything up.
> > diff --git a/arch/x86/include/asm/sync_core.h b/arch/x86/include/asm/sync_core.h
> > index fdb5b356e59b..201ea3d9a6bd 100644
> > --- a/arch/x86/include/asm/sync_core.h
> > +++ b/arch/x86/include/asm/sync_core.h
> > @@ -5,15 +5,19 @@
> > #include <linux/preempt.h>
> > #include <asm/processor.h>
> > #include <asm/cpufeature.h>
> > +#include <asm/special_insns.h>
> >
> > #ifdef CONFIG_X86_32
> > static inline void iret_to_self(void)
> > {
> > asm volatile (
> > + ALTERNATIVE("", "jmp 2f", X86_FEATURE_SERIALIZE)
> > "pushfl\n\t"
> > "pushl %%cs\n\t"
> > "pushl $1f\n\t"
> > "iret\n\t"
> > + "2:\n\t"
> > + __ASM_SERIALIZE "\n"
> > "1:"
> > : ASM_CALL_CONSTRAINT : : "memory");
> > }
> > @@ -23,6 +27,7 @@ static inline void iret_to_self(void)
> > unsigned int tmp;
> >
> > asm volatile (
> > + ALTERNATIVE("", "jmp 2f", X86_FEATURE_SERIALIZE)
>
> Why is this and above stuck inside the asm statement?
>
> Why can't you simply do:
>
> if (static_cpu_has(X86_FEATURE_SERIALIZE)) {
> asm volatile(__ASM_SERIALIZE ::: "memory");
> return;
> }
>
> on function entry instead of making it more unreadable for no particular
> reason?

My my first submission I had implemented it as you describe. The only
difference was that I used boot_cpu_has() instead of static_cpu_has()
as the latter has a comment stating that:
"Use static_cpu_has() only in fast paths (...) boot_cpu_has() is
already fast enough for the majority of cases..."

sync_core_before_usermode() already handles what I think are the
critical paths.

Thanks and BR,
Ricardo

2020-08-05 20:31:12

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH v2] x86/cpu: Use SERIALIZE in sync_core() when available



> On Aug 5, 2020, at 12:11 PM, Ricardo Neri <[email protected]> wrote:
>
> On Wed, Aug 05, 2020 at 11:28:31AM -0700, Andy Lutomirski wrote:
>>> On Wed, Aug 5, 2020 at 10:07 AM Ricardo Neri
>>> <[email protected]> wrote:
>>>
>>> On Wed, Aug 05, 2020 at 07:08:08AM +0200, Borislav Petkov wrote:
>>>> On Tue, Aug 04, 2020 at 09:58:25PM -0700, [email protected] wrote:
>>>>> Because why use an alternative to jump over one instruction?
>>>>>
>>>>> I personally would prefer to have the IRET put out of line
>>>>
>>>> Can't yet - SERIALIZE CPUs are a minority at the moment.
>>>>
>>>>> and have the call/jmp replaced by SERIALIZE inline.
>>>>
>>>> Well, we could do:
>>>>
>>>> alternative_io("... IRET bunch", __ASM_SERIALIZE, X86_FEATURE_SERIALIZE, ...);
>>>>
>>>> and avoid all kinds of jumping. Alternatives get padded so there
>>>> would be a couple of NOPs following when SERIALIZE gets patched in
>>>> but it shouldn't be a problem. I guess one needs to look at what gcc
>>>> generates...
>>>
>>> But the IRET-TO-SELF code has instruction which modify the stack. This
>>> would violate stack invariance in alternatives as enforced in commit
>>> 7117f16bf460 ("objtool: Fix ORC vs alternatives"). As a result, objtool
>>> gives warnings as follows:
>>>
>>> arch/x86/kernel/alternative.o: warning: objtool: do_sync_core()+0xe:
>>> alternative modifies stack
>>>
>>> Perhaps in this specific case it does not matter as the changes in the
>>> stack will be undone by IRET. However, using alternative_io would require
>>> adding the macro STACK_FRAME_NON_STANDARD to functions using sync_core().
>>> IMHO, it wouldn't look good.
>>>
>>> So maybe the best approach is to implement as you suggested using
>>> static_cpu_has()?
>>
>> I agree. Let's keep it simple.
>>
>> Honestly, I think the right solution is to have iret_to_self() in
>> actual asm and invoke it from C as needed.
>
> Do you mean anything different from what we have already [1]? If I
> understand your comment correctly, we have exactly that: an
> iret_to_self() asm implementation invoked from C.

I meant asm as in a .S file. But the code we have is fine for this purpose, at least for now.

>
> Thanks and BR,
> Ricardo
>
> [1]. https://lore.kernel.org/lkml/[email protected]/
>
> Thanks and BR,
> Ricardo

2020-08-05 22:21:18

by Tony Luck

[permalink] [raw]
Subject: RE: [PATCH v2] x86/cpu: Use SERIALIZE in sync_core() when available

> I meant asm as in a .S file. But the code we have is fine for this purpose, at least for now.

There seem to be some drivers that call sync_core:

drivers/misc/sgi-gru/grufault.c: sync_core();
drivers/misc/sgi-gru/grufault.c: sync_core(); /* make sure we are have current data */
drivers/misc/sgi-gru/gruhandles.c: sync_core();
drivers/misc/sgi-gru/gruhandles.c: sync_core();
drivers/misc/sgi-gru/grukservices.c: sync_core();

So if you go this path some day be sure to EXPORT the iret_to_self() function.

-Tony