2024-01-24 10:40:31

by Uros Bizjak

[permalink] [raw]
Subject: [PATCH] x86/asm: Use 32bit xor to clear registers

x86_64 zero extends 32bit operations, so for 64bit operands,
XORL r32,r32 is functionally equal to XORQ r64,r64, but avoids
a REX prefix byte when legacy registers are used.

Signed-off-by: Uros Bizjak <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
---
arch/x86/kernel/head_64.S | 6 +++---
arch/x86/kernel/sev_verify_cbit.S | 2 +-
2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index cc3a81852e4a..ed287170c126 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -171,7 +171,7 @@ SYM_INNER_LABEL(secondary_startup_64_no_verify, SYM_L_GLOBAL)
ANNOTATE_NOENDBR

/* Clear %R15 which holds the boot_params pointer on the boot CPU */
- xorq %r15, %r15
+ xorl %r15d, %r15d

/*
* Retrieve the modifier (SME encryption mask if SME is active) to be
@@ -180,7 +180,7 @@ SYM_INNER_LABEL(secondary_startup_64_no_verify, SYM_L_GLOBAL)
#ifdef CONFIG_AMD_MEM_ENCRYPT
movq sme_me_mask, %rax
#else
- xorq %rax, %rax
+ xorl %eax, %eax
#endif

/* Form the CR3 value being sure to include the CR3 modifier */
@@ -297,7 +297,7 @@ SYM_INNER_LABEL(secondary_startup_64_no_verify, SYM_L_GLOBAL)

.Llookup_AP:
/* EAX contains the APIC ID of the current CPU */
- xorq %rcx, %rcx
+ xorl %ecx, %ecx
leaq cpuid_to_apicid(%rip), %rbx

.Lfind_cpunr:
diff --git a/arch/x86/kernel/sev_verify_cbit.S b/arch/x86/kernel/sev_verify_cbit.S
index 3355e27c69eb..1ab65f6c6ae7 100644
--- a/arch/x86/kernel/sev_verify_cbit.S
+++ b/arch/x86/kernel/sev_verify_cbit.S
@@ -77,7 +77,7 @@ SYM_FUNC_START(sev_verify_cbit)
* The check failed, prevent any forward progress to prevent ROP
* attacks, invalidate the stack and go into a hlt loop.
*/
- xorq %rsp, %rsp
+ xorl %esp, %esp
subq $0x1000, %rsp
2: hlt
jmp 2b
--
2.31.1



Subject: [tip: x86/boot] x86/boot: Use 32-bit XOR to clear registers

The following commit has been merged into the x86/boot branch of tip:

Commit-ID: 721f791ce1cddfa5f2bf524ac14741bfa0f72697
Gitweb: https://git.kernel.org/tip/721f791ce1cddfa5f2bf524ac14741bfa0f72697
Author: Uros Bizjak <[email protected]>
AuthorDate: Wed, 24 Jan 2024 11:38:59 +01:00
Committer: Ingo Molnar <[email protected]>
CommitterDate: Fri, 01 Mar 2024 12:47:37 +01:00

x86/boot: Use 32-bit XOR to clear registers

x86_64 zero extends 32-bit operations, so for 64-bit operands,
XORL r32,r32 is functionally equal to XORQ r64,r64, but avoids
a REX prefix byte when legacy registers are used.

Slightly smaller code generated, no change in functionality.

Signed-off-by: Uros Bizjak <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: Denys Vlasenko <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
arch/x86/kernel/head_64.S | 6 +++---
arch/x86/kernel/sev_verify_cbit.S | 2 +-
2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index d295bf6..86136a7 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -169,7 +169,7 @@ SYM_INNER_LABEL(secondary_startup_64_no_verify, SYM_L_GLOBAL)
ANNOTATE_NOENDBR

/* Clear %R15 which holds the boot_params pointer on the boot CPU */
- xorq %r15, %r15
+ xorl %r15d, %r15d

/*
* Retrieve the modifier (SME encryption mask if SME is active) to be
@@ -178,7 +178,7 @@ SYM_INNER_LABEL(secondary_startup_64_no_verify, SYM_L_GLOBAL)
#ifdef CONFIG_AMD_MEM_ENCRYPT
movq sme_me_mask, %rax
#else
- xorq %rax, %rax
+ xorl %eax, %eax
#endif

/* Form the CR3 value being sure to include the CR3 modifier */
@@ -295,7 +295,7 @@ SYM_INNER_LABEL(secondary_startup_64_no_verify, SYM_L_GLOBAL)

.Llookup_AP:
/* EAX contains the APIC ID of the current CPU */
- xorq %rcx, %rcx
+ xorl %ecx, %ecx
leaq cpuid_to_apicid(%rip), %rbx

.Lfind_cpunr:
diff --git a/arch/x86/kernel/sev_verify_cbit.S b/arch/x86/kernel/sev_verify_cbit.S
index 3355e27..1ab65f6 100644
--- a/arch/x86/kernel/sev_verify_cbit.S
+++ b/arch/x86/kernel/sev_verify_cbit.S
@@ -77,7 +77,7 @@ SYM_FUNC_START(sev_verify_cbit)
* The check failed, prevent any forward progress to prevent ROP
* attacks, invalidate the stack and go into a hlt loop.
*/
- xorq %rsp, %rsp
+ xorl %esp, %esp
subq $0x1000, %rsp
2: hlt
jmp 2b

2024-03-01 12:45:18

by Ard Biesheuvel

[permalink] [raw]
Subject: Re: [tip: x86/boot] x86/boot: Use 32-bit XOR to clear registers

On Fri, 1 Mar 2024 at 13:39, tip-bot2 for Uros Bizjak
<[email protected]> wrote:
>
> The following commit has been merged into the x86/boot branch of tip:
>
> Commit-ID: 721f791ce1cddfa5f2bf524ac14741bfa0f72697
> Gitweb: https://git.kernel.org/tip/721f791ce1cddfa5f2bf524ac14741bfa0f72697
> Author: Uros Bizjak <[email protected]>
> AuthorDate: Wed, 24 Jan 2024 11:38:59 +01:00
> Committer: Ingo Molnar <[email protected]>
> CommitterDate: Fri, 01 Mar 2024 12:47:37 +01:00
>
> x86/boot: Use 32-bit XOR to clear registers
>
> x86_64 zero extends 32-bit operations, so for 64-bit operands,
> XORL r32,r32 is functionally equal to XORQ r64,r64, but avoids
> a REX prefix byte when legacy registers are used.
>

.. and so this change is pointless churn when not using legacy
registers, right?

> Slightly smaller code generated, no change in functionality.
>
> Signed-off-by: Uros Bizjak <[email protected]>
> Signed-off-by: Ingo Molnar <[email protected]>
> Cc: Andy Lutomirski <[email protected]>
> Cc: Brian Gerst <[email protected]>
> Cc: Denys Vlasenko <[email protected]>
> Cc: H. Peter Anvin <[email protected]>
> Cc: Linus Torvalds <[email protected]>
> Cc: Josh Poimboeuf <[email protected]>
> Cc: Ard Biesheuvel <[email protected]>
> Link: https://lore.kernel.org/r/[email protected]
> ---
> arch/x86/kernel/head_64.S | 6 +++---
> arch/x86/kernel/sev_verify_cbit.S | 2 +-
> 2 files changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
> index d295bf6..86136a7 100644
> --- a/arch/x86/kernel/head_64.S
> +++ b/arch/x86/kernel/head_64.S
> @@ -169,7 +169,7 @@ SYM_INNER_LABEL(secondary_startup_64_no_verify, SYM_L_GLOBAL)
> ANNOTATE_NOENDBR
>
> /* Clear %R15 which holds the boot_params pointer on the boot CPU */
> - xorq %r15, %r15
> + xorl %r15d, %r15d
>

0: 4d 31 ff xor %r15,%r15
3: 45 31 ff xor %r15d,%r15d


> /*
> * Retrieve the modifier (SME encryption mask if SME is active) to be
> @@ -178,7 +178,7 @@ SYM_INNER_LABEL(secondary_startup_64_no_verify, SYM_L_GLOBAL)
> #ifdef CONFIG_AMD_MEM_ENCRYPT
> movq sme_me_mask, %rax
> #else
> - xorq %rax, %rax
> + xorl %eax, %eax
> #endif
>

This conflicts with my RIP-relative boot cleanup series.

> /* Form the CR3 value being sure to include the CR3 modifier */
> @@ -295,7 +295,7 @@ SYM_INNER_LABEL(secondary_startup_64_no_verify, SYM_L_GLOBAL)
>
> .Llookup_AP:
> /* EAX contains the APIC ID of the current CPU */
> - xorq %rcx, %rcx
> + xorl %ecx, %ecx
> leaq cpuid_to_apicid(%rip), %rbx
>
> .Lfind_cpunr:
> diff --git a/arch/x86/kernel/sev_verify_cbit.S b/arch/x86/kernel/sev_verify_cbit.S
> index 3355e27..1ab65f6 100644
> --- a/arch/x86/kernel/sev_verify_cbit.S
> +++ b/arch/x86/kernel/sev_verify_cbit.S
> @@ -77,7 +77,7 @@ SYM_FUNC_START(sev_verify_cbit)
> * The check failed, prevent any forward progress to prevent ROP
> * attacks, invalidate the stack and go into a hlt loop.
> */
> - xorq %rsp, %rsp
> + xorl %esp, %esp
> subq $0x1000, %rsp
> 2: hlt
> jmp 2b

2024-03-01 12:51:38

by Uros Bizjak

[permalink] [raw]
Subject: Re: [tip: x86/boot] x86/boot: Use 32-bit XOR to clear registers

On Fri, Mar 1, 2024 at 1:45 PM Ard Biesheuvel <[email protected]> wrote:
>
> On Fri, 1 Mar 2024 at 13:39, tip-bot2 for Uros Bizjak
> <[email protected]> wrote:
> >
> > The following commit has been merged into the x86/boot branch of tip:
> >
> > Commit-ID: 721f791ce1cddfa5f2bf524ac14741bfa0f72697
> > Gitweb: https://git.kernel.org/tip/721f791ce1cddfa5f2bf524ac14741bfa0f72697
> > Author: Uros Bizjak <[email protected]>
> > AuthorDate: Wed, 24 Jan 2024 11:38:59 +01:00
> > Committer: Ingo Molnar <[email protected]>
> > CommitterDate: Fri, 01 Mar 2024 12:47:37 +01:00
> >
> > x86/boot: Use 32-bit XOR to clear registers
> >
> > x86_64 zero extends 32-bit operations, so for 64-bit operands,
> > XORL r32,r32 is functionally equal to XORQ r64,r64, but avoids
> > a REX prefix byte when legacy registers are used.
> >
>
> ... and so this change is pointless churn when not using legacy
> registers, right?

Although there is no code size change with REX registers, it would
look weird to use XORQ with REX registers and XORL with legacy regs.
Please see arch/x86/kvm/{vmx,svm}/vmenter.S where this approach is
also used.

Uros.

> > Slightly smaller code generated, no change in functionality.
> >
> > Signed-off-by: Uros Bizjak <[email protected]>
> > Signed-off-by: Ingo Molnar <[email protected]>
> > Cc: Andy Lutomirski <[email protected]>
> > Cc: Brian Gerst <[email protected]>
> > Cc: Denys Vlasenko <[email protected]>
> > Cc: H. Peter Anvin <[email protected]>
> > Cc: Linus Torvalds <[email protected]>
> > Cc: Josh Poimboeuf <[email protected]>
> > Cc: Ard Biesheuvel <[email protected]>
> > Link: https://lore.kernel.org/r/[email protected]
> > ---
> > arch/x86/kernel/head_64.S | 6 +++---
> > arch/x86/kernel/sev_verify_cbit.S | 2 +-
> > 2 files changed, 4 insertions(+), 4 deletions(-)
> >
> > diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
> > index d295bf6..86136a7 100644
> > --- a/arch/x86/kernel/head_64.S
> > +++ b/arch/x86/kernel/head_64.S
> > @@ -169,7 +169,7 @@ SYM_INNER_LABEL(secondary_startup_64_no_verify, SYM_L_GLOBAL)
> > ANNOTATE_NOENDBR
> >
> > /* Clear %R15 which holds the boot_params pointer on the boot CPU */
> > - xorq %r15, %r15
> > + xorl %r15d, %r15d
> >
>
> 0: 4d 31 ff xor %r15,%r15
> 3: 45 31 ff xor %r15d,%r15d
>
>
> > /*
> > * Retrieve the modifier (SME encryption mask if SME is active) to be
> > @@ -178,7 +178,7 @@ SYM_INNER_LABEL(secondary_startup_64_no_verify, SYM_L_GLOBAL)
> > #ifdef CONFIG_AMD_MEM_ENCRYPT
> > movq sme_me_mask, %rax
> > #else
> > - xorq %rax, %rax
> > + xorl %eax, %eax
> > #endif
> >
>
> This conflicts with my RIP-relative boot cleanup series.
>
> > /* Form the CR3 value being sure to include the CR3 modifier */
> > @@ -295,7 +295,7 @@ SYM_INNER_LABEL(secondary_startup_64_no_verify, SYM_L_GLOBAL)
> >
> > .Llookup_AP:
> > /* EAX contains the APIC ID of the current CPU */
> > - xorq %rcx, %rcx
> > + xorl %ecx, %ecx
> > leaq cpuid_to_apicid(%rip), %rbx
> >
> > .Lfind_cpunr:
> > diff --git a/arch/x86/kernel/sev_verify_cbit.S b/arch/x86/kernel/sev_verify_cbit.S
> > index 3355e27..1ab65f6 100644
> > --- a/arch/x86/kernel/sev_verify_cbit.S
> > +++ b/arch/x86/kernel/sev_verify_cbit.S
> > @@ -77,7 +77,7 @@ SYM_FUNC_START(sev_verify_cbit)
> > * The check failed, prevent any forward progress to prevent ROP
> > * attacks, invalidate the stack and go into a hlt loop.
> > */
> > - xorq %rsp, %rsp
> > + xorl %esp, %esp
> > subq $0x1000, %rsp
> > 2: hlt
> > jmp 2b

2024-03-01 13:10:49

by Ard Biesheuvel

[permalink] [raw]
Subject: Re: [tip: x86/boot] x86/boot: Use 32-bit XOR to clear registers

On Fri, 1 Mar 2024 at 13:51, Uros Bizjak <[email protected]> wrote:
>
> On Fri, Mar 1, 2024 at 1:45 PM Ard Biesheuvel <[email protected]> wrote:
> >
> > On Fri, 1 Mar 2024 at 13:39, tip-bot2 for Uros Bizjak
> > <[email protected]> wrote:
> > >
> > > The following commit has been merged into the x86/boot branch of tip:
> > >
> > > Commit-ID: 721f791ce1cddfa5f2bf524ac14741bfa0f72697
> > > Gitweb: https://git.kernel.org/tip/721f791ce1cddfa5f2bf524ac14741bfa0f72697
> > > Author: Uros Bizjak <[email protected]>
> > > AuthorDate: Wed, 24 Jan 2024 11:38:59 +01:00
> > > Committer: Ingo Molnar <[email protected]>
> > > CommitterDate: Fri, 01 Mar 2024 12:47:37 +01:00
> > >
> > > x86/boot: Use 32-bit XOR to clear registers
> > >
> > > x86_64 zero extends 32-bit operations, so for 64-bit operands,
> > > XORL r32,r32 is functionally equal to XORQ r64,r64, but avoids
> > > a REX prefix byte when legacy registers are used.
> > >
> >
> > ... and so this change is pointless churn when not using legacy
> > registers, right?
>
> Although there is no code size change with REX registers, it would
> look weird to use XORQ with REX registers and XORL with legacy regs.

You are changing an isolated occurrence of XORQ into XORL on the basis
that XORQ 'looks weird', and would produce a longer opcode if the
occurrence in question would be using a different register than it
actually uses.

Apologies for the bluntness, but in my book, this really falls firmly
into the 'pointless churn' territory. The startup code is not
performance critical, neither in terms of size nor in speed, and so
I'd prefer to avoid these kinds of changes. Just my 2c, though - Ingo
has already merged the patch.

2024-03-01 16:02:52

by Andrew Cooper

[permalink] [raw]
Subject: Re: [tip: x86/boot] x86/boot: Use 32-bit XOR to clear registers

On 01/03/2024 1:10 pm, Ard Biesheuvel wrote:
> On Fri, 1 Mar 2024 at 13:51, Uros Bizjak <[email protected]> wrote:
>> On Fri, Mar 1, 2024 at 1:45 PM Ard Biesheuvel <[email protected]> wrote:
>>> On Fri, 1 Mar 2024 at 13:39, tip-bot2 for Uros Bizjak
>>> <[email protected]> wrote:
>>>> The following commit has been merged into the x86/boot branch of tip:
>>>>
>>>> Commit-ID: 721f791ce1cddfa5f2bf524ac14741bfa0f72697
>>>> Gitweb: https://git.kernel.org/tip/721f791ce1cddfa5f2bf524ac14741bfa0f72697
>>>> Author: Uros Bizjak <[email protected]>
>>>> AuthorDate: Wed, 24 Jan 2024 11:38:59 +01:00
>>>> Committer: Ingo Molnar <[email protected]>
>>>> CommitterDate: Fri, 01 Mar 2024 12:47:37 +01:00
>>>>
>>>> x86/boot: Use 32-bit XOR to clear registers
>>>>
>>>> x86_64 zero extends 32-bit operations, so for 64-bit operands,
>>>> XORL r32,r32 is functionally equal to XORQ r64,r64, but avoids
>>>> a REX prefix byte when legacy registers are used.
>>>>
>>> ... and so this change is pointless churn when not using legacy
>>> registers, right?
>> Although there is no code size change with REX registers, it would
>> look weird to use XORQ with REX registers and XORL with legacy regs.
> You are changing an isolated occurrence of XORQ into XORL on the basis
> that XORQ 'looks weird', and would produce a longer opcode if the
> occurrence in question would be using a different register than it
> actually uses.
>
> Apologies for the bluntness, but in my book, this really falls firmly
> into the 'pointless churn' territory. The startup code is not
> performance critical, neither in terms of size nor in speed, and so
> I'd prefer to avoid these kinds of changes. Just my 2c, though - Ingo
> has already merged the patch.

Without trying to get into an argument here...

The better reason is that Silvermont Atoms don't recognise the 64bit
form as a zeroing idiom.  They only recognise the 32bit form of the idiom.

Therefore in fastpaths it is (marginally) important to xorl %r15d, %r15d
rather than xorq %r15, %r15.

But this instance is not a fastpath, and it also doesn't save any
encoding space, so I'm not sure it was really worth changing.

~Andrew

2024-03-01 16:36:38

by Ard Biesheuvel

[permalink] [raw]
Subject: Re: [tip: x86/boot] x86/boot: Use 32-bit XOR to clear registers

On Fri, 1 Mar 2024 at 17:02, Andrew Cooper <[email protected]> wrote:
>
> On 01/03/2024 1:10 pm, Ard Biesheuvel wrote:
> > On Fri, 1 Mar 2024 at 13:51, Uros Bizjak <[email protected]> wrote:
> >> On Fri, Mar 1, 2024 at 1:45 PM Ard Biesheuvel <[email protected]> wrote:
> >>> On Fri, 1 Mar 2024 at 13:39, tip-bot2 for Uros Bizjak
> >>> <[email protected]> wrote:
> >>>> The following commit has been merged into the x86/boot branch of tip:
> >>>>
> >>>> Commit-ID: 721f791ce1cddfa5f2bf524ac14741bfa0f72697
> >>>> Gitweb: https://git.kernel.org/tip/721f791ce1cddfa5f2bf524ac14741bfa0f72697
> >>>> Author: Uros Bizjak <[email protected]>
> >>>> AuthorDate: Wed, 24 Jan 2024 11:38:59 +01:00
> >>>> Committer: Ingo Molnar <[email protected]>
> >>>> CommitterDate: Fri, 01 Mar 2024 12:47:37 +01:00
> >>>>
> >>>> x86/boot: Use 32-bit XOR to clear registers
> >>>>
> >>>> x86_64 zero extends 32-bit operations, so for 64-bit operands,
> >>>> XORL r32,r32 is functionally equal to XORQ r64,r64, but avoids
> >>>> a REX prefix byte when legacy registers are used.
> >>>>
> >>> ... and so this change is pointless churn when not using legacy
> >>> registers, right?
> >> Although there is no code size change with REX registers, it would
> >> look weird to use XORQ with REX registers and XORL with legacy regs.
> > You are changing an isolated occurrence of XORQ into XORL on the basis
> > that XORQ 'looks weird', and would produce a longer opcode if the
> > occurrence in question would be using a different register than it
> > actually uses.
> >
> > Apologies for the bluntness, but in my book, this really falls firmly
> > into the 'pointless churn' territory. The startup code is not
> > performance critical, neither in terms of size nor in speed, and so
> > I'd prefer to avoid these kinds of changes. Just my 2c, though - Ingo
> > has already merged the patch.
>
> Without trying to get into an argument here...
>

No worries, we're all friends here :-)

> The better reason is that Silvermont Atoms don't recognise the 64bit
> form as a zeroing idiom. They only recognise the 32bit form of the idiom.
>
> Therefore in fastpaths it is (marginally) important to xorl %r15d, %r15d
> rather than xorq %r15, %r15.
>
> But this instance is not a fastpath, and it also doesn't save any
> encoding space, so I'm not sure it was really worth changing.
>

Yeah, that is my point, really. But let's move on. And apologies to
Uros for the tone - it didn't sound as grumpy in my head when I typed
it as it does now reading back the thread.