2022-01-24 19:26:38

by Kirill A. Shutemov

[permalink] [raw]
Subject: [PATCHv2 16/29] x86/boot: Add a trampoline for booting APs via firmware handoff

From: Sean Christopherson <[email protected]>

Historically, x86 platforms have booted secondary processors (APs)
using INIT followed by the start up IPI (SIPI) messages. In regular
VMs, this boot sequence is supported by the VMM emulation. But such a
wakeup model is fatal for secure VMs like TDX in which VMM is an
untrusted entity. To address this issue, a new wakeup model was added
in ACPI v6.4, in which firmware (like TDX virtual BIOS) will help boot
the APs. More details about this wakeup model can be found in ACPI
specification v6.4, the section titled "Multiprocessor Wakeup Structure".

Since the existing trampoline code requires processors to boot in real
mode with 16-bit addressing, it will not work for this wakeup model
(because it boots the AP in 64-bit mode). To handle it, extend the
trampoline code to support 64-bit mode firmware handoff. Also, extend
IDT and GDT pointers to support 64-bit mode hand off.

There is no TDX-specific detection for this new boot method. The kernel
will rely on it as the sole boot method whenever the new ACPI structure
is present.

The ACPI table parser for the MADT multiprocessor wake up structure and
the wakeup method that uses this structure will be added by the following
patch in this series.

Reported-by: Kai Huang <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
Reviewed-by: Andi Kleen <[email protected]>
Reviewed-by: Dan Williams <[email protected]>
Signed-off-by: Kuppuswamy Sathyanarayanan <[email protected]>
Signed-off-by: Kirill A. Shutemov <[email protected]>
---
arch/x86/include/asm/apic.h | 2 ++
arch/x86/include/asm/realmode.h | 1 +
arch/x86/kernel/smpboot.c | 12 ++++++--
arch/x86/realmode/rm/header.S | 1 +
arch/x86/realmode/rm/trampoline_64.S | 38 ++++++++++++++++++++++++
arch/x86/realmode/rm/trampoline_common.S | 12 +++++++-
6 files changed, 63 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h
index 48067af94678..35006e151774 100644
--- a/arch/x86/include/asm/apic.h
+++ b/arch/x86/include/asm/apic.h
@@ -328,6 +328,8 @@ struct apic {

/* wakeup_secondary_cpu */
int (*wakeup_secondary_cpu)(int apicid, unsigned long start_eip);
+ /* wakeup secondary CPU using 64-bit wakeup point */
+ int (*wakeup_secondary_cpu_64)(int apicid, unsigned long start_eip);

void (*inquire_remote_apic)(int apicid);

diff --git a/arch/x86/include/asm/realmode.h b/arch/x86/include/asm/realmode.h
index 331474b150f1..fd6f6e5b755a 100644
--- a/arch/x86/include/asm/realmode.h
+++ b/arch/x86/include/asm/realmode.h
@@ -25,6 +25,7 @@ struct real_mode_header {
u32 sev_es_trampoline_start;
#endif
#ifdef CONFIG_X86_64
+ u32 trampoline_start64;
u32 trampoline_pgd;
#endif
/* ACPI S3 wakeup */
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 617012f4619f..6269dd126dba 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -1088,6 +1088,11 @@ static int do_boot_cpu(int apicid, int cpu, struct task_struct *idle,
unsigned long boot_error = 0;
unsigned long timeout;

+#ifdef CONFIG_X86_64
+ /* If 64-bit wakeup method exists, use the 64-bit mode trampoline IP */
+ if (apic->wakeup_secondary_cpu_64)
+ start_ip = real_mode_header->trampoline_start64;
+#endif
idle->thread.sp = (unsigned long)task_pt_regs(idle);
early_gdt_descr.address = (unsigned long)get_cpu_gdt_rw(cpu);
initial_code = (unsigned long)start_secondary;
@@ -1129,11 +1134,14 @@ static int do_boot_cpu(int apicid, int cpu, struct task_struct *idle,

/*
* Wake up a CPU in difference cases:
- * - Use the method in the APIC driver if it's defined
+ * - Use a method from the APIC driver if one defined, with wakeup
+ * straight to 64-bit mode preferred over wakeup to RM.
* Otherwise,
* - Use an INIT boot APIC message for APs or NMI for BSP.
*/
- if (apic->wakeup_secondary_cpu)
+ if (apic->wakeup_secondary_cpu_64)
+ boot_error = apic->wakeup_secondary_cpu_64(apicid, start_ip);
+ else if (apic->wakeup_secondary_cpu)
boot_error = apic->wakeup_secondary_cpu(apicid, start_ip);
else
boot_error = wakeup_cpu_via_init_nmi(cpu, start_ip, apicid,
diff --git a/arch/x86/realmode/rm/header.S b/arch/x86/realmode/rm/header.S
index 8c1db5bf5d78..2eb62be6d256 100644
--- a/arch/x86/realmode/rm/header.S
+++ b/arch/x86/realmode/rm/header.S
@@ -24,6 +24,7 @@ SYM_DATA_START(real_mode_header)
.long pa_sev_es_trampoline_start
#endif
#ifdef CONFIG_X86_64
+ .long pa_trampoline_start64
.long pa_trampoline_pgd;
#endif
/* ACPI S3 wakeup */
diff --git a/arch/x86/realmode/rm/trampoline_64.S b/arch/x86/realmode/rm/trampoline_64.S
index cc8391f86cdb..ae112a91592f 100644
--- a/arch/x86/realmode/rm/trampoline_64.S
+++ b/arch/x86/realmode/rm/trampoline_64.S
@@ -161,6 +161,19 @@ SYM_CODE_START(startup_32)
ljmpl $__KERNEL_CS, $pa_startup_64
SYM_CODE_END(startup_32)

+SYM_CODE_START(pa_trampoline_compat)
+ /*
+ * In compatibility mode. Prep ESP and DX for startup_32, then disable
+ * paging and complete the switch to legacy 32-bit mode.
+ */
+ movl $rm_stack_end, %esp
+ movw $__KERNEL_DS, %dx
+
+ movl $X86_CR0_PE, %eax
+ movl %eax, %cr0
+ ljmpl $__KERNEL32_CS, $pa_startup_32
+SYM_CODE_END(pa_trampoline_compat)
+
.section ".text64","ax"
.code64
.balign 4
@@ -169,6 +182,20 @@ SYM_CODE_START(startup_64)
jmpq *tr_start(%rip)
SYM_CODE_END(startup_64)

+SYM_CODE_START(trampoline_start64)
+ /*
+ * APs start here on a direct transfer from 64-bit BIOS with identity
+ * mapped page tables. Load the kernel's GDT in order to gear down to
+ * 32-bit mode (to handle 4-level vs. 5-level paging), and to (re)load
+ * segment registers. Load the zero IDT so any fault triggers a
+ * shutdown instead of jumping back into BIOS.
+ */
+ lidt tr_idt(%rip)
+ lgdt tr_gdt64(%rip)
+
+ ljmpl *tr_compat(%rip)
+SYM_CODE_END(trampoline_start64)
+
.section ".rodata","a"
# Duplicate the global descriptor table
# so the kernel can live anywhere
@@ -182,6 +209,17 @@ SYM_DATA_START(tr_gdt)
.quad 0x00cf93000000ffff # __KERNEL_DS
SYM_DATA_END_LABEL(tr_gdt, SYM_L_LOCAL, tr_gdt_end)

+SYM_DATA_START(tr_gdt64)
+ .short tr_gdt_end - tr_gdt - 1 # gdt limit
+ .long pa_tr_gdt
+ .long 0
+SYM_DATA_END(tr_gdt64)
+
+SYM_DATA_START(tr_compat)
+ .long pa_trampoline_compat
+ .short __KERNEL32_CS
+SYM_DATA_END(tr_compat)
+
.bss
.balign PAGE_SIZE
SYM_DATA(trampoline_pgd, .space PAGE_SIZE)
diff --git a/arch/x86/realmode/rm/trampoline_common.S b/arch/x86/realmode/rm/trampoline_common.S
index 5033e640f957..4331c32c47f8 100644
--- a/arch/x86/realmode/rm/trampoline_common.S
+++ b/arch/x86/realmode/rm/trampoline_common.S
@@ -1,4 +1,14 @@
/* SPDX-License-Identifier: GPL-2.0 */
.section ".rodata","a"
.balign 16
-SYM_DATA_LOCAL(tr_idt, .fill 1, 6, 0)
+
+/*
+ * When a bootloader hands off to the kernel in 32-bit mode an
+ * IDT with a 2-byte limit and 4-byte base is needed. When a boot
+ * loader hands off to a kernel 64-bit mode the base address
+ * extends to 8-bytes. Reserve enough space for either scenario.
+ */
+SYM_DATA_START_LOCAL(tr_idt)
+ .short 0
+ .quad 0
+SYM_DATA_END(tr_idt)
--
2.34.1


2022-02-02 13:05:16

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCHv2 16/29] x86/boot: Add a trampoline for booting APs via firmware handoff

On Mon, Jan 24 2022 at 18:02, Kirill A. Shutemov wrote:

> From: Sean Christopherson <[email protected]>
>
> Historically, x86 platforms have booted secondary processors (APs)
> using INIT followed by the start up IPI (SIPI) messages. In regular
> VMs, this boot sequence is supported by the VMM emulation. But such a
> wakeup model is fatal for secure VMs like TDX in which VMM is an
> untrusted entity. To address this issue, a new wakeup model was added
> in ACPI v6.4, in which firmware (like TDX virtual BIOS) will help boot
> the APs. More details about this wakeup model can be found in ACPI
> specification v6.4, the section titled "Multiprocessor Wakeup Structure".
>
> Since the existing trampoline code requires processors to boot in real
> mode with 16-bit addressing, it will not work for this wakeup model
> (because it boots the AP in 64-bit mode). To handle it, extend the
> trampoline code to support 64-bit mode firmware handoff. Also, extend
> IDT and GDT pointers to support 64-bit mode hand off.
>
> There is no TDX-specific detection for this new boot method. The kernel
> will rely on it as the sole boot method whenever the new ACPI structure
> is present.
>
> The ACPI table parser for the MADT multiprocessor wake up structure and
> the wakeup method that uses this structure will be added by the following
> patch in this series.
>
> Reported-by: Kai Huang <[email protected]>
> Signed-off-by: Sean Christopherson <[email protected]>
> Reviewed-by: Andi Kleen <[email protected]>
> Reviewed-by: Dan Williams <[email protected]>
> Signed-off-by: Kuppuswamy Sathyanarayanan <[email protected]>
> Signed-off-by: Kirill A. Shutemov <[email protected]>

Reviewed-by: Thomas Gleixner <[email protected]>

2022-02-03 14:11:04

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCHv2 16/29] x86/boot: Add a trampoline for booting APs via firmware handoff

On Mon, Jan 24, 2022 at 06:02:02PM +0300, Kirill A. Shutemov wrote:
> From: Sean Christopherson <[email protected]>
>
> Historically, x86 platforms have booted secondary processors (APs)
> using INIT followed by the start up IPI (SIPI) messages. In regular
> VMs, this boot sequence is supported by the VMM emulation. But such a
> wakeup model is fatal for secure VMs like TDX in which VMM is an
> untrusted entity. To address this issue, a new wakeup model was added
> in ACPI v6.4, in which firmware (like TDX virtual BIOS) will help boot
> the APs. More details about this wakeup model can be found in ACPI
> specification v6.4, the section titled "Multiprocessor Wakeup Structure".
>
> Since the existing trampoline code requires processors to boot in real
> mode with 16-bit addressing, it will not work for this wakeup model
> (because it boots the AP in 64-bit mode). To handle it, extend the
> trampoline code to support 64-bit mode firmware handoff. Also, extend
> IDT and GDT pointers to support 64-bit mode hand off.
>
> There is no TDX-specific detection for this new boot method. The kernel
> will rely on it as the sole boot method whenever the new ACPI structure
> is present.
>
> The ACPI table parser for the MADT multiprocessor wake up structure and
> the wakeup method that uses this structure will be added by the following
> patch in this series.
>
> Reported-by: Kai Huang <[email protected]>

I wonder what that Reported-by tag means here for this is a feature
patch, not a bug fix or so...

> diff --git a/arch/x86/include/asm/realmode.h b/arch/x86/include/asm/realmode.h
> index 331474b150f1..fd6f6e5b755a 100644
> --- a/arch/x86/include/asm/realmode.h
> +++ b/arch/x86/include/asm/realmode.h
> @@ -25,6 +25,7 @@ struct real_mode_header {
> u32 sev_es_trampoline_start;
> #endif
> #ifdef CONFIG_X86_64
> + u32 trampoline_start64;
> u32 trampoline_pgd;
> #endif

Hmm, so there's trampoline_start, sev_es_trampoline_start and
trampoline_start64. If those are mutually exclusive, can we merge them
all into a single trampoline_start?

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Subject: Re: [PATCHv2 16/29] x86/boot: Add a trampoline for booting APs via firmware handoff


On 2/2/2022 3:27 AM, Borislav Petkov wrote:
> On Mon, Jan 24, 2022 at 06:02:02PM +0300, Kirill A. Shutemov wrote:
>> From: Sean Christopherson <[email protected]>
>>
>> Historically, x86 platforms have booted secondary processors (APs)
>> using INIT followed by the start up IPI (SIPI) messages. In regular
>> VMs, this boot sequence is supported by the VMM emulation. But such a
>> wakeup model is fatal for secure VMs like TDX in which VMM is an
>> untrusted entity. To address this issue, a new wakeup model was added
>> in ACPI v6.4, in which firmware (like TDX virtual BIOS) will help boot
>> the APs. More details about this wakeup model can be found in ACPI
>> specification v6.4, the section titled "Multiprocessor Wakeup Structure".
>>
>> Since the existing trampoline code requires processors to boot in real
>> mode with 16-bit addressing, it will not work for this wakeup model
>> (because it boots the AP in 64-bit mode). To handle it, extend the
>> trampoline code to support 64-bit mode firmware handoff. Also, extend
>> IDT and GDT pointers to support 64-bit mode hand off.
>>
>> There is no TDX-specific detection for this new boot method. The kernel
>> will rely on it as the sole boot method whenever the new ACPI structure
>> is present.
>>
>> The ACPI table parser for the MADT multiprocessor wake up structure and
>> the wakeup method that uses this structure will be added by the following
>> patch in this series.
>>
>> Reported-by: Kai Huang <[email protected]>
> I wonder what that Reported-by tag means here for this is a feature
> patch, not a bug fix or so...

I think it was added when Sean created the original patch. I don't have the
full history.

Sean, since this is not a bug fix, shall we remove the Reported-by tag?

>
>> diff --git a/arch/x86/include/asm/realmode.h b/arch/x86/include/asm/realmode.h
>> index 331474b150f1..fd6f6e5b755a 100644
>> --- a/arch/x86/include/asm/realmode.h
>> +++ b/arch/x86/include/asm/realmode.h
>> @@ -25,6 +25,7 @@ struct real_mode_header {
>> u32 sev_es_trampoline_start;
>> #endif
>> #ifdef CONFIG_X86_64
>> + u32 trampoline_start64;
>> u32 trampoline_pgd;
>> #endif
> Hmm, so there's trampoline_start, sev_es_trampoline_start and
> trampoline_start64. If those are mutually exclusive, can we merge them
> all into a single trampoline_start?

trampoline_start and sev_es_trampoline_start are not mutually exclusive.
Both are
used in arch/x86/kernel/sev.c.

arch/x86/kernel/sev.c:560:      startup_ip =
(u16)(rmh->sev_es_trampoline_start -
arch/x86/kernel/sev.c:561: rmh->trampoline_start);

But trampoline_start64 can be removed and replaced with
trampoline_start. But using
_*64 suffix makes it clear that is used for 64 bit(CONFIG_X86_64).

Adding it for clarity seems to be fine to me. But if you would prefer
single variable, we
can remove it. Please let me know.

>
--
Sathyanarayanan Kuppuswamy
Linux Kernel Developer

2022-02-07 17:18:43

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCHv2 16/29] x86/boot: Add a trampoline for booting APs via firmware handoff

On Fri, Feb 04, 2022 at 03:27:19AM -0800, Kuppuswamy, Sathyanarayanan wrote:
> trampoline_start and sev_es_trampoline_start are not mutually exclusive.
> Both are
> used in arch/x86/kernel/sev.c.

I know - I've asked Jörg to have a look here.

> But trampoline_start64 can be removed and replaced with trampoline_start.
> But using
> _*64 suffix makes it clear that is used for 64 bit(CONFIG_X86_64).
>
> Adding it for clarity seems to be fine to me.

Does it matter if the start IP is the same for all APs? Or do will there
be a case where you have some APs starting from the 32-bit trampoline
and some from the 64-bit one, on the same system? (that would be weird
but what do I know...)

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2022-02-10 01:30:00

by Kai Huang

[permalink] [raw]
Subject: Re: [PATCHv2 16/29] x86/boot: Add a trampoline for booting APs via firmware handoff


> >> Reported-by: Kai Huang <[email protected]>
> > I wonder what that Reported-by tag means here for this is a feature
> > patch, not a bug fix or so...
>
> I think it was added when Sean created the original patch. I don't have the
> full history.
>
> Sean, since this is not a bug fix, shall we remove the Reported-by tag?

Sorry just saw. Please remove :)

2022-02-16 10:10:06

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCHv2 16/29] x86/boot: Add a trampoline for booting APs via firmware handoff

On Wed, Feb 16, 2022 at 12:36:24AM +0300, Kirill A. Shutemov wrote:
> How can signle trampoline_start cover all cases?

All I'm saying is that the real mode header should have a single

u32 trampoline_start;

instead of:

u32 trampoline_start;
u32 sev_es_trampoline_start;
u32 trampoline_start64;

which all are the same thing on a single system.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2022-02-16 19:43:11

by Kirill A. Shutemov

[permalink] [raw]
Subject: Re: [PATCHv2 16/29] x86/boot: Add a trampoline for booting APs via firmware handoff

On Wed, Feb 16, 2022 at 11:07:15AM +0100, Borislav Petkov wrote:
> On Wed, Feb 16, 2022 at 12:36:24AM +0300, Kirill A. Shutemov wrote:
> > How can signle trampoline_start cover all cases?
>
> All I'm saying is that the real mode header should have a single
>
> u32 trampoline_start;
>
> instead of:
>
> u32 trampoline_start;
> u32 sev_es_trampoline_start;
> u32 trampoline_start64;
>
> which all are the same thing on a single system.

But these are generated at build time, no?

As far as I can see it is initialized in arch/x86/realmode/rm/header.S by
linker.

I'm confused.

--
Kirill A. Shutemov