2022-01-24 19:26:39

by Kirill A. Shutemov

[permalink] [raw]
Subject: [PATCHv2 17/29] x86/acpi, x86/boot: Add multiprocessor wake-up support

From: Kuppuswamy Sathyanarayanan <[email protected]>

TDX cannot use INIT/SIPI protocol to bring up secondary CPUs because it
requires assistance from untrusted VMM.

For platforms that do not support SIPI/INIT, ACPI defines a wakeup
model (using mailbox) via MADT multiprocessor wakeup structure. More
details about it can be found in ACPI specification v6.4, the section
titled "Multiprocessor Wakeup Structure". If a platform firmware
produces the multiprocessor wakeup structure, then OS may use this
new mailbox-based mechanism to wake up the APs.

Add ACPI MADT wake structure parsing support for x86 platform and if
MADT wake table is present, update apic->wakeup_secondary_cpu_64 with
new API which uses MADT wake mailbox to wake-up CPU.

Co-developed-by: Sean Christopherson <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
Reviewed-by: Andi Kleen <[email protected]>
Reviewed-by: Rafael J. Wysocki <[email protected]>
Signed-off-by: Kuppuswamy Sathyanarayanan <[email protected]>
Signed-off-by: Kirill A. Shutemov <[email protected]>
---
arch/x86/include/asm/apic.h | 5 ++
arch/x86/kernel/acpi/boot.c | 114 ++++++++++++++++++++++++++++++++++++
arch/x86/kernel/apic/apic.c | 10 ++++
3 files changed, 129 insertions(+)

diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h
index 35006e151774..bd8ae0a7010a 100644
--- a/arch/x86/include/asm/apic.h
+++ b/arch/x86/include/asm/apic.h
@@ -490,6 +490,11 @@ static inline unsigned int read_apic_id(void)
return apic->get_apic_id(reg);
}

+#ifdef CONFIG_X86_64
+typedef int (*wakeup_cpu_handler)(int apicid, unsigned long start_eip);
+extern void acpi_wake_cpu_handler_update(wakeup_cpu_handler handler);
+#endif
+
extern int default_apic_id_valid(u32 apicid);
extern int default_acpi_madt_oem_check(char *, char *);
extern void default_setup_apic_routing(void);
diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
index 5b6d1a95776f..af204a217575 100644
--- a/arch/x86/kernel/acpi/boot.c
+++ b/arch/x86/kernel/acpi/boot.c
@@ -65,6 +65,15 @@ static u64 acpi_lapic_addr __initdata = APIC_DEFAULT_PHYS_BASE;
static bool acpi_support_online_capable;
#endif

+#ifdef CONFIG_X86_64
+/* Physical address of the Multiprocessor Wakeup Structure mailbox */
+static u64 acpi_mp_wake_mailbox_paddr;
+/* Virtual address of the Multiprocessor Wakeup Structure mailbox */
+static struct acpi_madt_multiproc_wakeup_mailbox *acpi_mp_wake_mailbox;
+/* Lock to protect mailbox (acpi_mp_wake_mailbox) from parallel access */
+static DEFINE_SPINLOCK(mailbox_lock);
+#endif
+
#ifdef CONFIG_X86_IO_APIC
/*
* Locks related to IOAPIC hotplug
@@ -336,6 +345,80 @@ acpi_parse_lapic_nmi(union acpi_subtable_headers * header, const unsigned long e
return 0;
}

+#ifdef CONFIG_X86_64
+/* Virtual address of the Multiprocessor Wakeup Structure mailbox */
+static int acpi_wakeup_cpu(int apicid, unsigned long start_ip)
+{
+ static physid_mask_t apic_id_wakemap = PHYSID_MASK_NONE;
+ unsigned long flags;
+ u8 timeout;
+
+ /* Remap mailbox memory only for the first call to acpi_wakeup_cpu() */
+ if (physids_empty(apic_id_wakemap)) {
+ acpi_mp_wake_mailbox = memremap(acpi_mp_wake_mailbox_paddr,
+ sizeof(*acpi_mp_wake_mailbox),
+ MEMREMAP_WB);
+ }
+
+ /*
+ * According to the ACPI specification r6.4, section titled
+ * "Multiprocessor Wakeup Structure" the mailbox-based wakeup
+ * mechanism cannot be used more than once for the same CPU.
+ * Skip wakeups if they are attempted more than once.
+ */
+ if (physid_isset(apicid, apic_id_wakemap)) {
+ pr_err("CPU already awake (APIC ID %x), skipping wakeup\n",
+ apicid);
+ return -EINVAL;
+ }
+
+ spin_lock_irqsave(&mailbox_lock, flags);
+
+ /*
+ * Mailbox memory is shared between firmware and OS. Firmware will
+ * listen on mailbox command address, and once it receives the wakeup
+ * command, CPU associated with the given apicid will be booted.
+ *
+ * The value of apic_id and wakeup_vector has to be set before updating
+ * the wakeup command. To let compiler preserve order of writes, use
+ * smp_store_release.
+ */
+ smp_store_release(&acpi_mp_wake_mailbox->apic_id, apicid);
+ smp_store_release(&acpi_mp_wake_mailbox->wakeup_vector, start_ip);
+ smp_store_release(&acpi_mp_wake_mailbox->command,
+ ACPI_MP_WAKE_COMMAND_WAKEUP);
+
+ /*
+ * After writing the wakeup command, wait for maximum timeout of 0xFF
+ * for firmware to reset the command address back zero to indicate
+ * the successful reception of command.
+ * NOTE: 0xFF as timeout value is decided based on our experiments.
+ *
+ * XXX: Change the timeout once ACPI specification comes up with
+ * standard maximum timeout value.
+ */
+ timeout = 0xFF;
+ while (READ_ONCE(acpi_mp_wake_mailbox->command) && --timeout)
+ cpu_relax();
+
+ /* If timed out (timeout == 0), return error */
+ if (!timeout) {
+ spin_unlock_irqrestore(&mailbox_lock, flags);
+ return -EIO;
+ }
+
+ /*
+ * If the CPU wakeup process is successful, store the
+ * status in apic_id_wakemap to prevent re-wakeup
+ * requests.
+ */
+ physid_set(apicid, apic_id_wakemap);
+
+ spin_unlock_irqrestore(&mailbox_lock, flags);
+
+ return 0;
+}
+#endif
#endif /*CONFIG_X86_LOCAL_APIC */

#ifdef CONFIG_X86_IO_APIC
@@ -1083,6 +1166,29 @@ static int __init acpi_parse_madt_lapic_entries(void)
}
return 0;
}
+
+#ifdef CONFIG_X86_64
+static int __init acpi_parse_mp_wake(union acpi_subtable_headers *header,
+ const unsigned long end)
+{
+ struct acpi_madt_multiproc_wakeup *mp_wake;
+
+ if (!IS_ENABLED(CONFIG_SMP))
+ return -ENODEV;
+
+ mp_wake = (struct acpi_madt_multiproc_wakeup *)header;
+ if (BAD_MADT_ENTRY(mp_wake, end))
+ return -EINVAL;
+
+ acpi_table_print_madt_entry(&header->common);
+
+ acpi_mp_wake_mailbox_paddr = mp_wake->base_address;
+
+ acpi_wake_cpu_handler_update(acpi_wakeup_cpu);
+
+ return 0;
+}
+#endif /* CONFIG_X86_64 */
#endif /* CONFIG_X86_LOCAL_APIC */

#ifdef CONFIG_X86_IO_APIC
@@ -1278,6 +1384,14 @@ static void __init acpi_process_madt(void)

smp_found_config = 1;
}
+
+#ifdef CONFIG_X86_64
+ /*
+ * Parse MADT MP Wake entry.
+ */
+ acpi_table_parse_madt(ACPI_MADT_TYPE_MULTIPROC_WAKEUP,
+ acpi_parse_mp_wake, 1);
+#endif
}
if (error == -EINVAL) {
/*
diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index b70344bf6600..3c8f2c797a98 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -2551,6 +2551,16 @@ u32 x86_msi_msg_get_destid(struct msi_msg *msg, bool extid)
}
EXPORT_SYMBOL_GPL(x86_msi_msg_get_destid);

+#ifdef CONFIG_X86_64
+void __init acpi_wake_cpu_handler_update(wakeup_cpu_handler handler)
+{
+ struct apic **drv;
+
+ for (drv = __apicdrivers; drv < __apicdrivers_end; drv++)
+ (*drv)->wakeup_secondary_cpu_64 = handler;
+}
+#endif
+
/*
* Override the generic EOI implementation with an optimized version.
* Only called during early boot when only one CPU is active and with
--
2.34.1


2022-02-03 01:01:55

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCHv2 17/29] x86/acpi, x86/boot: Add multiprocessor wake-up support

On Mon, Jan 24 2022 at 18:02, Kirill A. Shutemov wrote:
> +#ifdef CONFIG_X86_64
> +/* Physical address of the Multiprocessor Wakeup Structure mailbox */
> +static u64 acpi_mp_wake_mailbox_paddr;
> +/* Virtual address of the Multiprocessor Wakeup Structure mailbox */
> +static struct acpi_madt_multiproc_wakeup_mailbox *acpi_mp_wake_mailbox;
> +/* Lock to protect mailbox (acpi_mp_wake_mailbox) from parallel access */
> +static DEFINE_SPINLOCK(mailbox_lock);
> +#endif
> +
> #ifdef CONFIG_X86_IO_APIC
> /*
> * Locks related to IOAPIC hotplug
> @@ -336,6 +345,80 @@ acpi_parse_lapic_nmi(union acpi_subtable_headers * header, const unsigned long e
> return 0;
> }
>
> +#ifdef CONFIG_X86_64
> +/* Virtual address of the Multiprocessor Wakeup Structure mailbox */
> +static int acpi_wakeup_cpu(int apicid, unsigned long start_ip)
> +{
> + static physid_mask_t apic_id_wakemap = PHYSID_MASK_NONE;
> + unsigned long flags;
> + u8 timeout;
> +
> + /* Remap mailbox memory only for the first call to acpi_wakeup_cpu() */
> + if (physids_empty(apic_id_wakemap)) {
> + acpi_mp_wake_mailbox = memremap(acpi_mp_wake_mailbox_paddr,
> + sizeof(*acpi_mp_wake_mailbox),
> + MEMREMAP_WB);
> + }
> +
> + /*
> + * According to the ACPI specification r6.4, section titled
> + * "Multiprocessor Wakeup Structure" the mailbox-based wakeup
> + * mechanism cannot be used more than once for the same CPU.
> + * Skip wakeups if they are attempted more than once.
> + */
> + if (physid_isset(apicid, apic_id_wakemap)) {
> + pr_err("CPU already awake (APIC ID %x), skipping wakeup\n",
> + apicid);
> + return -EINVAL;
> + }
> +
> + spin_lock_irqsave(&mailbox_lock, flags);

What's the reason that interrupts need to be disabled here? The comment
above this invocation is not really informative ...

> + /*
> + * Mailbox memory is shared between firmware and OS. Firmware will
> + * listen on mailbox command address, and once it receives the wakeup
> + * command, CPU associated with the given apicid will be booted.
> + *
> + * The value of apic_id and wakeup_vector has to be set before updating
> + * the wakeup command. To let compiler preserve order of writes, use
> + * smp_store_release.

What? If the only purpose is to tell the compiler to preserve code
ordering then why are you using smp_store_release() here?
smp_store_release() is way more than that...

> + */
> + smp_store_release(&acpi_mp_wake_mailbox->apic_id, apicid);
> + smp_store_release(&acpi_mp_wake_mailbox->wakeup_vector, start_ip);
> + smp_store_release(&acpi_mp_wake_mailbox->command,
> + ACPI_MP_WAKE_COMMAND_WAKEUP);
> +
> + /*
> + * After writing the wakeup command, wait for maximum timeout of 0xFF
> + * for firmware to reset the command address back zero to indicate
> + * the successful reception of command.
> + * NOTE: 0xFF as timeout value is decided based on our experiments.
> + *
> + * XXX: Change the timeout once ACPI specification comes up with
> + * standard maximum timeout value.
> + */
> + timeout = 0xFF;
> + while (READ_ONCE(acpi_mp_wake_mailbox->command) && --timeout)
> + cpu_relax();
> +
> + /* If timed out (timeout == 0), return error */
> + if (!timeout) {

So this leaves a stale acpi_mp_wake_mailbox->command. What checks that
acpi_mp_wake_mailbox->command is 0 on the next invocation?

Aside of that assume timeout happens and the firmware acts after this
returned. Then you have inconsistent state as well. Error handling is
not trivial, but making it hope based is the worst kind.

> + spin_unlock_irqrestore(&mailbox_lock, flags);
> + return -EIO;
> + }
> +
> + /*
> + * If the CPU wakeup process is successful, store the
> + * status in apic_id_wakemap to prevent re-wakeup
> + * requests.
> + */
> + physid_set(apicid, apic_id_wakemap);
> +
> + spin_unlock_irqrestore(&mailbox_lock, flags);
> +
> + return 0;
> +}
> +#endif
> #endif /*CONFIG_X86_LOCAL_APIC */

Thanks,

tglx

Subject: Re: [PATCHv2 17/29] x86/acpi, x86/boot: Add multiprocessor wake-up support


On 2/1/2022 3:27 PM, Thomas Gleixner wrote:
> On Mon, Jan 24 2022 at 18:02, Kirill A. Shutemov wrote:
>> +#ifdef CONFIG_X86_64
>> +/* Physical address of the Multiprocessor Wakeup Structure mailbox */
>> +static u64 acpi_mp_wake_mailbox_paddr;
>> +/* Virtual address of the Multiprocessor Wakeup Structure mailbox */
>> +static struct acpi_madt_multiproc_wakeup_mailbox *acpi_mp_wake_mailbox;
>> +/* Lock to protect mailbox (acpi_mp_wake_mailbox) from parallel access */
>> +static DEFINE_SPINLOCK(mailbox_lock);
>> +#endif
>> +
>> #ifdef CONFIG_X86_IO_APIC
>> /*
>> * Locks related to IOAPIC hotplug
>> @@ -336,6 +345,80 @@ acpi_parse_lapic_nmi(union acpi_subtable_headers * header, const unsigned long e
>> return 0;
>> }
>>
>> +#ifdef CONFIG_X86_64
>> +/* Virtual address of the Multiprocessor Wakeup Structure mailbox */
>> +static int acpi_wakeup_cpu(int apicid, unsigned long start_ip)
>> +{
>> + static physid_mask_t apic_id_wakemap = PHYSID_MASK_NONE;
>> + unsigned long flags;
>> + u8 timeout;
>> +
>> + /* Remap mailbox memory only for the first call to acpi_wakeup_cpu() */
>> + if (physids_empty(apic_id_wakemap)) {
>> + acpi_mp_wake_mailbox = memremap(acpi_mp_wake_mailbox_paddr,
>> + sizeof(*acpi_mp_wake_mailbox),
>> + MEMREMAP_WB);
>> + }
>> +
>> + /*
>> + * According to the ACPI specification r6.4, section titled
>> + * "Multiprocessor Wakeup Structure" the mailbox-based wakeup
>> + * mechanism cannot be used more than once for the same CPU.
>> + * Skip wakeups if they are attempted more than once.
>> + */
>> + if (physid_isset(apicid, apic_id_wakemap)) {
>> + pr_err("CPU already awake (APIC ID %x), skipping wakeup\n",
>> + apicid);
>> + return -EINVAL;
>> + }
>> +
>> + spin_lock_irqsave(&mailbox_lock, flags);
> What's the reason that interrupts need to be disabled here? The comment
> above this invocation is not really informative ...

I initially thought this routine is getting called from interrupt
context. But after
re-investigation, this seems not true. So we don't need to disable IRQ here.
Regular spin_lock/unlock variant is suffice. Sorry for the mistake.

>
>> + /*
>> + * Mailbox memory is shared between firmware and OS. Firmware will
>> + * listen on mailbox command address, and once it receives the wakeup
>> + * command, CPU associated with the given apicid will be booted.
>> + *
>> + * The value of apic_id and wakeup_vector has to be set before updating
>> + * the wakeup command. To let compiler preserve order of writes, use
>> + * smp_store_release.
> What? If the only purpose is to tell the compiler to preserve code
> ordering then why are you using smp_store_release() here?
> smp_store_release() is way more than that...

Order of execution is not the only reason. Since this memory is shared with
firmware, I thought it is better to keep it volatile.  So used
smp_store_release()
variant.  I have initially used only WRITE_ONCE(), but suggestion to use
smp_store_release came out of community review.

>
>> + */
>> + smp_store_release(&acpi_mp_wake_mailbox->apic_id, apicid);
>> + smp_store_release(&acpi_mp_wake_mailbox->wakeup_vector, start_ip);
>> + smp_store_release(&acpi_mp_wake_mailbox->command,
>> + ACPI_MP_WAKE_COMMAND_WAKEUP);
>> +
>> + /*
>> + * After writing the wakeup command, wait for maximum timeout of 0xFF
>> + * for firmware to reset the command address back zero to indicate
>> + * the successful reception of command.
>> + * NOTE: 0xFF as timeout value is decided based on our experiments.
>> + *
>> + * XXX: Change the timeout once ACPI specification comes up with
>> + * standard maximum timeout value.
>> + */
>> + timeout = 0xFF;
>> + while (READ_ONCE(acpi_mp_wake_mailbox->command) && --timeout)
>> + cpu_relax();
>> +
>> + /* If timed out (timeout == 0), return error */
>> + if (!timeout) {
> So this leaves a stale acpi_mp_wake_mailbox->command. What checks that
> acpi_mp_wake_mailbox->command is 0 on the next invocation?


 For each invocation, acpi_mp_wake_mailbox->comand value is set as 1. So I
think we don't have to worry about the previous state. Please correct me
if I don't understand your query.

>
> Aside of that assume timeout happens and the firmware acts after this
> returned. Then you have inconsistent state as well. Error handling is
> not trivial, but making it hope based is the worst kind.

Current assumption is, once the timeout happens, current wakeup request is
considered failed and firmware will not update the command address. But
current ACPI spec does not document the above assumption. I will check with
the spec owner about this issue and get back to you.

>
>> + spin_unlock_irqrestore(&mailbox_lock, flags);
>> + return -EIO;
>> + }
>> +
>> + /*
>> + * If the CPU wakeup process is successful, store the
>> + * status in apic_id_wakemap to prevent re-wakeup
>> + * requests.
>> + */
>> + physid_set(apicid, apic_id_wakemap);
>> +
>> + spin_unlock_irqrestore(&mailbox_lock, flags);
>> +
>> + return 0;
>> +}
>> +#endif
>> #endif /*CONFIG_X86_LOCAL_APIC */
> Thanks,
>
> tglx

--
Sathyanarayanan Kuppuswamy
Linux Kernel Developer