2024-03-25 23:35:29

by Kirill A. Shutemov

[permalink] [raw]
Subject: [PATCHv9 00/17] x86/tdx: Add kexec support

The patchset adds bits and pieces to get kexec (and crashkernel) work on
TDX guest.

The last patch implements CPU offlining according to the approved ACPI
spec change poposal[1]. It unlocks kexec with all CPUs visible in the target
kernel. It requires BIOS-side enabling. If it missing we fallback to booting
2nd kernel with single CPU.

Please review. I would be glad for any feedback.

[1] https://lore.kernel.org/all/13356251.uLZWGnKmhe@kreacher

v9:
- Rebased;
- Keep page tables that maps E820_TYPE_ACPI (Ashish);
- Ack/Reviewed/Tested-bys from Sathya, Kai, Tao;
- Minor printk() message adjustments;
v8:
- Rework serialization of around conversion memory back to private;
- Print ACPI_MADT_TYPE_MULTIPROC_WAKEUP in acpi_table_print_madt_entry();
- Drop debugfs interface to dump info on shared memory;
- Adjust comments and commit messages;
- Reviewed-bys by Baoquan, Dave and Thomas;
v7:
- Call enc_kexec_stop_conversion() and enc_kexec_unshare_mem() after shutting
down IO-APIC, lapic and hpet. It meets AMD requirements.
- Minor style changes;
- Add Acked/Reviewed-bys;
v6:
- Rebased to v6.8-rc1;
- Provide default noop callbacks from .enc_kexec_stop_conversion and
.enc_kexec_unshare_mem;
- Split off patch that introduces .enc_kexec_* callbacks;
- asm_acpi_mp_play_dead(): program CR3 directly from RSI, no MOV to RAX
required;
- Restructure how smp_ops.stop_this_cpu() hooked up in crash_nmi_callback();
- kvmclock patch got merged via KVM tree;
v5:
- Rename smp_ops.crash_play_dead to smp_ops.stop_this_cpu and use it in
stop_this_cpu();
- Split off enc_kexec_stop_conversion() from enc_kexec_unshare_mem();
- Introduce kernel_ident_mapping_free();
- Add explicit include for alternatives and stringify.
- Add barrier() after setting conversion_allowed to false;
- Mark cpu_hotplug_offline_disabled __ro_after_init;
- Print error if failed to hand over CPU to BIOS;
- Update comments and commit messages;
v4:
- Fix build for !KEXEC_CORE;
- Cleaner ATLERNATIVE use;
- Update commit messages and comments;
- Add Reviewed-bys;
v3:
- Rework acpi_mp_crash_stop_other_cpus() to avoid invoking hotplug state
machine;
- Free page tables if reset vector setup failed;
- Change asm_acpi_mp_play_dead() to pass reset vector and PGD as arguments;
- Mark acpi_mp_* variables as static and __ro_after_init;
- Use u32 for apicid;
- Disable CPU offlining if reset vector setup failed;
- Rename madt.S -> madt_playdead.S;
- Mark tdx_kexec_unshare_mem() as static;
- Rebase onto up-to-date tip/master;
- Whitespace fixes;
- Reorder patches;
- Add Reviewed-bys;
- Update comments and commit messages;
v2:
- Rework how unsharing hook ups into kexec codepath;
- Rework kvmclock_disable() fix based on Sean's;
- s/cpu_hotplug_not_supported()/cpu_hotplug_disable_offlining()/;
- use play_dead_common() to implement acpi_mp_play_dead();
- cond_resched() in tdx_shared_memory_show();
- s/target kernel/second kernel/;
- Update commit messages and comments;

Kirill A. Shutemov (17):
x86/acpi: Extract ACPI MADT wakeup code into a separate file
x86/apic: Mark acpi_mp_wake_* variables as __ro_after_init
cpu/hotplug: Add support for declaring CPU offlining not supported
cpu/hotplug, x86/acpi: Disable CPU offlining for ACPI MADT wakeup
x86/kexec: Keep CR4.MCE set during kexec for TDX guest
x86/mm: Make x86_platform.guest.enc_status_change_*() return errno
x86/mm: Return correct level from lookup_address() if pte is none
x86/tdx: Account shared memory
x86/mm: Adding callbacks to prepare encrypted memory for kexec
x86/tdx: Convert shared memory back to private on kexec
x86/mm: Make e820_end_ram_pfn() cover E820_TYPE_ACPI ranges
x86/acpi: Rename fields in acpi_madt_multiproc_wakeup structure
x86/acpi: Do not attempt to bring up secondary CPUs in kexec case
x86/smp: Add smp_ops.stop_this_cpu() callback
x86/mm: Introduce kernel_ident_mapping_free()
x86/acpi: Add support for CPU offlining for ACPI MADT wakeup method
ACPI: tables: Print MULTIPROC_WAKEUP when MADT is parsed

arch/x86/Kconfig | 7 +
arch/x86/coco/core.c | 1 -
arch/x86/coco/tdx/tdx.c | 99 ++++++++-
arch/x86/hyperv/ivm.c | 9 +-
arch/x86/include/asm/acpi.h | 7 +
arch/x86/include/asm/init.h | 3 +
arch/x86/include/asm/pgtable.h | 5 +
arch/x86/include/asm/pgtable_types.h | 1 +
arch/x86/include/asm/set_memory.h | 3 +
arch/x86/include/asm/smp.h | 1 +
arch/x86/include/asm/x86_init.h | 6 +-
arch/x86/kernel/acpi/Makefile | 11 +-
arch/x86/kernel/acpi/boot.c | 86 +-------
arch/x86/kernel/acpi/madt_playdead.S | 28 +++
arch/x86/kernel/acpi/madt_wakeup.c | 292 +++++++++++++++++++++++++++
arch/x86/kernel/crash.c | 6 +
arch/x86/kernel/e820.c | 9 +-
arch/x86/kernel/process.c | 7 +
arch/x86/kernel/reboot.c | 18 ++
arch/x86/kernel/relocate_kernel_64.S | 5 +
arch/x86/kernel/x86_init.c | 8 +-
arch/x86/mm/ident_map.c | 73 +++++++
arch/x86/mm/mem_encrypt_amd.c | 8 +-
arch/x86/mm/pat/set_memory.c | 59 ++++--
drivers/acpi/tables.c | 14 ++
include/acpi/actbl2.h | 19 +-
include/linux/cc_platform.h | 10 -
include/linux/cpu.h | 2 +
kernel/cpu.c | 12 +-
29 files changed, 663 insertions(+), 146 deletions(-)
create mode 100644 arch/x86/kernel/acpi/madt_playdead.S
create mode 100644 arch/x86/kernel/acpi/madt_wakeup.c

--
2.43.0



2024-03-26 02:03:22

by Kirill A. Shutemov

[permalink] [raw]
Subject: [PATCHv9 01/17] x86/acpi: Extract ACPI MADT wakeup code into a separate file

In order to prepare for the expansion of support for the ACPI MADT
wakeup method, move the relevant code into a separate file.

Introduce a new configuration option to clearly indicate dependencies
without the use of ifdefs.

There have been no functional changes.

Signed-off-by: Kirill A. Shutemov <[email protected]>
Reviewed-by: Kuppuswamy Sathyanarayanan <[email protected]>
Acked-by: Kai Huang <[email protected]>
Reviewed-by: Baoquan He <[email protected]>
Reviewed-by: Thomas Gleixner <[email protected]>
---
arch/x86/Kconfig | 7 +++
arch/x86/include/asm/acpi.h | 5 ++
arch/x86/kernel/acpi/Makefile | 11 ++--
arch/x86/kernel/acpi/boot.c | 86 +-----------------------------
arch/x86/kernel/acpi/madt_wakeup.c | 82 ++++++++++++++++++++++++++++
5 files changed, 101 insertions(+), 90 deletions(-)
create mode 100644 arch/x86/kernel/acpi/madt_wakeup.c

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 03483b23a009..0f5fd815bca3 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1119,6 +1119,13 @@ config X86_LOCAL_APIC
depends on X86_64 || SMP || X86_32_NON_STANDARD || X86_UP_APIC || PCI_MSI
select IRQ_DOMAIN_HIERARCHY

+config X86_ACPI_MADT_WAKEUP
+ def_bool y
+ depends on X86_64
+ depends on ACPI
+ depends on SMP
+ depends on X86_LOCAL_APIC
+
config X86_IO_APIC
def_bool y
depends on X86_LOCAL_APIC || X86_UP_IOAPIC
diff --git a/arch/x86/include/asm/acpi.h b/arch/x86/include/asm/acpi.h
index f896eed4516c..2625b915ae7f 100644
--- a/arch/x86/include/asm/acpi.h
+++ b/arch/x86/include/asm/acpi.h
@@ -76,6 +76,11 @@ static inline bool acpi_skip_set_wakeup_address(void)

#define acpi_skip_set_wakeup_address acpi_skip_set_wakeup_address

+union acpi_subtable_headers;
+
+int __init acpi_parse_mp_wake(union acpi_subtable_headers *header,
+ const unsigned long end);
+
/*
* Check if the CPU can handle C2 and deeper
*/
diff --git a/arch/x86/kernel/acpi/Makefile b/arch/x86/kernel/acpi/Makefile
index fc17b3f136fe..8c7329c88a75 100644
--- a/arch/x86/kernel/acpi/Makefile
+++ b/arch/x86/kernel/acpi/Makefile
@@ -1,11 +1,12 @@
# SPDX-License-Identifier: GPL-2.0

-obj-$(CONFIG_ACPI) += boot.o
-obj-$(CONFIG_ACPI_SLEEP) += sleep.o wakeup_$(BITS).o
-obj-$(CONFIG_ACPI_APEI) += apei.o
-obj-$(CONFIG_ACPI_CPPC_LIB) += cppc.o
+obj-$(CONFIG_ACPI) += boot.o
+obj-$(CONFIG_ACPI_SLEEP) += sleep.o wakeup_$(BITS).o
+obj-$(CONFIG_ACPI_APEI) += apei.o
+obj-$(CONFIG_ACPI_CPPC_LIB) += cppc.o
+obj-$(CONFIG_X86_ACPI_MADT_WAKEUP) += madt_wakeup.o

ifneq ($(CONFIG_ACPI_PROCESSOR),)
-obj-y += cstate.o
+obj-y += cstate.o
endif

diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
index 4bf82dbd2a6b..53b8802e01e7 100644
--- a/arch/x86/kernel/acpi/boot.c
+++ b/arch/x86/kernel/acpi/boot.c
@@ -67,13 +67,6 @@ static bool has_lapic_cpus __initdata;
static bool acpi_support_online_capable;
#endif

-#ifdef CONFIG_X86_64
-/* Physical address of the Multiprocessor Wakeup Structure mailbox */
-static u64 acpi_mp_wake_mailbox_paddr;
-/* Virtual address of the Multiprocessor Wakeup Structure mailbox */
-static struct acpi_madt_multiproc_wakeup_mailbox *acpi_mp_wake_mailbox;
-#endif
-
#ifdef CONFIG_X86_IO_APIC
/*
* Locks related to IOAPIC hotplug
@@ -341,60 +334,6 @@ acpi_parse_lapic_nmi(union acpi_subtable_headers * header, const unsigned long e

return 0;
}
-
-#ifdef CONFIG_X86_64
-static int acpi_wakeup_cpu(u32 apicid, unsigned long start_ip)
-{
- /*
- * Remap mailbox memory only for the first call to acpi_wakeup_cpu().
- *
- * Wakeup of secondary CPUs is fully serialized in the core code.
- * No need to protect acpi_mp_wake_mailbox from concurrent accesses.
- */
- if (!acpi_mp_wake_mailbox) {
- acpi_mp_wake_mailbox = memremap(acpi_mp_wake_mailbox_paddr,
- sizeof(*acpi_mp_wake_mailbox),
- MEMREMAP_WB);
- }
-
- /*
- * Mailbox memory is shared between the firmware and OS. Firmware will
- * listen on mailbox command address, and once it receives the wakeup
- * command, the CPU associated with the given apicid will be booted.
- *
- * The value of 'apic_id' and 'wakeup_vector' must be visible to the
- * firmware before the wakeup command is visible. smp_store_release()
- * ensures ordering and visibility.
- */
- acpi_mp_wake_mailbox->apic_id = apicid;
- acpi_mp_wake_mailbox->wakeup_vector = start_ip;
- smp_store_release(&acpi_mp_wake_mailbox->command,
- ACPI_MP_WAKE_COMMAND_WAKEUP);
-
- /*
- * Wait for the CPU to wake up.
- *
- * The CPU being woken up is essentially in a spin loop waiting to be
- * woken up. It should not take long for it wake up and acknowledge by
- * zeroing out ->command.
- *
- * ACPI specification doesn't provide any guidance on how long kernel
- * has to wait for a wake up acknowledgement. It also doesn't provide
- * a way to cancel a wake up request if it takes too long.
- *
- * In TDX environment, the VMM has control over how long it takes to
- * wake up secondary. It can postpone scheduling secondary vCPU
- * indefinitely. Giving up on wake up request and reporting error opens
- * possible attack vector for VMM: it can wake up a secondary CPU when
- * kernel doesn't expect it. Wait until positive result of the wake up
- * request.
- */
- while (READ_ONCE(acpi_mp_wake_mailbox->command))
- cpu_relax();
-
- return 0;
-}
-#endif /* CONFIG_X86_64 */
#endif /* CONFIG_X86_LOCAL_APIC */

#ifdef CONFIG_X86_IO_APIC
@@ -1124,29 +1063,6 @@ static int __init acpi_parse_madt_lapic_entries(void)
}
return 0;
}
-
-#ifdef CONFIG_X86_64
-static int __init acpi_parse_mp_wake(union acpi_subtable_headers *header,
- const unsigned long end)
-{
- struct acpi_madt_multiproc_wakeup *mp_wake;
-
- if (!IS_ENABLED(CONFIG_SMP))
- return -ENODEV;
-
- mp_wake = (struct acpi_madt_multiproc_wakeup *)header;
- if (BAD_MADT_ENTRY(mp_wake, end))
- return -EINVAL;
-
- acpi_table_print_madt_entry(&header->common);
-
- acpi_mp_wake_mailbox_paddr = mp_wake->base_address;
-
- apic_update_callback(wakeup_secondary_cpu_64, acpi_wakeup_cpu);
-
- return 0;
-}
-#endif /* CONFIG_X86_64 */
#endif /* CONFIG_X86_LOCAL_APIC */

#ifdef CONFIG_X86_IO_APIC
@@ -1343,7 +1259,7 @@ static void __init acpi_process_madt(void)
smp_found_config = 1;
}

-#ifdef CONFIG_X86_64
+#ifdef CONFIG_X86_ACPI_MADT_WAKEUP
/*
* Parse MADT MP Wake entry.
*/
diff --git a/arch/x86/kernel/acpi/madt_wakeup.c b/arch/x86/kernel/acpi/madt_wakeup.c
new file mode 100644
index 000000000000..7f164d38bd0b
--- /dev/null
+++ b/arch/x86/kernel/acpi/madt_wakeup.c
@@ -0,0 +1,82 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+#include <linux/acpi.h>
+#include <linux/io.h>
+#include <asm/apic.h>
+#include <asm/barrier.h>
+#include <asm/processor.h>
+
+/* Physical address of the Multiprocessor Wakeup Structure mailbox */
+static u64 acpi_mp_wake_mailbox_paddr;
+
+/* Virtual address of the Multiprocessor Wakeup Structure mailbox */
+static struct acpi_madt_multiproc_wakeup_mailbox *acpi_mp_wake_mailbox;
+
+static int acpi_wakeup_cpu(u32 apicid, unsigned long start_ip)
+{
+ /*
+ * Remap mailbox memory only for the first call to acpi_wakeup_cpu().
+ *
+ * Wakeup of secondary CPUs is fully serialized in the core code.
+ * No need to protect acpi_mp_wake_mailbox from concurrent accesses.
+ */
+ if (!acpi_mp_wake_mailbox) {
+ acpi_mp_wake_mailbox = memremap(acpi_mp_wake_mailbox_paddr,
+ sizeof(*acpi_mp_wake_mailbox),
+ MEMREMAP_WB);
+ }
+
+ /*
+ * Mailbox memory is shared between the firmware and OS. Firmware will
+ * listen on mailbox command address, and once it receives the wakeup
+ * command, the CPU associated with the given apicid will be booted.
+ *
+ * The value of 'apic_id' and 'wakeup_vector' must be visible to the
+ * firmware before the wakeup command is visible. smp_store_release()
+ * ensures ordering and visibility.
+ */
+ acpi_mp_wake_mailbox->apic_id = apicid;
+ acpi_mp_wake_mailbox->wakeup_vector = start_ip;
+ smp_store_release(&acpi_mp_wake_mailbox->command,
+ ACPI_MP_WAKE_COMMAND_WAKEUP);
+
+ /*
+ * Wait for the CPU to wake up.
+ *
+ * The CPU being woken up is essentially in a spin loop waiting to be
+ * woken up. It should not take long for it wake up and acknowledge by
+ * zeroing out ->command.
+ *
+ * ACPI specification doesn't provide any guidance on how long kernel
+ * has to wait for a wake up acknowledgment. It also doesn't provide
+ * a way to cancel a wake up request if it takes too long.
+ *
+ * In TDX environment, the VMM has control over how long it takes to
+ * wake up secondary. It can postpone scheduling secondary vCPU
+ * indefinitely. Giving up on wake up request and reporting error opens
+ * possible attack vector for VMM: it can wake up a secondary CPU when
+ * kernel doesn't expect it. Wait until positive result of the wake up
+ * request.
+ */
+ while (READ_ONCE(acpi_mp_wake_mailbox->command))
+ cpu_relax();
+
+ return 0;
+}
+
+int __init acpi_parse_mp_wake(union acpi_subtable_headers *header,
+ const unsigned long end)
+{
+ struct acpi_madt_multiproc_wakeup *mp_wake;
+
+ mp_wake = (struct acpi_madt_multiproc_wakeup *)header;
+ if (BAD_MADT_ENTRY(mp_wake, end))
+ return -EINVAL;
+
+ acpi_table_print_madt_entry(&header->common);
+
+ acpi_mp_wake_mailbox_paddr = mp_wake->base_address;
+
+ apic_update_callback(wakeup_secondary_cpu_64, acpi_wakeup_cpu);
+
+ return 0;
+}
--
2.43.0


2024-04-04 18:28:08

by Kalra, Ashish

[permalink] [raw]
Subject: Re: [PATCHv9 00/17] x86/tdx: Add kexec support

Hi Kirill,

On 3/25/2024 5:38 AM, Kirill A. Shutemov wrote:
> The patchset adds bits and pieces to get kexec (and crashkernel) work on
> TDX guest.
>
> The last patch implements CPU offlining according to the approved ACPI
> spec change poposal[1]. It unlocks kexec with all CPUs visible in the target
> kernel. It requires BIOS-side enabling. If it missing we fallback to booting
> 2nd kernel with single CPU.
>
> Please review. I would be glad for any feedback.
>
> [1] https://lore.kernel.org/all/13356251.uLZWGnKmhe@kreacher
>
> v9:
> - Rebased;
> - Keep page tables that maps E820_TYPE_ACPI (Ashish);
> - Ack/Reviewed/Tested-bys from Sathya, Kai, Tao;
> - Minor printk() message adjustments;
> v8:
> - Rework serialization of around conversion memory back to private;
> - Print ACPI_MADT_TYPE_MULTIPROC_WAKEUP in acpi_table_print_madt_entry();
> - Drop debugfs interface to dump info on shared memory;
> - Adjust comments and commit messages;
> - Reviewed-bys by Baoquan, Dave and Thomas;
> v7:
> - Call enc_kexec_stop_conversion() and enc_kexec_unshare_mem() after shutting
> down IO-APIC, lapic and hpet. It meets AMD requirements.
> - Minor style changes;
> - Add Acked/Reviewed-bys;
> v6:
> - Rebased to v6.8-rc1;
> - Provide default noop callbacks from .enc_kexec_stop_conversion and
> .enc_kexec_unshare_mem;
> - Split off patch that introduces .enc_kexec_* callbacks;
> - asm_acpi_mp_play_dead(): program CR3 directly from RSI, no MOV to RAX
> required;
> - Restructure how smp_ops.stop_this_cpu() hooked up in crash_nmi_callback();
> - kvmclock patch got merged via KVM tree;
> v5:
> - Rename smp_ops.crash_play_dead to smp_ops.stop_this_cpu and use it in
> stop_this_cpu();
> - Split off enc_kexec_stop_conversion() from enc_kexec_unshare_mem();
> - Introduce kernel_ident_mapping_free();
> - Add explicit include for alternatives and stringify.
> - Add barrier() after setting conversion_allowed to false;
> - Mark cpu_hotplug_offline_disabled __ro_after_init;
> - Print error if failed to hand over CPU to BIOS;
> - Update comments and commit messages;
> v4:
> - Fix build for !KEXEC_CORE;
> - Cleaner ATLERNATIVE use;
> - Update commit messages and comments;
> - Add Reviewed-bys;
> v3:
> - Rework acpi_mp_crash_stop_other_cpus() to avoid invoking hotplug state
> machine;
> - Free page tables if reset vector setup failed;
> - Change asm_acpi_mp_play_dead() to pass reset vector and PGD as arguments;
> - Mark acpi_mp_* variables as static and __ro_after_init;
> - Use u32 for apicid;
> - Disable CPU offlining if reset vector setup failed;
> - Rename madt.S -> madt_playdead.S;
> - Mark tdx_kexec_unshare_mem() as static;
> - Rebase onto up-to-date tip/master;
> - Whitespace fixes;
> - Reorder patches;
> - Add Reviewed-bys;
> - Update comments and commit messages;
> v2:
> - Rework how unsharing hook ups into kexec codepath;
> - Rework kvmclock_disable() fix based on Sean's;
> - s/cpu_hotplug_not_supported()/cpu_hotplug_disable_offlining()/;
> - use play_dead_common() to implement acpi_mp_play_dead();
> - cond_resched() in tdx_shared_memory_show();
> - s/target kernel/second kernel/;
> - Update commit messages and comments;
>
> Kirill A. Shutemov (17):
> x86/acpi: Extract ACPI MADT wakeup code into a separate file
> x86/apic: Mark acpi_mp_wake_* variables as __ro_after_init
> cpu/hotplug: Add support for declaring CPU offlining not supported
> cpu/hotplug, x86/acpi: Disable CPU offlining for ACPI MADT wakeup
> x86/kexec: Keep CR4.MCE set during kexec for TDX guest
> x86/mm: Make x86_platform.guest.enc_status_change_*() return errno
> x86/mm: Return correct level from lookup_address() if pte is none
> x86/tdx: Account shared memory
> x86/mm: Adding callbacks to prepare encrypted memory for kexec
> x86/tdx: Convert shared memory back to private on kexec
> x86/mm: Make e820_end_ram_pfn() cover E820_TYPE_ACPI ranges
> x86/acpi: Rename fields in acpi_madt_multiproc_wakeup structure
> x86/acpi: Do not attempt to bring up secondary CPUs in kexec case
> x86/smp: Add smp_ops.stop_this_cpu() callback
> x86/mm: Introduce kernel_ident_mapping_free()
> x86/acpi: Add support for CPU offlining for ACPI MADT wakeup method
> ACPI: tables: Print MULTIPROC_WAKEUP when MADT is parsed
>
> arch/x86/Kconfig | 7 +
> arch/x86/coco/core.c | 1 -
> arch/x86/coco/tdx/tdx.c | 99 ++++++++-
> arch/x86/hyperv/ivm.c | 9 +-
> arch/x86/include/asm/acpi.h | 7 +
> arch/x86/include/asm/init.h | 3 +
> arch/x86/include/asm/pgtable.h | 5 +
> arch/x86/include/asm/pgtable_types.h | 1 +
> arch/x86/include/asm/set_memory.h | 3 +
> arch/x86/include/asm/smp.h | 1 +
> arch/x86/include/asm/x86_init.h | 6 +-
> arch/x86/kernel/acpi/Makefile | 11 +-
> arch/x86/kernel/acpi/boot.c | 86 +-------
> arch/x86/kernel/acpi/madt_playdead.S | 28 +++
> arch/x86/kernel/acpi/madt_wakeup.c | 292 +++++++++++++++++++++++++++
> arch/x86/kernel/crash.c | 6 +
> arch/x86/kernel/e820.c | 9 +-
> arch/x86/kernel/process.c | 7 +
> arch/x86/kernel/reboot.c | 18 ++
> arch/x86/kernel/relocate_kernel_64.S | 5 +
> arch/x86/kernel/x86_init.c | 8 +-
> arch/x86/mm/ident_map.c | 73 +++++++
> arch/x86/mm/mem_encrypt_amd.c | 8 +-
> arch/x86/mm/pat/set_memory.c | 59 ++++--
> drivers/acpi/tables.c | 14 ++
> include/acpi/actbl2.h | 19 +-
> include/linux/cc_platform.h | 10 -
> include/linux/cpu.h | 2 +
> kernel/cpu.c | 12 +-
> 29 files changed, 663 insertions(+), 146 deletions(-)
> create mode 100644 arch/x86/kernel/acpi/madt_playdead.S
> create mode 100644 arch/x86/kernel/acpi/madt_wakeup.c

The cover letter mention the inclusion of the following patch - Keep
page tables that maps E820_TYPE_ACPI (Ashish)

But i don't this patch included in your patch-set.

Thanks, Ashish


2024-04-04 23:11:11

by Kalra, Ashish

[permalink] [raw]
Subject: [PATCH v3 0/4] x86/snp: Add kexec support

From: Ashish Kalra <[email protected]>

The patchset adds bits and pieces to get kexec (and crashkernel) work on
SNP guest.

v3:
- Rebased;
- moved Keep page tables that maps E820_TYPE_ACPI patch to Kirill's tdx
guest kexec patch series.
- checking the md attribute instead of checking the efi_setup for
detecting if running under kexec kernel.
- added new sev_es_enabled() function.
- skip video memory access in decompressor for SEV-ES/SNP systems to
prevent guest termination as boot stage2 #VC handler does not handle
MMIO.

v2:
- address zeroing of unaccepted memory table mappings at all page table levels
adding phys_pte_init(), phys_pud_init() and phys_p4d_init().
- include skip efi_arch_mem_reserve() in case of kexec as part of this
patch set.
- rename last_address_shd_kexec to a more appropriate
kexec_last_address_to_make_private.
- remove duplicate code shared with TDX and use common interfaces
defined for SNP and TDX for kexec/kdump.
- remove set_pte_enc() dependency on pg_level_to_pfn() and make the
function simpler.
- rename unshare_pte() to make_pte_private().
- clarify and make the comment for using kexec_last_address_to_make_private
more understandable.
- general cleanup.

Ashish Kalra (4):
efi/x86: skip efi_arch_mem_reserve() in case of kexec.
x86/sev: add sev_es_enabled() function.
x86/boot/compressed: Skip Video Memory access in Decompressor for
SEV-ES/SNP.
x86/snp: Convert shared memory back to private on kexec

arch/x86/boot/compressed/misc.c | 6 +-
arch/x86/boot/compressed/misc.h | 1 +
arch/x86/boot/compressed/sev.c | 5 +
arch/x86/boot/compressed/sev.h | 2 +
arch/x86/include/asm/probe_roms.h | 1 +
arch/x86/include/asm/sev.h | 4 +
arch/x86/kernel/probe_roms.c | 16 +++
arch/x86/kernel/sev.c | 169 ++++++++++++++++++++++++++++++
arch/x86/mm/mem_encrypt_amd.c | 3 +
arch/x86/platform/efi/quirks.c | 23 +++-
10 files changed, 225 insertions(+), 5 deletions(-)

--
2.34.1


2024-04-04 23:11:25

by Kalra, Ashish

[permalink] [raw]
Subject: [PATCH v3 1/4] efi/x86: skip efi_arch_mem_reserve() in case of kexec.

From: Ashish Kalra <[email protected]>

For kexec use case, need to use and stick to the EFI memmap passed
from the first kernel via boot-params/setup data, hence,
skip efi_arch_mem_reserve() during kexec.

Additionally during SNP guest kexec testing discovered that EFI memmap
is corrupted during chained kexec. kexec_enter_virtual_mode() during
late init will remap the efi_memmap physical pages allocated in
efi_arch_mem_reserve() via memblock & then subsequently cause random
EFI memmap corruption once memblock is freed/teared-down.

Suggested-by: Dave Young <[email protected]>
[Dave Young: checking the md attribute instead of checking the efi_setup]
Signed-off-by: Ashish Kalra <[email protected]>
---
arch/x86/platform/efi/quirks.c | 23 ++++++++++++++++++++---
1 file changed, 20 insertions(+), 3 deletions(-)

diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c
index f0cc00032751..2b65b3863912 100644
--- a/arch/x86/platform/efi/quirks.c
+++ b/arch/x86/platform/efi/quirks.c
@@ -255,15 +255,32 @@ void __init efi_arch_mem_reserve(phys_addr_t addr, u64 size)
struct efi_memory_map_data data = { 0 };
struct efi_mem_range mr;
efi_memory_desc_t md;
- int num_entries;
+ int num_entries, ret;
void *new;

- if (efi_mem_desc_lookup(addr, &md) ||
- md.type != EFI_BOOT_SERVICES_DATA) {
+ /*
+ * For kexec use case, we need to use the EFI memmap passed from the first
+ * kernel via setup data, so we need to skip this.
+ * Additionally kexec_enter_virtual_mode() during late init will remap
+ * the efi_memmap physical pages allocated here via memboot & then
+ * subsequently cause random EFI memmap corruption once memblock is freed.
+ */
+
+ ret = efi_mem_desc_lookup(addr, &md);
+ if (ret) {
pr_err("Failed to lookup EFI memory descriptor for %pa\n", &addr);
return;
}

+ if (md.type != EFI_BOOT_SERVICES_DATA) {
+ pr_err("Skip reserving non EFI Boot Service Data memory for %pa\n", &addr);
+ return;
+ }
+
+ /* Kexec copied the efi memmap from the first kernel, thus skip the case */
+ if (md.attribute & EFI_MEMORY_RUNTIME)
+ return;
+
if (addr + size > md.phys_addr + (md.num_pages << EFI_PAGE_SHIFT)) {
pr_err("Region spans EFI memory descriptors, %pa\n", &addr);
return;
--
2.34.1


2024-04-04 23:11:38

by Kalra, Ashish

[permalink] [raw]
Subject: [PATCH v3 2/4] x86/sev: add sev_es_enabled() function.

From: Ashish Kalra <[email protected]>

Add sev_es_enabled() function to detect if SEV-ES
support is enabled.

Signed-off-by: Ashish Kalra <[email protected]>
---
arch/x86/boot/compressed/sev.c | 5 +++++
arch/x86/boot/compressed/sev.h | 2 ++
2 files changed, 7 insertions(+)

diff --git a/arch/x86/boot/compressed/sev.c b/arch/x86/boot/compressed/sev.c
index ec71846d28c9..4ae4cc51e6b8 100644
--- a/arch/x86/boot/compressed/sev.c
+++ b/arch/x86/boot/compressed/sev.c
@@ -134,6 +134,11 @@ bool sev_snp_enabled(void)
return sev_status & MSR_AMD64_SEV_SNP_ENABLED;
}

+bool sev_es_enabled(void)
+{
+ return sev_status & MSR_AMD64_SEV_ES_ENABLED;
+}
+
static void __page_state_change(unsigned long paddr, enum psc_op op)
{
u64 val;
diff --git a/arch/x86/boot/compressed/sev.h b/arch/x86/boot/compressed/sev.h
index fc725a981b09..5008c80e66e6 100644
--- a/arch/x86/boot/compressed/sev.h
+++ b/arch/x86/boot/compressed/sev.h
@@ -11,11 +11,13 @@
#ifdef CONFIG_AMD_MEM_ENCRYPT

bool sev_snp_enabled(void);
+bool sev_es_enabled(void);
void snp_accept_memory(phys_addr_t start, phys_addr_t end);

#else

static inline bool sev_snp_enabled(void) { return false; }
+static inline bool sev_es_enabled(void) { return false; }
static inline void snp_accept_memory(phys_addr_t start, phys_addr_t end) { }

#endif
--
2.34.1


2024-04-04 23:11:57

by Kalra, Ashish

[permalink] [raw]
Subject: [PATCH v3 3/4] x86/boot/compressed: Skip Video Memory access in Decompressor for SEV-ES/SNP.

From: Ashish Kalra <[email protected]>

Accessing guest video memory/RAM during kernel decompressor
causes guest termination as boot stage2 #VC handler for
SEV-ES/SNP systems does not support MMIO handling.

This issue is observed with SEV-ES/SNP guest kexec as
kexec -c adds screen_info to the boot parameters
passed to the kexec kernel, which causes console output to
be dumped to both video and serial.

As the decompressor output gets cleared really fast, it is
preferable to get the console output only on serial, hence,
skip accessing video RAM during decompressor stage to
prevent guest termination.

Serial console output during decompressor stage works as
boot stage2 #VC handler already supports handling port I/O.

Suggested-by: Thomas Lendacy <[email protected]>
Signed-off-by: Ashish Kalra <[email protected]>
---
arch/x86/boot/compressed/misc.c | 6 ++++--
arch/x86/boot/compressed/misc.h | 1 +
2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index b70e4a21c15f..47b4db200e1f 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -427,8 +427,10 @@ asmlinkage __visible void *extract_kernel(void *rmode, unsigned char *output)
vidport = 0x3d4;
}

- lines = boot_params_ptr->screen_info.orig_video_lines;
- cols = boot_params_ptr->screen_info.orig_video_cols;
+ if (!sev_es_enabled()) {
+ lines = boot_params_ptr->screen_info.orig_video_lines;
+ cols = boot_params_ptr->screen_info.orig_video_cols;
+ }

init_default_io_ops();

diff --git a/arch/x86/boot/compressed/misc.h b/arch/x86/boot/compressed/misc.h
index b353a7be380c..3c12ca987554 100644
--- a/arch/x86/boot/compressed/misc.h
+++ b/arch/x86/boot/compressed/misc.h
@@ -37,6 +37,7 @@
#include <asm/desc_defs.h>

#include "tdx.h"
+#include "sev.h"

#define BOOT_CTYPE_H
#include <linux/acpi.h>
--
2.34.1


2024-04-04 23:12:44

by Kalra, Ashish

[permalink] [raw]
Subject: [PATCH v3 4/4] x86/snp: Convert shared memory back to private on kexec

From: Ashish Kalra <[email protected]>

SNP guests allocate shared buffers to perform I/O. It is done by
allocating pages normally from the buddy allocator and converting them
to shared with set_memory_decrypted().

The second kernel has no idea what memory is converted this way. It only
sees E820_TYPE_RAM.

Accessing shared memory via private mapping will cause unrecoverable RMP
page-faults.

On kexec walk direct mapping and convert all shared memory back to
private. It makes all RAM private again and second kernel may use it
normally. Additionally for SNP guests convert all bss decrypted section
pages back to private and switch back ROM regions to shared so that
their revalidation does not fail during kexec kernel boot.

The conversion occurs in two steps: stopping new conversions and
unsharing all memory. In the case of normal kexec, the stopping of
conversions takes place while scheduling is still functioning. This
allows for waiting until any ongoing conversions are finished. The
second step is carried out when all CPUs except one are inactive and
interrupts are disabled. This prevents any conflicts with code that may
access shared memory.

Signed-off-by: Ashish Kalra <[email protected]>
---
arch/x86/include/asm/probe_roms.h | 1 +
arch/x86/include/asm/sev.h | 4 +
arch/x86/kernel/probe_roms.c | 16 +++
arch/x86/kernel/sev.c | 169 ++++++++++++++++++++++++++++++
arch/x86/mm/mem_encrypt_amd.c | 3 +
5 files changed, 193 insertions(+)

diff --git a/arch/x86/include/asm/probe_roms.h b/arch/x86/include/asm/probe_roms.h
index 1c7f3815bbd6..d50b67dbff33 100644
--- a/arch/x86/include/asm/probe_roms.h
+++ b/arch/x86/include/asm/probe_roms.h
@@ -6,4 +6,5 @@ struct pci_dev;
extern void __iomem *pci_map_biosrom(struct pci_dev *pdev);
extern void pci_unmap_biosrom(void __iomem *rom);
extern size_t pci_biosrom_size(struct pci_dev *pdev);
+extern void snp_kexec_unprep_rom_memory(void);
#endif
diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
index 9477b4053bce..51197a544693 100644
--- a/arch/x86/include/asm/sev.h
+++ b/arch/x86/include/asm/sev.h
@@ -230,6 +230,8 @@ u64 snp_get_unsupported_features(u64 status);
u64 sev_get_status(void);
void kdump_sev_callback(void);
void sev_show_status(void);
+void snp_kexec_unshare_mem(void);
+void snp_kexec_stop_conversion(bool crash);
#else
static inline void sev_es_ist_enter(struct pt_regs *regs) { }
static inline void sev_es_ist_exit(void) { }
@@ -260,6 +262,8 @@ static inline u64 snp_get_unsupported_features(u64 status) { return 0; }
static inline u64 sev_get_status(void) { return 0; }
static inline void kdump_sev_callback(void) { }
static inline void sev_show_status(void) { }
+void snp_kexec_unshare_mem(void) {}
+static void snp_kexec_stop_conversion(bool crash) {}
#endif

#ifdef CONFIG_KVM_AMD_SEV
diff --git a/arch/x86/kernel/probe_roms.c b/arch/x86/kernel/probe_roms.c
index 319fef37d9dc..457f1e5c8d00 100644
--- a/arch/x86/kernel/probe_roms.c
+++ b/arch/x86/kernel/probe_roms.c
@@ -177,6 +177,22 @@ size_t pci_biosrom_size(struct pci_dev *pdev)
}
EXPORT_SYMBOL(pci_biosrom_size);

+void snp_kexec_unprep_rom_memory(void)
+{
+ unsigned long vaddr, npages, sz;
+
+ /*
+ * Switch back ROM regions to shared so that their validation
+ * does not fail during kexec kernel boot.
+ */
+ vaddr = (unsigned long)__va(video_rom_resource.start);
+ sz = (system_rom_resource.end + 1) - video_rom_resource.start;
+ npages = PAGE_ALIGN(sz) >> PAGE_SHIFT;
+
+ snp_set_memory_shared(vaddr, npages);
+}
+EXPORT_SYMBOL(snp_kexec_unprep_rom_memory);
+
#define ROMSIGNATURE 0xaa55

static int __init romsignature(const unsigned char *rom)
diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index b59b09c2f284..1395c9f0fae4 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -41,6 +41,7 @@
#include <asm/apic.h>
#include <asm/cpuid.h>
#include <asm/cmdline.h>
+#include <asm/probe_roms.h>

#define DR7_RESET_VALUE 0x400

@@ -91,6 +92,9 @@ static struct ghcb *boot_ghcb __section(".data");
/* Bitmap of SEV features supported by the hypervisor */
static u64 sev_hv_features __ro_after_init;

+/* Last address to be switched to private during kexec */
+static unsigned long kexec_last_addr_to_make_private;
+
/* #VC handler runtime per-CPU data */
struct sev_es_runtime_data {
struct ghcb ghcb_page;
@@ -927,6 +931,171 @@ void snp_accept_memory(phys_addr_t start, phys_addr_t end)
set_pages_state(vaddr, npages, SNP_PAGE_STATE_PRIVATE);
}

+static bool set_pte_enc(pte_t *kpte, int level, void *va)
+{
+ pte_t new_pte;
+
+ if (pte_none(*kpte))
+ return false;
+
+ /*
+ * Change the physical page attribute from C=0 to C=1. Flush the
+ * caches to ensure that data gets accessed with the correct C-bit.
+ */
+ if (pte_present(*kpte))
+ clflush_cache_range(va, page_level_size(level));
+
+ new_pte = __pte(cc_mkenc(pte_val(*kpte)));
+ set_pte_atomic(kpte, new_pte);
+
+ return true;
+}
+
+static bool make_pte_private(pte_t *pte, unsigned long addr, int pages, int level)
+{
+ struct sev_es_runtime_data *data;
+ struct ghcb *ghcb;
+
+ data = this_cpu_read(runtime_data);
+ ghcb = &data->ghcb_page;
+
+ /* Check for GHCB for being part of a PMD range. */
+ if ((unsigned long)ghcb >= addr &&
+ (unsigned long)ghcb <= (addr + (pages * PAGE_SIZE))) {
+ /*
+ * Ensure that the current cpu's GHCB is made private
+ * at the end of unshared loop so that we continue to use the
+ * optimized GHCB protocol and not force the switch to
+ * MSR protocol till the very end.
+ */
+ pr_debug("setting boot_ghcb to NULL for this cpu ghcb\n");
+ kexec_last_addr_to_make_private = addr;
+ return true;
+ }
+
+ if (!set_pte_enc(pte, level, (void *)addr))
+ return false;
+
+ snp_set_memory_private(addr, pages);
+
+ return true;
+}
+
+static void unshare_all_memory(void)
+{
+ unsigned long addr, end;
+
+ /*
+ * Walk direct mapping and convert all shared memory back to private,
+ */
+
+ addr = PAGE_OFFSET;
+ end = PAGE_OFFSET + get_max_mapped();
+
+ while (addr < end) {
+ unsigned long size;
+ unsigned int level;
+ pte_t *pte;
+
+ pte = lookup_address(addr, &level);
+ size = page_level_size(level);
+
+ /*
+ * pte_none() check is required to skip physical memory holes in direct mapped.
+ */
+ if (pte && pte_decrypted(*pte) && !pte_none(*pte)) {
+ int pages = size / PAGE_SIZE;
+
+ if (!make_pte_private(pte, addr, pages, level)) {
+ pr_err("Failed to unshare range %#lx-%#lx\n",
+ addr, addr + size);
+ }
+
+ }
+
+ addr += size;
+ }
+ __flush_tlb_all();
+
+}
+
+static void unshare_all_bss_decrypted_memory(void)
+{
+ unsigned long vaddr, vaddr_end;
+ unsigned long size;
+ unsigned int level;
+ unsigned int npages;
+ pte_t *pte;
+
+ vaddr = (unsigned long)__start_bss_decrypted;
+ vaddr_end = (unsigned long)__start_bss_decrypted_unused;
+ npages = (vaddr_end - vaddr) >> PAGE_SHIFT;
+ for (; vaddr < vaddr_end; vaddr += PAGE_SIZE) {
+ pte = lookup_address(vaddr, &level);
+ if (!pte || !pte_decrypted(*pte) || pte_none(*pte))
+ continue;
+
+ size = page_level_size(level);
+ set_pte_enc(pte, level, (void *)vaddr);
+ }
+ vaddr = (unsigned long)__start_bss_decrypted;
+ snp_set_memory_private(vaddr, npages);
+}
+
+/* Stop new private<->shared conversions */
+void snp_kexec_stop_conversion(bool crash)
+{
+ /*
+ * Crash kernel reaches here with interrupts disabled: can't wait for
+ * conversions to finish.
+ *
+ * If race happened, just report and proceed.
+ */
+ bool wait_for_lock = !crash;
+
+ if (!stop_memory_enc_conversion(wait_for_lock))
+ pr_warn("Failed to finish shared<->private conversions\n");
+}
+
+void snp_kexec_unshare_mem(void)
+{
+ if (!cc_platform_has(CC_ATTR_GUEST_SEV_SNP))
+ return;
+
+ /*
+ * Switch back any specific memory regions such as option
+ * ROM regions back to shared so that (re)validation does
+ * not fail when kexec kernel boots.
+ */
+ snp_kexec_unprep_rom_memory();
+
+ unshare_all_memory();
+
+ unshare_all_bss_decrypted_memory();
+
+ if (kexec_last_addr_to_make_private) {
+ unsigned long size;
+ unsigned int level;
+ pte_t *pte;
+
+ /*
+ * Switch to using the MSR protocol to change this cpu's
+ * GHCB to private.
+ * All the per-cpu GHCBs have been switched back to private,
+ * so can't do any more GHCB calls to the hypervisor beyond
+ * this point till the kexec kernel starts running.
+ */
+ boot_ghcb = NULL;
+ sev_cfg.ghcbs_initialized = false;
+
+ pr_debug("boot ghcb 0x%lx\n", kexec_last_addr_to_make_private);
+ pte = lookup_address(kexec_last_addr_to_make_private, &level);
+ size = page_level_size(level);
+ set_pte_enc(pte, level, (void *)kexec_last_addr_to_make_private);
+ snp_set_memory_private(kexec_last_addr_to_make_private, (size / PAGE_SIZE));
+ }
+}
+
static int snp_set_vmsa(void *va, bool vmsa)
{
u64 attrs;
diff --git a/arch/x86/mm/mem_encrypt_amd.c b/arch/x86/mm/mem_encrypt_amd.c
index d314e577836d..dab2dc2207fb 100644
--- a/arch/x86/mm/mem_encrypt_amd.c
+++ b/arch/x86/mm/mem_encrypt_amd.c
@@ -468,6 +468,9 @@ void __init sme_early_init(void)
x86_platform.guest.enc_tlb_flush_required = amd_enc_tlb_flush_required;
x86_platform.guest.enc_cache_flush_required = amd_enc_cache_flush_required;

+ x86_platform.guest.enc_kexec_stop_conversion = snp_kexec_stop_conversion;
+ x86_platform.guest.enc_kexec_unshare_mem = snp_kexec_unshare_mem;
+
/*
* AMD-SEV-ES intercepts the RDMSR to read the X2APIC ID in the
* parallel bringup low level code. That raises #VC which cannot be
--
2.34.1


2024-04-05 11:32:00

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH v3 4/4] x86/snp: Convert shared memory back to private on kexec

Hi Ashish,

kernel test robot noticed the following build errors:

[auto build test ERROR on efi/next]
[also build test ERROR on linus/master v6.9-rc2 next-20240405]
[cannot apply to tip/x86/core tip/master tip/x86/mm tip/auto-latest]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url: https://github.com/intel-lab-lkp/linux/commits/Ashish-Kalra/efi-x86-skip-efi_arch_mem_reserve-in-case-of-kexec/20240405-071346
base: https://git.kernel.org/pub/scm/linux/kernel/git/efi/efi.git next
patch link: https://lore.kernel.org/r/41db1ebbe58fb082dbe848f1c666ed23e83f1752.1712270976.git.ashish.kalra%40amd.com
patch subject: [PATCH v3 4/4] x86/snp: Convert shared memory back to private on kexec
config: x86_64-allnoconfig (https://download.01.org/0day-ci/archive/20240405/[email protected]/config)
compiler: clang version 17.0.6 (https://github.com/llvm/llvm-project 6009708b4367171ccdbf4b5905cb6a803753fe18)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240405/[email protected]/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <[email protected]>
| Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/

All errors (new ones prefixed by >>):

>> ld.lld: error: duplicate symbol: snp_kexec_unshare_mem
>>> defined at init.c
>>> arch/x86/realmode/init.o:(snp_kexec_unshare_mem) in archive vmlinux.a
>>> defined at extable.c
>>> arch/x86/mm/extable.o:(.text+0x0) in archive vmlinux.a
--
>> ld.lld: error: duplicate symbol: snp_kexec_unshare_mem
>>> defined at init.c
>>> arch/x86/realmode/init.o:(snp_kexec_unshare_mem) in archive vmlinux.a
>>> defined at fault.c
>>> arch/x86/mm/fault.o:(.text+0x0) in archive vmlinux.a
--
>> ld.lld: error: duplicate symbol: snp_kexec_unshare_mem
>>> defined at init.c
>>> arch/x86/realmode/init.o:(snp_kexec_unshare_mem) in archive vmlinux.a
>>> defined at amd.c
>>> arch/x86/kernel/cpu/amd.o:(.text+0x0) in archive vmlinux.a
--
>> ld.lld: error: duplicate symbol: snp_kexec_unshare_mem
>>> defined at init.c
>>> arch/x86/realmode/init.o:(snp_kexec_unshare_mem) in archive vmlinux.a
>>> defined at common.c
>>> arch/x86/kernel/cpu/common.o:(.text+0x0) in archive vmlinux.a
--
>> ld.lld: error: duplicate symbol: snp_kexec_unshare_mem
>>> defined at init.c
>>> arch/x86/realmode/init.o:(snp_kexec_unshare_mem) in archive vmlinux.a
>>> defined at probe_roms.c
>>> arch/x86/kernel/probe_roms.o:(.text+0x0) in archive vmlinux.a
--
>> ld.lld: error: duplicate symbol: snp_kexec_unshare_mem
>>> defined at init.c
>>> arch/x86/realmode/init.o:(snp_kexec_unshare_mem) in archive vmlinux.a
>>> defined at nmi.c
>>> arch/x86/kernel/nmi.o:(.text+0x0) in archive vmlinux.a
--
>> ld.lld: error: duplicate symbol: snp_kexec_unshare_mem
>>> defined at init.c
>>> arch/x86/realmode/init.o:(snp_kexec_unshare_mem) in archive vmlinux.a
>>> defined at head64.c
>>> arch/x86/kernel/head64.o:(.text+0x0) in archive vmlinux.a

--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

2024-04-05 11:35:59

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH v3 4/4] x86/snp: Convert shared memory back to private on kexec

Hi Ashish,

kernel test robot noticed the following build errors:

[auto build test ERROR on efi/next]
[also build test ERROR on linus/master v6.9-rc2 next-20240405]
[cannot apply to tip/x86/core tip/master tip/x86/mm tip/auto-latest]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url: https://github.com/intel-lab-lkp/linux/commits/Ashish-Kalra/efi-x86-skip-efi_arch_mem_reserve-in-case-of-kexec/20240405-071346
base: https://git.kernel.org/pub/scm/linux/kernel/git/efi/efi.git next
patch link: https://lore.kernel.org/r/41db1ebbe58fb082dbe848f1c666ed23e83f1752.1712270976.git.ashish.kalra%40amd.com
patch subject: [PATCH v3 4/4] x86/snp: Convert shared memory back to private on kexec
config: x86_64-defconfig (https://download.01.org/0day-ci/archive/20240405/[email protected]/config)
compiler: gcc-13 (Ubuntu 13.2.0-4ubuntu3) 13.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240405/[email protected]/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <[email protected]>
| Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/

All error/warnings (new ones prefixed by >>):

ld: arch/x86/kernel/head64.o: in function `snp_kexec_unshare_mem':
>> head64.c:(.text+0x110): multiple definition of `snp_kexec_unshare_mem'; arch/x86/realmode/init.o:init.c:(.text+0x10): first defined here
ld: arch/x86/kernel/nmi.o: in function `snp_kexec_unshare_mem':
nmi.c:(.text+0x820): multiple definition of `snp_kexec_unshare_mem'; arch/x86/realmode/init.o:init.c:(.text+0x10): first defined here
ld: arch/x86/kernel/probe_roms.o: in function `snp_kexec_unshare_mem':
probe_roms.c:(.text+0x370): multiple definition of `snp_kexec_unshare_mem'; arch/x86/realmode/init.o:init.c:(.text+0x10): first defined here
ld: arch/x86/kernel/cpu/common.o: in function `snp_kexec_unshare_mem':
common.c:(.text+0x530): multiple definition of `snp_kexec_unshare_mem'; arch/x86/realmode/init.o:init.c:(.text+0x10): first defined here
ld: arch/x86/kernel/cpu/amd.o: in function `snp_kexec_unshare_mem':
amd.c:(.text+0x12b0): multiple definition of `snp_kexec_unshare_mem'; arch/x86/realmode/init.o:init.c:(.text+0x10): first defined here
ld: arch/x86/kernel/smpboot.o: in function `snp_kexec_unshare_mem':
smpboot.c:(.text+0xfa0): multiple definition of `snp_kexec_unshare_mem'; arch/x86/realmode/init.o:init.c:(.text+0x10): first defined here
ld: arch/x86/kernel/crash.o: in function `snp_kexec_unshare_mem':
crash.c:(.text+0xa0): multiple definition of `snp_kexec_unshare_mem'; arch/x86/realmode/init.o:init.c:(.text+0x10): first defined here
ld: arch/x86/mm/fault.o: in function `snp_kexec_unshare_mem':
fault.c:(.text+0x1ce0): multiple definition of `snp_kexec_unshare_mem'; arch/x86/realmode/init.o:init.c:(.text+0x10): first defined here
ld: arch/x86/mm/extable.o: in function `snp_kexec_unshare_mem':
extable.c:(.text+0x330): multiple definition of `snp_kexec_unshare_mem'; arch/x86/realmode/init.o:init.c:(.text+0x10): first defined here
ld: arch/x86/platform/efi/efi_64.o: in function `snp_kexec_unshare_mem':
efi_64.c:(.text+0x160): multiple definition of `snp_kexec_unshare_mem'; arch/x86/realmode/init.o:init.c:(.text+0x10): first defined here
ld: drivers/iommu/amd/init.o: in function `snp_kexec_unshare_mem':
init.c:(.text+0x15e0): multiple definition of `snp_kexec_unshare_mem'; arch/x86/realmode/init.o:init.c:(.text+0x10): first defined here
--
In file included from arch/x86/realmode/init.c:12:
>> arch/x86/include/asm/sev.h:265:6: warning: no previous prototype for 'snp_kexec_unshare_mem' [-Wmissing-prototypes]
265 | void snp_kexec_unshare_mem(void) {}
| ^~~~~~~~~~~~~~~~~~~~~
>> arch/x86/include/asm/sev.h:266:13: warning: 'snp_kexec_stop_conversion' defined but not used [-Wunused-function]
266 | static void snp_kexec_stop_conversion(bool crash) {}
| ^~~~~~~~~~~~~~~~~~~~~~~~~

--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

2024-04-05 11:40:44

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH v3 4/4] x86/snp: Convert shared memory back to private on kexec

Hi Ashish,

kernel test robot noticed the following build warnings:

[auto build test WARNING on efi/next]
[also build test WARNING on linus/master v6.9-rc2 next-20240405]
[cannot apply to tip/x86/core tip/master tip/x86/mm tip/auto-latest]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url: https://github.com/intel-lab-lkp/linux/commits/Ashish-Kalra/efi-x86-skip-efi_arch_mem_reserve-in-case-of-kexec/20240405-071346
base: https://git.kernel.org/pub/scm/linux/kernel/git/efi/efi.git next
patch link: https://lore.kernel.org/r/41db1ebbe58fb082dbe848f1c666ed23e83f1752.1712270976.git.ashish.kalra%40amd.com
patch subject: [PATCH v3 4/4] x86/snp: Convert shared memory back to private on kexec
config: x86_64-rhel-8.3-rust (https://download.01.org/0day-ci/archive/20240405/[email protected]/config)
compiler: clang version 17.0.6 (https://github.com/llvm/llvm-project 6009708b4367171ccdbf4b5905cb6a803753fe18)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240405/[email protected]/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <[email protected]>
| Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/

All warnings (new ones prefixed by >>):

arch/x86/kernel/sev.c:1006:14: error: call to undeclared function 'pte_decrypted'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
1006 | if (pte && pte_decrypted(*pte) && !pte_none(*pte)) {
| ^
arch/x86/kernel/sev.c:1035:16: error: call to undeclared function 'pte_decrypted'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
1035 | if (!pte || !pte_decrypted(*pte) || pte_none(*pte))
| ^
>> arch/x86/kernel/sev.c:1025:16: warning: variable 'size' set but not used [-Wunused-but-set-variable]
1025 | unsigned long size;
| ^
arch/x86/kernel/sev.c:1056:7: error: call to undeclared function 'stop_memory_enc_conversion'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
1056 | if (!stop_memory_enc_conversion(wait_for_lock))
| ^
1 warning and 3 errors generated.


vim +/size +1025 arch/x86/kernel/sev.c

1021
1022 static void unshare_all_bss_decrypted_memory(void)
1023 {
1024 unsigned long vaddr, vaddr_end;
> 1025 unsigned long size;
1026 unsigned int level;
1027 unsigned int npages;
1028 pte_t *pte;
1029
1030 vaddr = (unsigned long)__start_bss_decrypted;
1031 vaddr_end = (unsigned long)__start_bss_decrypted_unused;
1032 npages = (vaddr_end - vaddr) >> PAGE_SHIFT;
1033 for (; vaddr < vaddr_end; vaddr += PAGE_SIZE) {
1034 pte = lookup_address(vaddr, &level);
1035 if (!pte || !pte_decrypted(*pte) || pte_none(*pte))
1036 continue;
1037
1038 size = page_level_size(level);
1039 set_pte_enc(pte, level, (void *)vaddr);
1040 }
1041 vaddr = (unsigned long)__start_bss_decrypted;
1042 snp_set_memory_private(vaddr, npages);
1043 }
1044

--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

Subject: Re: [PATCH v3 1/4] efi/x86: skip efi_arch_mem_reserve() in case of kexec.


On 4/4/24 4:11 PM, Ashish Kalra wrote:
> From: Ashish Kalra <[email protected]>
>
> For kexec use case, need to use and stick to the EFI memmap passed
> from the first kernel via boot-params/setup data, hence,
> skip efi_arch_mem_reserve() during kexec.
>
> Additionally during SNP guest kexec testing discovered that EFI memmap
> is corrupted during chained kexec. kexec_enter_virtual_mode() during
> late init will remap the efi_memmap physical pages allocated in
> efi_arch_mem_reserve() via memblock & then subsequently cause random
> EFI memmap corruption once memblock is freed/teared-down.
>
> Suggested-by: Dave Young <[email protected]>
> [Dave Young: checking the md attribute instead of checking the efi_setup]
> Signed-off-by: Ashish Kalra <[email protected]>
> ---
> arch/x86/platform/efi/quirks.c | 23 ++++++++++++++++++++---
> 1 file changed, 20 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c
> index f0cc00032751..2b65b3863912 100644
> --- a/arch/x86/platform/efi/quirks.c
> +++ b/arch/x86/platform/efi/quirks.c
> @@ -255,15 +255,32 @@ void __init efi_arch_mem_reserve(phys_addr_t addr, u64 size)
> struct efi_memory_map_data data = { 0 };
> struct efi_mem_range mr;
> efi_memory_desc_t md;
> - int num_entries;
> + int num_entries, ret;
> void *new;
>
> - if (efi_mem_desc_lookup(addr, &md) ||
> - md.type != EFI_BOOT_SERVICES_DATA) {
> + /*
> + * For kexec use case, we need to use the EFI memmap passed from the first
> + * kernel via setup data, so we need to skip this.
> + * Additionally kexec_enter_virtual_mode() during late init will remap
> + * the efi_memmap physical pages allocated here via memboot & then
> + * subsequently cause random EFI memmap corruption once memblock is freed.
> + */
> +
> + ret = efi_mem_desc_lookup(addr, &md);

Since you are not using ret, why not directly use if (efi_mem_desc_lookup(..))?

> + if (ret) {
> pr_err("Failed to lookup EFI memory descriptor for %pa\n", &addr);
> return;
> }
>
> + if (md.type != EFI_BOOT_SERVICES_DATA) {
> + pr_err("Skip reserving non EFI Boot Service Data memory for %pa\n", &addr);
> + return;
> + }
> +
> + /* Kexec copied the efi memmap from the first kernel, thus skip the case */
> + if (md.attribute & EFI_MEMORY_RUNTIME)
> + return;
> +
> if (addr + size > md.phys_addr + (md.num_pages << EFI_PAGE_SHIFT)) {
> pr_err("Region spans EFI memory descriptors, %pa\n", &addr);
> return;

--
Sathyanarayanan Kuppuswamy
Linux Kernel Developer


Subject: Re: [PATCH v3 3/4] x86/boot/compressed: Skip Video Memory access in Decompressor for SEV-ES/SNP.


On 4/4/24 4:11 PM, Ashish Kalra wrote:
> From: Ashish Kalra <[email protected]>
>
> Accessing guest video memory/RAM during kernel decompressor
> causes guest termination as boot stage2 #VC handler for
> SEV-ES/SNP systems does not support MMIO handling.
>
> This issue is observed with SEV-ES/SNP guest kexec as
> kexec -c adds screen_info to the boot parameters
> passed to the kexec kernel, which causes console output to
> be dumped to both video and serial.
>
> As the decompressor output gets cleared really fast, it is
> preferable to get the console output only on serial, hence,
> skip accessing video RAM during decompressor stage to
> prevent guest termination.
>
> Serial console output during decompressor stage works as
> boot stage2 #VC handler already supports handling port I/O.
>
> Suggested-by: Thomas Lendacy <[email protected]>
> Signed-off-by: Ashish Kalra <[email protected]>
> ---

Looks good to me.

Reviewed-by: Kuppuswamy Sathyanarayanan <[email protected]>

> arch/x86/boot/compressed/misc.c | 6 ++++--
> arch/x86/boot/compressed/misc.h | 1 +
> 2 files changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
> index b70e4a21c15f..47b4db200e1f 100644
> --- a/arch/x86/boot/compressed/misc.c
> +++ b/arch/x86/boot/compressed/misc.c
> @@ -427,8 +427,10 @@ asmlinkage __visible void *extract_kernel(void *rmode, unsigned char *output)
> vidport = 0x3d4;
> }
>
> - lines = boot_params_ptr->screen_info.orig_video_lines;
> - cols = boot_params_ptr->screen_info.orig_video_cols;
> + if (!sev_es_enabled()) {
> + lines = boot_params_ptr->screen_info.orig_video_lines;
> + cols = boot_params_ptr->screen_info.orig_video_cols;
> + }
>
> init_default_io_ops();
>
> diff --git a/arch/x86/boot/compressed/misc.h b/arch/x86/boot/compressed/misc.h
> index b353a7be380c..3c12ca987554 100644
> --- a/arch/x86/boot/compressed/misc.h
> +++ b/arch/x86/boot/compressed/misc.h
> @@ -37,6 +37,7 @@
> #include <asm/desc_defs.h>
>
> #include "tdx.h"
> +#include "sev.h"
>
> #define BOOT_CTYPE_H
> #include <linux/acpi.h>

--
Sathyanarayanan Kuppuswamy
Linux Kernel Developer


Subject: Re: [PATCH v3 2/4] x86/sev: add sev_es_enabled() function.


On 4/4/24 4:11 PM, Ashish Kalra wrote:
> From: Ashish Kalra <[email protected]>
>
> Add sev_es_enabled() function to detect if SEV-ES
> support is enabled.
>
> Signed-off-by: Ashish Kalra <[email protected]>
> ---

Looks good to me.

Reviewed-by: Kuppuswamy Sathyanarayanan <[email protected]>

> arch/x86/boot/compressed/sev.c | 5 +++++
> arch/x86/boot/compressed/sev.h | 2 ++
> 2 files changed, 7 insertions(+)
>
> diff --git a/arch/x86/boot/compressed/sev.c b/arch/x86/boot/compressed/sev.c
> index ec71846d28c9..4ae4cc51e6b8 100644
> --- a/arch/x86/boot/compressed/sev.c
> +++ b/arch/x86/boot/compressed/sev.c
> @@ -134,6 +134,11 @@ bool sev_snp_enabled(void)
> return sev_status & MSR_AMD64_SEV_SNP_ENABLED;
> }
>
> +bool sev_es_enabled(void)
> +{
> + return sev_status & MSR_AMD64_SEV_ES_ENABLED;
> +}
> +
> static void __page_state_change(unsigned long paddr, enum psc_op op)
> {
> u64 val;
> diff --git a/arch/x86/boot/compressed/sev.h b/arch/x86/boot/compressed/sev.h
> index fc725a981b09..5008c80e66e6 100644
> --- a/arch/x86/boot/compressed/sev.h
> +++ b/arch/x86/boot/compressed/sev.h
> @@ -11,11 +11,13 @@
> #ifdef CONFIG_AMD_MEM_ENCRYPT
>
> bool sev_snp_enabled(void);
> +bool sev_es_enabled(void);
> void snp_accept_memory(phys_addr_t start, phys_addr_t end);
>
> #else
>
> static inline bool sev_snp_enabled(void) { return false; }
> +static inline bool sev_es_enabled(void) { return false; }
> static inline void snp_accept_memory(phys_addr_t start, phys_addr_t end) { }
>
> #endif

--
Sathyanarayanan Kuppuswamy
Linux Kernel Developer


2024-04-07 20:46:38

by Kirill A. Shutemov

[permalink] [raw]
Subject: Re: [PATCHv9 00/17] x86/tdx: Add kexec support

On Thu, Apr 04, 2024 at 01:27:47PM -0500, Kalra, Ashish wrote:
> The cover letter mention the inclusion of the following patch - Keep page
> tables that maps E820_TYPE_ACPI (Ashish)
>
> But i don't this patch included in your patch-set.

Ouch. My bad. Will fix in v10.

--
Kiryl Shutsemau / Kirill A. Shutemov