Received: by 2002:a05:6a10:2726:0:0:0:0 with SMTP id ib38csp977946pxb; Wed, 6 Apr 2022 05:53:07 -0700 (PDT) X-Google-Smtp-Source: ABdhPJykRUxArT9B+FecpMNh6NLIXNFZ/ZDP7gF/CqzJc598bHn+RGeTuYyrHDGLVqMpvLv9prv2 X-Received: by 2002:a63:6f45:0:b0:398:49ba:1037 with SMTP id k66-20020a636f45000000b0039849ba1037mr7072891pgc.601.1649249587080; Wed, 06 Apr 2022 05:53:07 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1649249587; cv=none; d=google.com; s=arc-20160816; b=vQHJQwAvNqIKBK/4PlZuPAtG1sEm/8RBeOjKfRxRYGaqNRpKs3+xlXf186/gR398m3 bMlsVXdsvRPO7EjxglvGJdHPCQ0GOMequCQqqBoZI60EzQXUOiWmFK7r/OspK4PlBivK 7K8XiBPzZ6LqaLB2yt09mjavhgAZ+SnzT2V+sxv/IcMQKOyNjSGKz1ASxFaT/H3ZBTJL Ae8uQM7oFkS1hl6wEB3bRFxRKs7yOFHZ4A97eRs3gXMzRm3PmpkrMhsJCOB7YpZd+ONp tdApIcd56PFz0CAcBUo5m6SxR+ot+7VSuj4Lf8txqqsGgNemVw6WJxJFo5rH8AK5x9ap 0Gmw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=ti/MFZG20LQ7iVYjbZTROTO+Vs43n3r8GF4vU8IY4aE=; b=ns3zh43aCJ+NekFaKuMZCWZu5/gxlpDek8bxFMK3iDfQ/q1RySibEnIiSL2fooGfQS 9qIn1Yo0mMcbtDe11uePcnEbUNkj0AErQF1ye5Of06R/tm27AwJa90uhmKYXnja/liAN XhPa1QdrHpOSI+amygmC/cALxt18awQ8tgtt5BBAkgk3JuSVffn5Z8HVpFF0DbU52Uiu 0FcCWtXCION4WZokEa6LeS3xaKF8C68uqE08SIIiFxpElS7UFZEcXUQkCuzLAY34OQ9p sIhMIgwdv1eopvewKzEXuWcijr0Kcn/XuvfnvY3yX8CBcMfoW6+S1JlfzdiCADY2fZ+t o8Qw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=bi82APc5; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [23.128.96.19]) by mx.google.com with ESMTPS id n2-20020a6563c2000000b003816043f114si15595865pgv.777.2022.04.06.05.53.06 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 06 Apr 2022 05:53:07 -0700 (PDT) Received-SPF: softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) client-ip=23.128.96.19; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=bi82APc5; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email [IPv6:2620:137:e000::1:20]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id DEED4472F32; Wed, 6 Apr 2022 02:52:40 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1454127AbiDFDKg (ORCPT + 99 others); Tue, 5 Apr 2022 23:10:36 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40066 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1847127AbiDFCNH (ORCPT ); Tue, 5 Apr 2022 22:13:07 -0400 Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 61F611AF7C9 for ; Tue, 5 Apr 2022 16:35:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1649201713; x=1680737713; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=LiEB38jGM5YS7hvDp4/Df3hvveP4t2e15On5tJYc3LM=; b=bi82APc5WDrDNt9SVWIIRZEvyEk7QqUxaHnAZmpH3ZZY22O4XASAmlpS 0FleYfT/FByJu1xkx6oLC7scl8xo3Tv2c8KYD8Y8r9WSTgIQ9qWo3OOAz JDeHLs54xhtoZS5qHc4SrkKQP1zzEuKdZZkoq5tTNAv/L1bBppncybYtC PL6LqxPKcayMwGMmc1FxPZZSO0gHBStuRq/VSIFeU3Vl6jjIZOJmO6rf3 aifdzlErZRBrhyCz64fgwn3y4P1E3XoQV2lqPLYPkmEztYnAI3j84zaum Ss/nTOLp7Izzzal8H1bryOy2v8CcJdZameN0SEDngab6qV2HTwTHDH23J Q==; X-IronPort-AV: E=McAfee;i="6200,9189,10308"; a="285854438" X-IronPort-AV: E=Sophos;i="5.90,238,1643702400"; d="scan'208";a="285854438" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Apr 2022 16:34:55 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.90,238,1643702400"; d="scan'208";a="697135351" Received: from black.fi.intel.com ([10.237.72.28]) by fmsmga001.fm.intel.com with ESMTP; 05 Apr 2022 16:34:48 -0700 Received: by black.fi.intel.com (Postfix, from userid 1000) id 879ABC4C; Wed, 6 Apr 2022 02:29:46 +0300 (EEST) From: "Kirill A. Shutemov" To: tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@intel.com, luto@kernel.org, peterz@infradead.org Cc: sathyanarayanan.kuppuswamy@linux.intel.com, aarcange@redhat.com, ak@linux.intel.com, dan.j.williams@intel.com, david@redhat.com, hpa@zytor.com, jgross@suse.com, jmattson@google.com, joro@8bytes.org, jpoimboe@redhat.com, knsathya@kernel.org, pbonzini@redhat.com, sdeep@vmware.com, seanjc@google.com, tony.luck@intel.com, vkuznets@redhat.com, wanpengli@tencent.com, thomas.lendacky@amd.com, brijesh.singh@amd.com, x86@kernel.org, linux-kernel@vger.kernel.org, Sean Christopherson , "Kirill A . Shutemov" Subject: [PATCHv8 20/30] x86/boot: Add a trampoline for booting APs via firmware handoff Date: Wed, 6 Apr 2022 02:29:29 +0300 Message-Id: <20220405232939.73860-21-kirill.shutemov@linux.intel.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220405232939.73860-1-kirill.shutemov@linux.intel.com> References: <20220405232939.73860-1-kirill.shutemov@linux.intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,RDNS_NONE,SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Sean Christopherson Historically, x86 platforms have booted secondary processors (APs) using INIT followed by the start up IPI (SIPI) messages. In regular VMs, this boot sequence is supported by the VMM emulation. But such a wakeup model is fatal for secure VMs like TDX in which VMM is an untrusted entity. To address this issue, a new wakeup model was added in ACPI v6.4, in which firmware (like TDX virtual BIOS) will help boot the APs. More details about this wakeup model can be found in ACPI specification v6.4, the section titled "Multiprocessor Wakeup Structure". Since the existing trampoline code requires processors to boot in real mode with 16-bit addressing, it will not work for this wakeup model (because it boots the AP in 64-bit mode). To handle it, extend the trampoline code to support 64-bit mode firmware handoff. Also, extend IDT and GDT pointers to support 64-bit mode hand off. There is no TDX-specific detection for this new boot method. The kernel will rely on it as the sole boot method whenever the new ACPI structure is present. The ACPI table parser for the MADT multiprocessor wake up structure and the wakeup method that uses this structure will be added by the following patch in this series. Signed-off-by: Sean Christopherson Reviewed-by: Andi Kleen Reviewed-by: Dan Williams Signed-off-by: Kuppuswamy Sathyanarayanan Signed-off-by: Kirill A. Shutemov Reviewed-by: Thomas Gleixner --- arch/x86/include/asm/apic.h | 2 ++ arch/x86/include/asm/realmode.h | 1 + arch/x86/kernel/smpboot.c | 12 ++++++-- arch/x86/realmode/rm/header.S | 1 + arch/x86/realmode/rm/trampoline_64.S | 38 ++++++++++++++++++++++++ arch/x86/realmode/rm/trampoline_common.S | 12 +++++++- 6 files changed, 63 insertions(+), 3 deletions(-) diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h index 48067af94678..35006e151774 100644 --- a/arch/x86/include/asm/apic.h +++ b/arch/x86/include/asm/apic.h @@ -328,6 +328,8 @@ struct apic { /* wakeup_secondary_cpu */ int (*wakeup_secondary_cpu)(int apicid, unsigned long start_eip); + /* wakeup secondary CPU using 64-bit wakeup point */ + int (*wakeup_secondary_cpu_64)(int apicid, unsigned long start_eip); void (*inquire_remote_apic)(int apicid); diff --git a/arch/x86/include/asm/realmode.h b/arch/x86/include/asm/realmode.h index 331474b150f1..fd6f6e5b755a 100644 --- a/arch/x86/include/asm/realmode.h +++ b/arch/x86/include/asm/realmode.h @@ -25,6 +25,7 @@ struct real_mode_header { u32 sev_es_trampoline_start; #endif #ifdef CONFIG_X86_64 + u32 trampoline_start64; u32 trampoline_pgd; #endif /* ACPI S3 wakeup */ diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c index 2ef14772dc04..870cc5d203b1 100644 --- a/arch/x86/kernel/smpboot.c +++ b/arch/x86/kernel/smpboot.c @@ -1082,6 +1082,11 @@ static int do_boot_cpu(int apicid, int cpu, struct task_struct *idle, unsigned long boot_error = 0; unsigned long timeout; +#ifdef CONFIG_X86_64 + /* If 64-bit wakeup method exists, use the 64-bit mode trampoline IP */ + if (apic->wakeup_secondary_cpu_64) + start_ip = real_mode_header->trampoline_start64; +#endif idle->thread.sp = (unsigned long)task_pt_regs(idle); early_gdt_descr.address = (unsigned long)get_cpu_gdt_rw(cpu); initial_code = (unsigned long)start_secondary; @@ -1123,11 +1128,14 @@ static int do_boot_cpu(int apicid, int cpu, struct task_struct *idle, /* * Wake up a CPU in difference cases: - * - Use the method in the APIC driver if it's defined + * - Use a method from the APIC driver if one defined, with wakeup + * straight to 64-bit mode preferred over wakeup to RM. * Otherwise, * - Use an INIT boot APIC message for APs or NMI for BSP. */ - if (apic->wakeup_secondary_cpu) + if (apic->wakeup_secondary_cpu_64) + boot_error = apic->wakeup_secondary_cpu_64(apicid, start_ip); + else if (apic->wakeup_secondary_cpu) boot_error = apic->wakeup_secondary_cpu(apicid, start_ip); else boot_error = wakeup_cpu_via_init_nmi(cpu, start_ip, apicid, diff --git a/arch/x86/realmode/rm/header.S b/arch/x86/realmode/rm/header.S index 8c1db5bf5d78..2eb62be6d256 100644 --- a/arch/x86/realmode/rm/header.S +++ b/arch/x86/realmode/rm/header.S @@ -24,6 +24,7 @@ SYM_DATA_START(real_mode_header) .long pa_sev_es_trampoline_start #endif #ifdef CONFIG_X86_64 + .long pa_trampoline_start64 .long pa_trampoline_pgd; #endif /* ACPI S3 wakeup */ diff --git a/arch/x86/realmode/rm/trampoline_64.S b/arch/x86/realmode/rm/trampoline_64.S index cc8391f86cdb..ae112a91592f 100644 --- a/arch/x86/realmode/rm/trampoline_64.S +++ b/arch/x86/realmode/rm/trampoline_64.S @@ -161,6 +161,19 @@ SYM_CODE_START(startup_32) ljmpl $__KERNEL_CS, $pa_startup_64 SYM_CODE_END(startup_32) +SYM_CODE_START(pa_trampoline_compat) + /* + * In compatibility mode. Prep ESP and DX for startup_32, then disable + * paging and complete the switch to legacy 32-bit mode. + */ + movl $rm_stack_end, %esp + movw $__KERNEL_DS, %dx + + movl $X86_CR0_PE, %eax + movl %eax, %cr0 + ljmpl $__KERNEL32_CS, $pa_startup_32 +SYM_CODE_END(pa_trampoline_compat) + .section ".text64","ax" .code64 .balign 4 @@ -169,6 +182,20 @@ SYM_CODE_START(startup_64) jmpq *tr_start(%rip) SYM_CODE_END(startup_64) +SYM_CODE_START(trampoline_start64) + /* + * APs start here on a direct transfer from 64-bit BIOS with identity + * mapped page tables. Load the kernel's GDT in order to gear down to + * 32-bit mode (to handle 4-level vs. 5-level paging), and to (re)load + * segment registers. Load the zero IDT so any fault triggers a + * shutdown instead of jumping back into BIOS. + */ + lidt tr_idt(%rip) + lgdt tr_gdt64(%rip) + + ljmpl *tr_compat(%rip) +SYM_CODE_END(trampoline_start64) + .section ".rodata","a" # Duplicate the global descriptor table # so the kernel can live anywhere @@ -182,6 +209,17 @@ SYM_DATA_START(tr_gdt) .quad 0x00cf93000000ffff # __KERNEL_DS SYM_DATA_END_LABEL(tr_gdt, SYM_L_LOCAL, tr_gdt_end) +SYM_DATA_START(tr_gdt64) + .short tr_gdt_end - tr_gdt - 1 # gdt limit + .long pa_tr_gdt + .long 0 +SYM_DATA_END(tr_gdt64) + +SYM_DATA_START(tr_compat) + .long pa_trampoline_compat + .short __KERNEL32_CS +SYM_DATA_END(tr_compat) + .bss .balign PAGE_SIZE SYM_DATA(trampoline_pgd, .space PAGE_SIZE) diff --git a/arch/x86/realmode/rm/trampoline_common.S b/arch/x86/realmode/rm/trampoline_common.S index 5033e640f957..4331c32c47f8 100644 --- a/arch/x86/realmode/rm/trampoline_common.S +++ b/arch/x86/realmode/rm/trampoline_common.S @@ -1,4 +1,14 @@ /* SPDX-License-Identifier: GPL-2.0 */ .section ".rodata","a" .balign 16 -SYM_DATA_LOCAL(tr_idt, .fill 1, 6, 0) + +/* + * When a bootloader hands off to the kernel in 32-bit mode an + * IDT with a 2-byte limit and 4-byte base is needed. When a boot + * loader hands off to a kernel 64-bit mode the base address + * extends to 8-bytes. Reserve enough space for either scenario. + */ +SYM_DATA_START_LOCAL(tr_idt) + .short 0 + .quad 0 +SYM_DATA_END(tr_idt) -- 2.35.1