Received: by 2002:a05:6358:53a8:b0:117:f937:c515 with SMTP id z40csp1829006rwe; Sat, 15 Apr 2023 06:48:25 -0700 (PDT) X-Google-Smtp-Source: AKy350Z16dr+5+Q5wstM6YGUaKd3qgcrcg/Axed1hJ/TQYro2hBgpv4aXIzXvYgK+Qwi4FAw9Tii X-Received: by 2002:a17:90a:f98a:b0:246:5986:308 with SMTP id cq10-20020a17090af98a00b0024659860308mr10334908pjb.7.1681566505352; Sat, 15 Apr 2023 06:48:25 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1681566505; cv=none; d=google.com; s=arc-20160816; b=NdJa/W9SU5WS/jjoEdh+XNFRC+6iiBfs5/tZRGVh3xMKzuWe5Jn7DbvPjjO7R4/Y7n my88GqzS2VJdvdk3O1imE8VPbZMcWs/LuXSMFhl7+c7v3yzuj8Qzu6aYGtGVPfeEQT+8 5T3aupWb9B6JkJOgSjIutKyneD2Gtw5dgZu31sri4uxKCU1rSJpMvIipFMG/ra83sVbV UbeVwi8W+AqvpLJXQ3nA9IsNtlqer4J/6oYt43tf/KzssLoTkIQtJUnaYPxsdzZ5bsUK CCW1I2swm3WUyHChYT+nmV/Wzu5eJwMDMZ6Znp2D70C93A5/ppIja301ux6V4VXKf1Pr iNIQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=FVzmiHK64lr1yWOA/+XrXfsMmSyuvLawYPiT7JCvob8=; b=wQ2XiuL/SqP18rDltXDdtZxOBL/u/iepXbHt6d9fQeyCFp2OJe9T9yyAg9J5mbILJ0 qe54p3S01K9qTTgS/BvehQDhBDeH04AZSoizH7eqESX7QGNs8qGis1Jt8Swjyt/U5MfE yJjyIUHyF6VSmTq4kYzJT5+1IEdq05hShJnPSREci1NX81MdtMGydxn8/z4XpnR9ssIn aitH0C660hMv1jFGHoW57XYKHJpWha/2X2NSCpSgHHDZfdmS77emzscvsCtuzVcuvylE emkMoJY3q9NoZWF/jGPHPt0SnDh+wPDB/Gp+ZAHggGgNfKzZck2XS/zo9vJ5kfNlJtZG Lr1w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20221208 header.b=kzN6Qldo; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id f2-20020a17090ab94200b0023f000be67dsi529308pjw.13.2023.04.15.06.48.05; Sat, 15 Apr 2023 06:48:25 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20221208 header.b=kzN6Qldo; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229948AbjDONWt (ORCPT + 99 others); Sat, 15 Apr 2023 09:22:49 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60420 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229752AbjDONWs (ORCPT ); Sat, 15 Apr 2023 09:22:48 -0400 Received: from mail-yb1-xb2f.google.com (mail-yb1-xb2f.google.com [IPv6:2607:f8b0:4864:20::b2f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6F79B210A; Sat, 15 Apr 2023 06:22:46 -0700 (PDT) Received: by mail-yb1-xb2f.google.com with SMTP id r184so5580152ybc.1; Sat, 15 Apr 2023 06:22:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1681564965; x=1684156965; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=FVzmiHK64lr1yWOA/+XrXfsMmSyuvLawYPiT7JCvob8=; b=kzN6Qldowb67XS8W5yIr9TKg8Wd1DqCY2ePREiBWY/NtxgAgDeDo2TRyur9nar8C++ HnTU0sGESuNQrLXZCmp3MbPSMe1ctHBcEzOeudm0xjlT82JqM37TyyV1w921MOnZOt6Z Yl7JLD1auTWmNaBAqeCDaHjO1FCRr0N1PCoLWMJkB962BgyPgRGiSnVZP/zKM+Wii/Z6 avk+V5z1zf7aKn7zz8tC7iPNYYczHe6njW5BJijHx4bRAJcJmA1C+2uvle8w9h0679Xe I+RvsIyQOxeJGPvDgiUZDzz/5uSuqAkakw1Oh0ISxsKLLZT6a+aMlHFulUbAlWle21kk M3lA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1681564965; x=1684156965; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=FVzmiHK64lr1yWOA/+XrXfsMmSyuvLawYPiT7JCvob8=; b=MDFxrTfx4EvQE0lsiOq+4tArUsKMJnOZStqOIR/xbwSw7aQe4Z9+CLcxeIbhqX3ERk hDTeahFdYbeMhU5Gb+Aewm64dNEravWy8hjRq0daG+C7jeTfGXy2pop65j/mnyMoIdMu wy49erxgsPlY3XqyUcYBZkPsxCJzc2ZvZSMGjrpP8pHa1xj2KsTB0DOdbs0QpIx4AXUE arDWaydXNfDjA+VRs0oTkF4lqkL/WIheSxGPxt8VzVBR6L20IYrZNp1wxWKyXjkzuqRe 96QOtelINvdwd67d0h3NDgyunTakFi3Jj5sWY3jP2WI+r9TTUDxjs3gRxd98vnP6HkOe vgWw== X-Gm-Message-State: AAQBX9dJ0m2Zpn918znsC7TQ2Nf5HLBd1ERHIqxxejuOruAHWE83FUj3 n6MTwsxbfqr3vYCUXs8eGPwuItDTikbVXWZYGg== X-Received: by 2002:a25:df94:0:b0:b8b:f5fb:5986 with SMTP id w142-20020a25df94000000b00b8bf5fb5986mr5819247ybg.10.1681564965560; Sat, 15 Apr 2023 06:22:45 -0700 (PDT) MIME-Version: 1.0 References: <20230414225551.858160935@linutronix.de> <20230414232311.379210081@linutronix.de> In-Reply-To: <20230414232311.379210081@linutronix.de> From: Brian Gerst Date: Sat, 15 Apr 2023 09:22:34 -0400 Message-ID: Subject: Re: [patch 35/37] x86/smpboot: Support parallel startup of secondary CPUs To: Thomas Gleixner Cc: LKML , x86@kernel.org, David Woodhouse , Andrew Cooper , Arjan van de Veen , Paolo Bonzini , Paul McKenney , Tom Lendacky , Sean Christopherson , Oleksandr Natalenko , Paul Menzel , "Guilherme G. Piccoli" , Piotr Gorski , David Woodhouse , Usama Arif , Juergen Gross , Boris Ostrovsky , xen-devel@lists.xenproject.org, Russell King , Arnd Bergmann , linux-arm-kernel@lists.infradead.org, Catalin Marinas , Will Deacon , Guo Ren , linux-csky@vger.kernel.org, Thomas Bogendoerfer , linux-mips@vger.kernel.org, "James E.J. Bottomley" , Helge Deller , linux-parisc@vger.kernel.org, Paul Walmsley , Palmer Dabbelt , linux-riscv@lists.infradead.org, Mark Rutland , Sabin Rapan Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Apr 14, 2023 at 7:45=E2=80=AFPM Thomas Gleixner wrote: > > From: David Woodhouse > > Rework the real-mode startup code to allow for APs to be brought up in > parallel. This is in two parts: > > 1. Introduce a bit-spinlock to prevent them from all using the real > mode stack at the same time. > > 2. Avoid needing to use the global smpboot_control variable to pass > each AP its CPU number. > > To achieve the latter, export the cpuid_to_apicid[] array so that each > AP can find its own CPU number by searching therein based on its APIC ID. > > Introduce flags in the top bits of smpboot_control which indicate methods > by which an AP should find its CPU number. For a serialized bringup, the > CPU number is explicitly passed in the low bits of smpboot_control as > before. For parallel mode there are flags directing the AP to find its AP= IC > ID in CPUID leaf 0x0b or 1x1f (for X2APIC mode) or CPUID leaf 0x01 where = 8 > bits are sufficient, then perform the cpuid_to_apicid[] lookup with that. > > Aside from the fact that APs will now look up their CPU number via the > newly-exported cpuid_to_apicid[] table, there is no behavioural change > intended, since the parallel bootup has not yet been enabled. > > [ tglx: Initial proof of concept patch with bitlock and APIC ID lookup ] > [ dwmw2: Rework and testing, commit message, CPUID 0x1 and CPU0 support ] > [ seanc: Fix stray override of initial_gs in common_cpu_up() ] > [ Oleksandr Natalenko: reported suspend/resume issue fixed in > x86_acpi_suspend_lowlevel ] > > Co-developed-by: Thomas Gleixner > Signed-off-by: Thomas Gleixner > Co-developed-by: Brian Gerst > Signed-off-by: Brian Gerst > Signed-off-by: David Woodhouse > Signed-off-by: Usama Arif > Signed-off-by: Thomas Gleixner > --- > arch/x86/include/asm/apic.h | 2 > arch/x86/include/asm/realmode.h | 3 + > arch/x86/include/asm/smp.h | 8 +++ > arch/x86/kernel/acpi/sleep.c | 9 +++ > arch/x86/kernel/apic/apic.c | 2 > arch/x86/kernel/head_64.S | 79 ++++++++++++++++++++++++++++= ++++++- > arch/x86/kernel/smpboot.c | 5 -- > arch/x86/realmode/init.c | 3 + > arch/x86/realmode/rm/trampoline_64.S | 27 +++++++++-- > 9 files changed, 125 insertions(+), 13 deletions(-) > > --- a/arch/x86/include/asm/apic.h > +++ b/arch/x86/include/asm/apic.h > @@ -55,6 +55,8 @@ extern int local_apic_timer_c2_ok; > extern int disable_apic; > extern unsigned int lapic_timer_period; > > +extern int cpuid_to_apicid[]; > + > extern enum apic_intr_mode_id apic_intr_mode; > enum apic_intr_mode_id { > APIC_PIC, > --- a/arch/x86/include/asm/realmode.h > +++ b/arch/x86/include/asm/realmode.h > @@ -52,6 +52,7 @@ struct trampoline_header { > u64 efer; > u32 cr4; > u32 flags; > + u32 lock; > #endif > }; > > @@ -64,6 +65,8 @@ extern unsigned long initial_stack; > extern unsigned long initial_vc_handler; > #endif > > +extern u32 *trampoline_lock; > + > extern unsigned char real_mode_blob[]; > extern unsigned char real_mode_relocs[]; > > --- a/arch/x86/include/asm/smp.h > +++ b/arch/x86/include/asm/smp.h > @@ -198,4 +198,12 @@ extern unsigned int smpboot_control; > > #endif /* !__ASSEMBLY__ */ > > +/* Control bits for startup_64 */ > +#define STARTUP_APICID_CPUID_1F 0x80000000 > +#define STARTUP_APICID_CPUID_0B 0x40000000 > +#define STARTUP_APICID_CPUID_01 0x20000000 > + > +/* Top 8 bits are reserved for control */ > +#define STARTUP_PARALLEL_MASK 0xFF000000 > + > #endif /* _ASM_X86_SMP_H */ > --- a/arch/x86/kernel/acpi/sleep.c > +++ b/arch/x86/kernel/acpi/sleep.c > @@ -16,6 +16,7 @@ > #include > #include > #include > +#include > > #include > #include "../../realmode/rm/wakeup.h" > @@ -127,7 +128,13 @@ int x86_acpi_suspend_lowlevel(void) > * value is in the actual %rsp register. > */ > current->thread.sp =3D (unsigned long)temp_stack + sizeof(temp_st= ack); > - smpboot_control =3D smp_processor_id(); > + /* > + * Ensure the CPU knows which one it is when it comes back, if > + * it isn't in parallel mode and expected to work that out for > + * itself. > + */ > + if (!(smpboot_control & STARTUP_PARALLEL_MASK)) > + smpboot_control =3D smp_processor_id(); > #endif > initial_code =3D (unsigned long)wakeup_long64; > saved_magic =3D 0x123456789abcdef0L; > --- a/arch/x86/kernel/apic/apic.c > +++ b/arch/x86/kernel/apic/apic.c > @@ -2377,7 +2377,7 @@ static int nr_logical_cpuids =3D 1; > /* > * Used to store mapping between logical CPU IDs and APIC IDs. > */ > -static int cpuid_to_apicid[] =3D { > +int cpuid_to_apicid[] =3D { > [0 ... NR_CPUS - 1] =3D -1, > }; > > --- a/arch/x86/kernel/head_64.S > +++ b/arch/x86/kernel/head_64.S > @@ -25,6 +25,7 @@ > #include > #include > #include > +#include > > /* > * We are not able to switch in one step to the final KERNEL ADDRESS SPA= CE > @@ -234,8 +235,70 @@ SYM_INNER_LABEL(secondary_startup_64_no_ > ANNOTATE_NOENDBR // above > > #ifdef CONFIG_SMP > + /* > + * For parallel boot, the APIC ID is retrieved from CPUID, and th= en > + * used to look up the CPU number. For booting a single CPU, the > + * CPU number is encoded in smpboot_control. > + * > + * Bit 31 STARTUP_APICID_CPUID_1F flag (use CPUID 0x1f) > + * Bit 30 STARTUP_APICID_CPUID_0B flag (use CPUID 0x0b) > + * Bit 29 STARTUP_APICID_CPUID_01 flag (use CPUID 0x01) > + * Bit 0-23 CPU# if STARTUP_APICID_CPUID_xx flags are not set > + */ > movl smpboot_control(%rip), %ecx > + testl $STARTUP_APICID_CPUID_1F, %ecx > + jnz .Luse_cpuid_1f > + testl $STARTUP_APICID_CPUID_0B, %ecx > + jnz .Luse_cpuid_0b > + testl $STARTUP_APICID_CPUID_01, %ecx > + jnz .Luse_cpuid_01 > + andl $(~STARTUP_PARALLEL_MASK), %ecx > + jmp .Lsetup_cpu > + > +.Luse_cpuid_01: > + mov $0x01, %eax > + cpuid > + mov %ebx, %edx > + shr $24, %edx > + jmp .Lsetup_AP > + > +.Luse_cpuid_0b: > + mov $0x0B, %eax > + xorl %ecx, %ecx > + cpuid > + jmp .Lsetup_AP > + > +.Luse_cpuid_1f: > + mov $0x1f, %eax > + xorl %ecx, %ecx > + cpuid > > +.Lsetup_AP: > + /* EDX contains the APIC ID of the current CPU */ > + xorq %rcx, %rcx > + leaq cpuid_to_apicid(%rip), %rbx > + > +.Lfind_cpunr: > + cmpl (%rbx,%rcx,4), %edx > + jz .Lsetup_cpu > + inc %ecx > +#ifdef CONFIG_FORCE_NR_CPUS > + cmpl $NR_CPUS, %ecx > +#else > + cmpl nr_cpu_ids(%rip), %ecx > +#endif > + jb .Lfind_cpunr > + > + /* APIC ID not found in the table. Drop the trampoline lock and = bail. */ > + movq trampoline_lock(%rip), %rax > + lock > + btrl $0, (%rax) > + > +1: cli > + hlt > + jmp 1b > + > +.Lsetup_cpu: > /* Get the per cpu offset for the given CPU# which is in ECX */ > movq __per_cpu_offset(,%rcx,8), %rdx > #else > @@ -248,10 +311,20 @@ SYM_INNER_LABEL(secondary_startup_64_no_ > * > * RDX contains the per-cpu offset > */ > - movq pcpu_hot + X86_current_task(%rdx), %rax > - movq TASK_threadsp(%rax), %rsp > + movq pcpu_hot + X86_top_of_stack(%rdx), %rsp Switching to using pcpu_hot.top_of_stack is ok, but it's not completely equivalent. top_of_stack points to the end of the pt_regs structure, while the kernel stack starts below pt_regs even for kernel threads. So you need to subtract PTREGS_SIZE from the stack pointer after this. This change should also be a separate patch. -- Brian Gerst