2023-02-02 21:56:57

by Usama Arif

[permalink] [raw]
Subject: [PATCH v6 00/11] Parallel CPU bringup for x86_64

Doing INIT/SIPI/SIPI in parallel brings down the time for smpboot from ~700ms
to 100ms (85% improvement) on a server with 128 CPUs split across 2 NUMA nodes.
The parallel CPU bringup is disabled for all AMD CPUs in this version:
(see discussions: https://lore.kernel.org/all/[email protected]/ and
https://lore.kernel.org/all/[email protected]/).

The last patch reuses the timer calibration from CPU 0 for secondary CPUs
which brings down the time for parallel smpboot from 100ms to 30ms. It is
missing a sign-off from the author, which hopefully Arjan will add.

Changes across versions:
v2: Cut it back to just INIT/SIPI/SIPI in parallel for now, nothing more
v3: Clean up x2apic patch, add MTRR optimisation, lock topology update
in preparation for more parallelisation.
v4: Fixes to the real mode parallelisation patch spotted by SeanC, to
avoid scribbling on initial_gs in common_cpu_up(), and to allow all
24 bits of the physical X2APIC ID to be used. That patch still needs
a Signed-off-by from its original author, who once claimed not to
remember writing it at all. But now we've fixed it, hopefully he'll
admit it now :)
v5: rebase to v6.1 and remeasure performance, disable parallel bringup
for AMD CPUs.
v6: rebase to v6.2-rc6, disabled parallel boot on amd as a cpu bug and
reused timer calibration for secondary CPUs.

Arjan van de Ven (1):
x86/smpboot: reuse timer calibration

David Woodhouse (9):
x86/apic/x2apic: Fix parallel handling of cluster_mask
cpu/hotplug: Move idle_thread_get() to <linux/smpboot.h>
cpu/hotplug: Add dynamic parallel bringup states before
CPUHP_BRINGUP_CPU
x86/smpboot: Reference count on smpboot_setup_warm_reset_vector()
x86/smpboot: Split up native_cpu_up into separate phases and document
them
x86/smpboot: Disable parallel boot for AMD CPUs
x86/smpboot: Send INIT/SIPI/SIPI to secondary CPUs in parallel
x86/mtrr: Avoid repeated save of MTRRs on boot-time CPU bringup
x86/smpboot: Serialize topology updates for secondary bringup

Thomas Gleixner (1):
x86/smpboot: Support parallel startup of secondary CPUs

arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/include/asm/realmode.h | 3 +
arch/x86/include/asm/smp.h | 13 +-
arch/x86/include/asm/topology.h | 2 -
arch/x86/kernel/acpi/sleep.c | 1 +
arch/x86/kernel/apic/apic.c | 2 +-
arch/x86/kernel/apic/x2apic_cluster.c | 108 +++++----
arch/x86/kernel/cpu/amd.c | 11 +
arch/x86/kernel/cpu/common.c | 6 +-
arch/x86/kernel/cpu/mtrr/mtrr.c | 9 +
arch/x86/kernel/head_64.S | 75 ++++++
arch/x86/kernel/smpboot.c | 333 ++++++++++++++++++--------
arch/x86/kernel/tsc.c | 3 +
arch/x86/realmode/init.c | 3 +
arch/x86/realmode/rm/trampoline_64.S | 14 ++
arch/x86/xen/smp_pv.c | 4 +-
include/linux/cpuhotplug.h | 2 +
include/linux/smpboot.h | 7 +
kernel/cpu.c | 27 ++-
kernel/smpboot.c | 2 +-
kernel/smpboot.h | 2 -
21 files changed, 469 insertions(+), 159 deletions(-)

--
2.25.1



2023-02-02 21:57:06

by Usama Arif

[permalink] [raw]
Subject: [PATCH v6 01/11] x86/apic/x2apic: Fix parallel handling of cluster_mask

From: David Woodhouse <[email protected]>

For each CPU being brought up, the alloc_clustermask() function
allocates a new struct cluster_mask just in case it's needed. Then the
target CPU actually runs, and in init_x2apic_ldr() it either uses a
cluster_mask from a previous CPU in the same cluster, or consumes the
"spare" one and sets the global pointer to NULL.

That isn't going to parallelise stunningly well.

Ditch the global variable, let alloc_clustermask() install the struct
*directly* in the per_cpu data for the CPU being brought up. As an
optimisation, actually make it do so for *all* present CPUs in the same
cluster, which means only one iteration over for_each_present_cpu()
instead of doing so repeatedly, once for each CPU.

Now, in fact, there's no point in the 'node' or 'clusterid' members of
the struct cluster_mask, so just kill it and use struct cpumask instead.

This was a harmless "bug" while CPU bringup wasn't actually happening in
parallel. It's about to become less harmless...

Fixes: 023a611748fd5 ("x86/apic/x2apic: Simplify cluster management")
Signed-off-by: David Woodhouse <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
---
arch/x86/kernel/apic/x2apic_cluster.c | 108 +++++++++++++++-----------
1 file changed, 62 insertions(+), 46 deletions(-)

diff --git a/arch/x86/kernel/apic/x2apic_cluster.c b/arch/x86/kernel/apic/x2apic_cluster.c
index e696e22d0531..e116dfaf5922 100644
--- a/arch/x86/kernel/apic/x2apic_cluster.c
+++ b/arch/x86/kernel/apic/x2apic_cluster.c
@@ -9,11 +9,7 @@

#include "local.h"

-struct cluster_mask {
- unsigned int clusterid;
- int node;
- struct cpumask mask;
-};
+#define apic_cluster(apicid) ((apicid) >> 4)

/*
* __x2apic_send_IPI_mask() possibly needs to read
@@ -23,8 +19,7 @@ struct cluster_mask {
static u32 *x86_cpu_to_logical_apicid __read_mostly;

static DEFINE_PER_CPU(cpumask_var_t, ipi_mask);
-static DEFINE_PER_CPU_READ_MOSTLY(struct cluster_mask *, cluster_masks);
-static struct cluster_mask *cluster_hotplug_mask;
+static DEFINE_PER_CPU_READ_MOSTLY(struct cpumask *, cluster_masks);

static int x2apic_acpi_madt_oem_check(char *oem_id, char *oem_table_id)
{
@@ -60,10 +55,10 @@ __x2apic_send_IPI_mask(const struct cpumask *mask, int vector, int apic_dest)

/* Collapse cpus in a cluster so a single IPI per cluster is sent */
for_each_cpu(cpu, tmpmsk) {
- struct cluster_mask *cmsk = per_cpu(cluster_masks, cpu);
+ struct cpumask *cmsk = per_cpu(cluster_masks, cpu);

dest = 0;
- for_each_cpu_and(clustercpu, tmpmsk, &cmsk->mask)
+ for_each_cpu_and(clustercpu, tmpmsk, cmsk)
dest |= x86_cpu_to_logical_apicid[clustercpu];

if (!dest)
@@ -71,7 +66,7 @@ __x2apic_send_IPI_mask(const struct cpumask *mask, int vector, int apic_dest)

__x2apic_send_IPI_dest(dest, vector, APIC_DEST_LOGICAL);
/* Remove cluster CPUs from tmpmask */
- cpumask_andnot(tmpmsk, tmpmsk, &cmsk->mask);
+ cpumask_andnot(tmpmsk, tmpmsk, cmsk);
}

local_irq_restore(flags);
@@ -105,55 +100,76 @@ static u32 x2apic_calc_apicid(unsigned int cpu)

static void init_x2apic_ldr(void)
{
- struct cluster_mask *cmsk = this_cpu_read(cluster_masks);
- u32 cluster, apicid = apic_read(APIC_LDR);
- unsigned int cpu;
+ struct cpumask *cmsk = this_cpu_read(cluster_masks);

- x86_cpu_to_logical_apicid[smp_processor_id()] = apicid;
+ BUG_ON(!cmsk);

- if (cmsk)
- goto update;
-
- cluster = apicid >> 16;
- for_each_online_cpu(cpu) {
- cmsk = per_cpu(cluster_masks, cpu);
- /* Matching cluster found. Link and update it. */
- if (cmsk && cmsk->clusterid == cluster)
- goto update;
+ cpumask_set_cpu(smp_processor_id(), cmsk);
+}
+
+/*
+ * As an optimisation during boot, set the cluster_mask for *all*
+ * present CPUs at once, to prevent *each* of them having to iterate
+ * over the others to find the existing cluster_mask.
+ */
+static void prefill_clustermask(struct cpumask *cmsk, u32 cluster)
+{
+ int cpu;
+
+ for_each_present_cpu(cpu) {
+ u32 apicid = apic->cpu_present_to_apicid(cpu);
+ if (apicid != BAD_APICID && apic_cluster(apicid) == cluster) {
+ struct cpumask **cpu_cmsk = &per_cpu(cluster_masks, cpu);
+
+ BUG_ON(*cpu_cmsk && *cpu_cmsk != cmsk);
+ *cpu_cmsk = cmsk;
+ }
}
- cmsk = cluster_hotplug_mask;
- cmsk->clusterid = cluster;
- cluster_hotplug_mask = NULL;
-update:
- this_cpu_write(cluster_masks, cmsk);
- cpumask_set_cpu(smp_processor_id(), &cmsk->mask);
}

-static int alloc_clustermask(unsigned int cpu, int node)
+static int alloc_clustermask(unsigned int cpu, u32 cluster, int node)
{
+ struct cpumask *cmsk = NULL;
+ unsigned int cpu_i;
+ u32 apicid;
+
if (per_cpu(cluster_masks, cpu))
return 0;
- /*
- * If a hotplug spare mask exists, check whether it's on the right
- * node. If not, free it and allocate a new one.
- */
- if (cluster_hotplug_mask) {
- if (cluster_hotplug_mask->node == node)
- return 0;
- kfree(cluster_hotplug_mask);
+
+ /* For the hotplug case, don't always allocate a new one */
+ if (system_state >= SYSTEM_RUNNING) {
+ for_each_present_cpu(cpu_i) {
+ apicid = apic->cpu_present_to_apicid(cpu_i);
+ if (apicid != BAD_APICID && apic_cluster(apicid) == cluster) {
+ cmsk = per_cpu(cluster_masks, cpu_i);
+ if (cmsk)
+ break;
+ }
+ }
+ }
+ if (!cmsk) {
+ cmsk = kzalloc_node(sizeof(*cmsk), GFP_KERNEL, node);
+ if (!cmsk)
+ return -ENOMEM;
}

- cluster_hotplug_mask = kzalloc_node(sizeof(*cluster_hotplug_mask),
- GFP_KERNEL, node);
- if (!cluster_hotplug_mask)
- return -ENOMEM;
- cluster_hotplug_mask->node = node;
+ per_cpu(cluster_masks, cpu) = cmsk;
+
+ if (system_state < SYSTEM_RUNNING)
+ prefill_clustermask(cmsk, cluster);
+
return 0;
}

static int x2apic_prepare_cpu(unsigned int cpu)
{
- if (alloc_clustermask(cpu, cpu_to_node(cpu)) < 0)
+ u32 phys_apicid = apic->cpu_present_to_apicid(cpu);
+ u32 cluster = apic_cluster(phys_apicid);
+ u32 logical_apicid = (cluster << 16) | (1 << (phys_apicid & 0xf));
+
+ x86_cpu_to_logical_apicid[cpu] = logical_apicid;
+
+ if (alloc_clustermask(cpu, cluster, cpu_to_node(cpu)) < 0)
return -ENOMEM;
if (!zalloc_cpumask_var(&per_cpu(ipi_mask, cpu), GFP_KERNEL))
return -ENOMEM;
@@ -162,10 +178,10 @@ static int x2apic_prepare_cpu(unsigned int cpu)

static int x2apic_dead_cpu(unsigned int dead_cpu)
{
- struct cluster_mask *cmsk = per_cpu(cluster_masks, dead_cpu);
+ struct cpumask *cmsk = per_cpu(cluster_masks, dead_cpu);

if (cmsk)
- cpumask_clear_cpu(dead_cpu, &cmsk->mask);
+ cpumask_clear_cpu(dead_cpu, cmsk);
free_cpumask_var(per_cpu(ipi_mask, dead_cpu));
return 0;
}
--
2.25.1


2023-02-02 21:57:09

by Usama Arif

[permalink] [raw]
Subject: [PATCH v6 05/11] x86/smpboot: Split up native_cpu_up into separate phases and document them

From: David Woodhouse <[email protected]>

There are four logical parts to what native_cpu_up() does on the BSP (or
on the controlling CPU for a later hotplug).

First it actually wakes the AP by sending the INIT/SIPI/SIPI sequence.

Second, it waits for the AP to make it as far as wait_for_master_cpu()
which sets that CPU's bit in cpu_initialized_mask, then sets the bit in
cpu_callout_mask to let the AP proceed through cpu_init().

Then, it waits for the AP to finish cpu_init() and get as far as the
smp_callin() call, which sets that CPU's bit in cpu_callin_mask.

Finally, it does the TSC synchronization and waits for the AP to actually
mark itself online in cpu_online_mask.

This commit should have no behavioural change, but merely splits those
phases out into separate functions so that future commits can make them
happen in parallel for all APs. And adds some comments around them on
both the BSP and AP code paths.

[Usama Arif: fixed rebase conflict]
Signed-off-by: David Woodhouse <[email protected]>
Signed-off-by: Usama Arif <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
---
arch/x86/kernel/smpboot.c | 182 +++++++++++++++++++++++++++-----------
1 file changed, 128 insertions(+), 54 deletions(-)

diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index a19eddcdccc2..fdcf7c08945f 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -206,6 +206,10 @@ static void smp_callin(void)

wmb();

+ /*
+ * This runs the AP through all the cpuhp states to its target
+ * state (CPUHP_ONLINE in the case of serial bringup).
+ */
notify_cpu_starting(cpuid);

/*
@@ -233,17 +237,33 @@ static void notrace start_secondary(void *unused)
load_cr3(swapper_pg_dir);
__flush_tlb_all();
#endif
+ /*
+ * Sync point with do_wait_cpu_initialized(). On boot, all secondary
+ * CPUs reach this stage after receiving INIT/SIPI from do_cpu_up()
+ * in the x86/cpu:kick cpuhp stage. At the start of cpu_init() they
+ * will wait for do_wait_cpu_initialized() to set their bit in
+ * smp_callout_mask to release them.
+ */
cpu_init_secondary();
rcu_cpu_starting(raw_smp_processor_id());
x86_cpuinit.early_percpu_clock_init();
+
+ /*
+ * Sync point with do_wait_cpu_callin(). The AP doesn't wait here
+ * but just sets the bit to let the controlling CPU (BSP) know that
+ * it's got this far.
+ */
smp_callin();

enable_start_cpu0 = 0;

/* otherwise gcc will move up smp_processor_id before the cpu_init */
barrier();
+
/*
- * Check TSC synchronization with the boot CPU:
+ * Check TSC synchronization with the boot CPU (or whichever CPU
+ * is controlling the bringup). It will do its part of this from
+ * do_wait_cpu_online(), making it an implicit sync point.
*/
check_tsc_sync_target();

@@ -256,6 +276,7 @@ static void notrace start_secondary(void *unused)
* half valid vector space.
*/
lock_vector_lock();
+ /* Sync point with do_wait_cpu_online() */
set_cpu_online(smp_processor_id(), true);
lapic_online();
unlock_vector_lock();
@@ -1085,7 +1106,6 @@ static int do_boot_cpu(int apicid, int cpu, struct task_struct *idle,
unsigned long start_ip = real_mode_header->trampoline_start;

unsigned long boot_error = 0;
- unsigned long timeout;

#ifdef CONFIG_X86_64
/* If 64-bit wakeup method exists, use the 64-bit mode trampoline IP */
@@ -1146,55 +1166,94 @@ static int do_boot_cpu(int apicid, int cpu, struct task_struct *idle,
boot_error = wakeup_cpu_via_init_nmi(cpu, start_ip, apicid,
cpu0_nmi_registered);

- if (!boot_error) {
- /*
- * Wait 10s total for first sign of life from AP
- */
- boot_error = -1;
- timeout = jiffies + 10*HZ;
- while (time_before(jiffies, timeout)) {
- if (cpumask_test_cpu(cpu, cpu_initialized_mask)) {
- /*
- * Tell AP to proceed with initialization
- */
- cpumask_set_cpu(cpu, cpu_callout_mask);
- boot_error = 0;
- break;
- }
- schedule();
- }
- }
+ return boot_error;
+}

- if (!boot_error) {
- /*
- * Wait till AP completes initial initialization
- */
- while (!cpumask_test_cpu(cpu, cpu_callin_mask)) {
- /*
- * Allow other tasks to run while we wait for the
- * AP to come online. This also gives a chance
- * for the MTRR work(triggered by the AP coming online)
- * to be completed in the stop machine context.
- */
- schedule();
- }
+static int do_wait_cpu_cpumask(unsigned int cpu, const struct cpumask *mask)
+{
+ unsigned long timeout;
+
+ /*
+ * Wait up to 10s for the CPU to report in.
+ */
+ timeout = jiffies + 10*HZ;
+ while (time_before(jiffies, timeout)) {
+ if (cpumask_test_cpu(cpu, mask))
+ return 0;
+
+ schedule();
}
+ return -1;
+}

- if (x86_platform.legacy.warm_reset) {
- /*
- * Cleanup possible dangling ends...
- */
- smpboot_restore_warm_reset_vector();
+/*
+ * Bringup step two: Wait for the target AP to reach cpu_init_secondary()
+ * and thus wait_for_master_cpu(), then set cpu_callout_mask to allow it
+ * to proceed. The AP will then proceed past setting its 'callin' bit
+ * and end up waiting in check_tsc_sync_target() until we reach
+ * do_wait_cpu_online() to tend to it.
+ */
+static int do_wait_cpu_initialized(unsigned int cpu)
+{
+ /*
+ * Wait for first sign of life from AP.
+ */
+ if (do_wait_cpu_cpumask(cpu, cpu_initialized_mask))
+ return -1;
+
+ cpumask_set_cpu(cpu, cpu_callout_mask);
+ return 0;
+}
+
+/*
+ * Bringup step three: Wait for the target AP to reach smp_callin().
+ * The AP is not waiting for us here so we don't need to parallelise
+ * this step. Not entirely clear why we care about this, since we just
+ * proceed directly to TSC synchronization which is the next sync
+ * point with the AP anyway.
+ */
+static int do_wait_cpu_callin(unsigned int cpu)
+{
+ /*
+ * Wait till AP completes initial initialization.
+ */
+ return do_wait_cpu_cpumask(cpu, cpu_callin_mask);
+}
+
+/*
+ * Bringup step four: Synchronize the TSC and wait for the target AP
+ * to reach set_cpu_online() in start_secondary().
+ */
+static int do_wait_cpu_online(unsigned int cpu)
+{
+ unsigned long flags;
+
+ /*
+ * Check TSC synchronization with the AP (keep irqs disabled
+ * while doing so):
+ */
+ local_irq_save(flags);
+ check_tsc_sync_source(cpu);
+ local_irq_restore(flags);
+
+ /*
+ * Wait for the AP to mark itself online. Not entirely
+ * clear why we care, since the generic cpuhp code will
+ * wait for it to each CPUHP_AP_ONLINE_IDLE before going
+ * ahead with the rest of the bringup anyway.
+ */
+ while (!cpu_online(cpu)) {
+ cpu_relax();
+ touch_nmi_watchdog();
}

- return boot_error;
+ return 0;
}

-int native_cpu_up(unsigned int cpu, struct task_struct *tidle)
+static int do_cpu_up(unsigned int cpu, struct task_struct *tidle)
{
int apicid = apic->cpu_present_to_apicid(cpu);
int cpu0_nmi_registered = 0;
- unsigned long flags;
int err, ret = 0;

lockdep_assert_irqs_enabled();
@@ -1241,19 +1300,6 @@ int native_cpu_up(unsigned int cpu, struct task_struct *tidle)
goto unreg_nmi;
}

- /*
- * Check TSC synchronization with the AP (keep irqs disabled
- * while doing so):
- */
- local_irq_save(flags);
- check_tsc_sync_source(cpu);
- local_irq_restore(flags);
-
- while (!cpu_online(cpu)) {
- cpu_relax();
- touch_nmi_watchdog();
- }
-
unreg_nmi:
/*
* Clean up the nmi handler. Do this after the callin and callout sync
@@ -1265,6 +1311,34 @@ int native_cpu_up(unsigned int cpu, struct task_struct *tidle)
return ret;
}

+int native_cpu_up(unsigned int cpu, struct task_struct *tidle)
+{
+ int ret;
+
+ ret = do_cpu_up(cpu, tidle);
+ if (ret)
+ return ret;
+
+ ret = do_wait_cpu_initialized(cpu);
+ if (ret)
+ return ret;
+
+ ret = do_wait_cpu_callin(cpu);
+ if (ret)
+ return ret;
+
+ ret = do_wait_cpu_online(cpu);
+
+ if (x86_platform.legacy.warm_reset) {
+ /*
+ * Cleanup possible dangling ends...
+ */
+ smpboot_restore_warm_reset_vector();
+ }
+
+ return ret;
+}
+
/**
* arch_disable_smp_support() - disables SMP support for x86 at runtime
*/
--
2.25.1


2023-02-02 21:57:13

by Usama Arif

[permalink] [raw]
Subject: [PATCH v6 10/11] x86/smpboot: Serialize topology updates for secondary bringup

From: David Woodhouse <[email protected]>

If we bring up secondaries in parallel they might get confused unless we
impose some ordering here:

[ 1.360149] x86: Booting SMP configuration:
[ 1.360221] .... node #0, CPUs: #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 #11 #12 #13 #14 #15 #16 #17 #18 #19 #20 #21 #22 #23
[ 1.366225] .... node #1, CPUs: #24 #25 #26 #27 #28 #29 #30 #31 #32 #33 #34 #35 #36 #37 #38 #39 #40 #41 #42 #43 #44 #45 #46 #47
[ 1.370219] .... node #0, CPUs: #48 #49 #50 #51 #52 #53 #54 #55 #56 #57 #58 #59 #60 #61 #62 #63 #64 #65 #66 #67 #68 #69 #70 #71
[ 1.378226] .... node #1, CPUs: #72 #73 #74 #75 #76 #77 #78 #79 #80 #81 #82 #83 #84 #85 #86 #87 #88 #89 #90 #91 #92 #93 #94 #95
[ 1.382037] Brought 96 CPUs to x86/cpu:kick in 72232606 cycles
[ 0.104104] smpboot: CPU 26 Converting physical 0 to logical die 1
[ 0.104104] smpboot: CPU 27 Converting physical 1 to logical package 2
[ 0.104104] smpboot: CPU 24 Converting physical 1 to logical package 3
[ 0.104104] smpboot: CPU 27 Converting physical 0 to logical die 2
[ 0.104104] smpboot: CPU 25 Converting physical 1 to logical package 4
[ 1.385609] Brought 96 CPUs to x86/cpu:wait-init in 9269218 cycles
[ 1.395285] Brought CPUs online in 28930764 cycles
[ 1.395469] smp: Brought up 2 nodes, 96 CPUs
[ 1.395689] smpboot: Max logical packages: 2
[ 1.396222] smpboot: Total of 96 processors activated (576000.00 BogoMIPS)

Do the full topology update in smp_store_cpu_info() under a spinlock
to ensure that things remain consistent.

[Usama Arif: fixed rebase conflict]
Signed-off-by: David Woodhouse <[email protected]>
Signed-off-by: Usama Arif <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
---
arch/x86/include/asm/smp.h | 4 +-
arch/x86/include/asm/topology.h | 2 -
arch/x86/kernel/cpu/common.c | 6 +--
arch/x86/kernel/smpboot.c | 73 ++++++++++++++++++++-------------
arch/x86/xen/smp_pv.c | 4 +-
5 files changed, 48 insertions(+), 41 deletions(-)

diff --git a/arch/x86/include/asm/smp.h b/arch/x86/include/asm/smp.h
index 9b3c0cd35f5f..1760aa269673 100644
--- a/arch/x86/include/asm/smp.h
+++ b/arch/x86/include/asm/smp.h
@@ -48,8 +48,6 @@ struct smp_ops {
};

/* Globals due to paravirt */
-extern void set_cpu_sibling_map(int cpu);
-
#ifdef CONFIG_SMP
extern struct smp_ops smp_ops;

@@ -137,7 +135,7 @@ void native_send_call_func_single_ipi(int cpu);
void x86_idle_thread_init(unsigned int cpu, struct task_struct *idle);

void smp_store_boot_cpu_info(void);
-void smp_store_cpu_info(int id);
+void smp_store_cpu_info(int id, bool force_single_core);

asmlinkage __visible void smp_reboot_interrupt(void);
__visible void smp_reschedule_interrupt(struct pt_regs *regs);
diff --git a/arch/x86/include/asm/topology.h b/arch/x86/include/asm/topology.h
index 458c891a8273..4bccbd949a99 100644
--- a/arch/x86/include/asm/topology.h
+++ b/arch/x86/include/asm/topology.h
@@ -136,8 +136,6 @@ static inline int topology_max_smt_threads(void)
return __max_smt_threads;
}

-int topology_update_package_map(unsigned int apicid, unsigned int cpu);
-int topology_update_die_map(unsigned int dieid, unsigned int cpu);
int topology_phys_to_logical_pkg(unsigned int pkg);
int topology_phys_to_logical_die(unsigned int die, unsigned int cpu);
bool topology_is_primary_thread(unsigned int cpu);
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 9cfca3d7d0e2..cf1a4eff7a76 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1766,7 +1766,7 @@ static void generic_identify(struct cpuinfo_x86 *c)
* Validate that ACPI/mptables have the same information about the
* effective APIC id and update the package map.
*/
-static void validate_apic_and_package_id(struct cpuinfo_x86 *c)
+static void validate_apic_id(struct cpuinfo_x86 *c)
{
#ifdef CONFIG_SMP
unsigned int apicid, cpu = smp_processor_id();
@@ -1777,8 +1777,6 @@ static void validate_apic_and_package_id(struct cpuinfo_x86 *c)
pr_err(FW_BUG "CPU%u: APIC id mismatch. Firmware: %x APIC: %x\n",
cpu, apicid, c->initial_apicid);
}
- BUG_ON(topology_update_package_map(c->phys_proc_id, cpu));
- BUG_ON(topology_update_die_map(c->cpu_die_id, cpu));
#else
c->logical_proc_id = 0;
#endif
@@ -1969,7 +1967,7 @@ void identify_secondary_cpu(struct cpuinfo_x86 *c)
#ifdef CONFIG_X86_32
enable_sep_cpu();
#endif
- validate_apic_and_package_id(c);
+ validate_apic_id(c);
x86_spec_ctrl_setup_ap();
update_srbds_msr();

diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 77c509e95571..030eaa84aea0 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -182,16 +182,12 @@ static void smp_callin(void)
apic_ap_setup();

/*
- * Save our processor parameters. Note: this information
- * is needed for clock calibration.
- */
- smp_store_cpu_info(cpuid);
-
- /*
+ * Save our processor parameters and update topology.
+ * Note: this information is needed for clock calibration.
* The topology information must be up to date before
* calibrate_delay() and notify_cpu_starting().
*/
- set_cpu_sibling_map(raw_smp_processor_id());
+ smp_store_cpu_info(cpuid, false);

ap_init_aperfmperf();

@@ -246,6 +242,12 @@ static void notrace start_secondary(void *unused)
* smp_callout_mask to release them.
*/
cpu_init_secondary();
+
+ /*
+ * Even though notify_cpu_starting() will do this, it does so too late
+ * as the AP may already have triggered lockdep splats by then. See
+ * commit 29368e093 ("x86/smpboot: Move rcu_cpu_starting() earlier").
+ */
rcu_cpu_starting(raw_smp_processor_id());
x86_cpuinit.early_percpu_clock_init();

@@ -354,7 +356,7 @@ EXPORT_SYMBOL(topology_phys_to_logical_die);
* @pkg: The physical package id as retrieved via CPUID
* @cpu: The cpu for which this is updated
*/
-int topology_update_package_map(unsigned int pkg, unsigned int cpu)
+static int topology_update_package_map(unsigned int pkg, unsigned int cpu)
{
int new;

@@ -377,7 +379,7 @@ int topology_update_package_map(unsigned int pkg, unsigned int cpu)
* @die: The die id as retrieved via CPUID
* @cpu: The cpu for which this is updated
*/
-int topology_update_die_map(unsigned int die, unsigned int cpu)
+static int topology_update_die_map(unsigned int die, unsigned int cpu)
{
int new;

@@ -408,25 +410,7 @@ void __init smp_store_boot_cpu_info(void)
c->initialized = true;
}

-/*
- * The bootstrap kernel entry code has set these up. Save them for
- * a given CPU
- */
-void smp_store_cpu_info(int id)
-{
- struct cpuinfo_x86 *c = &cpu_data(id);
-
- /* Copy boot_cpu_data only on the first bringup */
- if (!c->initialized)
- *c = boot_cpu_data;
- c->cpu_index = id;
- /*
- * During boot time, CPU0 has this setup already. Save the info when
- * bringing up AP or offlined CPU0.
- */
- identify_secondary_cpu(c);
- c->initialized = true;
-}
+static arch_spinlock_t topology_lock = __ARCH_SPIN_LOCK_UNLOCKED;

static bool
topology_same_node(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
@@ -632,7 +616,7 @@ static struct sched_domain_topology_level x86_topology[] = {
*/
static bool x86_has_numa_in_package;

-void set_cpu_sibling_map(int cpu)
+static void set_cpu_sibling_map(int cpu)
{
bool has_smt = smp_num_siblings > 1;
bool has_mp = has_smt || boot_cpu_data.x86_max_cores > 1;
@@ -711,6 +695,37 @@ void set_cpu_sibling_map(int cpu)
}
}

+/*
+ * The bootstrap kernel entry code has set these up. Save them for
+ * a given CPU
+ */
+void smp_store_cpu_info(int id, bool force_single_core)
+{
+ struct cpuinfo_x86 *c = &cpu_data(id);
+
+ /* Copy boot_cpu_data only on the first bringup */
+ if (!c->initialized)
+ *c = boot_cpu_data;
+ c->cpu_index = id;
+ /*
+ * During boot time, CPU0 has this setup already. Save the info when
+ * bringing up AP or offlined CPU0.
+ */
+ identify_secondary_cpu(c);
+
+ arch_spin_lock(&topology_lock);
+ BUG_ON(topology_update_package_map(c->phys_proc_id, id));
+ BUG_ON(topology_update_die_map(c->cpu_die_id, id));
+ c->initialized = true;
+
+ /* For Xen PV */
+ if (force_single_core)
+ c->x86_max_cores = 1;
+
+ set_cpu_sibling_map(id);
+ arch_spin_unlock(&topology_lock);
+}
+
/* maps the cpu to the sched domain representing multi-core */
const struct cpumask *cpu_coregroup_mask(int cpu)
{
diff --git a/arch/x86/xen/smp_pv.c b/arch/x86/xen/smp_pv.c
index 6175f2c5c822..09f94f940689 100644
--- a/arch/x86/xen/smp_pv.c
+++ b/arch/x86/xen/smp_pv.c
@@ -71,9 +71,7 @@ static void cpu_bringup(void)
xen_enable_syscall();
}
cpu = smp_processor_id();
- smp_store_cpu_info(cpu);
- cpu_data(cpu).x86_max_cores = 1;
- set_cpu_sibling_map(cpu);
+ smp_store_cpu_info(cpu, true);

speculative_store_bypass_ht_init();

--
2.25.1


2023-02-02 21:57:16

by Usama Arif

[permalink] [raw]
Subject: [PATCH v6 08/11] x86/smpboot: Send INIT/SIPI/SIPI to secondary CPUs in parallel

From: David Woodhouse <[email protected]>

When the APs can find their own APIC ID without assistance, we can do
the AP bringup in parallel.

Register a CPUHP_BP_PARALLEL_DYN stage "x86/cpu:kick" which just calls
do_boot_cpu() to deliver INIT/SIPI/SIPI to each AP in turn before the
normal native_cpu_up() does the rest of the hand-holding.

The APs will then take turns through the real mode code (which has its
own bitlock for exclusion) until they make it to their own stack, then
proceed through the first few lines of start_secondary() and execute
these parts in parallel:

start_secondary()
-> cr4_init()
-> (some 32-bit only stuff so not in the parallel cases)
-> cpu_init_secondary()
-> cpu_init_exception_handling()
-> cpu_init()
-> wait_for_master_cpu()

At this point they wait for the BSP to set their bit in cpu_callout_mask
(from do_wait_cpu_initialized()), and release them to continue through
the rest of cpu_init() and beyond.

This reduces the time taken for bringup on my 28-thread Haswell system
from about 120ms to 80ms. On a socket 96-thread Skylake it takes the
bringup time from 500ms to 100ms.

There is more speedup to be had by doing the remaining parts in parallel
too — especially notify_cpu_starting() in which the AP takes itself
through all the stages from CPUHP_BRINGUP_CPU to CPUHP_ONLINE. But those
require careful auditing to ensure they are reentrant, before we can go
that far.

[Usama Arif: fixed rebase conflict]
Signed-off-by: David Woodhouse <[email protected]>
Signed-off-by: Usama Arif <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
---
arch/x86/kernel/smpboot.c | 20 +++++++++++++++++---
1 file changed, 17 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 656897b055f5..77c509e95571 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -57,6 +57,7 @@
#include <linux/pgtable.h>
#include <linux/overflow.h>
#include <linux/stackprotector.h>
+#include <linux/smpboot.h>

#include <asm/acpi.h>
#include <asm/cacheinfo.h>
@@ -1330,9 +1331,12 @@ int native_cpu_up(unsigned int cpu, struct task_struct *tidle)
{
int ret;

- ret = do_cpu_up(cpu, tidle);
- if (ret)
- return ret;
+ /* If parallel AP bringup isn't enabled, perform the first steps now. */
+ if (!do_parallel_bringup) {
+ ret = do_cpu_up(cpu, tidle);
+ if (ret)
+ return ret;
+ }

ret = do_wait_cpu_initialized(cpu);
if (ret)
@@ -1354,6 +1358,12 @@ int native_cpu_up(unsigned int cpu, struct task_struct *tidle)
return ret;
}

+/* Bringup step one: Send INIT/SIPI to the target AP */
+static int native_cpu_kick(unsigned int cpu)
+{
+ return do_cpu_up(cpu, idle_thread_get(cpu));
+}
+
/**
* arch_disable_smp_support() - disables SMP support for x86 at runtime
*/
@@ -1550,6 +1560,10 @@ void __init native_smp_prepare_cpus(unsigned int max_cpus)
do_parallel_bringup = false;
}

+ if (do_parallel_bringup)
+ cpuhp_setup_state_nocalls(CPUHP_BP_PARALLEL_DYN, "x86/cpu:kick",
+ native_cpu_kick, NULL);
+
snp_set_wakeup_secondary_cpu();
}

--
2.25.1


2023-02-02 21:57:29

by Usama Arif

[permalink] [raw]
Subject: [PATCH v6 04/11] x86/smpboot: Reference count on smpboot_setup_warm_reset_vector()

From: David Woodhouse <[email protected]>

If we want to do parallel CPU bringup, we're going to need to set this up
and leave it until all CPUs are done. Might as well use the RTC spinlock
to protect the refcount, as we need to take it anyway.

[Usama Arif: fixed rebase conflict]
Signed-off-by: David Woodhouse <[email protected]>
Signed-off-by: Usama Arif <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
---
arch/x86/kernel/smpboot.c | 21 ++++++++++++++-------
1 file changed, 14 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 55cad72715d9..a19eddcdccc2 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -121,17 +121,22 @@ int arch_update_cpu_topology(void)
return retval;
}

+
+static unsigned int smpboot_warm_reset_vector_count;
+
static inline void smpboot_setup_warm_reset_vector(unsigned long start_eip)
{
unsigned long flags;

spin_lock_irqsave(&rtc_lock, flags);
- CMOS_WRITE(0xa, 0xf);
+ if (!smpboot_warm_reset_vector_count++) {
+ CMOS_WRITE(0xa, 0xf);
+ *((volatile unsigned short *)phys_to_virt(TRAMPOLINE_PHYS_HIGH)) =
+ start_eip >> 4;
+ *((volatile unsigned short *)phys_to_virt(TRAMPOLINE_PHYS_LOW)) =
+ start_eip & 0xf;
+ }
spin_unlock_irqrestore(&rtc_lock, flags);
- *((volatile unsigned short *)phys_to_virt(TRAMPOLINE_PHYS_HIGH)) =
- start_eip >> 4;
- *((volatile unsigned short *)phys_to_virt(TRAMPOLINE_PHYS_LOW)) =
- start_eip & 0xf;
}

static inline void smpboot_restore_warm_reset_vector(void)
@@ -143,10 +148,12 @@ static inline void smpboot_restore_warm_reset_vector(void)
* to default values.
*/
spin_lock_irqsave(&rtc_lock, flags);
- CMOS_WRITE(0, 0xf);
+ if (!--smpboot_warm_reset_vector_count) {
+ CMOS_WRITE(0, 0xf);
+ *((volatile u32 *)phys_to_virt(TRAMPOLINE_PHYS_LOW)) = 0;
+ }
spin_unlock_irqrestore(&rtc_lock, flags);

- *((volatile u32 *)phys_to_virt(TRAMPOLINE_PHYS_LOW)) = 0;
}

/*
--
2.25.1


2023-02-02 21:57:34

by Usama Arif

[permalink] [raw]
Subject: [PATCH v6 02/11] cpu/hotplug: Move idle_thread_get() to <linux/smpboot.h>

From: David Woodhouse <[email protected]>

Instead of relying purely on the special-case wrapper in bringup_cpu()
to pass the idle thread to __cpu_up(), expose idle_thread_get() so that
the architecture code can obtain it directly when necessary.

This will be useful when the existing __cpu_up() is split into multiple
phases, only *one* of which will actually need the idle thread.

If the architecture code is to register its new pre-bringup states with
the cpuhp core, having a special-case wrapper to pass extra arguments is
non-trivial and it's easier just to let the arch register its function
pointer to be invoked with the standard API.

Signed-off-by: David Woodhouse <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
---
include/linux/smpboot.h | 7 +++++++
kernel/smpboot.h | 2 --
2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/include/linux/smpboot.h b/include/linux/smpboot.h
index 9d1bc65d226c..3862addcaa34 100644
--- a/include/linux/smpboot.h
+++ b/include/linux/smpboot.h
@@ -5,6 +5,13 @@
#include <linux/types.h>

struct task_struct;
+
+#ifdef CONFIG_GENERIC_SMP_IDLE_THREAD
+struct task_struct *idle_thread_get(unsigned int cpu);
+#else
+static inline struct task_struct *idle_thread_get(unsigned int cpu) { return NULL; }
+#endif
+
/* Cookie handed to the thread_fn*/
struct smpboot_thread_data;

diff --git a/kernel/smpboot.h b/kernel/smpboot.h
index 34dd3d7ba40b..60c609318ad6 100644
--- a/kernel/smpboot.h
+++ b/kernel/smpboot.h
@@ -5,11 +5,9 @@
struct task_struct;

#ifdef CONFIG_GENERIC_SMP_IDLE_THREAD
-struct task_struct *idle_thread_get(unsigned int cpu);
void idle_thread_set_boot_cpu(void);
void idle_threads_init(void);
#else
-static inline struct task_struct *idle_thread_get(unsigned int cpu) { return NULL; }
static inline void idle_thread_set_boot_cpu(void) { }
static inline void idle_threads_init(void) { }
#endif
--
2.25.1


2023-02-02 21:57:38

by Usama Arif

[permalink] [raw]
Subject: [PATCH v6 03/11] cpu/hotplug: Add dynamic parallel bringup states before CPUHP_BRINGUP_CPU

From: David Woodhouse <[email protected]>

If the platform registers these states, bring all CPUs to each registered
state in turn, before the final bringup to CPUHP_BRINGUP_CPU. This allows
the architecture to parallelise the slow asynchronous tasks like sending
INIT/SIPI and waiting for the AP to come to life.

There is a subtlety here: even with an empty CPUHP_BP_PARALLEL_DYN step,
this means that *all* CPUs are brought through the prepare states and to
CPUHP_BP_PREPARE_DYN before any of them are taken to CPUHP_BRINGUP_CPU
and then are allowed to run for themselves to CPUHP_ONLINE.

So any combination of prepare/start calls which depend on A-B ordering
for each CPU in turn, such as the X2APIC code which used to allocate a
cluster mask 'just in case' and store it in a global variable in the
prep stage, then potentially consume that preallocated structure from
the AP and set the global pointer to NULL to be reallocated in
CPUHP_X2APIC_PREPARE for the next CPU... would explode horribly.

We believe that X2APIC was the only such case, for x86. But this is why
it remains an architecture opt-in. For now.

Note that the new parallel stages do *not* yet bring each AP to the
CPUHP_BRINGUP_CPU state. The final loop in bringup_nonboot_cpus() is
untouched, bringing each AP in turn from the final PARALLEL_DYN state
(or all the way from CPUHP_OFFLINE) to CPUHP_BRINGUP_CPU and then
waiting for that AP to do its own processing and reach CPUHP_ONLINE
before releasing the next. Parallelising that part by bringing them all
to CPUHP_BRINGUP_CPU and then waiting for them all is an exercise for
the future.

Signed-off-by: David Woodhouse <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
---
include/linux/cpuhotplug.h | 2 ++
kernel/cpu.c | 27 +++++++++++++++++++++++++--
2 files changed, 27 insertions(+), 2 deletions(-)

diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
index 6c6859bfc454..e5a73ae6ccc0 100644
--- a/include/linux/cpuhotplug.h
+++ b/include/linux/cpuhotplug.h
@@ -133,6 +133,8 @@ enum cpuhp_state {
CPUHP_MIPS_SOC_PREPARE,
CPUHP_BP_PREPARE_DYN,
CPUHP_BP_PREPARE_DYN_END = CPUHP_BP_PREPARE_DYN + 20,
+ CPUHP_BP_PARALLEL_DYN,
+ CPUHP_BP_PARALLEL_DYN_END = CPUHP_BP_PARALLEL_DYN + 4,
CPUHP_BRINGUP_CPU,

/*
diff --git a/kernel/cpu.c b/kernel/cpu.c
index 6c0a92ca6bb5..5a8f1a93b57c 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -1505,6 +1505,24 @@ int bringup_hibernate_cpu(unsigned int sleep_cpu)
void bringup_nonboot_cpus(unsigned int setup_max_cpus)
{
unsigned int cpu;
+ int n = setup_max_cpus - num_online_cpus();
+
+ /* ∀ parallel pre-bringup state, bring N CPUs to it */
+ if (n > 0) {
+ enum cpuhp_state st = CPUHP_BP_PARALLEL_DYN;
+
+ while (st <= CPUHP_BP_PARALLEL_DYN_END &&
+ cpuhp_hp_states[st].name) {
+ int i = n;
+
+ for_each_present_cpu(cpu) {
+ cpu_up(cpu, st);
+ if (!--i)
+ break;
+ }
+ st++;
+ }
+ }

for_each_present_cpu(cpu) {
if (num_online_cpus() >= setup_max_cpus)
@@ -1882,6 +1900,10 @@ static int cpuhp_reserve_state(enum cpuhp_state state)
step = cpuhp_hp_states + CPUHP_BP_PREPARE_DYN;
end = CPUHP_BP_PREPARE_DYN_END;
break;
+ case CPUHP_BP_PARALLEL_DYN:
+ step = cpuhp_hp_states + CPUHP_BP_PARALLEL_DYN;
+ end = CPUHP_BP_PARALLEL_DYN_END;
+ break;
default:
return -EINVAL;
}
@@ -1906,14 +1928,15 @@ static int cpuhp_store_callbacks(enum cpuhp_state state, const char *name,
/*
* If name is NULL, then the state gets removed.
*
- * CPUHP_AP_ONLINE_DYN and CPUHP_BP_PREPARE_DYN are handed out on
+ * CPUHP_AP_ONLINE_DYN and CPUHP_BP_P*_DYN are handed out on
* the first allocation from these dynamic ranges, so the removal
* would trigger a new allocation and clear the wrong (already
* empty) state, leaving the callbacks of the to be cleared state
* dangling, which causes wreckage on the next hotplug operation.
*/
if (name && (state == CPUHP_AP_ONLINE_DYN ||
- state == CPUHP_BP_PREPARE_DYN)) {
+ state == CPUHP_BP_PREPARE_DYN ||
+ state == CPUHP_BP_PARALLEL_DYN)) {
ret = cpuhp_reserve_state(state);
if (ret < 0)
return ret;
--
2.25.1


2023-02-02 21:57:42

by Usama Arif

[permalink] [raw]
Subject: [PATCH v6 06/11] x86/smpboot: Support parallel startup of secondary CPUs

From: Thomas Gleixner <[email protected]>

To allow for parallel AP bringup, we need to avoid the use of global
variables for passing information to the APs, as well as preventing them
from all trying to use the same real-mode stack simultaneously.

So, introduce a 'lock' field in struct trampoline_header to use as a
simple bit-spinlock for the real-mode stack. That lock also protects
the global variables initial_gs, initial_stack and early_gdt_descr,
which can now be calculated...

So how do we calculate those addresses? Well, they they can all be found
from the per_cpu data for this CPU. Simples! Except... how does it know
what its CPU# is? OK, we export the cpuid_to_apicid[] array and it can
search it to find its APIC ID in there.

But now you whine at me that it doesn't even know its APIC ID? Well, if
it's a relatively modern CPU then the APIC ID is in CPUID leaf 0x0B so
we can use that. Otherwise... erm... OK, otherwise it can't have parallel
CPU bringup for now. We'll still use a global variable for those CPUs and
bring them up one at a time.

So add a global 'smpboot_control' field which either contains the APIC
ID, or a flag indicating that it can be found in CPUID.

This adds the 'do_parallel_bringup' flag in preparation but doesn't
actually enable parallel bringup yet.

[ dwmw2: Minor tweaks, write a commit message ]
[ seanc: Fix stray override of initial_gs in common_cpu_up() ]
Not-signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: David Woodhouse <[email protected]>
Signed-off-by: Usama Arif <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
---
arch/x86/include/asm/realmode.h | 3 ++
arch/x86/include/asm/smp.h | 9 +++-
arch/x86/kernel/acpi/sleep.c | 1 +
arch/x86/kernel/apic/apic.c | 2 +-
arch/x86/kernel/head_64.S | 75 ++++++++++++++++++++++++++++
arch/x86/kernel/smpboot.c | 34 +++++++++++--
arch/x86/realmode/init.c | 3 ++
arch/x86/realmode/rm/trampoline_64.S | 14 ++++++
kernel/smpboot.c | 2 +-
9 files changed, 136 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/realmode.h b/arch/x86/include/asm/realmode.h
index a336feef0af1..f0357cfe2fb0 100644
--- a/arch/x86/include/asm/realmode.h
+++ b/arch/x86/include/asm/realmode.h
@@ -52,6 +52,7 @@ struct trampoline_header {
u64 efer;
u32 cr4;
u32 flags;
+ u32 lock;
#endif
};

@@ -65,6 +66,8 @@ extern unsigned long initial_stack;
extern unsigned long initial_vc_handler;
#endif

+extern u32 *trampoline_lock;
+
extern unsigned char real_mode_blob[];
extern unsigned char real_mode_relocs[];

diff --git a/arch/x86/include/asm/smp.h b/arch/x86/include/asm/smp.h
index b4dbb20dab1a..9b3c0cd35f5f 100644
--- a/arch/x86/include/asm/smp.h
+++ b/arch/x86/include/asm/smp.h
@@ -199,5 +199,12 @@ extern void nmi_selftest(void);
#define nmi_selftest() do { } while (0)
#endif

-#endif /* __ASSEMBLY__ */
+extern unsigned int smpboot_control;
+
+#endif /* !__ASSEMBLY__ */
+
+/* Control bits for startup_64 */
+#define STARTUP_PARALLEL 0x80000000
+#define STARTUP_SECONDARY 0x40000000
+
#endif /* _ASM_X86_SMP_H */
diff --git a/arch/x86/kernel/acpi/sleep.c b/arch/x86/kernel/acpi/sleep.c
index 3b7f4cdbf2e0..06adf340a0f1 100644
--- a/arch/x86/kernel/acpi/sleep.c
+++ b/arch/x86/kernel/acpi/sleep.c
@@ -115,6 +115,7 @@ int x86_acpi_suspend_lowlevel(void)
early_gdt_descr.address =
(unsigned long)get_cpu_gdt_rw(smp_processor_id());
initial_gs = per_cpu_offset(smp_processor_id());
+ smpboot_control = 0;
#endif
initial_code = (unsigned long)wakeup_long64;
saved_magic = 0x123456789abcdef0L;
diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index 20d9a604da7c..ac1d7e5da1f2 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -2377,7 +2377,7 @@ static int nr_logical_cpuids = 1;
/*
* Used to store mapping between logical CPU IDs and APIC IDs.
*/
-static int cpuid_to_apicid[] = {
+int cpuid_to_apicid[] = {
[0 ... NR_CPUS - 1] = -1,
};

diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index 222efd4a09bc..d07f694691d2 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -25,6 +25,7 @@
#include <asm/export.h>
#include <asm/nospec-branch.h>
#include <asm/fixmap.h>
+#include <asm/smp.h>

/*
* We are not able to switch in one step to the final KERNEL ADDRESS SPACE
@@ -241,6 +242,68 @@ SYM_INNER_LABEL(secondary_startup_64_no_verify, SYM_L_GLOBAL)
UNWIND_HINT_EMPTY
ANNOTATE_NOENDBR // above

+#ifdef CONFIG_SMP
+ /*
+ * Is this the boot CPU coming up? If so everything is available
+ * in initial_gs, initial_stack and early_gdt_descr.
+ */
+ movl smpboot_control(%rip), %eax
+ testl %eax, %eax
+ jz .Lsetup_cpu
+
+ /*
+ * Secondary CPUs find out the offsets via the APIC ID. For parallel
+ * boot the APIC ID is retrieved from CPUID, otherwise it's encoded
+ * in smpboot_control:
+ * Bit 0-29 APIC ID if STARTUP_PARALLEL flag is not set
+ * Bit 30 STARTUP_SECONDARY flag
+ * Bit 31 STARTUP_PARALLEL flag (use CPUID 0x0b for APIC ID)
+ */
+ testl $STARTUP_PARALLEL, %eax
+ jnz .Luse_cpuid_0b
+ andl $0x0FFFFFFF, %eax
+ jmp .Lsetup_AP
+
+.Luse_cpuid_0b:
+ mov $0x0B, %eax
+ xorl %ecx, %ecx
+ cpuid
+ mov %edx, %eax
+
+.Lsetup_AP:
+ /* EAX contains the APICID of the current CPU */
+ xorl %ecx, %ecx
+ leaq cpuid_to_apicid(%rip), %rbx
+
+.Lfind_cpunr:
+ cmpl (%rbx), %eax
+ jz .Linit_cpu_data
+ addq $4, %rbx
+ addq $8, %rcx
+ jmp .Lfind_cpunr
+
+.Linit_cpu_data:
+ /* Get the per cpu offset */
+ leaq __per_cpu_offset(%rip), %rbx
+ addq %rcx, %rbx
+ movq (%rbx), %rbx
+ /* Save it for GS BASE setup */
+ movq %rbx, initial_gs(%rip)
+
+ /* Calculate the GDT address */
+ movq $gdt_page, %rcx
+ addq %rbx, %rcx
+ movq %rcx, early_gdt_descr_base(%rip)
+
+ /* Find the idle task stack */
+ movq $idle_threads, %rcx
+ addq %rbx, %rcx
+ movq (%rcx), %rcx
+ movq TASK_threadsp(%rcx), %rcx
+ movq %rcx, initial_stack(%rip)
+#endif /* CONFIG_SMP */
+
+.Lsetup_cpu:
/*
* We must switch to a new descriptor in kernel space for the GDT
* because soon the kernel won't have access anymore to the userspace
@@ -281,6 +344,14 @@ SYM_INNER_LABEL(secondary_startup_64_no_verify, SYM_L_GLOBAL)
*/
movq initial_stack(%rip), %rsp

+ /* Drop the realmode protection. For the boot CPU the pointer is NULL! */
+ movq trampoline_lock(%rip), %rax
+ testq %rax, %rax
+ jz .Lsetup_idt
+ lock
+ btrl $0, (%rax)
+
+.Lsetup_idt:
/* Setup and Load IDT */
pushq %rsi
call early_setup_idt
@@ -426,6 +497,7 @@ SYM_DATA(initial_vc_handler, .quad handle_vc_boot_ghcb)
* reliably detect the end of the stack.
*/
SYM_DATA(initial_stack, .quad init_thread_union + THREAD_SIZE - FRAME_SIZE)
+SYM_DATA(trampoline_lock, .quad 0);
__FINITDATA

__INIT
@@ -660,6 +732,9 @@ SYM_DATA_END(level1_fixmap_pgt)
SYM_DATA(early_gdt_descr, .word GDT_ENTRIES*8-1)
SYM_DATA_LOCAL(early_gdt_descr_base, .quad INIT_PER_CPU_VAR(gdt_page))

+ .align 16
+SYM_DATA(smpboot_control, .long 0)
+
.align 16
/* This must match the first entry in level2_kernel_pgt */
SYM_DATA(phys_base, .quad 0x0)
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index fdcf7c08945f..741da8d306a4 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -800,6 +800,16 @@ static int __init cpu_init_udelay(char *str)
}
early_param("cpu_init_udelay", cpu_init_udelay);

+static bool do_parallel_bringup = true;
+
+static int __init no_parallel_bringup(char *str)
+{
+ do_parallel_bringup = false;
+
+ return 0;
+}
+early_param("no_parallel_bringup", no_parallel_bringup);
+
static void __init smp_quirk_init_udelay(void)
{
/* if cmdline changed it from default, leave it alone */
@@ -1087,8 +1097,6 @@ int common_cpu_up(unsigned int cpu, struct task_struct *idle)
#ifdef CONFIG_X86_32
/* Stack for startup_32 can be just as for start_secondary onwards */
per_cpu(pcpu_hot.top_of_stack, cpu) = task_top_of_stack(idle);
-#else
- initial_gs = per_cpu_offset(cpu);
#endif
return 0;
}
@@ -1113,9 +1121,16 @@ static int do_boot_cpu(int apicid, int cpu, struct task_struct *idle,
start_ip = real_mode_header->trampoline_start64;
#endif
idle->thread.sp = (unsigned long)task_pt_regs(idle);
- early_gdt_descr.address = (unsigned long)get_cpu_gdt_rw(cpu);
initial_code = (unsigned long)start_secondary;
- initial_stack = idle->thread.sp;
+
+ if (IS_ENABLED(CONFIG_X86_32)) {
+ early_gdt_descr.address = (unsigned long)get_cpu_gdt_rw(cpu);
+ initial_stack = idle->thread.sp;
+ } else if (do_parallel_bringup) {
+ smpboot_control = STARTUP_SECONDARY | STARTUP_PARALLEL;
+ } else {
+ smpboot_control = STARTUP_SECONDARY | apicid;
+ }

/* Enable the espfix hack for this CPU */
init_espfix_ap(cpu);
@@ -1515,6 +1530,17 @@ void __init native_smp_prepare_cpus(unsigned int max_cpus)

speculative_store_bypass_ht_init();

+ /*
+ * We can do 64-bit AP bringup in parallel if the CPU reports its
+ * APIC ID in CPUID leaf 0x0B. Otherwise it's too hard. And not
+ * for SEV-ES guests because they can't use CPUID that early.
+ * Also, some AMD CPUs crash when doing parallel cpu bringup, disable
+ * it for all AMD CPUs to be on the safe side.
+ */
+ if (IS_ENABLED(CONFIG_X86_32) || boot_cpu_data.cpuid_level < 0x0B ||
+ cc_platform_has(CC_ATTR_GUEST_STATE_ENCRYPT))
+ do_parallel_bringup = false;
+
snp_set_wakeup_secondary_cpu();
}

diff --git a/arch/x86/realmode/init.c b/arch/x86/realmode/init.c
index af565816d2ba..788e5559549f 100644
--- a/arch/x86/realmode/init.c
+++ b/arch/x86/realmode/init.c
@@ -154,6 +154,9 @@ static void __init setup_real_mode(void)

trampoline_header->flags = 0;

+ trampoline_lock = &trampoline_header->lock;
+ *trampoline_lock = 0;
+
trampoline_pgd = (u64 *) __va(real_mode_header->trampoline_pgd);

/* Map the real mode stub as virtual == physical */
diff --git a/arch/x86/realmode/rm/trampoline_64.S b/arch/x86/realmode/rm/trampoline_64.S
index e38d61d6562e..49ebc1636ffd 100644
--- a/arch/x86/realmode/rm/trampoline_64.S
+++ b/arch/x86/realmode/rm/trampoline_64.S
@@ -49,6 +49,19 @@ SYM_CODE_START(trampoline_start)
mov %ax, %es
mov %ax, %ss

+ /*
+ * Make sure only one CPU fiddles with the realmode stack
+ */
+.Llock_rm:
+ btl $0, tr_lock
+ jnc 2f
+ pause
+ jmp .Llock_rm
+2:
+ lock
+ btsl $0, tr_lock
+ jc .Llock_rm
+
# Setup stack
movl $rm_stack_end, %esp

@@ -241,6 +254,7 @@ SYM_DATA_START(trampoline_header)
SYM_DATA(tr_efer, .space 8)
SYM_DATA(tr_cr4, .space 4)
SYM_DATA(tr_flags, .space 4)
+ SYM_DATA(tr_lock, .space 4)
SYM_DATA_END(trampoline_header)

#include "trampoline_common.S"
diff --git a/kernel/smpboot.c b/kernel/smpboot.c
index 2c7396da470c..a18a21dff9bc 100644
--- a/kernel/smpboot.c
+++ b/kernel/smpboot.c
@@ -25,7 +25,7 @@
* For the hotplug case we keep the task structs around and reuse
* them.
*/
-static DEFINE_PER_CPU(struct task_struct *, idle_threads);
+DEFINE_PER_CPU(struct task_struct *, idle_threads);

struct task_struct *idle_thread_get(unsigned int cpu)
{
--
2.25.1


2023-02-02 21:57:46

by Usama Arif

[permalink] [raw]
Subject: [PATCH v6 07/11] x86/smpboot: Disable parallel boot for AMD CPUs

From: David Woodhouse <[email protected]>

Signed-off-by: David Woodhouse <[email protected]>
---
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/kernel/cpu/amd.c | 11 +++++++++++
arch/x86/kernel/smpboot.c | 13 +++++++++++--
3 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 61012476d66e..ed7f32354edc 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -466,5 +466,6 @@
#define X86_BUG_MMIO_UNKNOWN X86_BUG(26) /* CPU is too old and its MMIO Stale Data status is unknown */
#define X86_BUG_RETBLEED X86_BUG(27) /* CPU is affected by RETBleed */
#define X86_BUG_EIBRS_PBRSB X86_BUG(28) /* EIBRS is vulnerable to Post Barrier RSB Predictions */
+#define X86_BUG_NO_PARALLEL_BRINGUP X86_BUG(29) /* CPU has hardware issues with parallel AP bringup */

#endif /* _ASM_X86_CPUFEATURES_H */
diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index f769d6d08b43..19b5c8342d7e 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -941,6 +941,17 @@ static void init_amd(struct cpuinfo_x86 *c)
case 0x19: init_amd_zn(c); break;
}

+ /*
+ * Various AMD CPUs appear to not to cope with APs being brought up
+ * in parallel. In debugging, the AP doesn't even seem to reach an
+ * outb to port 0x3f8 right at the top of the startup trampoline.
+ * We don't *think* there are any remaining software issues which
+ * may contribute to this, although it's possible. So far, attempts
+ * to get AMD to investigate this have been to no avail. So just
+ * disable parallel bring up for all AMD CPUs for now.
+ */
+ set_cpu_bug(c, X86_BUG_NO_PARALLEL_BRINGUP);
+
/*
* Enable workaround for FXSAVE leak on CPUs
* without a XSaveErPtr feature
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 741da8d306a4..656897b055f5 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -1534,13 +1534,22 @@ void __init native_smp_prepare_cpus(unsigned int max_cpus)
* We can do 64-bit AP bringup in parallel if the CPU reports its
* APIC ID in CPUID leaf 0x0B. Otherwise it's too hard. And not
* for SEV-ES guests because they can't use CPUID that early.
- * Also, some AMD CPUs crash when doing parallel cpu bringup, disable
- * it for all AMD CPUs to be on the safe side.
*/
if (IS_ENABLED(CONFIG_X86_32) || boot_cpu_data.cpuid_level < 0x0B ||
cc_platform_has(CC_ATTR_GUEST_STATE_ENCRYPT))
do_parallel_bringup = false;

+ /*
+ * Various AMD CPUs appear not to cope with APs being brought up
+ * in parallel, so just disable parallel bring up for all AMD CPUs
+ * for now.
+ */
+ if (do_parallel_bringup &&
+ boot_cpu_has_bug(X86_BUG_NO_PARALLEL_BRINGUP)) {
+ pr_info("Disabling parallel bringup due to CPU bugs\n");
+ do_parallel_bringup = false;
+ }
+
snp_set_wakeup_secondary_cpu();
}

--
2.25.1


2023-02-02 21:57:49

by Usama Arif

[permalink] [raw]
Subject: [PATCH v6 11/11] x86/smpboot: reuse timer calibration

From: Arjan van de Ven <[email protected]>

No point recalibrating for known-constant tsc.
When tested on a 128 core, 2 socket server, this reduces
the parallel smpboot time from 100ms to 30ms.

Not-signed-off-by: Arjan van de Ven <[email protected]>
Signed-off-by: Usama Arif <[email protected]>
---
arch/x86/kernel/tsc.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index a78e73da4a74..bab8a98080cf 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -1569,6 +1569,9 @@ unsigned long calibrate_delay_is_known(void)
if (!constant_tsc || !mask)
return 0;

+ if (cpu != 0)
+ return cpu_data(0).loops_per_jiffy;
+
sibling = cpumask_any_but(mask, cpu);
if (sibling < nr_cpu_ids)
return cpu_data(sibling).loops_per_jiffy;
--
2.25.1


2023-02-02 21:57:52

by Usama Arif

[permalink] [raw]
Subject: [PATCH v6 09/11] x86/mtrr: Avoid repeated save of MTRRs on boot-time CPU bringup

From: David Woodhouse <[email protected]>

There's no need to repeatedly save the BSP's MTRRs for each AP we bring
up at boot time. And there's no need to use smp_call_function_single()
even for the one time we *do* want to do it.

Signed-off-by: David Woodhouse <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
---
arch/x86/kernel/cpu/mtrr/mtrr.c | 9 +++++++++
1 file changed, 9 insertions(+)

diff --git a/arch/x86/kernel/cpu/mtrr/mtrr.c b/arch/x86/kernel/cpu/mtrr/mtrr.c
index 783f3210d582..b6eae3ad4414 100644
--- a/arch/x86/kernel/cpu/mtrr/mtrr.c
+++ b/arch/x86/kernel/cpu/mtrr/mtrr.c
@@ -721,11 +721,20 @@ void __init mtrr_bp_init(void)
*/
void mtrr_save_state(void)
{
+ static bool mtrr_saved;
int first_cpu;

if (!mtrr_enabled())
return;

+ if (system_state < SYSTEM_RUNNING) {
+ if (!mtrr_saved) {
+ mtrr_save_fixed_ranges(NULL);
+ mtrr_saved = true;
+ }
+ return;
+ }
+
first_cpu = cpumask_first(cpu_online_mask);
smp_call_function_single(first_cpu, mtrr_save_fixed_ranges, NULL, 1);
}
--
2.25.1


2023-02-02 22:32:08

by David Woodhouse

[permalink] [raw]
Subject: Re: [PATCH v6 06/11] x86/smpboot: Support parallel startup of secondary CPUs

On Thu, 2023-02-02 at 21:56 +0000, Usama Arif wrote:
> @@ -1515,6 +1530,17 @@ void __init native_smp_prepare_cpus(unsigned int max_cpus)
>  
>         speculative_store_bypass_ht_init();
>  
> +       /*
> +        * We can do 64-bit AP bringup in parallel if the CPU reports its
> +        * APIC ID in CPUID leaf 0x0B. Otherwise it's too hard. And not
> +        * for SEV-ES guests because they can't use CPUID that early.
> +        * Also, some AMD CPUs crash when doing parallel cpu bringup, disable
> +        * it for all AMD CPUs to be on the safe side.
> +        */
> +       if (IS_ENABLED(CONFIG_X86_32) || boot_cpu_data.cpuid_level < 0x0B ||
> +           cc_platform_has(CC_ATTR_GUEST_STATE_ENCRYPT))
> +               do_parallel_bringup = false;
> +
>         snp_set_wakeup_secondary_cpu();
>  }

Oops, I thought I'd removed those two lines about AMD in the comment
when I removed the actual code that did so. Turns out we remove those
lines again in the *next* patch instead.


Attachments:
smime.p7s (5.83 kB)

2023-02-02 22:51:13

by Usama Arif

[permalink] [raw]
Subject: Re: [External] Re: [PATCH v6 06/11] x86/smpboot: Support parallel startup of secondary CPUs



On 02/02/2023 22:30, David Woodhouse wrote:
> On Thu, 2023-02-02 at 21:56 +0000, Usama Arif wrote:
>> @@ -1515,6 +1530,17 @@ void __init native_smp_prepare_cpus(unsigned int max_cpus)
>>
>>         speculative_store_bypass_ht_init();
>>
>> +       /*
>> +        * We can do 64-bit AP bringup in parallel if the CPU reports its
>> +        * APIC ID in CPUID leaf 0x0B. Otherwise it's too hard. And not
>> +        * for SEV-ES guests because they can't use CPUID that early.
>> +        * Also, some AMD CPUs crash when doing parallel cpu bringup, disable
>> +        * it for all AMD CPUs to be on the safe side.
>> +        */
>> +       if (IS_ENABLED(CONFIG_X86_32) || boot_cpu_data.cpuid_level < 0x0B ||
>> +           cc_platform_has(CC_ATTR_GUEST_STATE_ENCRYPT))
>> +               do_parallel_bringup = false;
>> +
>>         snp_set_wakeup_secondary_cpu();
>>  }
>
> Oops, I thought I'd removed those two lines about AMD in the comment
> when I removed the actual code that did so. Turns out we remove those
> lines again in the *next* patch instead.

Ah, sorry about that, should have caught it while reviewing before
posting. To think of it, it might be better to squash this and next AMD
disabling patch, if anyone testing on AMD ever happens to check out at
this specific patch, their machine might not boot :). Will repost only
after getting the OK for the rest of the patches along with addressing
any other comments that come up.

Thanks,
Usama


2023-02-03 08:15:14

by David Woodhouse

[permalink] [raw]
Subject: Re: [External] Re: [PATCH v6 06/11] x86/smpboot: Support parallel startup of secondary CPUs

On Thu, 2023-02-02 at 22:50 +0000, Usama Arif wrote:
>
> Ah, sorry about that, should have caught it while reviewing before
> posting. To think of it, it might be better to squash this and next AMD
> disabling patch, if anyone testing on AMD ever happens to check out at
> this specific patch, their machine might not boot :). 
>

Nah, we don't actually *enable* parallel boot until the next patch
anyway. At this point we're only laying the groundwork. I think it's
best separate.

> Will repost only after getting the OK for the rest of the patches
> along with addressing any other comments that come up.

Ack. And hopefully the missing SoB from both Thomas and Arjan for their
parts. Part of the reasoning for the lack of SoB from Arjan might have
been that we haven't finished doing the thinking about how this works
in all cases, with non-constant TSC and other stuff. Let's make sure we
give that a long hard stare.


Attachments:
smime.p7s (5.83 kB)

2023-02-03 14:42:12

by Arjan van de Ven

[permalink] [raw]
Subject: Re: [External] Re: [PATCH v6 06/11] x86/smpboot: Support parallel startup of secondary CPUs

On 2/3/2023 12:14 AM, David Woodhouse wrote:
> On Thu, 2023-02-02 at 22:50 +0000, Usama Arif wrote:
>>
>> Ah, sorry about that, should have caught it while reviewing before
>> posting. To think of it, it might be better to squash this and next AMD
>> disabling patch, if anyone testing on AMD ever happens to check out at
>> this specific patch, their machine might not boot :).
>>
>
> Nah, we don't actually *enable* parallel boot until the next patch
> anyway. At this point we're only laying the groundwork. I think it's
> best separate.
>
>> Will repost only after getting the OK for the rest of the patches
>> along with addressing any other comments that come up.
>
> Ack. And hopefully the missing SoB from both Thomas and Arjan for their
> parts. Part of the reasoning for the lack of SoB from Arjan might have
> been that we haven't finished doing the thinking about how this works
> in all cases, with non-constant TSC and other stuff. Let's make sure we
> give that a long hard stare.

most of it is that I need to sit down for some real time and have a think through it ;)

for all the common cases / modern HW it will be ok... but all the rest???

maybe we need a strict(er) safety valve on where this is used and then over time
widen the usage?

(after all, a regression means revert, and maybe it's better to only revert "add CPU type X" than
revert the whole series)


2023-02-03 18:17:14

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v6 06/11] x86/smpboot: Support parallel startup of secondary CPUs

On Thu, Feb 02, 2023, Usama Arif wrote:
> @@ -1515,6 +1530,17 @@ void __init native_smp_prepare_cpus(unsigned int max_cpus)
>
> speculative_store_bypass_ht_init();
>
> + /*
> + * We can do 64-bit AP bringup in parallel if the CPU reports its
> + * APIC ID in CPUID leaf 0x0B. Otherwise it's too hard. And not
> + * for SEV-ES guests because they can't use CPUID that early.
> + * Also, some AMD CPUs crash when doing parallel cpu bringup, disable
> + * it for all AMD CPUs to be on the safe side.
> + */
> + if (IS_ENABLED(CONFIG_X86_32) || boot_cpu_data.cpuid_level < 0x0B ||
> + cc_platform_has(CC_ATTR_GUEST_STATE_ENCRYPT))

Obviously not your fault, but CC_ATTR_GUEST_STATE_ENCRYPT is a terrible name.
TDX guest state is also encrypted, but TDX doesn't return true CC_ATTR_GUEST_STATE_ENCRYPT.
I.e. this looks wrong, but is correct, because CC_ATTR_GUEST_STATE_ENCRYPT really
just means "SEV-ES guest".

> + do_parallel_bringup = false;
> +
> snp_set_wakeup_secondary_cpu();
> }

2023-02-03 19:48:53

by Kim Phillips

[permalink] [raw]
Subject: Re: [PATCH v6 07/11] x86/smpboot: Disable parallel boot for AMD CPUs

+Mario

Hi,

On 2/2/23 3:56 PM, Usama Arif wrote:
> From: David Woodhouse <[email protected]>
>
> Signed-off-by: David Woodhouse <[email protected]>
> ---

I'd like to nack this, but can't (and not because it doesn't have
commit text):

If I:

- take dwmw2's parallel-6.2-rc6 branch (commit 459d1c46dbd1)
- remove the set_cpu_bug(c, X86_BUG_NO_PARALLEL_BRINGUP) line from amd.c

Then:

- a Ryzen 3000 (Picasso A1/Zen+) notebook I have access to fails to boot.
- Zen 2,3,4-based servers boot fine
- a Zen1-based server doesn't boot.

This is what's left on its serial port:

[ 3.199633] smp: Bringing up secondary CPUs ...
[ 3.200732] x86: Booting SMP configuration:
[ 3.204242] .... node #0, CPUs: #1
[ 3.204301] CPU 1 to 93/x86/cpu:kick in 63 21 -114014307645 0 . 0 0 0 0 . 0 114025055970
[ 3.204478] ------------[ cut here ]------------
[ 3.204481] WARNING: CPU: 0 PID: 1 at arch/x86/kernel/cpu/common.c:2122 cpu_init+0x2d/0x1f0
[ 3.204490] Modules linked in:
[ 3.204493] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.2.0-rc6+ #19
[ 3.204496] Hardware name: AMD Corporation Speedway/Speedway, BIOS RSW1009C 07/27/2018
[ 3.204498] RIP: 0010:cpu_init+0x2d/0x1f0
[ 3.204502] Code: e5 41 56 41 55 41 54 53 65 48 8b 1c 25 80 2e 1f 00 65 44 8b 35 20 e4 39 55 48 8b 05 5d f7 51 02 44 89 f2 f0 48 0f ab 10 73 06 <0f> 0b eb 02 f3 90 48 8b 05 3e f7 51 02 48 0f a3 10 73 f1 45 85 f6
[ 3.204504] RSP: 0000:ffffffffac803d70 EFLAGS: 00010083
[ 3.204506] RAX: ffff8d293eef6e40 RBX: ffff8d1d40010000 RCX: 0000000000000008
[ 3.204508] RDX: 0000000000000000 RSI: ffff8d1d1c40b048 RDI: ffffffffac566418
[ 3.204509] RBP: ffffffffac803d90 R08: 00000000fffffe14 R09: ffff8d1d1c406078
[ 3.204510] R10: ffffffffac803dc0 R11: 0000000000000000 R12: 0000000000000000
[ 3.204511] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[ 3.204512] FS: 0000000000000000(0000) GS:ffff8d1d1c400000(0000) knlGS:0000000000000000
[ 3.204514] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3.204515] CR2: 0000000000000000 CR3: 0000800daec12000 CR4: 00000000003100a0
[ 3.204517] Call Trace:
[ 3.204519] ---[ end trace 0000000000000000 ]---
[ 3.204580] [Firmware Bug]: CPU0: APIC id mismatch. Firmware: 0 APIC: 2
[ 3.288686] #2
[ 3.288735] CPU 2 to 93/x86/cpu:kick in 210 42 -114355248756 0 . 0 0 0 0 . 0 114356192013
[ 3.288798] ------------[ cut here ]------------
[ 3.288804] WARNING: CPU: 0 PID: 1 at arch/x86/kernel/cpu/common.c:2122 cpu_init+0x2d/0x1f0
[ 3.288815] Modules linked in:
[ 3.288819] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G W 6.2.0-rc6+ #19
[ 3.288823] Hardware name: AMD Corporation Speedway/Speedway, BIOS RSW1009C 07/27/2018
[ 3.288826] RIP: 0010:cpu_init+0x2d/0x1f0
[ 3.288831] Code: e5 41 56 41 55 41 54 53 65 48 8b 1c 25 80 2e 1f 00 65 44 8b 35 20 e4 39 55 48 8b 05 5d f7 51 02 44 89 f2 f0 48 0f ab 10 73 06 <0f> 0b eb 02 f3 90 48 8b 05 3e f7 51 02 48 0f a3 10 73 f1 45 85 f6
[ 3.288835] RSP: 0000:ffffffffac803d70 EFLAGS: 00010083
[ 3.288838] RAX: ffff8d293eef6e40 RBX: ffff8d1d40010000 RCX: 0000000000000008
[ 3.288841] RDX: 0000000000000000 RSI: ffff8d1d1c40b048 RDI: ffffffffac566418
[ 3.288844] RBP: ffffffffac803d90 R08: 00000000fffffe14 R09: ffff8d1d1c406078
[ 3.288845] R10: ffffffffac803dc0 R11: 0000000000000000 R12: 0000000000000000
[ 3.288848] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[ 3.288850] FS: 0000000000000000(0000) GS:ffff8d1d1c400000(0000) knlGS:0000000000000000
[ 3.288852] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3.288855] CR2: 0000000000000000 CR3: 0000800daec12000 CR4: 00000000003100a0
[ 3.288857] Call Trace:
[ 3.288859] ---[ end trace 0000000000000000 ]---
[ 3.288925] [Firmware Bug]: CPU0: APIC id mismatch. Firmware: 0 APIC: 8
6.36[ [ 3. 68 33]3 [ #3[ [ #
[ 3.368623[ 3
[ 3.368623] #3
[ 3.368662] ------------[ cut here ]------------
[ 3.368673] CPU 3 to 93/x86/cpu:kick in 504 315 -114684508974 0 . 0 0 0 0 . 0 114685353594
[ 3.368705] BUG: scheduling while atomic: swapper/0/1/0x00000003
[ 3.368708] 7 locks held by swapper/0/1:
[ 3.368710] #0: ffffffffacbff920 (console_lock){....}-{0:0}, at: vprintk_emit+0x13a/0x2e0
[ 3.368721] #1: ffffffffacbffd48 (console_srcu){....}-{0:0}, at: console_flush_all+0x2d/0x250
[ 3.368728] #2: ffffffffac87f540 (console_owner){....}-{0:0}, at: console_emit_next_record.constprop.22+0x189/0x350
[ 3.368735] #3: ffffffffadaae838 (&port_lock_key){....}-{2:2}, at: serial8250_console_write+0x88/0x3c0
[ 3.368745] #4: ffffffffac86aa50 (cpu_add_remove_lock){....}-{3:3}, at: cpu_up+0x6a/0xd0
[ 3.368753] #5: ffffffffac86a9a0 (cpu_hotplug_lock){....}-{0:0}, at: _cpu_up+0x3d/0x2f0
[ 3.368760] #6: ffffffffac8763b0 (smpboot_threads_lock){....}-{3:3}, at: smpboot_create_threads+0x21/0x80
[ 3.368769] Modules linked in:
[ 3.368770] Preemption disabled at:
[ 3.368771] [<ffffffffaae717a4>] do_cpu_up+0x3e4/0x780
[ 3.368777] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G W 6.2.0-rc6+ #19
[ 3.368781] Hardware name: AMD Corporation Speedway/Speedway, BIOS RSW1009C 07/27/2018
[ 3.368782] Call Trace:
[ 3.368783] <TASK>
[ 3.368789] dump_stack_lvl+0x49/0x63
[ 3.368795] ? do_cpu_up+0x3e4/0x780
[ 3.368799] dump_stack+0x10/0x16
[ 3.368802] __schedule_bug+0xad/0xd0
[ 3.368808] __schedule+0x76/0x8a0
[ 3.368812] ? sched_clock+0x9/0x10
[ 3.368817] ? sched_clock_local+0x17/0x90
[ 3.368826] ? sort_range+0x30/0x30
[ 3.368830] schedule+0x88/0xd0
[ 3.368833] schedule_timeout+0x40/0x320
[ 3.368840] ? __this_cpu_preempt_check+0x13/0x20
[ 3.368844] ? lock_release+0x353/0x3c0
[ 3.368852] ? sort_range+0x30/0x30
[ 3.368856] wait_for_completion_killable+0xe0/0x1c0
[ 3.368864] __kthread_create_on_node+0xfe/0x1e0
[ 3.368876] ? wait_for_completion_killable+0x38/0x1c0
[ 3.368884] kthread_create_on_node+0x46/0x70
[ 3.368894] kthread_create_on_cpu+0x2c/0x90
[ 3.368899] __smpboot_create_thread+0x87/0x140
[ 3.368905] smpboot_create_threads+0x3f/0x80
[ 3.368909] ? idle_thread_get+0x40/0x40
[ 3.368913] cpuhp_invoke_callback+0x13c/0x5d0
[ 3.368921] __cpuhp_invoke_callback_range+0x69/0xf0
[ 3.368929] _cpu_up+0x12a/0x2f0
[ 3.368937] cpu_up+0x8f/0xd0
[ 3.368942] bringup_nonboot_cpus+0x7c/0x160
[ 3.368950] smp_init+0x2a/0x83
[ 3.368957] kernel_init_freeable+0x1a1/0x309
[ 3.368961] ? lock_release+0x353/0x3c0
[ 3.368972] ? rest_init+0x140/0x140
[ 3.368977] kernel_init+0x1a/0x130
[ 3.368980] ret_from_fork+0x22/0x30
[ 3.368996] </TASK>
[ 3.369419]
[ 3.369420] .... node #1, CPUs: #4
[ 3.369466] ------------[ cut here ]------------
[ 3.369469] CPU 4 to 93/x86/cpu:kick in 378 42 -114685407543 0 . 0 0 0 0 . 0 114687022569
[ 3.369474] WARNING: CPU: 0 PID: 1 at arch/x86/kernel/cpu/common.c:2122 cpu_init+0x2d/0x1f0
[ 3.369487] Modules linked in:
[ 3.369491] ------------[ cut here ]------------
[ 3.369494] DEBUG_LOCKS_WARN_ON(val > preempt_count())
[ 3.369493] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G W 6.2.0-rc6+ #19
[ 3.369499] Hardware name: AMD Corporation Speedway/Speedway, BIOS RSW1009C 07/27/2018


...which points to the WARN_ON here:

static void wait_for_master_cpu(int cpu)
{
#ifdef CONFIG_SMP
/*
* wait for ACK from master CPU before continuing
* with AP initialization
*/
WARN_ON(cpumask_test_and_set_cpu(cpu, cpu_initialized_mask));
while (!cpumask_test_cpu(cpu, cpu_callout_mask))
cpu_relax();
#endif
}

Let me know if you'd like me to test any changes.

Thanks,

Kim

2023-02-03 21:45:36

by Kim Phillips

[permalink] [raw]
Subject: Re: [PATCH v6 07/11] x86/smpboot: Disable parallel boot for AMD CPUs

On 2/3/23 2:19 PM, Woodhouse, David wrote:
> Would be interesting to know if that goes away if you test only the
> first part of the tree. My first inclination is to suspect that's a bug
> in the later patches... although the APIC ID mismatch is interesting.
> That part (with each AP getting its own APIC ID from CPUID) is in the
> *first* part of the series....

dwmw2/parallel-6.2-rc6-part1 (commit 942b3faa258c) re-enabled for
AMD gets the same splat(s):

[ 3.195498] smp: Bringing up secondary CPUs ...
[ 3.196670] x86: Booting SMP configuration:
[ 3.200189] .... node #0, CPUs: #1
[ 3.200247] CPU 1 to 93/x86/cpu:kick in 315 42 -155206575216 0 . 0 0 0 0 . 0 155217324423
[ 3.200418] ------------[ cut here ]------------
[ 3.200420] WARNING: CPU: 0 PID: 1 at arch/x86/kernel/cpu/common.c:2122 cpu_init+0x2d/0x1f0
[ 3.200428] Modules linked in:
[ 3.200430] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.2.0-rc6+ #19
[ 3.200433] Hardware name: AMD Corporation Speedway/Speedway, BIOS RSW1009C 07/27/2018
[ 3.200435] RIP: 0010:cpu_init+0x2d/0x1f0
[ 3.200438] Code: e5 41 56 41 55 41 54 53 65 48 8b 1c 25 80 2e 1f 00 65 44 8b 35 20 e4 99 47 48 8b 05 5d f7 51 02 44 89 f2 f0 48 0f ab 10 73 06 <0f> 0b eb 02 f3 90 48 8b 05 3e f7 51 02 48 0f a3 10 73 f1 45 85 f6
[ 3.200440] RSP: 0000:ffffffffba203d70 EFLAGS: 00010083
[ 3.200443] RAX: ffff8e027eef6e40 RBX: ffff8dfe80064b80 RCX: 0000000000000008
[ 3.200444] RDX: 0000000000000000 RSI: ffff8df65c40b048 RDI: ffffffffb9f66418
[ 3.200445] RBP: ffffffffba203d90 R08: 00000000fffffe4d R09: ffff8df65c406078
[ 3.200446] R10: ffffffffba203dc0 R11: 0000000000000000 R12: 0000000000000000
[ 3.200447] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[ 3.200448] FS: 0000000000000000(0000) GS:ffff8df65c400000(0000) knlGS:0000000000000000
[ 3.200450] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3.200451] CR2: 0000000000000000 CR3: 0000800307e12000 CR4: 00000000003100a0
[ 3.200453] Call Trace:
[ 3.200454] ---[ end trace 0000000000000000 ]---
[ 3.200509] [Firmware Bug]: CPU0: APIC id mismatch. Firmware: 0 APIC: 2
[ 3.284620] #2
[ 3.284669] CPU 2 to 93/x86/cpu:kick in 252 42 -155547668514 0 . 0 0 0 0 . 0 155548597197
[ 3.284727] ------------[ cut here ]------------
[ 3.284732] WARNING: CPU: 0 PID: 1 at arch/x86/kernel/cpu/common.c:2122 cpu_init+0x2d/0x1f0
[ 3.284741] Modules linked in:
[ 3.284745] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G W 6.2.0-rc6+ #19
[ 3.284749] Hardware name: AMD Corporation Speedway/Speedway, BIOS RSW1009C 07/27/2018
[ 3.284752] RIP: 0010:cpu_init+0x2d/0x1f0
[ 3.284756] Code: e5 41 56 41 55 41 54 53 65 48 8b 1c 25 80 2e 1f 00 65 44 8b 35 20 e4 99 47 48 8b 05 5d f7 51 02 44 89 f2 f0 48 0f ab 10 73 06 <0f> 0b eb 02 f3 90 48 8b 05 3e f7 51 02 48 0f a3 10 73 f1 45 85 f6
[ 3.284760] RSP: 0000:ffffffffba203d70 EFLAGS: 00010083
[ 3.284764] RAX: ffff8e027eef6e40 RBX: ffff8dfe80064b80 RCX: 0000000000000008
[ 3.284766] RDX: 0000000000000000 RSI: ffff8df65c40b048 RDI: ffffffffb9f66418
[ 3.284769] RBP: ffffffffba203d90 R08: 00000000fffffe4d R09: ffff8df65c406078
[ 3.284771] R10: ffffffffba203dc0 R11: 0000000000000000 R12: 0000000000000000
[ 3.284773] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[ 3.284775] FS: 0000000000000000(0000) GS:ffff8df65c400000(0000) knlGS:0000000000000000
[ 3.284778] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3.284779] CR2: 0000000000000000 CR3: 0000800307e12000 CR4: 00000000003100a0
[ 3.284781] Call Trace:
[ 3.284783] ---[ end trace 0000000000000000 ]---
[ 3.284841] [Firmware Bug]: CPU0: APIC id mismatch. Firmware: 0 APIC: 8
[ 3.364[[ [. 343 35 5 5] [ 3 364575 [
3 [ [ [3 3 [ [ 3

Thanks,

Kim

2023-02-03 22:25:35

by David Woodhouse

[permalink] [raw]
Subject: Re: [EXTERNAL][PATCH v6 07/11] x86/smpboot: Disable parallel boot for AMD CPUs

On Fri, 2023-02-03 at 15:45 -0600, Kim Phillips wrote:
> On 2/3/23 2:19 PM, Woodhouse, David wrote:
> > Would be interesting to know if that goes away if you test only the
> > first part of the tree. My first inclination is to suspect that's a bug
> > in the later patches... although the APIC ID mismatch is interesting.
> > That part (with each AP getting its own APIC ID from CPUID) is in the
> > *first* part of the series....
>
> dwmw2/parallel-6.2-rc6-part1 (commit 942b3faa258c) re-enabled for
> AMD gets the same splat(s):

Sure that's the right image? Looks like it still has the timing debug
patch from the last commit in the tree?


> [    3.195498] smp: Bringing up secondary CPUs ...
> [    3.196670] x86: Booting SMP configuration:
> [    3.200189] .... node  #0, CPUs:          #1
> [    3.200247] CPU 1 to 93/x86/cpu:kick in 315 42 -155206575216 0 . 0 0 0 0 . 0 155217324423
> [    3.200418] ------------[ cut here ]------------
> [    3.200420] WARNING: CPU: 0 PID: 1 at arch/x86/kernel/cpu/common.c:2122 cpu_init+0x2d/0x1f0
> [    3.200428] Modules linked in:
> [    3.200430] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.2.0-rc6+ #19
> [    3.200433] Hardware name: AMD Corporation Speedway/Speedway, BIOS RSW1009C 07/27/2018
> [    3.200435] RIP: 0010:cpu_init+0x2d/0x1f0
> [    3.200438] Code: e5 41 56 41 55 41 54 53 65 48 8b 1c 25 80 2e 1f 00 65 44 8b 35 20 e4 99 47 48 8b 05 5d f7 51 02 44 89 f2 f0 48 0f ab 10 73 06 <0f> 0b eb 02 f3 90 48 8b 05 3e f7 51 02 48 0f a3 10 73 f1 45 85 f6
> [    3.200440] RSP: 0000:ffffffffba203d70 EFLAGS: 00010083
> [    3.200443] RAX: ffff8e027eef6e40 RBX: ffff8dfe80064b80 RCX: 0000000000000008
> [    3.200444] RDX: 0000000000000000 RSI: ffff8df65c40b048 RDI: ffffffffb9f66418
> [    3.200445] RBP: ffffffffba203d90 R08: 00000000fffffe4d R09: ffff8df65c406078
> [    3.200446] R10: ffffffffba203dc0 R11: 0000000000000000 R12: 0000000000000000
> [    3.200447] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
> [    3.200448] FS:  0000000000000000(0000) GS:ffff8df65c400000(0000) knlGS:0000000000000000
> [    3.200450] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [    3.200451] CR2: 0000000000000000 CR3: 0000800307e12000 CR4: 00000000003100a0
> [    3.200453] Call Trace:
> [    3.200454] ---[ end trace 0000000000000000 ]---
> [    3.200509] [Firmware Bug]: CPU0: APIC id mismatch. Firmware: 0 APIC: 2
> [    3.284620]    #2
> [    3.284669] CPU 2 to 93/x86/cpu:kick in 252 42 -155547668514 0 . 0 0 0 0 . 0 155548597197
> [    3.284727] ------------[ cut here ]------------
> [    3.284732] WARNING: CPU: 0 PID: 1 at arch/x86/kernel/cpu/common.c:2122 cpu_init+0x2d/0x1f0
> [    3.284741] Modules linked in:
> [    3.284745] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G        W          6.2.0-rc6+ #19
> [    3.284749] Hardware name: AMD Corporation Speedway/Speedway, BIOS RSW1009C 07/27/2018
> [    3.284752] RIP: 0010:cpu_init+0x2d/0x1f0
> [    3.284756] Code: e5 41 56 41 55 41 54 53 65 48 8b 1c 25 80 2e 1f 00 65 44 8b 35 20 e4 99 47 48 8b 05 5d f7 51 02 44 89 f2 f0 48 0f ab 10 73 06 <0f> 0b eb 02 f3 90 48 8b 05 3e f7 51 02 48 0f a3 10 73 f1 45 85 f6
> [    3.284760] RSP: 0000:ffffffffba203d70 EFLAGS: 00010083
> [    3.284764] RAX: ffff8e027eef6e40 RBX: ffff8dfe80064b80 RCX: 0000000000000008
> [    3.284766] RDX: 0000000000000000 RSI: ffff8df65c40b048 RDI: ffffffffb9f66418
> [    3.284769] RBP: ffffffffba203d90 R08: 00000000fffffe4d R09: ffff8df65c406078
> [    3.284771] R10: ffffffffba203dc0 R11: 0000000000000000 R12: 0000000000000000
> [    3.284773] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
> [    3.284775] FS:  0000000000000000(0000) GS:ffff8df65c400000(0000) knlGS:0000000000000000
> [    3.284778] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [    3.284779] CR2: 0000000000000000 CR3: 0000800307e12000 CR4: 00000000003100a0
> [    3.284781] Call Trace:
> [    3.284783] ---[ end trace 0000000000000000 ]---
> [    3.284841] [Firmware Bug]: CPU0: APIC id mismatch. Firmware: 0 APIC: 8
> [    3.364[[   [. 343 35     5 5] [  3 364575  [
>   3 [    [   [3 3 [ [  3
>
> Thanks,
>
> Kim


Attachments:
smime.p7s (5.83 kB)

2023-02-04 09:08:18

by David Woodhouse

[permalink] [raw]
Subject: Re: [PATCH v6 07/11] x86/smpboot: Disable parallel boot for AMD CPUs

On Fri, 2023-02-03 at 13:48 -0600, Kim Phillips wrote:
> +Mario
>
> Hi,
>
> On 2/2/23 3:56 PM, Usama Arif wrote:
> > From: David Woodhouse <[email protected]>
> >
> > Signed-off-by: David Woodhouse <[email protected]>
> > ---
>
> I'd like to nack this, but can't (and not because it doesn't have
> commit text):
>
> If I:
>
>   - take dwmw2's parallel-6.2-rc6 branch (commit 459d1c46dbd1)
>   - remove the set_cpu_bug(c, X86_BUG_NO_PARALLEL_BRINGUP) line from amd.c
>
> Then:
>
>   - a Ryzen 3000 (Picasso A1/Zen+) notebook I have access to fails to boot.
>   - Zen 2,3,4-based servers boot fine
>   - a Zen1-based server doesn't boot.
>
> This is what's left on its serial port:
>
> [    3.199633] smp: Bringing up secondary CPUs ...
> [    3.200732] x86: Booting SMP configuration:
> [    3.204242] .... node  #0, CPUs:          #1
> [    3.204301] CPU 1 to 93/x86/cpu:kick in 63 21 -114014307645 0 . 0 0 0 0 . 0 114025055970
> [    3.204478] ------------[ cut here ]------------
> [    3.204481] WARNING: CPU: 0 PID: 1 at arch/x86/kernel/cpu/common.c:2122 cpu_init+0x2d/0x1f0
> [    3.204490] Modules linked in:
> [    3.204493] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.2.0-rc6+ #19
> [    3.204496] Hardware name: AMD Corporation Speedway/Speedway, BIOS RSW1009C 07/27/2018
> [    3.204498] RIP: 0010:cpu_init+0x2d/0x1f0
> [    3.204502] Code: e5 41 56 41 55 41 54 53 65 48 8b 1c 25 80 2e 1f 00 65 44 8b 35 20 e4 39 55 48 8b 05 5d f7 51 02 44 89 f2 f0 48 0f ab 10 73 06 <0f> 0b eb 02 f3 90 48 8b 05 3e f7 51 02 48 0f a3 10 73 f1 45 85 f6
> [    3.204504] RSP: 0000:ffffffffac803d70 EFLAGS: 00010083
> [    3.204506] RAX: ffff8d293eef6e40 RBX: ffff8d1d40010000 RCX: 0000000000000008
> [    3.204508] RDX: 0000000000000000 RSI: ffff8d1d1c40b048 RDI: ffffffffac566418
> [    3.204509] RBP: ffffffffac803d90 R08: 00000000fffffe14 R09: ffff8d1d1c406078
> [    3.204510] R10: ffffffffac803dc0 R11: 0000000000000000 R12: 0000000000000000
> [    3.204511] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
> [    3.204512] FS:  0000000000000000(0000) GS:ffff8d1d1c400000(0000) knlGS:0000000000000000
> [    3.204514] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [    3.204515] CR2: 0000000000000000 CR3: 0000800daec12000 CR4: 00000000003100a0
> [    3.204517] Call Trace:
> [    3.204519] ---[ end trace 0000000000000000 ]---
> [    3.204580] [Firmware Bug]: CPU0: APIC id mismatch. Firmware: 0 APIC: 2


So this is where it all starts to go wrong, and we didn't really change
this in the less-tested later part of the series. So even though I'd
like to be sure you've tested with just '-part1' to be sure, I suspect
that isn't the issue.

The warning you highlighted the end of the end of the log is just
complaining that a given AP is *already* marked as initialized when
it's trying to come up, and that's just fallout of the fact that they
don't know which CPU they are. They *all* come up thinking they're
CPU#0.

So that's weird. Now, what we do in this series is stop *telling* the
AP which CPU# it is in a global variable as we bring them up one at a
time, and instead we let them get their own APIC ID from CPUID leaf 0xb
and look up their per-cpu data that way.

The commit message for 'Support parallel startup of secondary CPUs'
(currently commit 0f52d4eaaf0c) explains that in a bit more detail.

Please could you try parallel-6.2-rc6-part1 with something like this?
If I boot this in qemu with a weird topology to stop it being a 1:1
mapping, I do see a series of BbCcDdDeEeFfIgJh... showing the APIC ID
and CPU# that each AP finds as it starts up.

qemu-system-x86_64 -kernel arch/x86/boot/bzImage -smp 24,sockets=4,cores=3,threads=2 -append "console=ttyS0,115200 lpj=16528321 earlyprintk=ttyS0" -serial mon:stdio -display none -m 1G -accel kvm,kernel-irqchip=split

...
[ 0.570022] x86: Booting SMP configuration:
BbCc[ 0.570923] .... node #0, CPUs: #1 #2
[ 0.711468] Callback from call_rcu_tasks_rude() invoked.
DdEe[ 0.713316] #3 #4
[ 0.854459] Callback from call_rcu_tasks() invoked.
FfIgJhKiLjMkNlQmRnSoTpUqVrYsZt[u\v]w^x[ 0.856289] #5 #6 #7 #8 #9 #10 #11 #12 #13 #14 #15 #16 #17 #18 #19 #20 #21 #22 #23



diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index d07f694691d2..c3219dc2a201 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -247,8 +247,8 @@ SYM_INNER_LABEL(secondary_startup_64_no_verify, SYM_L_GLOBAL)
* Is this the boot CPU coming up? If so everything is available
* in initial_gs, initial_stack and early_gdt_descr.
*/
- movl smpboot_control(%rip), %eax
- testl %eax, %eax
+ movl smpboot_control(%rip), %edx
+ testl %edx, %edx
jz .Lsetup_cpu

/*
@@ -259,30 +259,45 @@ SYM_INNER_LABEL(secondary_startup_64_no_verify, SYM_L_GLOBAL)
* Bit 30 STARTUP_SECONDARY flag
* Bit 31 STARTUP_PARALLEL flag (use CPUID 0x0b for APIC ID)
*/
- testl $STARTUP_PARALLEL, %eax
+ testl $STARTUP_PARALLEL, %edx
jnz .Luse_cpuid_0b
- andl $0x0FFFFFFF, %eax
+ andl $0x0FFFFFFF, %edx
jmp .Lsetup_AP

.Luse_cpuid_0b:
mov $0x0B, %eax
xorl %ecx, %ecx
cpuid
- mov %edx, %eax
+
+/* test hack: print 'a' + APICID */

.Lsetup_AP:
- /* EAX contains the APICID of the current CPU */
+ /* EDX contains the APICID of the current CPU */
+
+ /* Test hack: Print APIC ID and then CPU# when we find it. */
+ mov %edx, %ecx
+ mov %edx, %eax
+ addb $'A', %al
+ mov $0x3f8, %dx
+ outb %al, %dx
+ mov %ecx, %edx
+ mov $'a', %al
+
xorl %ecx, %ecx
leaq cpuid_to_apicid(%rip), %rbx

.Lfind_cpunr:
- cmpl (%rbx), %eax
+ cmpl (%rbx), %edx
jz .Linit_cpu_data
addq $4, %rbx
addq $8, %rcx
+ addb $1, %al
jmp .Lfind_cpunr

.Linit_cpu_data:
+ mov $0x3f8, %dx
+ outb %al, %dx
+
/* Get the per cpu offset */
leaq __per_cpu_offset(%rip), %rbx
addq %rcx, %rbx





Attachments:
smime.p7s (5.83 kB)

2023-02-04 10:10:19

by David Woodhouse

[permalink] [raw]
Subject: Re: [PATCH v6 07/11] x86/smpboot: Disable parallel boot for AMD CPUs

On Fri, 2023-02-03 at 13:48 -0600, Kim Phillips wrote:
> [    3.204580] [Firmware Bug]: CPU0: APIC id mismatch. Firmware: 0
> APIC: 2


Could it just be that some processors have CPUID leaves above 0xb but
don't actually support 0xb?

What does this do?

--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -1578,6 +1578,18 @@ void __init native_smp_prepare_cpus(unsigned int max_cpus)
cc_platform_has(CC_ATTR_GUEST_STATE_ENCRYPT))
do_parallel_bringup = false;

+ if (do_parallel_bringup) {
+ /* Check that CPUID 0x0B really does look sane. */
+ unsigned int eax, ebx, ecx, edx;
+
+ cpuid_count(0xb, 0, &eax, &ebx, &ecx, &edx);
+ printk("CPUID 0xB level 0: %x %x %x %x\n", eax, ebx, ecx, edx);
+ if (!eax && !ebx) {
+ pr_info("Disabling parallel bringup because CPUID 0xb looks untrustworthy\n");
+ do_parallel_bringup = false;
+ }
+ }
+
if (do_parallel_bringup &&
boot_cpu_has_bug(X86_BUG_NO_PARALLEL_BRINGUP)) {
pr_info("Disabling parallel bringup due to CPU bugs\n");


Attachments:
smime.p7s (5.83 kB)

2023-02-04 15:42:04

by David Woodhouse

[permalink] [raw]
Subject: Re: [PATCH v6 07/11] x86/smpboot: Disable parallel boot for AMD CPUs

On Fri, 2023-02-03 at 13:48 -0600, Kim Phillips wrote:
>
> I'd like to nack this, but can't (and not because it doesn't have
> commit text)
>


So, I *think* that worked precisely as designed; thanks for letting it
grab your attention :)

> If I:
>
>   - take dwmw2's parallel-6.2-rc6 branch (commit 459d1c46dbd1)
>   - remove the set_cpu_bug(c, X86_BUG_NO_PARALLEL_BRINGUP) line from amd.c
>
> Then:
>
>   - a Ryzen 3000 (Picasso A1/Zen+) notebook I have access to fails to boot.
>   - Zen 2,3,4-based servers boot fine
>   - a Zen1-based server doesn't boot.

I've changed it to use CPUID 0xb only if we're actually in x2apic mode,
which Boris tells me won't be the case on Zen1 because that doesn't
support X2APIC.

When we're not in x2apic mode, we can use CPUID 0x1 because the 8 bits
of APIC ID we find there are perfectly sufficient.

New tree in the same place as before, commit ce7e2d1e046a for the
parallel-6.2-rc6-part1 tag and 17bbd12ee03 for parallel-6.2-rc6.

However...

Even though we *can* support non-X2APIC processors, we *might* want to
play it safe and not go back that far; only enabling parallel bringup
on machines with X2APIC which roughly correlates with "lots of CPUs"
since that's where the benefit is.




Attachments:
smime.p7s (5.83 kB)

2023-02-04 18:19:04

by Arjan van de Ven

[permalink] [raw]
Subject: Re: [PATCH v6 07/11] x86/smpboot: Disable parallel boot for AMD CPUs

>
> However...
>
> Even though we *can* support non-X2APIC processors, we *might* want to
> play it safe and not go back that far; only enabling parallel bringup
> on machines with X2APIC which roughly correlates with "lots of CPUs"
> since that's where the benefit is.

I think that this is the right approach, at least on the initial patch series.
KISS principle; do all the easy-but-important cases first, get it stable and working
and in later series/kernels the range can be expanded.... if it matters.


>
>
>


2023-02-04 22:32:29

by David Woodhouse

[permalink] [raw]
Subject: Re: [PATCH v6 07/11] x86/smpboot: Disable parallel boot for AMD CPUs



On 4 February 2023 18:18:55 GMT, Arjan van de Ven <[email protected]> wrote:
>>
>> However...
>>
>> Even though we *can* support non-X2APIC processors, we *might* want to
>> play it safe and not go back that far; only enabling parallel bringup
>> on machines with X2APIC which roughly correlates with "lots of CPUs"
>> since that's where the benefit is.
>
>I think that this is the right approach, at least on the initial patch series.
>KISS principle; do all the easy-but-important cases first, get it stable and working
>and in later series/kernels the range can be expanded.... if it matters.

Agreed. I'll split it to do it only with X2APIC for the initial series, and then hold the CPUID 0x1 part back for the next phase.

2023-02-05 19:19:52

by Russ Anderson

[permalink] [raw]
Subject: Re: [PATCH v6 00/11] Parallel CPU bringup for x86_64

On Thu, Feb 02, 2023 at 09:56:14PM +0000, Usama Arif wrote:
> Doing INIT/SIPI/SIPI in parallel brings down the time for smpboot from ~700ms
> to 100ms (85% improvement) on a server with 128 CPUs split across 2 NUMA nodes.
> The parallel CPU bringup is disabled for all AMD CPUs in this version:
> (see discussions: https://lore.kernel.org/all/[email protected]/ and
> https://lore.kernel.org/all/[email protected]/).
>
> The last patch reuses the timer calibration from CPU 0 for secondary CPUs
> which brings down the time for parallel smpboot from 100ms to 30ms. It is
> missing a sign-off from the author, which hopefully Arjan will add.
>
> Changes across versions:
> v2: Cut it back to just INIT/SIPI/SIPI in parallel for now, nothing more
> v3: Clean up x2apic patch, add MTRR optimisation, lock topology update
> in preparation for more parallelisation.
> v4: Fixes to the real mode parallelisation patch spotted by SeanC, to
> avoid scribbling on initial_gs in common_cpu_up(), and to allow all
> 24 bits of the physical X2APIC ID to be used. That patch still needs
> a Signed-off-by from its original author, who once claimed not to
> remember writing it at all. But now we've fixed it, hopefully he'll
> admit it now :)
> v5: rebase to v6.1 and remeasure performance, disable parallel bringup
> for AMD CPUs.
> v6: rebase to v6.2-rc6, disabled parallel boot on amd as a cpu bug and
> reused timer calibration for secondary CPUs.
>
> Arjan van de Ven (1):
> x86/smpboot: reuse timer calibration
>
> David Woodhouse (9):
> x86/apic/x2apic: Fix parallel handling of cluster_mask
> cpu/hotplug: Move idle_thread_get() to <linux/smpboot.h>
> cpu/hotplug: Add dynamic parallel bringup states before
> CPUHP_BRINGUP_CPU
> x86/smpboot: Reference count on smpboot_setup_warm_reset_vector()
> x86/smpboot: Split up native_cpu_up into separate phases and document
> them
> x86/smpboot: Disable parallel boot for AMD CPUs
> x86/smpboot: Send INIT/SIPI/SIPI to secondary CPUs in parallel
> x86/mtrr: Avoid repeated save of MTRRs on boot-time CPU bringup
> x86/smpboot: Serialize topology updates for secondary bringup
>
> Thomas Gleixner (1):
> x86/smpboot: Support parallel startup of secondary CPUs
>
> arch/x86/include/asm/cpufeatures.h | 1 +
> arch/x86/include/asm/realmode.h | 3 +
> arch/x86/include/asm/smp.h | 13 +-
> arch/x86/include/asm/topology.h | 2 -
> arch/x86/kernel/acpi/sleep.c | 1 +
> arch/x86/kernel/apic/apic.c | 2 +-
> arch/x86/kernel/apic/x2apic_cluster.c | 108 +++++----
> arch/x86/kernel/cpu/amd.c | 11 +
> arch/x86/kernel/cpu/common.c | 6 +-
> arch/x86/kernel/cpu/mtrr/mtrr.c | 9 +
> arch/x86/kernel/head_64.S | 75 ++++++
> arch/x86/kernel/smpboot.c | 333 ++++++++++++++++++--------
> arch/x86/kernel/tsc.c | 3 +
> arch/x86/realmode/init.c | 3 +
> arch/x86/realmode/rm/trampoline_64.S | 14 ++
> arch/x86/xen/smp_pv.c | 4 +-
> include/linux/cpuhotplug.h | 2 +
> include/linux/smpboot.h | 7 +
> kernel/cpu.c | 27 ++-
> kernel/smpboot.c | 2 +-
> kernel/smpboot.h | 2 -
> 21 files changed, 469 insertions(+), 159 deletions(-)
>
> --
> 2.25.1

Gave the v6 patchset a try on a system with 1920 logocal cpus (sixteen 60 core Sapphire Rapids sockets with Hyperthreadding enabled).

Without the patchset it took 71 seconds to start all the cpus.
With the v6 patchset it took 14 seconds to start all the cpus,
a reduction of 57 seconds. That is impressive.

Full boot, to root login prompt, without patches takes 223 seconds.
This patchset reduces the full boot time by 57 seconds, a 25% reduction.

Well done.


Output with the patchset:
--------------------------------------------------------------------------
2023-02-05 18:12:19Z [ 1.172564][ T1] smp: Bringing up secondary CPUs ...
2023-02-05 18:12:19Z [ 1.179316][ T1] x86: Booting SMP configuration:
2023-02-05 18:12:19Z [ 1.179634][ T1] .... node #0, CPUs: #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 #11 #12 #13 #14 #15 #16 #17 #18 #19 #20 #21 #22 #23 #24 #25 #26 #27 #28 #29 #30 #31 #32 #33 #34 #35 #36 #37 #38 #39 #40 #41 #42 #43 #44 #45 #46 #47 #48 #49 #50 #51 #52 #53 #54 #55 #56 #57 #58 #59
2023-02-05 18:12:20Z [ 1.267650][ T1] .... node #1, CPUs: #60 #61 #62 #63 #64 #65 #66 #67 #68 #69 #70 #71 #72 #73 #74 #75 #76 #77 #78 #79 #80 #81 #82 #83 #84 #85 #86 #87 #88 #89 #90 #91 #92 #93 #94 #95 #96 #97 #98 #99 #100 #101 #102 #103 #104 #105 #106 #107 #108 #109 #110 #111 #112 #113 #114 #115 #116 #117 #118 #119
2023-02-05 18:12:20Z [ 1.315633][ T1] .... node #2, CPUs: #120 #121 #122 #123 #124 #125 #126 #127 #128 #129 #130 #131 #132 #133 #134 #135 #136 #137 #138 #139 #140 #141 #142 #143 #144 #145 #146 #147 #148 #149 #150 #151 #152 #153 #154 #155 #156 #157 #158 #159 #160 #161 #162 #163 #164 #165 #166 #167 #168 #169 #170 #171 #172 #173 #174 #175 #176 #177 #178 #179
2023-02-05 18:12:20Z [ 1.363634][ T1] .... node #3, CPUs: #180 #181 #182 #183 #184 #185 #186 #187 #188 #189 #190 #191 #192 #193 #194 #195 #196 #197 #198 #199 #200 #201 #202 #203 #204 #205 #206 #207 #208 #209 #210 #211 #212 #213 #214 #215 #216 #217 #218 #219 #220 #221 #222 #223 #224 #225 #226 #227 #228 #229 #230 #231 #232 #233 #234 #235 #236 #237 #238 #239
2023-02-05 18:12:20Z [ 1.411634][ T1] .... node #4, CPUs: #240 #241 #242 #243 #244 #245 #246 #247 #248 #249 #250 #251 #252 #253 #254 #255 #256 #257 #258 #259 #260 #261 #262 #263 #264 #265 #266 #267 #268 #269 #270 #271 #272 #273 #274 #275 #276 #277 #278 #279 #280 #281 #282 #283 #284 #285 #286 #287 #288 #289 #290 #291 #292 #293 #294 #295 #296 #297 #298 #299
2023-02-05 18:12:20Z [ 1.459634][ T1] .... node #5, CPUs: #300 #301 #302 #303 #304 #305 #306 #307 #308 #309 #310 #311 #312 #313 #314 #315 #316 #317 #318 #319 #320 #321 #322 #323 #324 #325 #326 #327 #328 #329 #330 #331 #332 #333 #334 #335 #336 #337 #338 #339 #340 #341 #342 #343 #344 #345 #346 #347 #348 #349 #350 #351 #352 #353 #354 #355 #356 #357 #358 #359
2023-02-05 18:12:20Z [ 1.507634][ T1] .... node #6, CPUs: #360 #361 #362 #363 #364 #365 #366 #367 #368 #369 #370 #371 #372 #373 #374 #375 #376 #377 #378 #379 #380 #381 #382 #383 #384 #385 #386 #387 #388 #389 #390 #391 #392 #393 #394 #395 #396 #397 #398 #399 #400 #401 #402 #403 #404 #405 #406 #407 #408 #409 #410 #411 #412 #413 #414 #415 #416 #417 #418 #419
2023-02-05 18:12:20Z [ 1.555634][ T1] .... node #7, CPUs: #420 #421 #422 #423 #424 #425 #426 #427 #428 #429 #430 #431 #432 #433 #434 #435 #436 #437 #438 #439 #440 #441 #442 #443 #444 #445 #446 #447 #448 #449 #450 #451 #452 #453 #454 #455 #456 #457 #458 #459 #460 #461 #462 #463 #464 #465 #466 #467 #468 #469 #470 #471 #472 #473 #474 #475 #476 #477 #478 #479
2023-02-05 18:12:20Z [ 1.603634][ T1] .... node #8, CPUs: #480 #481 #482 #483 #484 #485 #486 #487 #488 #489 #490 #491 #492 #493 #494 #495 #496 #497 #498 #499 #500 #501 #502 #503 #504 #505 #506 #507 #508 #509 #510 #511 #512 #513 #514 #515 #516 #517 #518 #519 #520 #521 #522 #523 #524 #525 #526 #527 #528 #529 #530 #531 #532 #533 #534 #535 #536 #537 #538 #539
2023-02-05 18:12:20Z [ 1.655634][ T1] .... node #9, CPUs: #540 #541 #542 #543 #544 #545 #546 #547 #548 #549 #550 #551 #552 #553 #554 #555 #556 #557 #558 #559 #560 #561 #562 #563 #564 #565 #566 #567 #568 #569 #570 #571 #572 #573 #574 #575 #576 #577 #578 #579 #580 #581 #582 #583 #584 #585 #586 #587 #588 #589 #590 #591 #592 #593 #594 #595 #596 #597 #598 #599
2023-02-05 18:12:20Z [ 1.707633][ T1] .... node #10, CPUs: #600 #601 #602 #603 #604 #605 #606 #607 #608 #609 #610 #611 #612 #613 #614 #615 #616 #617 #618 #619 #620 #621 #622 #623 #624 #625 #626 #627 #628 #629 #630 #631 #632 #633 #634 #635 #636 #637 #638 #639 #640 #641 #642 #643 #644 #645 #646 #647 #648 #649 #650 #651 #652 #653 #654 #655 #656 #657 #658 #659
2023-02-05 18:12:20Z [ 1.755634][ T1] .... node #11, CPUs: #660 #661 #662 #663 #664 #665 #666 #667 #668 #669 #670 #671 #672 #673 #674 #675 #676 #677 #678 #679 #680 #681 #682 #683 #684 #685 #686 #687 #688 #689 #690 #691 #692 #693 #694 #695 #696 #697 #698 #699 #700 #701 #702 #703 #704 #705 #706 #707 #708 #709 #710 #711 #712 #713 #714 #715 #716 #717 #718 #719
2023-02-05 18:12:20Z [ 1.803634][ T1] .... node #12, CPUs: #720 #721 #722 #723 #724 #725 #726 #727 #728 #729 #730 #731 #732 #733 #734 #735 #736 #737 #738 #739 #740 #741 #742 #743 #744 #745 #746 #747 #748 #749 #750 #751 #752 #753 #754 #755 #756 #757 #758 #759 #760 #761 #762 #763 #764 #765 #766 #767 #768 #769 #770 #771 #772 #773 #774 #775 #776 #777 #778 #779
2023-02-05 18:12:21Z [ 1.855634][ T1] .... node #13, CPUs: #780 #781 #782 #783 #784 #785 #786 #787 #788 #789 #790 #791 #792 #793 #794 #795 #796 #797 #798 #799 #800 #801 #802 #803 #804 #805 #806 #807 #808 #809 #810 #811 #812 #813 #814 #815 #816 #817 #818 #819 #820 #821 #822 #823 #824 #825 #826 #827 #828 #829 #830 #831 #832 #833 #834 #835 #836 #837 #838 #839
2023-02-05 18:12:21Z [ 1.903641][ T1] .... node #14, CPUs: #840 #841 #842 #843 #844 #845 #846 #847 #848 #849 #850 #851 #852 #853 #854 #855 #856 #857 #858 #859 #860 #861 #862 #863 #864 #865 #866 #867 #868 #869 #870 #871 #872 #873 #874 #875 #876 #877 #878 #879 #880 #881 #882 #883 #884 #885 #886 #887 #888 #889 #890 #891 #892 #893 #894 #895 #896 #897 #898 #899
2023-02-05 18:12:21Z [ 1.951634][ T1] .... node #15, CPUs: #900 #901 #902 #903 #904 #905 #906 #907 #908 #909 #910 #911 #912 #913 #914 #915 #916 #917 #918 #919 #920 #921 #922 #923 #924 #925 #926 #927 #928 #929 #930 #931 #932 #933 #934 #935 #936 #937 #938 #939 #940 #941 #942 #943 #944 #945 #946 #947 #948 #949 #950 #951 #952 #953 #954 #955 #956 #957 #958 #959
2023-02-05 18:12:21Z [ 1.999634][ T1] .... node #0, CPUs: #960 #961 #962 #963 #964 #965 #966 #967 #968 #969 #970 #971 #972 #973 #974 #975 #976 #977 #978 #979 #980 #981 #982 #983 #984 #985 #986 #987 #988 #989 #990 #991 #992 #993 #994 #995 #996 #997 #998 #999 #1000 #1001 #1002 #1003 #1004 #1005 #1006 #1007 #1008 #1009 #1010 #1011 #1012 #1013 #1014 #1015 #1016 #1017 #1018 #1019
2023-02-05 18:12:21Z [ 2.079642][ T1] .... node #1, CPUs: #1020 #1021 #1022 #1023 #1024 #1025 #1026 #1027 #1028 #1029 #1030 #1031 #1032 #1033 #1034 #1035 #1036 #1037 #1038 #1039 #1040 #1041 #1042 #1043 #1044 #1045 #1046 #1047 #1048 #1049 #1050 #1051 #1052 #1053 #1054 #1055 #1056 #1057 #1058 #1059 #1060 #1061 #1062 #1063 #1064 #1065 #1066 #1067 #1068 #1069 #1070 #1071 #1072 #1073 #1074 #1075 #1076 #1077 #1078 #1079
2023-02-05 18:12:21Z [ 2.119634][ T1] .... node #2, CPUs: #1080 #1081 #1082 #1083 #1084 #1085 #1086 #1087 #1088 #1089 #1090 #1091 #1092 #1093 #1094 #1095 #1096 #1097 #1098 #1099 #1100 #1101 #1102 #1103 #1104 #1105 #1106 #1107 #1108 #1109 #1110 #1111 #1112 #1113 #1114 #1115 #1116 #1117 #1118 #1119 #1120 #1121 #1122 #1123 #1124 #1125 #1126 #1127 #1128 #1129 #1130 #1131 #1132 #1133 #1134 #1135 #1136 #1137 #1138 #1139
2023-02-05 18:12:21Z [ 2.155634][ T1] .... node #3, CPUs: #1140 #1141 #1142 #1143 #1144 #1145 #1146 #1147 #1148 #1149 #1150 #1151 #1152 #1153 #1154 #1155 #1156 #1157 #1158 #1159 #1160 #1161 #1162 #1163 #1164 #1165 #1166 #1167 #1168 #1169 #1170 #1171 #1172 #1173 #1174 #1175 #1176 #1177 #1178 #1179 #1180 #1181 #1182 #1183 #1184 #1185 #1186 #1187 #1188 #1189 #1190 #1191 #1192 #1193 #1194 #1195 #1196 #1197 #1198 #1199
2023-02-05 18:12:21Z [ 2.195634][ T1] .... node #4, CPUs: #1200 #1201 #1202 #1203 #1204 #1205 #1206 #1207 #1208 #1209 #1210 #1211 #1212 #1213 #1214 #1215 #1216 #1217 #1218 #1219 #1220 #1221 #1222 #1223 #1224 #1225 #1226 #1227 #1228 #1229 #1230 #1231 #1232 #1233 #1234 #1235 #1236 #1237 #1238 #1239 #1240 #1241 #1242 #1243 #1244 #1245 #1246 #1247 #1248 #1249 #1250 #1251 #1252 #1253 #1254 #1255 #1256 #1257 #1258 #1259
2023-02-05 18:12:21Z [ 2.235635][ T1] .... node #5, CPUs: #1260 #1261 #1262 #1263 #1264 #1265 #1266 #1267 #1268 #1269 #1270 #1271 #1272 #1273 #1274 #1275 #1276 #1277 #1278 #1279 #1280 #1281 #1282 #1283 #1284 #1285 #1286 #1287 #1288 #1289 #1290 #1291 #1292 #1293 #1294 #1295 #1296 #1297 #1298 #1299 #1300 #1301 #1302 #1303 #1304 #1305 #1306 #1307 #1308 #1309 #1310 #1311 #1312 #1313 #1314 #1315 #1316 #1317 #1318 #1319
2023-02-05 18:12:21Z [ 2.275634][ T1] .... node #6, CPUs: #1320 #1321 #1322 #1323 #1324 #1325 #1326 #1327 #1328 #1329 #1330 #1331 #1332 #1333 #1334 #1335 #1336 #1337 #1338 #1339 #1340 #1341 #1342 #1343 #1344 #1345 #1346 #1347 #1348 #1349 #1350 #1351 #1352 #1353 #1354 #1355 #1356 #1357 #1358 #1359 #1360 #1361 #1362 #1363 #1364 #1365 #1366 #1367 #1368 #1369 #1370 #1371 #1372 #1373 #1374 #1375 #1376 #1377 #1378 #1379
2023-02-05 18:12:21Z [ 2.320561][ T1] .... node #7, CPUs: #1380 #1381 #1382 #1383 #1384 #1385 #1386 #1387 #1388 #1389 #1390 #1391 #1392 #1393 #1394 #1395 #1396 #1397 #1398 #1399 #1400 #1401 #1402 #1403 #1404 #1405 #1406 #1407 #1408 #1409 #1410 #1411 #1412 #1413 #1414 #1415 #1416 #1417 #1418 #1419 #1420 #1421 #1422 #1423 #1424 #1425 #1426 #1427 #1428 #1429 #1430 #1431 #1432 #1433 #1434 #1435 #1436 #1437 #1438 #1439
2023-02-05 18:12:21Z [ 2.363647][ T1] .... node #8, CPUs: #1440 #1441 #1442 #1443 #1444 #1445 #1446 #1447 #1448 #1449 #1450 #1451 #1452 #1453 #1454 #1455 #1456 #1457 #1458 #1459 #1460 #1461 #1462 #1463 #1464 #1465 #1466 #1467 #1468 #1469 #1470 #1471 #1472 #1473 #1474 #1475 #1476 #1477 #1478 #1479 #1480 #1481 #1482 #1483 #1484 #1485 #1486 #1487 #1488 #1489 #1490 #1491 #1492 #1493 #1494 #1495 #1496 #1497 #1498 #1499
2023-02-05 18:12:22Z [ 2.407636][ T1] .... node #9, CPUs: #1500 #1501 #1502 #1503 #1504 #1505 #1506 #1507 #1508 #1509 #1510 #1511 #1512 #1513 #1514 #1515 #1516 #1517 #1518 #1519 #1520 #1521 #1522 #1523 #1524 #1525 #1526 #1527 #1528 #1529 #1530 #1531 #1532 #1533 #1534 #1535 #1536 #1537 #1538 #1539 #1540 #1541 #1542 #1543 #1544 #1545 #1546 #1547 #1548 #1549 #1550 #1551 #1552 #1553 #1554 #1555 #1556 #1557 #1558 #1559
2023-02-05 18:12:22Z [ 2.451644][ T1] .... node #10, CPUs: #1560 #1561 #1562 #1563 #1564 #1565 #1566 #1567 #1568 #1569 #1570 #1571 #1572 #1573 #1574 #1575 #1576 #1577 #1578 #1579 #1580 #1581 #1582 #1583 #1584 #1585 #1586 #1587 #1588 #1589 #1590 #1591 #1592 #1593 #1594 #1595 #1596 #1597 #1598 #1599 #1600 #1601 #1602 #1603 #1604 #1605 #1606 #1607 #1608 #1609 #1610 #1611 #1612 #1613 #1614 #1615 #1616 #1617 #1618 #1619
2023-02-05 18:12:22Z [ 2.495648][ T1] .... node #11, CPUs: #1620 #1621 #1622 #1623 #1624 #1625 #1626 #1627 #1628 #1629 #1630 #1631 #1632 #1633 #1634 #1635 #1636 #1637 #1638 #1639 #1640 #1641 #1642 #1643 #1644 #1645 #1646 #1647 #1648 #1649 #1650 #1651 #1652 #1653 #1654 #1655 #1656 #1657 #1658 #1659 #1660 #1661 #1662 #1663 #1664 #1665 #1666 #1667 #1668 #1669 #1670 #1671 #1672 #1673 #1674 #1675 #1676 #1677 #1678 #1679
2023-02-05 18:12:22Z [ 2.539634][ T1] .... node #12, CPUs: #1680 #1681 #1682 #1683 #1684 #1685 #1686 #1687 #1688 #1689 #1690 #1691 #1692 #1693 #1694 #1695 #1696 #1697 #1698 #1699 #1700 #1701 #1702 #1703 #1704 #1705 #1706 #1707 #1708 #1709 #1710 #1711 #1712 #1713 #1714 #1715 #1716 #1717 #1718 #1719 #1720 #1721 #1722 #1723 #1724 #1725 #1726 #1727 #1728 #1729 #1730 #1731 #1732 #1733 #1734 #1735 #1736 #1737 #1738 #1739
2023-02-05 18:12:22Z [ 2.583644][ T1] .... node #13, CPUs: #1740 #1741 #1742 #1743 #1744 #1745 #1746 #1747 #1748 #1749 #1750 #1751 #1752 #1753 #1754 #1755 #1756 #1757 #1758 #1759 #1760 #1761 #1762 #1763 #1764 #1765 #1766 #1767 #1768 #1769 #1770 #1771 #1772 #1773 #1774 #1775 #1776 #1777 #1778 #1779 #1780 #1781 #1782 #1783 #1784 #1785 #1786 #1787 #1788 #1789 #1790 #1791 #1792 #1793 #1794 #1795 #1796 #1797 #1798 #1799
2023-02-05 18:12:22Z [ 2.627635][ T1] .... node #14, CPUs: #1800 #1801 #1802 #1803 #1804 #1805 #1806 #1807 #1808 #1809 #1810 #1811 #1812 #1813 #1814 #1815 #1816 #1817 #1818 #1819 #1820 #1821 #1822 #1823 #1824 #1825 #1826 #1827 #1828 #1829 #1830 #1831 #1832 #1833 #1834 #1835 #1836 #1837 #1838 #1839 #1840 #1841 #1842 #1843 #1844 #1845 #1846 #1847 #1848 #1849 #1850 #1851 #1852 #1853 #1854 #1855 #1856 #1857 #1858 #1859
2023-02-05 18:12:22Z [ 2.675650][ T1] .... node #15, CPUs: #1860 #1861 #1862 #1863 #1864 #1865 #1866 #1867 #1868 #1869 #1870 #1871 #1872 #1873 #1874 #1875 #1876 #1877 #1878 #1879 #1880 #1881 #1882 #1883 #1884 #1885 #1886 #1887 #1888 #1889 #1890 #1891 #1892 #1893 #1894 #1895 #1896 #1897 #1898 #1899 #1900 #1901 #1902 #1903 #1904 #1905 #1906 #1907 #1908 #1909 #1910 #1911 #1912 #1913 #1914 #1915 #1916 #1917 #1918 #1919
2023-02-05 18:12:22Z [ 0.000000][ T0] smpboot: CPU 60 Converting physical 0 to logical die 1
2023-02-05 18:12:22Z [ 0.000000][ T0] smpboot: CPU 120 Converting physical 0 to logical die 2
2023-02-05 18:12:22Z [ 0.000000][ T0] smpboot: CPU 180 Converting physical 0 to logical die 3
2023-02-05 18:12:23Z [ 0.000000][ T0] smpboot: CPU 240 Converting physical 0 to logical die 4
2023-02-05 18:12:23Z [ 0.000000][ T0] smpboot: CPU 300 Converting physical 0 to logical die 5
2023-02-05 18:12:23Z [ 0.000000][ T0] smpboot: CPU 360 Converting physical 0 to logical die 6
2023-02-05 18:12:23Z [ 0.000000][ T0] smpboot: CPU 420 Converting physical 0 to logical die 7
2023-02-05 18:12:23Z [ 0.000000][ T0] smpboot: CPU 480 Converting physical 0 to logical die 8
2023-02-05 18:12:23Z [ 0.000000][ T0] smpboot: CPU 540 Converting physical 0 to logical die 9
2023-02-05 18:12:24Z [ 0.000000][ T0] smpboot: CPU 600 Converting physical 0 to logical die 10
2023-02-05 18:12:24Z [ 0.000000][ T0] smpboot: CPU 660 Converting physical 0 to logical die 11
2023-02-05 18:12:24Z [ 0.000000][ T0] smpboot: CPU 720 Converting physical 0 to logical die 12
2023-02-05 18:12:25Z [ 0.000000][ T0] smpboot: CPU 780 Converting physical 0 to logical die 13
2023-02-05 18:12:25Z [ 0.000000][ T0] smpboot: CPU 840 Converting physical 0 to logical die 14
2023-02-05 18:12:25Z [ 0.000000][ T0] smpboot: CPU 900 Converting physical 0 to logical die 15
2023-02-05 18:12:35Z [ 15.213189][ T1] smp: Brought up 16 nodes, 1920 CPUs
2023-02-05 18:12:35Z [ 15.215637][ T1] smpboot: Max logical packages: 16
2023-02-05 18:12:35Z [ 15.219750][ T1] smpboot: Total of 1920 processors activated (7296000.00 BogoMIPS)
--------------------------------------------------------------------------


Output without the patchset:
--------------------------------------------------------------------------
2023-02-03 11:06:49Z [ 1.157637][ T1] smp: Bringing up secondary CPUs ...
2023-02-03 11:06:49Z [ 1.162807][ T1] x86: Booting SMP configuration:
2023-02-03 11:06:51Z [ 1.164299][ T1] .... node #0, CPUs: #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 #11 #12 #13 #14 #15 #16 #17 #18 #19 #20 #21 #22 #23 #24 #25 #26 #27 #28 #29 #30 #31 #32 #33 #34 #35 #36 #37 #38 #39 #40 #41 #42 #43 #44 #45 #46 #47 #48 #49 #50 #51 #52 #53 #54 #55 #56 #57 #58 #59
2023-02-03 11:06:51Z [ 3.052299][ T1] .... node #1, CPUs: #60
2023-02-03 11:06:51Z [ 0.000000][ T0] smpboot: CPU 60 Converting physical 0 to logical die 1
2023-02-03 11:06:53Z [ 3.133499][ T1] #61 #62 #63 #64 #65 #66 #67 #68 #69 #70 #71 #72 #73 #74 #75 #76 #77 #78 #79 #80 #81 #82 #83 #84 #85 #86 #87 #88 #89 #90 #91 #92 #93 #94 #95 #96 #97 #98 #99 #100 #101 #102 #103 #104 #105 #106 #107 #108 #109 #110 #111 #112 #113 #114 #115 #116 #117 #118 #119
2023-02-03 11:06:53Z [ 4.984298][ T1] .... node #2, CPUs: #120
2023-02-03 11:06:53Z [ 0.000000][ T0] smpboot: CPU 120 Converting physical 0 to logical die 2
2023-02-03 11:06:55Z [ 5.065813][ T1] #121 #122 #123 #124 #125 #126 #127 #128 #129 #130 #131 #132 #133 #134 #135 #136 #137 #138 #139 #140 #141 #142 #143 #144 #145 #146 #147 #148 #149 #150 #151 #152 #153 #154 #155 #156 #157 #158 #159 #160 #161 #162 #163 #164 #165 #166 #167 #168 #169 #170 #171 #172 #173 #174 #175 #176 #177 #178 #179
2023-02-03 11:06:55Z [ 6.920298][ T1] .... node #3, CPUs: #180
2023-02-03 11:06:55Z [ 0.000000][ T0] smpboot: CPU 180 Converting physical 0 to logical die 3
2023-02-03 11:06:57Z [ 7.002019][ T1] #181 #182 #183 #184 #185 #186 #187 #188 #189 #190 #191 #192 #193 #194 #195 #196 #197 #198 #199 #200 #201 #202 #203 #204 #205 #206 #207 #208 #209 #210 #211 #212 #213 #214 #215 #216 #217 #218 #219 #220 #221 #222 #223 #224 #225 #226 #227 #228 #229 #230 #231 #232 #233 #234 #235 #236 #237 #238 #239
2023-02-03 11:06:57Z [ 8.880299][ T1] .... node #4, CPUs: #240
2023-02-03 11:06:57Z [ 0.000000][ T0] smpboot: CPU 240 Converting physical 0 to logical die 4
2023-02-03 11:06:59Z [ 8.991204][ T1] #241 #242 #243 #244 #245 #246 #247 #248 #249 #250 #251 #252 #253 #254 #255 #256 #257 #258 #259 #260 #261 #262 #263 #264 #265 #266 #267 #268 #269 #270 #271 #272 #273 #274 #275 #276 #277 #278 #279 #280 #281 #282 #283 #284 #285 #286 #287 #288 #289 #290 #291 #292 #293 #294 #295 #296 #297 #298 #299
2023-02-03 11:06:59Z [ 10.944299][ T1] .... node #5, CPUs: #300
2023-02-03 11:06:59Z [ 0.000000][ T0] smpboot: CPU 300 Converting physical 0 to logical die 5
2023-02-03 11:07:01Z [ 11.027023][ T1] #301 #302 #303 #304 #305 #306 #307 #308 #309 #310 #311 #312 #313 #314 #315 #316 #317 #318 #319 #320 #321 #322 #323 #324 #325 #326 #327 #328 #329 #330 #331 #332 #333 #334 #335 #336 #337 #338 #339 #340 #341 #342 #343 #344 #345 #346 #347 #348 #349 #350 #351 #352 #353 #354 #355 #356 #357 #358 #359
2023-02-03 11:07:01Z [ 12.964299][ T1] .... node #6, CPUs: #360
2023-02-03 11:07:01Z [ 0.000000][ T0] smpboot: CPU 360 Converting physical 0 to logical die 6
2023-02-03 11:07:03Z [ 13.047169][ T1] #361 #362 #363 #364 #365 #366 #367 #368 #369 #370 #371 #372 #373 #374 #375 #376 #377 #378 #379 #380 #381 #382 #383 #384 #385 #386 #387 #388 #389 #390 #391 #392 #393 #394 #395 #396 #397 #398 #399 #400 #401 #402 #403 #404 #405 #406 #407 #408 #409 #410 #411 #412 #413 #414 #415 #416 #417 #418 #419
2023-02-03 11:07:03Z [ 14.996299][ T1] .... node #7, CPUs: #420
2023-02-03 11:07:03Z [ 0.000000][ T0] smpboot: CPU 420 Converting physical 0 to logical die 7
2023-02-03 11:07:05Z [ 15.079323][ T1] #421 #422 #423 #424 #425 #426 #427 #428 #429 #430 #431 #432 #433 #434 #435 #436 #437 #438 #439 #440 #441 #442 #443 #444 #445 #446 #447 #448 #449 #450 #451 #452 #453 #454 #455 #456 #457 #458 #459 #460 #461 #462 #463 #464 #465 #466 #467 #468 #469 #470 #471 #472 #473 #474 #475 #476 #477 #478 #479
2023-02-03 11:07:06Z [ 17.048299][ T1] .... node #8, CPUs: #480
2023-02-03 11:07:06Z [ 0.000000][ T0] smpboot: CPU 480 Converting physical 0 to logical die 8
2023-02-03 11:07:08Z [ 17.161077][ T1] #481 #482 #483 #484 #485 #486 #487 #488 #489 #490 #491 #492 #493 #494 #495 #496 #497 #498 #499 #500 #501 #502 #503 #504 #505 #506 #507 #508 #509 #510 #511 #512 #513 #514 #515 #516 #517 #518 #519 #520 #521 #522 #523 #524 #525 #526 #527 #528 #529 #530 #531 #532 #533 #534 #535 #536 #537 #538 #539
2023-02-03 11:07:08Z [ 19.196298][ T1] .... node #9, CPUs: #540
2023-02-03 11:07:08Z [ 0.000000][ T0] smpboot: CPU 540 Converting physical 0 to logical die 9
2023-02-03 11:07:10Z [ 19.281098][ T1] #541 #542 #543 #544 #545 #546 #547 #548 #549 #550 #551 #552 #553 #554 #555 #556 #557 #558 #559 #560 #561 #562 #563 #564 #565 #566 #567 #568 #569 #570 #571 #572 #573 #574 #575 #576 #577 #578 #579 #580 #581 #582 #583 #584 #585 #586 #587 #588 #589 #590 #591 #592 #593 #594 #595 #596 #597 #598 #599
2023-02-03 11:07:10Z [ 21.324298][ T1] .... node #10, CPUs: #600
2023-02-03 11:07:10Z [ 0.000000][ T0] smpboot: CPU 600 Converting physical 0 to logical die 10
2023-02-03 11:07:12Z [ 21.409074][ T1] #601 #602 #603 #604 #605 #606 #607 #608 #609 #610 #611 #612 #613 #614 #615 #616 #617 #618 #619 #620 #621 #622 #623 #624 #625 #626 #627 #628 #629 #630 #631 #632 #633 #634 #635 #636 #637 #638 #639 #640 #641 #642 #643 #644 #645 #646 #647 #648 #649 #650 #651 #652 #653 #654 #655 #656 #657 #658 #659
2023-02-03 11:07:12Z [ 23.468298][ T1] .... node #11, CPUs: #660
2023-02-03 11:07:12Z [ 0.000000][ T0] smpboot: CPU 660 Converting physical 0 to logical die 11
2023-02-03 11:07:14Z [ 23.553115][ T1] #661 #662 #663 #664 #665 #666 #667 #668 #669 #670 #671 #672 #673 #674 #675 #676 #677 #678 #679 #680 #681 #682 #683 #684 #685 #686 #687 #688 #689 #690 #691 #692 #693 #694 #695 #696 #697 #698 #699 #700 #701 #702 #703 #704 #705 #706 #707 #708 #709 #710 #711 #712 #713 #714 #715 #716 #717 #718 #719
2023-02-03 11:07:14Z [ 25.624298][ T1] .... node #12, CPUs: #720
2023-02-03 11:07:14Z [ 0.000000][ T0] smpboot: CPU 720 Converting physical 0 to logical die 12
2023-02-03 11:07:17Z [ 25.741550][ T1] #721 #722 #723 #724 #725 #726 #727 #728 #729 #730 #731 #732 #733 #734 #735 #736 #737 #738 #739 #740 #741 #742 #743 #744 #745 #746 #747 #748 #749 #750 #751 #752 #753 #754 #755 #756 #757 #758 #759 #760 #761 #762 #763 #764 #765 #766 #767 #768 #769 #770 #771 #772 #773 #774 #775 #776 #777 #778 #779
2023-02-03 11:07:17Z [ 27.876299][ T1] .... node #13, CPUs: #780
2023-02-03 11:07:17Z [ 0.000000][ T0] smpboot: CPU 780 Converting physical 0 to logical die 13
2023-02-03 11:07:19Z [ 27.962255][ T1] #781 #782 #783 #784 #785 #786 #787 #788 #789 #790 #791 #792 #793 #794 #795 #796 #797 #798 #799 #800 #801 #802 #803 #804 #805 #806 #807 #808 #809 #810 #811 #812 #813 #814 #815 #816 #817 #818 #819 #820 #821 #822 #823 #824 #825 #826 #827 #828 #829 #830 #831 #832 #833 #834 #835 #836 #837 #838 #839
2023-02-03 11:07:19Z [ 30.096299][ T1] .... node #14, CPUs: #840
2023-02-03 11:07:19Z [ 0.000000][ T0] smpboot: CPU 840 Converting physical 0 to logical die 14
2023-02-03 11:07:21Z [ 30.182443][ T1] #841 #842 #843 #844 #845 #846 #847 #848 #849 #850 #851 #852 #853 #854 #855 #856 #857 #858 #859 #860 #861 #862 #863 #864 #865 #866 #867 #868 #869 #870 #871 #872 #873 #874 #875 #876 #877 #878 #879 #880 #881 #882 #883 #884 #885 #886 #887 #888 #889 #890 #891 #892 #893 #894 #895 #896 #897 #898 #899
2023-02-03 11:07:21Z [ 32.332299][ T1] .... node #15, CPUs: #900
2023-02-03 11:07:21Z [ 0.000000][ T0] smpboot: CPU 900 Converting physical 0 to logical die 15
2023-02-03 11:07:23Z [ 32.420700][ T1] #901 #902 #903 #904 #905 #906 #907 #908 #909 #910 #911 #912 #913 #914 #915 #916 #917 #918 #919 #920 #921 #922 #923 #924 #925 #926 #927 #928 #929 #930 #931 #932 #933 #934 #935 #936 #937 #938 #939 #940 #941 #942 #943 #944 #945 #946 #947 #948 #949 #950 #951 #952 #953 #954 #955 #956 #957 #958 #959
2023-02-03 11:07:26Z [ 34.584298][ T1] .... node #0, CPUs: #960 #961 #962 #963 #964 #965 #966 #967 #968 #969 #970 #971 #972 #973 #974 #975 #976 #977 #978 #979 #980 #981 #982 #983 #984 #985 #986 #987 #988 #989 #990 #991 #992 #993 #994 #995 #996 #997 #998 #999 #1000 #1001 #1002 #1003 #1004 #1005 #1006 #1007 #1008 #1009 #1010 #1011 #1012 #1013 #1014 #1015 #1016 #1017 #1018 #1019
2023-02-03 11:07:28Z [ 36.792301][ T1] .... node #1, CPUs: #1020 #1021 #1022 #1023 #1024 #1025 #1026 #1027 #1028 #1029 #1030 #1031 #1032 #1033 #1034 #1035 #1036 #1037 #1038 #1039 #1040 #1041 #1042 #1043 #1044 #1045 #1046 #1047 #1048 #1049 #1050 #1051 #1052 #1053 #1054 #1055 #1056 #1057 #1058 #1059 #1060 #1061 #1062 #1063 #1064 #1065 #1066 #1067 #1068 #1069 #1070 #1071 #1072 #1073 #1074 #1075 #1076 #1077 #1078 #1079
2023-02-03 11:07:30Z [ 38.980298][ T1] .... node #2, CPUs: #1080 #1081 #1082 #1083 #1084 #1085 #1086 #1087 #1088 #1089 #1090 #1091 #1092 #1093 #1094 #1095 #1096 #1097 #1098 #1099 #1100 #1101 #1102 #1103 #1104 #1105 #1106 #1107 #1108 #1109 #1110 #1111 #1112 #1113 #1114 #1115 #1116 #1117 #1118 #1119 #1120 #1121 #1122 #1123 #1124 #1125 #1126 #1127 #1128 #1129 #1130 #1131 #1132 #1133 #1134 #1135 #1136 #1137 #1138 #1139
2023-02-03 11:07:32Z [ 41.180298][ T1] .... node #3, CPUs: #1140 #1141 #1142 #1143 #1144 #1145 #1146 #1147 #1148 #1149 #1150 #1151 #1152 #1153 #1154 #1155 #1156 #1157 #1158 #1159 #1160 #1161 #1162 #1163 #1164 #1165 #1166 #1167 #1168 #1169 #1170 #1171 #1172 #1173 #1174 #1175 #1176 #1177 #1178 #1179 #1180 #1181 #1182 #1183 #1184 #1185 #1186 #1187 #1188 #1189 #1190 #1191 #1192 #1193 #1194 #1195 #1196 #1197 #1198 #1199
2023-02-03 11:07:35Z [ 43.408300][ T1] .... node #4, CPUs: #1200 #1201 #1202 #1203 #1204 #1205 #1206 #1207 #1208 #1209 #1210 #1211 #1212 #1213 #1214 #1215 #1216 #1217 #1218 #1219 #1220 #1221 #1222 #1223 #1224 #1225 #1226 #1227 #1228 #1229 #1230 #1231 #1232 #1233 #1234 #1235 #1236 #1237 #1238 #1239 #1240 #1241 #1242 #1243 #1244 #1245 #1246 #1247 #1248 #1249 #1250 #1251 #1252 #1253 #1254 #1255 #1256 #1257 #1258 #1259
2023-02-03 11:07:37Z [ 45.744298][ T1] .... node #5, CPUs: #1260 #1261 #1262 #1263 #1264 #1265 #1266 #1267 #1268 #1269 #1270 #1271 #1272 #1273 #1274 #1275 #1276 #1277 #1278 #1279 #1280 #1281 #1282 #1283 #1284 #1285 #1286 #1287 #1288 #1289 #1290 #1291 #1292 #1293 #1294 #1295 #1296 #1297 #1298 #1299 #1300 #1301 #1302 #1303 #1304 #1305 #1306 #1307 #1308 #1309 #1310 #1311 #1312 #1313 #1314 #1315 #1316 #1317 #1318 #1319
2023-02-03 11:07:39Z [ 48.056299][ T1] .... node #6, CPUs: #1320 #1321 #1322 #1323 #1324 #1325 #1326 #1327 #1328 #1329 #1330 #1331 #1332 #1333 #1334 #1335 #1336 #1337 #1338 #1339 #1340 #1341 #1342 #1343 #1344 #1345 #1346 #1347 #1348 #1349 #1350 #1351 #1352 #1353 #1354 #1355 #1356 #1357 #1358 #1359 #1360 #1361 #1362 #1363 #1364 #1365 #1366 #1367 #1368 #1369 #1370 #1371 #1372 #1373 #1374 #1375 #1376 #1377 #1378 #1379
2023-02-03 11:07:42Z [ 50.380298][ T1] .... node #7, CPUs: #1380 #1381 #1382 #1383 #1384 #1385 #1386 #1387 #1388 #1389 #1390 #1391 #1392 #1393 #1394 #1395 #1396 #1397 #1398 #1399 #1400 #1401 #1402 #1403 #1404 #1405 #1406 #1407 #1408 #1409 #1410 #1411 #1412 #1413 #1414 #1415 #1416 #1417 #1418 #1419 #1420 #1421 #1422 #1423 #1424 #1425 #1426 #1427 #1428 #1429 #1430 #1431 #1432 #1433 #1434 #1435 #1436 #1437 #1438 #1439
2023-02-03 11:07:44Z [ 52.736298][ T1] .... node #8, CPUs: #1440 #1441 #1442 #1443 #1444 #1445 #1446 #1447 #1448 #1449 #1450 #1451 #1452 #1453 #1454 #1455 #1456 #1457 #1458 #1459 #1460 #1461 #1462 #1463 #1464 #1465 #1466 #1467 #1468 #1469 #1470 #1471 #1472 #1473 #1474 #1475 #1476 #1477 #1478 #1479 #1480 #1481 #1482 #1483 #1484 #1485 #1486 #1487 #1488 #1489 #1490 #1491 #1492 #1493 #1494 #1495 #1496 #1497 #1498 #1499
2023-02-03 11:07:47Z [ 55.164301][ T1] .... node #9, CPUs: #1500 #1501 #1502 #1503 #1504 #1505 #1506 #1507 #1508 #1509 #1510 #1511 #1512 #1513 #1514 #1515 #1516 #1517 #1518 #1519 #1520 #1521 #1522 #1523 #1524 #1525 #1526 #1527 #1528 #1529 #1530 #1531 #1532 #1533 #1534 #1535 #1536 #1537 #1538 #1539 #1540 #1541 #1542 #1543 #1544 #1545 #1546 #1547 #1548 #1549 #1550 #1551 #1552 #1553 #1554 #1555 #1556 #1557 #1558 #1559
2023-02-03 11:07:49Z [ 57.592298][ T1] .... node #10, CPUs: #1560 #1561 #1562 #1563 #1564 #1565 #1566 #1567 #1568 #1569 #1570 #1571 #1572 #1573 #1574 #1575 #1576 #1577 #1578 #1579 #1580 #1581 #1582 #1583 #1584 #1585 #1586 #1587 #1588 #1589 #1590 #1591 #1592 #1593 #1594 #1595 #1596 #1597 #1598 #1599 #1600 #1601 #1602 #1603 #1604 #1605 #1606 #1607 #1608 #1609 #1610 #1611 #1612 #1613 #1614 #1615 #1616 #1617 #1618 #1619
2023-02-03 11:07:52Z [ 60.032299][ T1] .... node #11, CPUs: #1620 #1621 #1622 #1623 #1624 #1625 #1626 #1627 #1628 #1629 #1630 #1631 #1632 #1633 #1634 #1635 #1636 #1637 #1638 #1639 #1640 #1641 #1642 #1643 #1644 #1645 #1646 #1647 #1648 #1649 #1650 #1651 #1652 #1653 #1654 #1655 #1656 #1657 #1658 #1659 #1660 #1661 #1662 #1663 #1664 #1665 #1666 #1667 #1668 #1669 #1670 #1671 #1672 #1673 #1674 #1675 #1676 #1677 #1678 #1679
2023-02-03 11:07:54Z [ 62.484298][ T1] .... node #12, CPUs: #1680 #1681 #1682 #1683 #1684 #1685 #1686 #1687 #1688 #1689 #1690 #1691 #1692 #1693 #1694 #1695 #1696 #1697 #1698 #1699 #1700 #1701 #1702 #1703 #1704 #1705 #1706 #1707 #1708 #1709 #1710 #1711 #1712 #1713 #1714 #1715 #1716 #1717 #1718 #1719 #1720 #1721 #1722 #1723 #1724 #1725 #1726 #1727 #1728 #1729 #1730 #1731 #1732 #1733 #1734 #1735 #1736 #1737 #1738 #1739
2023-02-03 11:07:57Z [ 65.024298][ T1] .... node #13, CPUs: #1740 #1741 #1742 #1743 #1744 #1745 #1746 #1747 #1748 #1749 #1750 #1751 #1752 #1753 #1754 #1755 #1756 #1757 #1758 #1759 #1760 #1761 #1762 #1763 #1764 #1765 #1766 #1767 #1768 #1769 #1770 #1771 #1772 #1773 #1774 #1775 #1776 #1777 #1778 #1779 #1780 #1781 #1782 #1783 #1784 #1785 #1786 #1787 #1788 #1789 #1790 #1791 #1792 #1793 #1794 #1795 #1796 #1797 #1798 #1799
2023-02-03 11:07:59Z [ 67.544299][ T1] .... node #14, CPUs: #1800 #1801 #1802 #1803 #1804 #1805 #1806 #1807 #1808 #1809 #1810 #1811 #1812 #1813 #1814 #1815 #1816 #1817 #1818 #1819 #1820 #1821 #1822 #1823 #1824 #1825 #1826 #1827 #1828 #1829 #1830 #1831 #1832 #1833 #1834 #1835 #1836 #1837 #1838 #1839 #1840 #1841 #1842 #1843 #1844 #1845 #1846 #1847 #1848 #1849 #1850 #1851 #1852 #1853 #1854 #1855 #1856 #1857 #1858 #1859
2023-02-03 11:08:02Z [ 70.084299][ T1] .... node #15, CPUs: #1860 #1861 #1862 #1863 #1864 #1865 #1866 #1867 #1868 #1869 #1870 #1871 #1872 #1873 #1874 #1875 #1876 #1877 #1878 #1879 #1880 #1881 #1882 #1883 #1884 #1885 #1886 #1887 #1888 #1889 #1890 #1891 #1892 #1893 #1894 #1895 #1896 #1897 #1898 #1899 #1900 #1901 #1902 #1903 #1904 #1905 #1906 #1907 #1908 #1909 #1910 #1911 #1912 #1913 #1914 #1915 #1916 #1917 #1918 #1919
2023-02-03 11:08:02Z [ 72.634158][ T1] smp: Brought up 16 nodes, 1920 CPUs
2023-02-03 11:08:02Z [ 72.640299][ T1] smpboot: Max logical packages: 16
2023-02-03 11:08:02Z [ 72.644444][ T1] smpboot: Total of 1920 processors activated (7275848.16 BogoMIPS)
--------------------------------------------------------------------------

Thanks.
--
Russ Anderson, SuperDome Flex Linux Kernel Group Manager
HPE - Hewlett Packard Enterprise (formerly SGI) [email protected]

2023-02-05 22:13:50

by Usama Arif

[permalink] [raw]
Subject: Re: [External] Re: [PATCH v6 07/11] x86/smpboot: Disable parallel boot for AMD CPUs



On 04/02/2023 22:31, David Woodhouse wrote:
>
>
> On 4 February 2023 18:18:55 GMT, Arjan van de Ven <[email protected]> wrote:
>>>
>>> However...
>>>
>>> Even though we *can* support non-X2APIC processors, we *might* want to
>>> play it safe and not go back that far; only enabling parallel bringup
>>> on machines with X2APIC which roughly correlates with "lots of CPUs"
>>> since that's where the benefit is.
>>
>> I think that this is the right approach, at least on the initial patch series.
>> KISS principle; do all the easy-but-important cases first, get it stable and working
>> and in later series/kernels the range can be expanded.... if it matters.
>
> Agreed. I'll split it to do it only with X2APIC for the initial series, and then hold the CPUID 0x1 part back for the next phase.

This was an interesting find! I tested the latest branch
parallel-6.2-rc6 and it works well. The numbers from Russ makes the
patch series look so much better! :)

If we do it with x2apic only and not support non-x2apic CPUID 0x1 case,
maybe we apply the following diff to part 1?

diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index f53a060a899b..f6b89cf40076 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -1564,7 +1564,7 @@ void __init native_smp_prepare_cpus(unsigned int
max_cpus)
* sufficient). Otherwise it's too hard. And not for SEV-ES
* guests because they can't use CPUID that early.
*/
- if (IS_ENABLED(CONFIG_X86_32) || boot_cpu_data.cpuid_level < 1 ||
+ if (IS_ENABLED(CONFIG_X86_32) || !x2apic_mode ||
(x2apic_mode && boot_cpu_data.cpuid_level < 0xb) ||
cc_platform_has(CC_ATTR_GUEST_STATE_ENCRYPT))
do_parallel_bringup = false;



For reusing timer calibration, calibrate_delay ends up being used in
start_kernel, smp_callin, debug_calc_bogomips and check_cx686_slop. I
think reusing timer calibration would be ok in the first 2 uses? but not
really sure about the other 2. cx686 seems to be quite old so not sure
if anyone will have it to test or will ever run 6.2 kernel on it :). I
guess if unsure, better to leave out initially and try and get part1 merged?

Thanks,
Usama

2023-02-06 08:07:26

by David Woodhouse

[permalink] [raw]
Subject: Re: [External] Re: [PATCH v6 07/11] x86/smpboot: Disable parallel boot for AMD CPUs

On Sun, 2023-02-05 at 22:13 +0000, Usama Arif wrote:
>
>
> On 04/02/2023 22:31, David Woodhouse wrote:
> >
> >
> > On 4 February 2023 18:18:55 GMT, Arjan van de Ven <[email protected]> wrote:
> > > >
> > > > However...
> > > >
> > > > Even though we *can* support non-X2APIC processors, we *might* want to
> > > > play it safe and not go back that far; only enabling parallel bringup
> > > > on machines with X2APIC which roughly correlates with "lots of CPUs"
> > > > since that's where the benefit is.
> > >
> > > I think that this is the right approach, at least on the initial patch series.
> > > KISS principle; do all the easy-but-important cases first, get it stable and working
> > > and in later series/kernels the range can be expanded.... if it matters.
> >
> > Agreed. I'll split it to do it only with X2APIC for the initial series, and then hold the CPUID 0x1 part back for the next phase.
>
> This was an interesting find! I tested the latest branch
> parallel-6.2-rc6 and it works well. The numbers from Russ makes the
> patch series look so much better! :)
>
> If we do it with x2apic only and not support non-x2apic CPUID 0x1 case,
> maybe we apply the following diff to part 1?

Using x2apic_mode would also disable parallel boot when the CPU *does*
have X2APIC support but the kernel just isn't using it at the moment. I
think boot_cpu_has(X86_FEATURE_X2APIC) is the better criterion?

I was thinking I'd tweak the 'no_parallel_bringup' command line
argument into something that also allows us to *enable* it without
X2APIC being supported.

But I've also been pondering the fact that this is all only for 64-bit
anyway. It's not like we're doing it for the zoo of ancient i586 and
even i486 machines where the APICs were hooked up with blue wire and
duct tape.

Maybe "64-bit only" is good enough, with a command line opt-out. And
maybe a printk pointing out the existence of that command line option
before the bringup, just in case? 

> diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
> index f53a060a899b..f6b89cf40076 100644
> --- a/arch/x86/kernel/smpboot.c
> +++ b/arch/x86/kernel/smpboot.c
> @@ -1564,7 +1564,7 @@ void __init native_smp_prepare_cpus(unsigned int
> max_cpus)
>           * sufficient). Otherwise it's too hard. And not for SEV-ES
>           * guests because they can't use CPUID that early.
>           */
> -       if (IS_ENABLED(CONFIG_X86_32) || boot_cpu_data.cpuid_level < 1 ||
> +       if (IS_ENABLED(CONFIG_X86_32) || !x2apic_mode ||
>              (x2apic_mode && boot_cpu_data.cpuid_level < 0xb) ||
>              cc_platform_has(CC_ATTR_GUEST_STATE_ENCRYPT))
>                  do_parallel_bringup = false;
>
>
>
> For reusing timer calibration, calibrate_delay ends up being used in
> start_kernel, smp_callin, debug_calc_bogomips and check_cx686_slop. I
> think reusing timer calibration would be ok in the first 2 uses? 
>

It already explicitly allows the calibration to be reused from another
CPU in the same *core*, but no further. I think we'd need to be sure
about the correctness of extending that further, and which of the mess
of constant/invariant/we-really-mean-it-this-time TSC flags need to be
set for that to be OK.

> but not really sure about the other 2. cx686 seems to be quite old so not sure
> if anyone will have it to test or will ever run 6.2 kernel on it :).  I
> guess if unsure, better to leave out initially and try and get part1 merged?

I doubt cx686 advertises constant TSC; the early ones didn't even *have* a TSC.
Does it even support SMP?


Attachments:
smime.p7s (5.83 kB)

2023-02-06 08:29:17

by David Woodhouse

[permalink] [raw]
Subject: Re: [PATCH v6 00/11] Parallel CPU bringup for x86_64

On Sun, 2023-02-05 at 13:17 -0600, Russ Anderson wrote:
>
> Gave the v6 patchset a try on a system with 1920 logocal cpus
> (sixteen 60 core Sapphire Rapids sockets with Hyperthreadding
> enabled).
>
> Without the patchset it took 71 seconds to start all the cpus.
> With the v6 patchset it took 14 seconds to start all the cpus,
> a reduction of 57 seconds.  That is impressive.
>
> Full boot, to root login prompt, without patches takes 223 seconds.
> This patchset reduces the full boot time by 57 seconds, a 25%
> reduction.

Nice; thanks for testing.

Is that with just the "part1" patch series which has been posted, or
also with the 'parallel part 2' still taking shape in the tree at
https://git.infradead.org/users/dwmw2/linux.git/shortlog/refs/heads/parallel-6.2-rc6

I believe Usama said the second phase of parallelism didn't really help
much in terms of overall timing? Confirming that *without* all the
debug prints would be interesting. And we can look for what still
*could* be made parallel.


Attachments:
smime.p7s (5.83 kB)

2023-02-06 12:12:01

by Usama Arif

[permalink] [raw]
Subject: Re: [External] Re: [PATCH v6 07/11] x86/smpboot: Disable parallel boot for AMD CPUs



On 06/02/2023 08:05, David Woodhouse wrote:
> On Sun, 2023-02-05 at 22:13 +0000, Usama Arif wrote:
>>
>>
>> On 04/02/2023 22:31, David Woodhouse wrote:
>>>
>>>
>>> On 4 February 2023 18:18:55 GMT, Arjan van de Ven <[email protected]> wrote:
>>>>>
>>>>> However...
>>>>>
>>>>> Even though we *can* support non-X2APIC processors, we *might* want to
>>>>> play it safe and not go back that far; only enabling parallel bringup
>>>>> on machines with X2APIC which roughly correlates with "lots of CPUs"
>>>>> since that's where the benefit is.
>>>>
>>>> I think that this is the right approach, at least on the initial patch series.
>>>> KISS principle; do all the easy-but-important cases first, get it stable and working
>>>> and in later series/kernels the range can be expanded.... if it matters.
>>>
>>> Agreed. I'll split it to do it only with X2APIC for the initial series, and then hold the CPUID 0x1 part back for the next phase.
>>
>> This was an interesting find! I tested the latest branch
>> parallel-6.2-rc6 and it works well. The numbers from Russ makes the
>> patch series look so much better! :)
>>
>> If we do it with x2apic only and not support non-x2apic CPUID 0x1 case,
>> maybe we apply the following diff to part 1?
>
> Using x2apic_mode would also disable parallel boot when the CPU *does*
> have X2APIC support but the kernel just isn't using it at the moment. I
> think boot_cpu_has(X86_FEATURE_X2APIC) is the better criterion?
>

x2apic_mode is set to 0 only in the case when nox2apic is specified in
the kernel cmdline or if x2apic_setup fails. As 0xB leaf gets the
"x2apic id" and not the "apic id", I thought it would be better to not
use the x2apic id if the user doesnt want to use x2apic (cmdline), or
the kernel fails to set it up.

Another thing I noticed from the Intel Architecture Manual CPUID—CPU
Identification section:

"CPUID leaf 1FH is a preferred superset to leaf 0BH. Intel recommends
first checking for the existence of Leaf 1FH before using leaf 0BH."

So I think we should switch from 0BH to using the 1FH leaf EDX register.

> I was thinking I'd tweak the 'no_parallel_bringup' command line
> argument into something that also allows us to *enable* it without
> X2APIC being supported.
>
> But I've also been pondering the fact that this is all only for 64-bit
> anyway. It's not like we're doing it for the zoo of ancient i586 and
> even i486 machines where the APICs were hooked up with blue wire and
> duct tape.
>
> Maybe "64-bit only" is good enough, with a command line opt-out. And
> maybe a printk pointing out the existence of that command line option
> before the bringup, just in case?
>

I think that makes sense, the patch only has a significant impact when
the core count is high, and x2apic was made to support higher CPU count.


>> diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
>> index f53a060a899b..f6b89cf40076 100644
>> --- a/arch/x86/kernel/smpboot.c
>> +++ b/arch/x86/kernel/smpboot.c
>> @@ -1564,7 +1564,7 @@ void __init native_smp_prepare_cpus(unsigned int
>> max_cpus)
>>           * sufficient). Otherwise it's too hard. And not for SEV-ES
>>           * guests because they can't use CPUID that early.
>>           */
>> -       if (IS_ENABLED(CONFIG_X86_32) || boot_cpu_data.cpuid_level < 1 ||
>> +       if (IS_ENABLED(CONFIG_X86_32) || !x2apic_mode ||
>>              (x2apic_mode && boot_cpu_data.cpuid_level < 0xb) ||
>>              cc_platform_has(CC_ATTR_GUEST_STATE_ENCRYPT))
>>                  do_parallel_bringup = false;
>>
>>
>>
>> For reusing timer calibration, calibrate_delay ends up being used in
>> start_kernel, smp_callin, debug_calc_bogomips and check_cx686_slop. I
>> think reusing timer calibration would be ok in the first 2 uses?
>>
>
> It already explicitly allows the calibration to be reused from another
> CPU in the same *core*, but no further. I think we'd need to be sure
> about the correctness of extending that further, and which of the mess
> of constant/invariant/we-really-mean-it-this-time TSC flags need to be
> set for that to be OK.
>
>> but not really sure about the other 2. cx686 seems to be quite old so not sure
>> if anyone will have it to test or will ever run 6.2 kernel on it :).  I
>> guess if unsure, better to leave out initially and try and get part1 merged?
>
> I doubt cx686 advertises constant TSC; the early ones didn't even *have* a TSC.
> Does it even support SMP?

Just checked the wiki page, and it says core count is 1 :)

2023-02-06 12:18:10

by Usama Arif

[permalink] [raw]
Subject: Re: [External] Re: [PATCH v6 00/11] Parallel CPU bringup for x86_64



On 06/02/2023 08:28, David Woodhouse wrote:
> On Sun, 2023-02-05 at 13:17 -0600, Russ Anderson wrote:
>>
>> Gave the v6 patchset a try on a system with 1920 logocal cpus
>> (sixteen 60 core Sapphire Rapids sockets with Hyperthreadding
>> enabled).
>>
>> Without the patchset it took 71 seconds to start all the cpus.
>> With the v6 patchset it took 14 seconds to start all the cpus,
>> a reduction of 57 seconds.  That is impressive.
>>
>> Full boot, to root login prompt, without patches takes 223 seconds.
>> This patchset reduces the full boot time by 57 seconds, a 25%
>> reduction.
>
> Nice; thanks for testing.
>
> Is that with just the "part1" patch series which has been posted, or
> also with the 'parallel part 2' still taking shape in the tree at
> https://git.infradead.org/users/dwmw2/linux.git/shortlog/refs/heads/parallel-6.2-rc6
>
> I believe Usama said the second phase of parallelism didn't really help
> much in terms of overall timing? Confirming that *without* all the
> debug prints would be interesting. And we can look for what still
> *could* be made parallel.

I think it would be interesting to get the numbers for such a big
machine for 3 cases: part1, part1+reuse timer calibration and part1+part2.

Russ mentioned testing v6, so I guess the above numbers are for
part1+reuse timer calibration.

For my machine the smpboot times were:

No patches: 700ms
part 1:100ms
part1+reuse timer calibration: 30ms
part1+part2: 30ms

Thanks,
Usama



2023-02-06 17:58:59

by Kim Phillips

[permalink] [raw]
Subject: Re: [PATCH v6 07/11] x86/smpboot: Disable parallel boot for AMD CPUs

On 2/4/23 9:40 AM, David Woodhouse wrote:
> On Fri, 2023-02-03 at 13:48 -0600, Kim Phillips wrote:
>> If I:
>>
>>   - take dwmw2's parallel-6.2-rc6 branch (commit 459d1c46dbd1)
>>   - remove the set_cpu_bug(c, X86_BUG_NO_PARALLEL_BRINGUP) line from amd.c
>>
>> Then:
>>
>>   - a Ryzen 3000 (Picasso A1/Zen+) notebook I have access to fails to boot.
>>   - Zen 2,3,4-based servers boot fine
>>   - a Zen1-based server doesn't boot.
>
> I've changed it to use CPUID 0xb only if we're actually in x2apic mode,
> which Boris tells me won't be the case on Zen1 because that doesn't
> support X2APIC.
>
> When we're not in x2apic mode, we can use CPUID 0x1 because the 8 bits
> of APIC ID we find there are perfectly sufficient.
>
> New tree in the same place as before, commit ce7e2d1e046a for the
> parallel-6.2-rc6-part1 tag and 17bbd12ee03 for parallel-6.2-rc6.

Thanks, Zen 1 through 4 based servers all boot both those two tree
commits successfully.

I'll try that Ryzen again later.

Kim

2023-02-06 18:07:47

by Sean Christopherson

[permalink] [raw]
Subject: Re: [External] Re: [PATCH v6 07/11] x86/smpboot: Disable parallel boot for AMD CPUs

On Mon, Feb 06, 2023, Usama Arif wrote:
>
>
> On 06/02/2023 08:05, David Woodhouse wrote:
> > On Sun, 2023-02-05 at 22:13 +0000, Usama Arif wrote:
> > >
> > >
> > > On 04/02/2023 22:31, David Woodhouse wrote:
> > > >
> > > >
> > > > On 4 February 2023 18:18:55 GMT, Arjan van de Ven <[email protected]> wrote:
> > > > > >
> > > > > > However...
> > > > > >
> > > > > > Even though we *can* support non-X2APIC processors, we *might* want to
> > > > > > play it safe and not go back that far; only enabling parallel bringup
> > > > > > on machines with X2APIC which roughly correlates with "lots of CPUs"
> > > > > > since that's where the benefit is.
> > > > >
> > > > > I think that this is the right approach, at least on the initial patch series.
> > > > > KISS principle; do all the easy-but-important cases first, get it stable and working
> > > > > and in later series/kernels the range can be expanded.... if it matters.
> > > >
> > > > Agreed. I'll split it to do it only with X2APIC for the initial series,

And sanity check CPUID.0xB output even when x2APIC is supported, e.g. require
CPUID.0xB.EBX to be non-zero. Odds are very good that there are VMs in the wild
that support x2APIC but have an empty CPUID.0xB due to it being a topology leaf,
i.e. may be suppressed when vCPUs aren't pinned. QEMU even has a knob to deliberately
disable CPUID.0xB, e.g. booting a VM with

cpu host,host-phys-bits,-cpuid-0xb,+x2apic

works just fine.

> > > > and then hold the CPUID 0x1 part back for the next phase.
> > > This was an interesting find! I tested the latest branch
> > > parallel-6.2-rc6 and it works well. The numbers from Russ makes the
> > > patch series look so much better! :)
> > >
> > > If we do it with x2apic only and not support non-x2apic CPUID 0x1 case,
> > > maybe we apply the following diff to part 1?
> >
> > Using x2apic_mode would also disable parallel boot when the CPU *does*
> > have X2APIC support but the kernel just isn't using it at the moment. I
> > think boot_cpu_has(X86_FEATURE_X2APIC) is the better criterion?
> >
>
> x2apic_mode is set to 0 only in the case when nox2apic is specified in the
> kernel cmdline or if x2apic_setup fails. As 0xB leaf gets the "x2apic id"
> and not the "apic id", I thought it would be better to not use the x2apic id
> if the user doesnt want to use x2apic (cmdline), or the kernel fails to set
> it up.

I agree with David that checking boot_cpu_has(X86_FEATURE_X2APIC) is preferred,
x2APIC goes unused on a lot of platforms due to the kernel's interrupt remapping
requirement. I would rather have a single explicit "no_parallel_bringup" option
than try to infer the user's intentions based on tangentially related params.

> Another thing I noticed from the Intel Architecture Manual CPUID—CPU
> Identification section:
>
> "CPUID leaf 1FH is a preferred superset to leaf 0BH. Intel recommends first
> checking for the existence of Leaf 1FH before using leaf 0BH."
>
> So I think we should switch from 0BH to using the 1FH leaf EDX register.

I don't think using 0x1F will buy us anything except complexity. 0x1F provides more
details on the topology, but its x2APIC ID enumeration isn't any more trustworthy
than 0xB.

> > I was thinking I'd tweak the 'no_parallel_bringup' command line
> > argument into something that also allows us to *enable* it without
> > X2APIC being supported.
> >
> > But I've also been pondering the fact that this is all only for 64-bit
> > anyway. It's not like we're doing it for the zoo of ancient i586 and
> > even i486 machines where the APICs were hooked up with blue wire and
> > duct tape.

The only thing I can see benefiting from parallel bringup without x2APIC is large
VMs running on pre-x2AVIC hardware. E.g. IIRC, we (Google) hide x2APIC on AMD-based
VMs so that VMs would take advantage of AVIC acceleration if AVIC were even to be
enabled.

But that's more of an argument for "try to use CPUID.0x1" than it is an argument
for trying to use CPUID.0xB without x2APIC.

> > Maybe "64-bit only" is good enough, with a command line opt-out. And
> > maybe a printk pointing out the existence of that command line option
> > before the bringup, just in case?

2023-02-06 23:20:51

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH v6 01/11] x86/apic/x2apic: Fix parallel handling of cluster_mask

Usama!

On Thu, Feb 02 2023 at 21:56, Usama Arif wrote:
> For each CPU being brought up, the alloc_clustermask() function
> allocates a new struct cluster_mask just in case it's needed. Then the
> target CPU actually runs, and in init_x2apic_ldr() it either uses a
> cluster_mask from a previous CPU in the same cluster, or consumes the
> "spare" one and sets the global pointer to NULL.
>
> That isn't going to parallelise stunningly well.
>
> Ditch the global variable, let alloc_clustermask() install the struct
> *directly* in the per_cpu data for the CPU being brought up. As an
> optimisation, actually make it do so for *all* present CPUs in the same
> cluster, which means only one iteration over for_each_present_cpu()
> instead of doing so repeatedly, once for each CPU.
>
> Now, in fact, there's no point in the 'node' or 'clusterid' members of
> the struct cluster_mask, so just kill it and use struct cpumask instead.
>
> This was a harmless "bug" while CPU bringup wasn't actually happening in
> parallel. It's about to become less harmless...

Just to be clear. There is no bug in todays code and therefore this:

> Fixes: 023a611748fd5 ("x86/apic/x2apic: Simplify cluster management")

tag is unjustified. It'll just cause the stable robots to backport it
for no reason.

> Signed-off-by: David Woodhouse <[email protected]>
> Signed-off-by: Paul E. McKenney <[email protected]>

How is this SOB chain correct? It's unclear to me how Paul got involved
here, but let's assume he handed the patch over to you, then this still
lacks a SOB from you.

> +/*
> + * As an optimisation during boot, set the cluster_mask for *all*
> + * present CPUs at once, to prevent *each* of them having to iterate
> + * over the others to find the existing cluster_mask.
> + */
> +static void prefill_clustermask(struct cpumask *cmsk, u32 cluster)
> +{
> + int cpu;
> +
> + for_each_present_cpu(cpu) {
> + u32 apicid = apic->cpu_present_to_apicid(cpu);

Lacks a newline between declaration and code.

> + if (apicid != BAD_APICID && apic_cluster(apicid) == cluster) {
> + struct cpumask **cpu_cmsk = &per_cpu(cluster_masks, cpu);
> +
> + BUG_ON(*cpu_cmsk && *cpu_cmsk != cmsk);

While I agree that changing an in use mask pointer would be fatal, I
really have to ask why this code would be invoked on a partially
initialized cluster at all and why that would be correct.

if (WARN_ON_ONCE(*cpu_cmsk == cmsk))
return;
BUG_ON(*cpu_mask);

if at all. But of course that falls over with the way how this code is
invoked below.

> + *cpu_cmsk = cmsk;
> + }
>
> -static int alloc_clustermask(unsigned int cpu, int node)
> +static int alloc_clustermask(unsigned int cpu, u32 cluster, int node)
> {
> + struct cpumask *cmsk = NULL;
> + unsigned int cpu_i;
> + u32 apicid;
> +
> if (per_cpu(cluster_masks, cpu))
> return 0;
> - /*
> - * If a hotplug spare mask exists, check whether it's on the right
> - * node. If not, free it and allocate a new one.
> - */
> - if (cluster_hotplug_mask) {
> - if (cluster_hotplug_mask->node == node)
> - return 0;
> - kfree(cluster_hotplug_mask);
> +
> + /* For the hotplug case, don't always allocate a new one */

-ENOPARSE

> + if (system_state >= SYSTEM_RUNNING) {
> + for_each_present_cpu(cpu_i) {
> + apicid = apic->cpu_present_to_apicid(cpu_i);
> + if (apicid != BAD_APICID && apic_cluster(apicid) == cluster) {
> + cmsk = per_cpu(cluster_masks, cpu_i);
> + if (cmsk)
> + break;
> + }
> + }
> + }
> + if (!cmsk) {
> + cmsk = kzalloc_node(sizeof(*cmsk), GFP_KERNEL, node);
> + if (!cmsk)
> + return -ENOMEM;
> }

...

> + per_cpu(cluster_masks, cpu) = cmsk;
> +
> + if (system_state < SYSTEM_RUNNING)
> + prefill_clustermask(cmsk, cluster);

TBH. The logic of this code is anything but obvious. Something like the
uncompiled below perhaps?

Thanks,

tglx
---

@@ -116,44 +109,90 @@
+
+/*
+ * As an optimisation during boot, set the cluster_mask for all present
+ * CPUs at once, to prevent each of them having to iterate over the others
+ * to find the existing cluster_mask.
+ */
+static void prefill_clustermask(struct cpumask *cmsk, unsigned int cpu, u32 cluster)
+{
+ int cpu_i;

- cluster = apicid >> 16;
- for_each_online_cpu(cpu) {
- cmsk = per_cpu(cluster_masks, cpu);
- /* Matching cluster found. Link and update it. */
- if (cmsk && cmsk->clusterid == cluster)
- goto update;
+ for_each_present_cpu(cpu_i) {
+ struct cpumask **cpu_cmsk = &per_cpu(cluster_masks, cpu);
+ u32 apicid = apic->cpu_present_to_apicid(cpu_i);
+
+ if (apicid == BAD_APICID || cpu_i == cpu || apic_cluster(apicid) != cluster)
+ continue;
+
+ if (WARN_ON_ONCE(*cpu_mask == cmsk))
+ continue;
+
+ BUG_ON(*cpu_cmsk);
+ *cpu_cmsk = cmsk;
}
- cmsk = cluster_hotplug_mask;
- cmsk->clusterid = cluster;
- cluster_hotplug_mask = NULL;
-update:
- this_cpu_write(cluster_masks, cmsk);
- cpumask_set_cpu(smp_processor_id(), &cmsk->mask);
}

-static int alloc_clustermask(unsigned int cpu, int node)
+static int alloc_clustermask(unsigned int cpu, u32 cluster, int node)
{
+ struct cpumask *cmsk;
+ unsigned int cpu_i;
+
if (per_cpu(cluster_masks, cpu))
return 0;
+
/*
- * If a hotplug spare mask exists, check whether it's on the right
- * node. If not, free it and allocate a new one.
+ * At boot time CPU present mask is stable. If the cluster is not
+ * yet initialized, allocate the mask and propagate it to all
+ * siblings in this cluster.
*/
- if (cluster_hotplug_mask) {
- if (cluster_hotplug_mask->node == node)
- return 0;
- kfree(cluster_hotplug_mask);
- }
+ if (system_state < SYSTEM_RUNNING)
+ goto alloc;
+
+ /*
+ * On post boot hotplug iterate over the present CPUs to handle the
+ * case of partial clusters as they might be presented by
+ * virtualization.
+ */
+ for_each_present_cpu(cpu_i) {
+ u32 apicid = apic->cpu_present_to_apicid(cpu_i);
+
+ if (apicid != BAD_APICID && apic_cluster(apicid) == cluster) {
+ cmsk = per_cpu(cluster_masks, cpu_i);

- cluster_hotplug_mask = kzalloc_node(sizeof(*cluster_hotplug_mask),
- GFP_KERNEL, node);
- if (!cluster_hotplug_mask)
+ /*
+ * If the cluster is already initialized, just
+ * store the mask and return. No point in trying to
+ * propagate.
+ */
+ if (cmsk) {
+ per_cpu(cluster_masks, cpu) = cmsk;
+ return 0;
+ }
+ }
+ }
+ /*
+ * The cluster is not initialized yet. Fall through to the boot
+ * time code which might initialize the whole cluster if it is
+ * in the CPU present mask.
+ */
+alloc:
+ cmsk = kzalloc_node(sizeof(*cmsk), GFP_KERNEL, node);
+ if (!cmsk)
return -ENOMEM;
- cluster_hotplug_mask->node = node;
+ per_cpu(cluster_masks, cpu) = cmsk;
+ prefill_clustermask(cmsk, cluster);
+
return 0;
}



2023-02-06 23:33:10

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH v6 02/11] cpu/hotplug: Move idle_thread_get() to <linux/smpboot.h>

On Thu, Feb 02 2023 at 21:56, Usama Arif wrote:
> If the architecture code is to register its new pre-bringup states with
> the cpuhp core, having a special-case wrapper to pass extra arguments is
> non-trivial and it's easier just to let the arch register its function
> pointer to be invoked with the standard API.
>
> Signed-off-by: David Woodhouse <[email protected]>
> Signed-off-by: Paul E. McKenney <[email protected]>

Same issue vs. the SOB chain.

Other than that:

Reviewed-by: Thomas Gleixner <[email protected]>

2023-02-06 23:43:56

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH v6 03/11] cpu/hotplug: Add dynamic parallel bringup states before CPUHP_BRINGUP_CPU

On Thu, Feb 02 2023 at 21:56, Usama Arif wrote:
> So any combination of prepare/start calls which depend on A-B ordering
> for each CPU in turn, such as the X2APIC code which used to allocate a
> cluster mask 'just in case' and store it in a global variable in the
> prep stage, then potentially consume that preallocated structure from
> the AP and set the global pointer to NULL to be reallocated in
> CPUHP_X2APIC_PREPARE for the next CPU... would explode horribly.
>
> We believe that X2APIC was the only such case, for x86. But this is why
> it remains an architecture opt-in. For now.

We believe is not really a convincing technical argument.

On x86 the X2APIC case was established to be the only one which relied
on the full CPU hotplug serialization. It's unclear whether this
affects any other architecture, so this optimization is opt-in.

Hmm?

> diff --git a/kernel/cpu.c b/kernel/cpu.c
> index 6c0a92ca6bb5..5a8f1a93b57c 100644
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -1505,6 +1505,24 @@ int bringup_hibernate_cpu(unsigned int sleep_cpu)
> void bringup_nonboot_cpus(unsigned int setup_max_cpus)
> {
> unsigned int cpu;
> + int n = setup_max_cpus - num_online_cpus();

https://www.kernel.org/doc/html/latest/process/maintainer-tip.html#variable-declarations

Aside of that I detest the mixture between unsigned and signed here.

> +
> + /* ∀ parallel pre-bringup state, bring N CPUs to it */
> + if (n > 0) {
> + enum cpuhp_state st = CPUHP_BP_PARALLEL_DYN;
> +
> + while (st <= CPUHP_BP_PARALLEL_DYN_END &&
> + cpuhp_hp_states[st].name) {

while (st <= CPUHP_BP_PARALLEL_DYN_END && cpuhp_hp_states[st].name) {

80 character limit has been updated quite some time ago

Thanks,

tglx

2023-02-06 23:48:55

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH v6 04/11] x86/smpboot: Reference count on smpboot_setup_warm_reset_vector()

On Thu, Feb 02 2023 at 21:56, Usama Arif wrote:
> From: David Woodhouse <[email protected]>
>
> If we want to do parallel CPU bringup, we're going to need to set this up
> and leave it until all CPUs are done. Might as well use the RTC spinlock
> to protect the refcount, as we need to take it anyway.

https://www.kernel.org/doc/html/latest/process/maintainer-tip.html#changelog

Aside of the 'We' this does not explain anything at all.

> [Usama Arif: fixed rebase conflict]
> Signed-off-by: David Woodhouse <[email protected]>
> Signed-off-by: Usama Arif <[email protected]>
> Signed-off-by: Paul E. McKenney <[email protected]>

This SOB chain is even more broken...

> ---
> arch/x86/kernel/smpboot.c | 21 ++++++++++++++-------
> 1 file changed, 14 insertions(+), 7 deletions(-)
>
> diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
> index 55cad72715d9..a19eddcdccc2 100644
> --- a/arch/x86/kernel/smpboot.c
> +++ b/arch/x86/kernel/smpboot.c
> @@ -121,17 +121,22 @@ int arch_update_cpu_topology(void)
> return retval;
> }
>
> +
> +static unsigned int smpboot_warm_reset_vector_count;
> +
> static inline void smpboot_setup_warm_reset_vector(unsigned long start_eip)
> {
> unsigned long flags;
>
> spin_lock_irqsave(&rtc_lock, flags);
> - CMOS_WRITE(0xa, 0xf);
> + if (!smpboot_warm_reset_vector_count++) {
> + CMOS_WRITE(0xa, 0xf);
> + *((volatile unsigned short *)phys_to_virt(TRAMPOLINE_PHYS_HIGH)) =
> + start_eip >> 4;
> + *((volatile unsigned short *)phys_to_virt(TRAMPOLINE_PHYS_LOW)) =
> + start_eip & 0xf;

Again: 80 characters are history. Please fix that all over the series.

Thanks,

tglx

2023-02-06 23:59:15

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH v6 05/11] x86/smpboot: Split up native_cpu_up into separate phases and document them

On Thu, Feb 02 2023 at 21:56, Usama Arif wrote:
> From: David Woodhouse <[email protected]>
>
> There are four logical parts to what native_cpu_up() does on the BSP (or
> on the controlling CPU for a later hotplug).
>
> First it actually wakes the AP by sending the INIT/SIPI/SIPI sequence.
>
> Second, it waits for the AP to make it as far as wait_for_master_cpu()
> which sets that CPU's bit in cpu_initialized_mask, then sets the bit in
> cpu_callout_mask to let the AP proceed through cpu_init().
>
> Then, it waits for the AP to finish cpu_init() and get as far as the
> smp_callin() call, which sets that CPU's bit in cpu_callin_mask.
>
> Finally, it does the TSC synchronization and waits for the AP to actually
> mark itself online in cpu_online_mask.

Sigh. Four parts and why can't we have the obvious formatting here which
matches the _four_ parts?

1) It wakes the AP by sending ...

2) It waits for the AP

3) ....

4) ...

> This commit should have no behavioural change, but merely splits those

"This commit" is equally useless as "This patch"....

git grep "This patch" Documentation/process

> phases out into separate functions so that future commits can make them
> happen in parallel for all APs. And adds some comments around them on
> both the BSP and AP code paths.

Prepare for a partial parallelization of these phases by splitting
the code into separate function so that ......

Add comments at both the BSP and the AP code paths to clarify the
functionality.

No functional change intended.

Hmm?

> [Usama Arif: fixed rebase conflict]
> Signed-off-by: David Woodhouse <[email protected]>
> Signed-off-by: Usama Arif <[email protected]>
> Signed-off-by: Paul E. McKenney <[email protected]>

Sigh....

> diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
> index a19eddcdccc2..fdcf7c08945f 100644
> --- a/arch/x86/kernel/smpboot.c
> +++ b/arch/x86/kernel/smpboot.c
> @@ -206,6 +206,10 @@ static void smp_callin(void)
>
> wmb();
>
> + /*
> + * This runs the AP through all the cpuhp states to its target
> + * state (CPUHP_ONLINE in the case of serial bringup).
> + */
> notify_cpu_starting(cpuid);
>
> /*
> @@ -233,17 +237,33 @@ static void notrace start_secondary(void *unused)
> load_cr3(swapper_pg_dir);
> __flush_tlb_all();
> #endif
> + /*
> + * Sync point with do_wait_cpu_initialized(). On boot, all secondary
> + * CPUs reach this stage after receiving INIT/SIPI from do_cpu_up()
> + * in the x86/cpu:kick cpuhp stage. At the start of cpu_init() they
> + * will wait for do_wait_cpu_initialized() to set their bit in
> + * smp_callout_mask to release them.
> + */
> cpu_init_secondary();
> rcu_cpu_starting(raw_smp_processor_id());
> x86_cpuinit.early_percpu_clock_init();
> +
> + /*
> + * Sync point with do_wait_cpu_callin(). The AP doesn't wait here
> + * but just sets the bit to let the controlling CPU (BSP) know that
> + * it's got this far.
> + */
> smp_callin();
>
> enable_start_cpu0 = 0;
>
> /* otherwise gcc will move up smp_processor_id before the cpu_init */
> barrier();
> +
> /*
> - * Check TSC synchronization with the boot CPU:
> + * Check TSC synchronization with the boot CPU (or whichever CPU
> + * is controlling the bringup). It will do its part of this from
> + * do_wait_cpu_online(), making it an implicit sync point.
> */
> check_tsc_sync_target();
>
> @@ -256,6 +276,7 @@ static void notrace start_secondary(void *unused)
> * half valid vector space.
> */
> lock_vector_lock();
> + /* Sync point with do_wait_cpu_online() */
> set_cpu_online(smp_processor_id(), true);
> lapic_online();
> unlock_vector_lock();
> @@ -1085,7 +1106,6 @@ static int do_boot_cpu(int apicid, int cpu, struct task_struct *idle,
> unsigned long start_ip = real_mode_header->trampoline_start;
>
> unsigned long boot_error = 0;
> - unsigned long timeout;
>
> #ifdef CONFIG_X86_64
> /* If 64-bit wakeup method exists, use the 64-bit mode trampoline IP */
> @@ -1146,55 +1166,94 @@ static int do_boot_cpu(int apicid, int cpu, struct task_struct *idle,
> boot_error = wakeup_cpu_via_init_nmi(cpu, start_ip, apicid,
> cpu0_nmi_registered);
>
> - if (!boot_error) {
> - /*
> - * Wait 10s total for first sign of life from AP
> - */
> - boot_error = -1;
> - timeout = jiffies + 10*HZ;
> - while (time_before(jiffies, timeout)) {
> - if (cpumask_test_cpu(cpu, cpu_initialized_mask)) {
> - /*
> - * Tell AP to proceed with initialization
> - */
> - cpumask_set_cpu(cpu, cpu_callout_mask);
> - boot_error = 0;
> - break;
> - }
> - schedule();
> - }
> - }
> + return boot_error;
> +}
>
> - if (!boot_error) {
> - /*
> - * Wait till AP completes initial initialization
> - */
> - while (!cpumask_test_cpu(cpu, cpu_callin_mask)) {
> - /*
> - * Allow other tasks to run while we wait for the
> - * AP to come online. This also gives a chance
> - * for the MTRR work(triggered by the AP coming online)
> - * to be completed in the stop machine context.
> - */
> - schedule();
> - }
> +static int do_wait_cpu_cpumask(unsigned int cpu, const struct cpumask *mask)
> +{
> + unsigned long timeout;
> +
> + /*
> + * Wait up to 10s for the CPU to report in.
> + */
> + timeout = jiffies + 10*HZ;
> + while (time_before(jiffies, timeout)) {
> + if (cpumask_test_cpu(cpu, mask))
> + return 0;
> +
> + schedule();
> }
> + return -1;
> +}
>
> - if (x86_platform.legacy.warm_reset) {
> - /*
> - * Cleanup possible dangling ends...
> - */
> - smpboot_restore_warm_reset_vector();
> +/*
> + * Bringup step two: Wait for the target AP to reach cpu_init_secondary()
> + * and thus wait_for_master_cpu(), then set cpu_callout_mask to allow it
> + * to proceed. The AP will then proceed past setting its 'callin' bit
> + * and end up waiting in check_tsc_sync_target() until we reach
> + * do_wait_cpu_online() to tend to it.
> + */
> +static int do_wait_cpu_initialized(unsigned int cpu)
> +{
> + /*
> + * Wait for first sign of life from AP.
> + */
> + if (do_wait_cpu_cpumask(cpu, cpu_initialized_mask))
> + return -1;
> +
> + cpumask_set_cpu(cpu, cpu_callout_mask);
> + return 0;
> +}
> +
> +/*
> + * Bringup step three: Wait for the target AP to reach smp_callin().
> + * The AP is not waiting for us here so we don't need to parallelise
> + * this step. Not entirely clear why we care about this, since we just
> + * proceed directly to TSC synchronization which is the next sync
> + * point with the AP anyway.
> + */
> +static int do_wait_cpu_callin(unsigned int cpu)
> +{
> + /*
> + * Wait till AP completes initial initialization.
> + */
> + return do_wait_cpu_cpumask(cpu, cpu_callin_mask);
> +}
> +
> +/*
> + * Bringup step four: Synchronize the TSC and wait for the target AP
> + * to reach set_cpu_online() in start_secondary().
> + */
> +static int do_wait_cpu_online(unsigned int cpu)
> +{
> + unsigned long flags;
> +
> + /*
> + * Check TSC synchronization with the AP (keep irqs disabled
> + * while doing so):
> + */
> + local_irq_save(flags);
> + check_tsc_sync_source(cpu);
> + local_irq_restore(flags);
> +
> + /*
> + * Wait for the AP to mark itself online. Not entirely
> + * clear why we care, since the generic cpuhp code will
> + * wait for it to each CPUHP_AP_ONLINE_IDLE before going
> + * ahead with the rest of the bringup anyway.
> + */
> + while (!cpu_online(cpu)) {
> + cpu_relax();
> + touch_nmi_watchdog();
> }
>
> - return boot_error;
> + return 0;
> }
>
> -int native_cpu_up(unsigned int cpu, struct task_struct *tidle)
> +static int do_cpu_up(unsigned int cpu, struct task_struct *tidle)
> {
> int apicid = apic->cpu_present_to_apicid(cpu);
> int cpu0_nmi_registered = 0;
> - unsigned long flags;
> int err, ret = 0;
>
> lockdep_assert_irqs_enabled();
> @@ -1241,19 +1300,6 @@ int native_cpu_up(unsigned int cpu, struct task_struct *tidle)
> goto unreg_nmi;
> }
>
> - /*
> - * Check TSC synchronization with the AP (keep irqs disabled
> - * while doing so):
> - */
> - local_irq_save(flags);
> - check_tsc_sync_source(cpu);
> - local_irq_restore(flags);
> -
> - while (!cpu_online(cpu)) {
> - cpu_relax();
> - touch_nmi_watchdog();
> - }
> -
> unreg_nmi:
> /*
> * Clean up the nmi handler. Do this after the callin and callout sync
> @@ -1265,6 +1311,34 @@ int native_cpu_up(unsigned int cpu, struct task_struct *tidle)
> return ret;
> }
>
> +int native_cpu_up(unsigned int cpu, struct task_struct *tidle)
> +{
> + int ret;
> +
> + ret = do_cpu_up(cpu, tidle);
> + if (ret)
> + return ret;
> +
> + ret = do_wait_cpu_initialized(cpu);
> + if (ret)
> + return ret;
> +
> + ret = do_wait_cpu_callin(cpu);
> + if (ret)
> + return ret;
> +
> + ret = do_wait_cpu_online(cpu);
> +
> + if (x86_platform.legacy.warm_reset) {
> + /*
> + * Cleanup possible dangling ends...
> + */
> + smpboot_restore_warm_reset_vector();
> + }
> +
> + return ret;
> +}

I'm fine with the approach, but can you please structure this into the
obvious steps of splitting out things step by step and not forcing the
reviewer to digest all of this at once?

Thanks,

tglx

2023-02-07 00:07:52

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH v6 06/11] x86/smpboot: Support parallel startup of secondary CPUs

On Thu, Feb 02 2023 at 21:56, Usama Arif wrote:
> From: Thomas Gleixner <[email protected]>
>
> To allow for parallel AP bringup, we need to avoid the use of global
> variables for passing information to the APs, as well as preventing them
> from all trying to use the same real-mode stack simultaneously.
>
> So, introduce a 'lock' field in struct trampoline_header to use as a
> simple bit-spinlock for the real-mode stack. That lock also protects
> the global variables initial_gs, initial_stack and early_gdt_descr,
> which can now be calculated...
>
> So how do we calculate those addresses? Well, they they can all be found
> from the per_cpu data for this CPU. Simples! Except... how does it know
> what its CPU# is? OK, we export the cpuid_to_apicid[] array and it can
> search it to find its APIC ID in there.
>
> But now you whine at me that it doesn't even know its APIC ID? Well, if
> it's a relatively modern CPU then the APIC ID is in CPUID leaf 0x0B so
> we can use that. Otherwise... erm... OK, otherwise it can't have parallel
> CPU bringup for now. We'll still use a global variable for those CPUs and
> bring them up one at a time.
>
> So add a global 'smpboot_control' field which either contains the APIC
> ID, or a flag indicating that it can be found in CPUID.
>
> This adds the 'do_parallel_bringup' flag in preparation but doesn't
> actually enable parallel bringup yet.
>
> [ dwmw2: Minor tweaks, write a commit message ]
> [ seanc: Fix stray override of initial_gs in common_cpu_up() ]
> Not-signed-off-by: Thomas Gleixner <[email protected]>

I'm happy to add my SOB if someone actually writes a coherent changelog
which complies with the documented rules :)

2023-02-07 00:09:16

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH v6 07/11] x86/smpboot: Disable parallel boot for AMD CPUs

On Thu, Feb 02 2023 at 21:56, Usama Arif wrote:

> From: David Woodhouse <[email protected]>

-ENOCONTENT

> Signed-off-by: David Woodhouse <[email protected]>

2023-02-07 00:23:56

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH v6 07/11] x86/smpboot: Disable parallel boot for AMD CPUs

On Sat, Feb 04 2023 at 15:40, David Woodhouse wrote:
> On Fri, 2023-02-03 at 13:48 -0600, Kim Phillips wrote:
>> Then:
>>
>>   - a Ryzen 3000 (Picasso A1/Zen+) notebook I have access to fails to boot.
>>   - Zen 2,3,4-based servers boot fine
>>   - a Zen1-based server doesn't boot.
>
> I've changed it to use CPUID 0xb only if we're actually in x2apic mode,
> which Boris tells me won't be the case on Zen1 because that doesn't
> support X2APIC.

Correct.

> When we're not in x2apic mode, we can use CPUID 0x1 because the 8 bits
> of APIC ID we find there are perfectly sufficient.

Is that worth the trouble?

> Even though we *can* support non-X2APIC processors, we *might* want to
> play it safe and not go back that far; only enabling parallel bringup
> on machines with X2APIC which roughly correlates with "lots of CPUs"
> since that's where the benefit is.

The parallel bringup code is complex enough already, so please don't
optimize for the non-interesting case in the first place. When this has
stabilized then the CPUID 0x1 mechanism can be added if anyone thinks
it's interesting. KISS is still the best engineering principle.

Thanks,

tglx

2023-02-07 00:28:46

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH v6 08/11] x86/smpboot: Send INIT/SIPI/SIPI to secondary CPUs in parallel

On Thu, Feb 02 2023 at 21:56, Usama Arif wrote:

> From: David Woodhouse <[email protected]>
>
> When the APs can find their own APIC ID without assistance, we can do

We can do nothing ... Again, read Documentation/process ...

> the AP bringup in parallel.
> /**
> * arch_disable_smp_support() - disables SMP support for x86 at runtime
> */
> @@ -1550,6 +1560,10 @@ void __init native_smp_prepare_cpus(unsigned int max_cpus)
> do_parallel_bringup = false;
> }
>
> + if (do_parallel_bringup)
> + cpuhp_setup_state_nocalls(CPUHP_BP_PARALLEL_DYN, "x86/cpu:kick",
> + native_cpu_kick, NULL);

https://www.kernel.org/doc/html/latest/process/maintainer-tip.html#bracket-rules

Thanks,

tglx

2023-02-07 00:31:51

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH v6 11/11] x86/smpboot: reuse timer calibration

On Thu, Feb 02 2023 at 21:56, Usama Arif wrote:
> From: Arjan van de Ven <[email protected]>
>
> No point recalibrating for known-constant tsc.
> When tested on a 128 core, 2 socket server, this reduces
> the parallel smpboot time from 100ms to 30ms.

This is completely independent of the parallel bringup, so this should
have a proper SOB and be at the beginning of this series for a long
time.

> Not-signed-off-by: Arjan van de Ven <[email protected]>

Thanks,

tglx

2023-02-07 01:25:18

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH v6 02/11] cpu/hotplug: Move idle_thread_get() to <linux/smpboot.h>

On Tue, Feb 07, 2023 at 12:33:03AM +0100, Thomas Gleixner wrote:
> On Thu, Feb 02 2023 at 21:56, Usama Arif wrote:
> > If the architecture code is to register its new pre-bringup states with
> > the cpuhp core, having a special-case wrapper to pass extra arguments is
> > non-trivial and it's easier just to let the arch register its function
> > pointer to be invoked with the standard API.
> >
> > Signed-off-by: David Woodhouse <[email protected]>
> > Signed-off-by: Paul E. McKenney <[email protected]>
>
> Same issue vs. the SOB chain.
>
> Other than that:
>
> Reviewed-by: Thomas Gleixner <[email protected]>

I pulled David's earlier version of this series into -rcu strictly
for testing purposes, so perhaps Usama pulled the series from my repo.
I don't have any record of doing anything more than test that series,
so dropping my SoB entirely makes the most sense here.

Thanx, Paul

2023-02-07 10:05:44

by David Woodhouse

[permalink] [raw]
Subject: Re: [PATCH v6 07/11] x86/smpboot: Disable parallel boot for AMD CPUs

On Tue, 2023-02-07 at 01:23 +0100, Thomas Gleixner wrote:
> On Sat, Feb 04 2023 at 15:40, David Woodhouse wrote:
> > On Fri, 2023-02-03 at 13:48 -0600, Kim Phillips wrote:
> > > Then:
> > >
> > >   - a Ryzen 3000 (Picasso A1/Zen+) notebook I have access to fails to boot.
> > >   - Zen 2,3,4-based servers boot fine
> > >   - a Zen1-based server doesn't boot.
> >
> > I've changed it to use CPUID 0xb only if we're actually in x2apic mode,
> > which Boris tells me won't be the case on Zen1 because that doesn't
> > support X2APIC.
>
> Correct.
>
> > When we're not in x2apic mode, we can use CPUID 0x1 because the 8 bits
> > of APIC ID we find there are perfectly sufficient.
>
> Is that worth the trouble?

Well, that's what was being debated. I think the conclusion that was
bring reached was that it *is* worth the trouble, because there will be
a number of physical and especially virtual machines which have a high
CPU count but which don't actually use X2APIC mode. And which might not
even *support* CPUID 0xb.

So using CPUID 0x1 when there is no APIC ID greater than 254 does seem
to make sense.


> > Even though we *can* support non-X2APIC processors, we *might* want to
> > play it safe and not go back that far; only enabling parallel bringup
> > on machines with X2APIC which roughly correlates with "lots of CPUs"
> > since that's where the benefit is.
>
> The parallel bringup code is complex enough already, so please don't
> optimize for the non-interesting case in the first place. When this has
> stabilized then the CPUID 0x1 mechanism can be added if anyone thinks
> it's interesting. KISS is still the best engineering principle.

Actually it ends up being trivial. It probably makes sense to keep it
in there even if it can only be exercised by a deliberate opt-in on
older CPUs. I reworked the register usage from your original anyway,
which helps a little.

testl $STARTUP_APICID_CPUID_0B, %edx
jnz .Luse_cpuid_0b
testl $STARTUP_APICID_CPUID_01, %edx
jnz .Luse_cpuid_01
andl $0x0FFFFFFF, %edx
jmp .Lsetup_AP

.Luse_cpuid_01:
mov $0x01, %eax
cpuid
mov %ebx, %edx
shr $24, %edx
jmp .Lsetup_AP

.Luse_cpuid_0b:
mov $0x0B, %eax
xorl %ecx, %ecx
cpuid

.Lsetup_AP:
/* EDX contains the APICID of the current CPU */


Attachments:
smime.p7s (5.83 kB)

2023-02-07 10:57:59

by David Woodhouse

[permalink] [raw]
Subject: Re: [PATCH v6 01/11] x86/apic/x2apic: Fix parallel handling of cluster_mask

On Tue, 2023-02-07 at 00:20 +0100, Thomas Gleixner wrote:
>
>
> TBH. The logic of this code is anything but obvious. Something like the
> uncompiled below perhaps?

Looks sane to me. I'll tweak the comments a bit and give it a spin;
thanks.

...

> +        * At boot time CPU present mask is stable. If the cluster is not
> +        * yet initialized, allocate the mask and propagate it to all
> +        * siblings in this cluster.
>          */
> -       if (cluster_hotplug_mask) {
> -               if (cluster_hotplug_mask->node == node)
> -                       return 0;
> -               kfree(cluster_hotplug_mask);
> -       }
> +       if (system_state < SYSTEM_RUNNING)
> +               goto alloc;
> +
> +       /*
> +        * On post boot hotplug iterate over the present CPUs to handle the
> +        * case of partial clusters as they might be presented by
> +        * virtualization.
> +        */
> +       for_each_present_cpu(cpu_i) {


So... if this CPU was *present* at boot time (and if any other CPU in
this cluster was present), it will already have a cluster_mask.

Which means we get here in two cases:

• This CPU wasn't actually present (was just 'possible') at boot time.
(Is that actually a thing that happens?)

• This CPU was present but no other CPU in this cluster was actually
brought up at boot time so the cluster_mask wasn't allocated.

The code looks right, I don't grok the comment about partial clusters
and virtualization, and would have worded it something along the above
lines?


Attachments:
smime.p7s (5.83 kB)

2023-02-07 11:27:57

by David Woodhouse

[permalink] [raw]
Subject: Re: [PATCH v6 01/11] x86/apic/x2apic: Fix parallel handling of cluster_mask

On Tue, 2023-02-07 at 10:57 +0000, David Woodhouse wrote:
>
> > +       /*
> > +        * On post boot hotplug iterate over the present CPUs to handle the
> > +        * case of partial clusters as they might be presented by
> > +        * virtualization.
> > +        */
> > +       for_each_present_cpu(cpu_i) {
>
>
> So... if this CPU was *present* at boot time (and if any other CPU in
> this cluster was present), it will already have a cluster_mask.
>
> Which means we get here in two cases:
>
>  • This CPU wasn't actually present (was just 'possible') at boot time.
>    (Is that actually a thing that happens?)
>
>  • This CPU was present but no other CPU in this cluster was actually
>    brought up at boot time so the cluster_mask wasn't allocated.
>
> The code looks right, I don't grok the comment about partial clusters
> and virtualization, and would have worded it something along the above
> lines?

As I get my head around that, I think the code needs to change too.
What if we *unplug* the only CPU in a cluster (present→possible), then
add a new one in the same cluster? The new one would get a new
cluster_mask. Which is kind of OK for now but then if we re-add the
original CPU it'd continue to use its old cluster_mask.

Now, that's kind of weird if it's physical CPUs because that cluster is
within a given chip, isn't it? But with virtualization maybe that's
something that could happen, and it doesn't hurt to be completely safe
by using for_each_possible_cpu() instead?

Now looks like this:


/*
* On post boot hotplug for a CPU which was not present at boot time,
* iterate over all possible CPUs (even those which are not present
* any more) to find any existing cluster mask.
*/
for_each_possible_cpu(cpu_i) {


Attachments:
smime.p7s (5.83 kB)

2023-02-07 14:23:10

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH v6 01/11] x86/apic/x2apic: Fix parallel handling of cluster_mask

David!

On Tue, Feb 07 2023 at 10:57, David Woodhouse wrote:
> On Tue, 2023-02-07 at 00:20 +0100, Thomas Gleixner wrote:
>> +       /*
>> +        * On post boot hotplug iterate over the present CPUs to handle the
>> +        * case of partial clusters as they might be presented by
>> +        * virtualization.
>> +        */
>> +       for_each_present_cpu(cpu_i) {
>
>
> So... if this CPU was *present* at boot time (and if any other CPU in
> this cluster was present), it will already have a cluster_mask.
>
> Which means we get here in two cases:
>
> • This CPU wasn't actually present (was just 'possible') at boot time.
> (Is that actually a thing that happens?)

It happens on systems which support physical hotplug and AFAIK also
virtualization has support for "physical" hotplug.

The same is true the other way round on phsyical unplug. Then the CPU
is removed from present but is still set in possible.

> • This CPU was present but no other CPU in this cluster was actually
> brought up at boot time so the cluster_mask wasn't allocated.

Correct.

> The code looks right, I don't grok the comment about partial clusters
> and virtualization, and would have worded it something along the above
> lines?

My worry was that virtualization might be able to phsyically hotplug
partial clusters. Whether that's possible I don't know, but in context
of virtualization I always assume the worst case.

Thanks,

tglx

2023-02-07 14:24:57

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH v6 01/11] x86/apic/x2apic: Fix parallel handling of cluster_mask

On Tue, Feb 07 2023 at 11:27, David Woodhouse wrote:
> On Tue, 2023-02-07 at 10:57 +0000, David Woodhouse wrote:
>>  • This CPU was present but no other CPU in this cluster was actually
>>    brought up at boot time so the cluster_mask wasn't allocated.
>>
>> The code looks right, I don't grok the comment about partial clusters
>> and virtualization, and would have worded it something along the above
>> lines?
>
> As I get my head around that, I think the code needs to change too.
> What if we *unplug* the only CPU in a cluster (present→possible), then
> add a new one in the same cluster? The new one would get a new
> cluster_mask. Which is kind of OK for now but then if we re-add the
> original CPU it'd continue to use its old cluster_mask.

Indeed.

> Now, that's kind of weird if it's physical CPUs because that cluster is
> within a given chip, isn't it? But with virtualization maybe that's
> something that could happen, and it doesn't hurt to be completely safe
> by using for_each_possible_cpu() instead?

Yes. Virtualization does aweful things....

> Now looks like this:
> /*
> * On post boot hotplug for a CPU which was not present at boot time,
> * iterate over all possible CPUs (even those which are not present
> * any more) to find any existing cluster mask.
> */
> for_each_possible_cpu(cpu_i) {

Looks good!

tglx

2023-02-07 14:41:38

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH v6 04/11] x86/smpboot: Reference count on smpboot_setup_warm_reset_vector()

David!

On Tue, Feb 07 2023 at 09:49, David Woodhouse wrote:

Can you please fix your mail client to _NOT_ send multipart/mixed mails?
Despite the CC list being insanely large, your replies are dropped by
vger and missing in the archives.

> On Tue, 2023-02-07 at 00:48 +0100, Thomas Gleixner wrote:
>> On Thu, Feb 02 2023 at 21:56, Usama Arif wrote:
>> > From: David Woodhouse <[email protected]>
>> >
>> > If we want to do parallel CPU bringup, we're going to need to set this up
>> > and leave it until all CPUs are done. Might as well use the RTC spinlock
>> > to protect the refcount, as we need to take it anyway.
>>
>> https://www.kernel.org/doc/html/latest/process/maintainer-tip.html#changelog
>>
>> Aside of the 'We' this does not explain anything at all.
>
> Er, doesn't it?
>
> We refcount the warm reset vector because when we do parallel bringup,
> we'll want to set it up once and then put it back to normal (for cold
> reset) once all the CPUs are up.
>
> I can rework the phrasing; I'm aware that the whole nonsense about
> pronouns and the imperative mood has grown legs in the last couple of
> years since I originally wrote it — but is there anything actually
> missing?

We can settle the imperative mood debate over a beer at the next
conference, but stuff which goes through tip is required to follow those
rules. No exception for you :)

Vs. the content: This changelog lacks context. Changelogs have to be
self contained and self explanatory. Assuming that they are
understandable due to the context of the patch series is just wrong. I
fundamentally hate it when I have to dig out the context when I stare at
the changelog of a commit.

So something like this:

The warm reset vector on X86 is setup through the RTC (CMOS) clock
for each CPU bringup operation and cleared after the CPU came online.

Parallel bringup of multiple CPUs requires that the warm reset vector
is valid until all CPUs came online.

To prepare for that add refcounting for the reset vector and protect
it with the rtc_lock which has to be taken for the setup operation
anyway.

gives the full context and is simply factual, no?

Thanks,

tglx



2023-02-07 14:44:47

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH v6 07/11] x86/smpboot: Disable parallel boot for AMD CPUs

David!

On Tue, Feb 07 2023 at 10:04, David Woodhouse wrote:
> On Tue, 2023-02-07 at 01:23 +0100, Thomas Gleixner wrote:
>> > When we're not in x2apic mode, we can use CPUID 0x1 because the 8 bits
>> > of APIC ID we find there are perfectly sufficient.
>>
>> Is that worth the trouble?
>
> Well, that's what was being debated. I think the conclusion that was
> bring reached was that it *is* worth the trouble, because there will be
> a number of physical and especially virtual machines which have a high
> CPU count but which don't actually use X2APIC mode. And which might not
> even *support* CPUID 0xb.
>
> So using CPUID 0x1 when there is no APIC ID greater than 254 does seem
> to make sense.

Fair enough.

>> > Even though we *can* support non-X2APIC processors, we *might* want to
>> > play it safe and not go back that far; only enabling parallel bringup
>> > on machines with X2APIC which roughly correlates with "lots of CPUs"
>> > since that's where the benefit is.
>>
>> The parallel bringup code is complex enough already, so please don't
>> optimize for the non-interesting case in the first place. When this has
>> stabilized then the CPUID 0x1 mechanism can be added if anyone thinks
>> it's interesting. KISS is still the best engineering principle.
>
> Actually it ends up being trivial. It probably makes sense to keep it
> in there even if it can only be exercised by a deliberate opt-in on
> older CPUs. I reworked the register usage from your original anyway,
> which helps a little.
>
> testl $STARTUP_APICID_CPUID_0B, %edx
> jnz .Luse_cpuid_0b
> testl $STARTUP_APICID_CPUID_01, %edx
> jnz .Luse_cpuid_01
> andl $0x0FFFFFFF, %edx
> jmp .Lsetup_AP
>
> .Luse_cpuid_01:
> mov $0x01, %eax
> cpuid
> mov %ebx, %edx
> shr $24, %edx
> jmp .Lsetup_AP
>
> .Luse_cpuid_0b:
> mov $0x0B, %eax
> xorl %ecx, %ecx
> cpuid
>
> .Lsetup_AP:
> /* EDX contains the APICID of the current CPU */

That looks trivial enough. So no objections from my side. Not sure
whether this needs a special opt-in though. We probably want an opt-out
for the parallel bringup mode for diagnosis purposes anyway and that
should be good enough for a start.

Thanks,

tglx

2023-02-07 14:46:30

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH v6 07/11] x86/smpboot: Disable parallel boot for AMD CPUs

On Tue, Feb 07 2023 at 09:52, David Woodhouse wrote:

> On Tue, 2023-02-07 at 01:09 +0100, Thomas Gleixner wrote:
>> On Thu, Feb 02 2023 at 21:56, Usama Arif wrote:
>>
>> -ENOCONTENT
>
>
> Yeah, I'm aware of the nonsense here too but when I'm actually writing
> a commit message, years of habit kick in and it doesn't occur to me to
> pointlessly repeat the words that are already there, one line up where
> it says "Disable parallel boot for AMD CPUs".
>
> I'm old and stupid, but eventually I'll start to remember that people
> nowadays like to gratuitously refuse to read those words, and they want
> them repeated in a different place.

I'm not asking for repeating information from the subject line but to
actually add a rationale. The WHY is the most important information of a
changelog, no?

> Bear with me...

I do, but that doesn't mean I'm giving you special treatment :)

Thanks,

tglx

2023-02-07 16:27:24

by Kim Phillips

[permalink] [raw]
Subject: Re: [PATCH v6 07/11] x86/smpboot: Disable parallel boot for AMD CPUs

On 2/6/23 11:58 AM, Kim Phillips wrote:
> On 2/4/23 9:40 AM, David Woodhouse wrote:
>> On Fri, 2023-02-03 at 13:48 -0600, Kim Phillips wrote:
>>> If I:
>>>
>>>    - take dwmw2's parallel-6.2-rc6 branch (commit 459d1c46dbd1)
>>>    - remove the set_cpu_bug(c, X86_BUG_NO_PARALLEL_BRINGUP) line from amd.c
>>>
>>> Then:
>>>
>>>    - a Ryzen 3000 (Picasso A1/Zen+) notebook I have access to fails to boot.
>>>    - Zen 2,3,4-based servers boot fine
>>>    - a Zen1-based server doesn't boot.
>>
>> I've changed it to use CPUID 0xb only if we're actually in x2apic mode,
>> which Boris tells me won't be the case on Zen1 because that doesn't
>> support X2APIC.
>>
>> When we're not in x2apic mode, we can use CPUID 0x1 because the 8 bits
>> of APIC ID we find there are perfectly sufficient.
>>
>> New tree in the same place as before, commit ce7e2d1e046a for the
>> parallel-6.2-rc6-part1 tag and 17bbd12ee03 for parallel-6.2-rc6.
>
> Thanks, Zen 1 through 4 based servers all boot both those two tree
> commits successfully.
>
> I'll try that Ryzen again later.

The Ryzen 3000 also successfully boots those two branches now.

Thanks,

Kim

2023-02-07 16:50:47

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v6 04/11] x86/smpboot: Reference count on smpboot_setup_warm_reset_vector()

On Tue, Feb 07, 2023, Thomas Gleixner wrote:
> On Tue, Feb 07 2023 at 09:49, David Woodhouse wrote:
> > On Tue, 2023-02-07 at 00:48 +0100, Thomas Gleixner wrote:
> >> Aside of the 'We' this does not explain anything at all.
> >
> > Er, doesn't it?
> >
> > We refcount the warm reset vector because when we do parallel bringup,
> > we'll want to set it up once and then put it back to normal (for cold
> > reset) once all the CPUs are up.
> >
> > I can rework the phrasing; I'm aware that the whole nonsense about
> > pronouns and the imperative mood has grown legs in the last couple of
> > years since I originally wrote it — but is there anything actually
> > missing?
>
> We can settle the imperative mood debate over a beer at the next
> conference, but stuff which goes through tip is required to follow those
> rules. No exception for you :)

While we're reforming David, same goes for KVM x86. :-)

2023-02-07 19:49:10

by David Woodhouse

[permalink] [raw]
Subject: Re: [EXTERNAL][PATCH v6 04/11] x86/smpboot: Reference count on smpboot_setup_warm_reset_vector()

On Tue, 2023-02-07 at 15:39 +0100, Thomas Gleixner wrote:
> David!
>
> On Tue, Feb 07 2023 at 09:49, David Woodhouse wrote:
>
> Can you please fix your mail client to _NOT_ send multipart/mixed mails?
> Despite the CC list being insanely large, your replies are dropped by
> vger and missing in the archives.
>

That's not the client; that's the stupid mail system doing it in
transit. Sorry, I'd already filed a ticket about that idiocy last week
when I noticed they'd started adding HTML parts to a previously sane
mail. But obviously they haven't managed to fix it yet.

The correct thing to do in the meantime is use a non-broken mail
account, and I just forgot this morning until half way through the
thread, when you'll note the coffee kicked in and I switched.


> > On Tue, 2023-02-07 at 00:48 +0100, Thomas Gleixner wrote:
> > > On Thu, Feb 02 2023 at 21:56, Usama Arif wrote:
> > > > From: David Woodhouse <[email protected]>
> > > >
> > > > If we want to do parallel CPU bringup, we're going to need to set this up
> > > > and leave it until all CPUs are done. Might as well use the RTC spinlock
> > > > to protect the refcount, as we need to take it anyway.
> > >
> > > https://www.kernel.org/doc/html/latest/process/maintainer-tip.html#changelog
> > >
> > > Aside of the 'We' this does not explain anything at all.
> >
> > Er, doesn't it?
> >
> > We refcount the warm reset vector because when we do parallel bringup,
> > we'll want to set it up once and then put it back to normal (for cold
> > reset) once all the CPUs are up.
> >
> > I can rework the phrasing; I'm aware that the whole nonsense about
> > pronouns and the imperative mood has grown legs in the last couple of
> > years since I originally wrote it — but is there anything actually
> > missing?
>
> We can settle the imperative mood debate over a beer at the next
> conference, but stuff which goes through tip is required to follow those
> rules. No exception for you :)
>
> Vs. the content: This changelog lacks context. Changelogs have to be
> self contained and self explanatory. Assuming that they are
> understandable due to the context of the patch series is just wrong. I
> fundamentally hate it when I have to dig out the context when I stare at
> the changelog of a commit.
>
> So something like this:
>
>    The warm reset vector on X86 is setup through the RTC (CMOS) clock
>    for each CPU bringup operation and cleared after the CPU came online.
>
>    Parallel bringup of multiple CPUs requires that the warm reset vector
>    is valid until all CPUs came online.
>
>    To prepare for that add refcounting for the reset vector and protect
>    it with the rtc_lock which has to be taken for the setup operation
>    anyway.
>
> gives the full context and is simply factual, no?

Ack.


Attachments:
smime.p7s (5.83 kB)

2023-02-07 19:54:08

by David Woodhouse

[permalink] [raw]
Subject: Re: [PATCH v6 01/11] x86/apic/x2apic: Fix parallel handling of cluster_mask

On Tue, 2023-02-07 at 15:24 +0100, Thomas Gleixner wrote:
> On Tue, Feb 07 2023 at 11:27, David Woodhouse wrote:
> > On Tue, 2023-02-07 at 10:57 +0000, David Woodhouse wrote:
> > >  • This CPU was present but no other CPU in this cluster was actually
> > >    brought up at boot time so the cluster_mask wasn't allocated.
> > >
> > > The code looks right, I don't grok the comment about partial clusters
> > > and virtualization, and would have worded it something along the above
> > > lines?
> >
> > As I get my head around that, I think the code needs to change too.
> > What if we *unplug* the only CPU in a cluster (present→possible), then
> > add a new one in the same cluster? The new one would get a new
> > cluster_mask. Which is kind of OK for now but then if we re-add the
> > original CPU it'd continue to use its old cluster_mask.
>
> Indeed.
>
> > Now, that's kind of weird if it's physical CPUs because that cluster is
> > within a given chip, isn't it? But with virtualization maybe that's
> > something that could happen, and it doesn't hurt to be completely safe
> > by using for_each_possible_cpu() instead?
>
> Yes. Virtualization does aweful things....
>
> > Now looks like this:
> >         /*
> >          * On post boot hotplug for a CPU which was not present at boot time,
> >          * iterate over all possible CPUs (even those which are not present
> >          * any more) to find any existing cluster mask.
> >          */
> >         for_each_possible_cpu(cpu_i) {
>
> Looks good!


Thanks. I've reworked and I think I've caught everything. Didn't want
to elide the credit where Usama had done some of the forward-porting
work, so I've left those notes and the SoB intact on those patches, on
the assumption that they will be reposting the series after proper
testing on hardware again anyway (I'm only spawning it in qemu right
now).

https://git.infradead.org/users/dwmw2/linux.git/shortlog/refs/heads/parallel-6.2-rc7

The only real code change other than what we've discussed here is to
implement what we talked about for CPUID 0xb vs. 0x1 etc:


/*
* We can do 64-bit AP bringup in parallel if the CPU reports
* its APIC ID in CPUID (either leaf 0x0B if we need the full
* APIC ID in X2APIC mode, or leaf 0x01 if 8 bits are
* sufficient). Otherwise it's too hard. And not for SEV-ES
* guests because they can't use CPUID that early.
*/
if (IS_ENABLED(CONFIG_X86_32) || boot_cpu_data.cpuid_level < 1 ||
(x2apic_mode && boot_cpu_data.cpuid_level < 0xb) ||
cc_platform_has(CC_ATTR_GUEST_STATE_ENCRYPT))
do_parallel_bringup = false;

if (do_parallel_bringup && x2apic_mode) {
unsigned int eax, ebx, ecx, edx;

/*
* To support parallel bringup in x2apic mode, the AP will need
* to obtain its APIC ID from CPUID 0x0B, since CPUID 0x01 has
* only 8 bits. Check that it is present and seems correct.
*/
cpuid_count(0xb, 0, &eax, &ebx, &ecx, &edx);

/*
* AMD says that if executed with an umimplemented level in
* ECX, then it will return all zeroes in EAX. Intel says it
* will return zeroes in both EAX and EBX. Checking only EAX
* should be sufficient.
*/
if (eax) {
smpboot_control = STARTUP_SECONDARY | STARTUP_APICID_CPUID_0B;
} else {
pr_info("Disabling parallel bringup because CPUID 0xb looks untrustworthy\n");
do_parallel_bringup = false;
}
} else if (do_parallel_bringup) {
/* Without X2APIC, what's in CPUID 0x01 should suffice. */
smpboot_control = STARTUP_SECONDARY | STARTUP_APICID_CPUID_01;
}



Attachments:
smime.p7s (5.83 kB)

2023-02-07 20:59:00

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH v6 01/11] x86/apic/x2apic: Fix parallel handling of cluster_mask

On Tue, Feb 07 2023 at 19:53, David Woodhouse wrote:
> On Tue, 2023-02-07 at 15:24 +0100, Thomas Gleixner wrote:
> Thanks. I've reworked and I think I've caught everything. Didn't want
> to elide the credit where Usama had done some of the forward-porting
> work, so I've left those notes and the SoB intact on those patches, on
> the assumption that they will be reposting the series after proper
> testing on hardware again anyway (I'm only spawning it in qemu right
> now).
>
> https://git.infradead.org/users/dwmw2/linux.git/shortlog/refs/heads/parallel-6.2-rc7
>
> The only real code change other than what we've discussed here is to
> implement what we talked about for CPUID 0xb vs. 0x1 etc:
>
> /*
> * We can do 64-bit AP bringup in parallel if the CPU reports
> * its APIC ID in CPUID (either leaf 0x0B if we need the full
> * APIC ID in X2APIC mode, or leaf 0x01 if 8 bits are
> * sufficient). Otherwise it's too hard. And not for SEV-ES
> * guests because they can't use CPUID that early.
> */
> if (IS_ENABLED(CONFIG_X86_32) || boot_cpu_data.cpuid_level < 1 ||
> (x2apic_mode && boot_cpu_data.cpuid_level < 0xb) ||
> cc_platform_has(CC_ATTR_GUEST_STATE_ENCRYPT))
> do_parallel_bringup = false;
>
> if (do_parallel_bringup && x2apic_mode) {
> unsigned int eax, ebx, ecx, edx;
>
> /*
> * To support parallel bringup in x2apic mode, the AP will need
> * to obtain its APIC ID from CPUID 0x0B, since CPUID 0x01 has
> * only 8 bits. Check that it is present and seems correct.
> */
> cpuid_count(0xb, 0, &eax, &ebx, &ecx, &edx);
>
> /*
> * AMD says that if executed with an umimplemented level in
> * ECX, then it will return all zeroes in EAX. Intel says it
> * will return zeroes in both EAX and EBX. Checking only EAX
> * should be sufficient.
> */
> if (eax) {
> smpboot_control = STARTUP_SECONDARY | STARTUP_APICID_CPUID_0B;
> } else {
> pr_info("Disabling parallel bringup because CPUID 0xb looks untrustworthy\n");
> do_parallel_bringup = false;
> }
> } else if (do_parallel_bringup) {
> /* Without X2APIC, what's in CPUID 0x01 should suffice. */
> smpboot_control = STARTUP_SECONDARY | STARTUP_APICID_CPUID_01;
> }

Looks good to me!

Thanks,

tglx

2023-02-07 23:16:49

by Arjan van de Ven

[permalink] [raw]
Subject: Re: [PATCH v6 11/11] x86/smpboot: reuse timer calibration

On 2/2/2023 1:56 PM, Usama Arif wrote:
> From: Arjan van de Ven <[email protected]>
>
> No point recalibrating for known-constant tsc.
> When tested on a 128 core, 2 socket server, this reduces
> the parallel smpboot time from 100ms to 30ms.
>
> Not-signed-off-by: Arjan van de Ven <[email protected]>

you can turn this in a Signed-off-By:



2023-02-07 23:55:57

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH v6 11/11] x86/smpboot: reuse timer calibration

Arjan!

On Tue, Feb 07 2023 at 15:16, Arjan van de Ven wrote:
> On 2/2/2023 1:56 PM, Usama Arif wrote:
>> From: Arjan van de Ven <[email protected]>
>>
>> No point recalibrating for known-constant tsc.
>> When tested on a 128 core, 2 socket server, this reduces
>> the parallel smpboot time from 100ms to 30ms.
>>
>> Not-signed-off-by: Arjan van de Ven <[email protected]>
>
> you can turn this in a Signed-off-By:

Please post your patch separately as it is a completely orthogonal issue
and can be reviewed and discussed independently of the parallel bringup
effort.

Thanks,

tglx