LinuxLists.cc - [PATCH 0/6] x86/cpu hotplug: Wake up offline CPU via mwait or nmi

2012-06-04 18:29:58

Subject: [PATCH 0/6] x86/cpu hotplug: Wake up offline CPU via mwait or nmi

From: Fenghua Yu <[email protected]>

Since offline CPU is in wmait or hlt if mwait feature is not available, it can
be waken up by writing to monitored memory range or via nmi.

Compared to current INIT, INIT, STARTUP wake up sequence, waking up offline CPU
is faster via wmait or nmi. This is especially useful when offline CPU for
power saving and shorter waking up time is desired. On one tested desktop
machine, waking up time via mwait or nmi is reduced to 23% of waking up time
via INIT. Waking up time is measured from the beginning of store_online() to
the beginning of cpu_idle() after the CPU is waken up.

Waking up offline CPU via mwait or nmi is also useful to support BSP offline/
online because offline BSP can not be waken up by the INIT's sequence. The BSP
offline/online patchset will be sent out seperately.

Fenghua Yu (6):
x86/Documentation/kernel-parameters.txt: Add wakeup_cpu_via_init
kernel parameter help
x86/head_32.S/head_64.S: Kernel entry code after waking up offline
CPU via mwait or nmi
x86/smpboot.c: Wake up offline CPU via mwait or nmi
x86/apic_flat_64.c: Wakeup function in apic calls mwait or nmi method
x86/x2apic_cluster.c: Wakeup function in x2apic_cluster calls mwait
or nmi method
x86/x2apic_phys.c: Wakeup function in x2apic_phys calls mwait or nmi
method

Documentation/kernel-parameters.txt | 3 +
arch/x86/include/asm/apic.h | 5 +-
arch/x86/include/asm/cpu.h | 1 +
arch/x86/kernel/apic/apic_flat_64.c | 2 +
arch/x86/kernel/apic/x2apic_cluster.c | 1 +
arch/x86/kernel/apic/x2apic_phys.c | 1 +
arch/x86/kernel/head_32.S | 12 ++
arch/x86/kernel/head_64.S | 14 +++
arch/x86/kernel/smpboot.c | 187 ++++++++++++++++++++++++++++-----
9 files changed, 198 insertions(+), 28 deletions(-)

2012-06-04 18:30:00

by Fenghua Yu

[permalink] [raw]

Subject: [PATCH 1/6] x86/Documentation/kernel-parameters.txt: Add wakeup_cpu_via_init kernel parameter help

From: Fenghua Yu <[email protected]>

The new kernel parameter wakeup_cpu_via_init overrides mwait or nmi method and
forces to wake up offline CPU via INIT, INIT, STARTUP sequence. This is useful
in RAS when a temporary CPU error could be fixed by INIT.

Signed-off-by: Fenghua Yu <[email protected]>
---
Documentation/kernel-parameters.txt | 3 +++
1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 2b17e82..8efae55 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -3099,6 +3099,9 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
Format:
<irq>,<irq_mask>,<io>,<full_duplex>,<do_sound>,<lockup_hack>[,<irq2>[,<irq3>[,<irq4>]]]

+ wakeup_cpu_via_init [X86]
+ Wake up offline CPU via INIT, INIT, STARTUP sequence.
+
______________________________________________________________________

TODO:
--
1.6.0.3

2012-06-04 18:30:45

by Fenghua Yu

[permalink] [raw]

Subject: [PATCH 4/6] x86/apic_flat_64.c: Wakeup function in apic calls mwait or nmi method

From: Fenghua Yu <[email protected]>

The wakeup function wakeup_secondary_cpu in apic_flat and apic_physflat call
wakeup_secondary_cpu_via_soft().

Signed-off-by: Fenghua Yu <[email protected]>
---
arch/x86/kernel/apic/apic_flat_64.c | 2 ++
1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/apic/apic_flat_64.c b/arch/x86/kernel/apic/apic_flat_64.c
index 0e881c4..dd79ab7 100644
--- a/arch/x86/kernel/apic/apic_flat_64.c
+++ b/arch/x86/kernel/apic/apic_flat_64.c
@@ -219,6 +219,7 @@ static struct apic apic_flat = {
.send_IPI_all = flat_send_IPI_all,
.send_IPI_self = apic_send_IPI_self,

+ .wakeup_secondary_cpu = wakeup_secondary_cpu_via_soft,
.trampoline_phys_low = DEFAULT_TRAMPOLINE_PHYS_LOW,
.trampoline_phys_high = DEFAULT_TRAMPOLINE_PHYS_HIGH,
.wait_for_init_deassert = NULL,
@@ -379,6 +380,7 @@ static struct apic apic_physflat = {
.send_IPI_all = physflat_send_IPI_all,
.send_IPI_self = apic_send_IPI_self,

+ .wakeup_secondary_cpu = wakeup_secondary_cpu_via_soft,
.trampoline_phys_low = DEFAULT_TRAMPOLINE_PHYS_LOW,
.trampoline_phys_high = DEFAULT_TRAMPOLINE_PHYS_HIGH,
.wait_for_init_deassert = NULL,
--
1.6.0.3

2012-06-04 18:30:40

by Fenghua Yu

[permalink] [raw]

Subject: [PATCH 6/6] x86/x2apic_phys.c: Wakeup function in x2apic_phys calls mwait or nmi method

wmb();
+ printk("%s:%d\n",__func__,__LINE__);
cpu_idle();
}

@@ -478,6 +479,8 @@ _wakeup_secondary_cpu_via_nmi(int apicid, int dest_mode)
unsigned long send_status, accept_status = 0;
int maxlvt;

+ printk("%s:%d\n",__func__,__LINE__);
+
/* Target chip */
/* Boot on the stack */
/* Kick the second */
@@ -530,6 +533,7 @@ DEFINE_PER_CPU(int, cpu_dead) = { 0 };

static int wakeup_secondary_cpu_via_mwait(int cpu)
{
+ printk("%s:%d\n",__func__,__LINE__);
per_cpu(cpu_dead, cpu) |= CPU_DEAD_TRIGGER;
return 0;
}
@@ -553,6 +557,7 @@ wakeup_secondary_cpu_via_init(int phys_apicid, unsigned long start_eip)
unsigned long send_status, accept_status = 0;
int maxlvt, num_starts, j;

+ printk("%s:%d\n",__func__,__LINE__);
maxlvt = lapic_get_maxlvt();

/*
diff --git a/drivers/base/cpu.c b/drivers/base/cpu.c
index 6345294..b218149 100644
--- a/drivers/base/cpu.c
+++ b/drivers/base/cpu.c
@@ -49,6 +49,7 @@ static ssize_t __ref store_online(struct device *dev,
kobject_uevent(&dev->kobj, KOBJ_OFFLINE);
break;
case '1':
+ printk("%s:%d\n",__func__,__LINE__);
ret = cpu_up(cpu->dev.id);
if (!ret)
kobject_uevent(&dev->kobj, KOBJ_ONLINE);

2012-06-04 18:30:37

by Fenghua Yu

[permalink] [raw]

Subject: [PATCH 6/6] x86/x2apic_phys.c: Wakeup function in x2apic_phys calls mwait or nmi method

From: Fenghua Yu <[email protected]>

The wakeup_secondary_cpu function in apic_x2apic_phys calls
wakeup_secondary_cpu_via_soft().

Signed-off-by: Fenghua Yu <[email protected]>
---
arch/x86/kernel/apic/x2apic_phys.c | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/apic/x2apic_phys.c b/arch/x86/kernel/apic/x2apic_phys.c
index c17e982..64d9804 100644
--- a/arch/x86/kernel/apic/x2apic_phys.c
+++ b/arch/x86/kernel/apic/x2apic_phys.c
@@ -164,6 +164,7 @@ static struct apic apic_x2apic_phys = {
.send_IPI_all = x2apic_send_IPI_all,
.send_IPI_self = x2apic_send_IPI_self,

+ .wakeup_secondary_cpu = wakeup_secondary_cpu_via_soft,
.trampoline_phys_low = DEFAULT_TRAMPOLINE_PHYS_LOW,
.trampoline_phys_high = DEFAULT_TRAMPOLINE_PHYS_HIGH,
.wait_for_init_deassert = NULL,
--
1.6.0.3

2012-06-04 18:31:25

by Fenghua Yu

[permalink] [raw]

Subject: [PATCH 3/6] x86/smpboot.c: Wake up offline CPU via mwait or nmi

From: Fenghua Yu <[email protected]>

wakeup_secondary_cpu_via_soft() is defined to wake up offline CPU via mwait if
the CPU is in mwait or via nmi if the CPU is in hlt.

A CPU boots up by INIT, INIT, STARTUP sequence when it boots up for the first
time during boot time or hot plug.

Signed-off-by: Fenghua Yu <[email protected]>
---
arch/x86/include/asm/apic.h | 5 +-
arch/x86/kernel/smpboot.c | 187 ++++++++++++++++++++++++++++++++++++------
2 files changed, 164 insertions(+), 28 deletions(-)

diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h
index eaff479..cad00b1 100644
--- a/arch/x86/include/asm/apic.h
+++ b/arch/x86/include/asm/apic.h
@@ -425,7 +425,10 @@ extern struct apic *__apicdrivers[], *__apicdrivers_end[];
#ifdef CONFIG_SMP
extern atomic_t init_deasserted;
extern int wakeup_secondary_cpu_via_nmi(int apicid, unsigned long start_eip);
-#endif
+extern int wakeup_secondary_cpu_via_soft(int apicid, unsigned long start_eip);
+#else /* CONFIG_SMP */
+#define wakeup_secondary_cpu_via_soft NULL
+#endif /* CONFIG_SMP */

#ifdef CONFIG_X86_LOCAL_APIC

diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index fd019d7..109df30 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -472,13 +472,8 @@ void __inquire_remote_apic(int apicid)
}
}

-/*
- * Poke the other CPU in the eye via NMI to wake it up. Remember that the normal
- * INIT, INIT, STARTUP sequence will reset the chip hard for us, and this
- * won't ... remember to clear down the APIC, etc later.
- */
-int __cpuinit
-wakeup_secondary_cpu_via_nmi(int logical_apicid, unsigned long start_eip)
+static int __cpuinit
+_wakeup_secondary_cpu_via_nmi(int apicid, int dest_mode)
{
unsigned long send_status, accept_status = 0;
int maxlvt;
@@ -486,7 +481,7 @@ wakeup_secondary_cpu_via_nmi(int logical_apicid, unsigned long start_eip)
/* Target chip */
/* Boot on the stack */
/* Kick the second */
- apic_icr_write(APIC_DM_NMI | apic->dest_logical, logical_apicid);
+ apic_icr_write(APIC_DM_NMI | dest_mode, apicid);

pr_debug("Waiting for send to finish...\n");
send_status = safe_apic_wait_icr_idle();
@@ -511,6 +506,47 @@ wakeup_secondary_cpu_via_nmi(int logical_apicid, unsigned long start_eip)
return (send_status | accept_status);
}

+/*
+ * Poke the other CPU in the eye via NMI to wake it up. Remember that the normal
+ * INIT, INIT, STARTUP sequence will reset the chip hard for us, and this
+ * won't ... remember to clear down the APIC, etc later.
+ */
+int __cpuinit
+wakeup_secondary_cpu_via_nmi_phys(int phys_apicid, unsigned long start_eip)
+{
+ return _wakeup_secondary_cpu_via_nmi(phys_apicid, APIC_DEST_PHYSICAL);
+}
+
+int __cpuinit
+wakeup_secondary_cpu_via_nmi(int logical_apicid, unsigned long start_eip)
+{
+ return _wakeup_secondary_cpu_via_nmi(logical_apicid, APIC_DEST_LOGICAL);
+}
+
+DEFINE_PER_CPU(int, cpu_dead) = { 0 };
+#define CPU_DEAD_TRIGGER 1
+#define CPU_DEAD_MWAIT 2
+#define CPU_DEAD_HLT 4
+
+static int wakeup_secondary_cpu_via_mwait(int cpu)
+{
+ per_cpu(cpu_dead, cpu) |= CPU_DEAD_TRIGGER;
+ return 0;
+}
+
+static int wakeup_cpu_nmi(unsigned int cmd, struct pt_regs *regs)
+{
+ int cpu = smp_processor_id();
+ int *cpu_dead_ptr;
+
+ cpu_dead_ptr = &per_cpu(cpu_dead, cpu);
+ if (!cpu_online(cpu) && (*cpu_dead_ptr & CPU_DEAD_HLT) &&
+ (*cpu_dead_ptr & CPU_DEAD_TRIGGER))
+ return NMI_HANDLED;
+
+ return NMI_DONE;
+}
+
static int __cpuinit
wakeup_secondary_cpu_via_init(int phys_apicid, unsigned long start_eip)
{
@@ -626,6 +662,52 @@ wakeup_secondary_cpu_via_init(int phys_apicid, unsigned long start_eip)
return (send_status | accept_status);
}

+/*
+ * Kick a cpu.
+ *
+ * If the CPU is in mwait, wake it up by mwait method. Otherwise, if the CPU is
+ * in halt, wake it up by NMI. If none of above exists, wake it up by INIT boot
+ * APIC message.
+ *
+ * When the CPU first time boots up, i.e. cpu_dead is 0, it's waken up by INIT
+ * boot APIC message.
+ *
+ * At this point, the CPU should be in a fixed dead state. So we don't consider
+ * racy condition here.
+ */
+int __cpuinit
+wakeup_secondary_cpu_via_soft(int apicid, unsigned long start_eip)
+{
+ int cpu;
+ int boot_error = 0;
+ /* start_ip had better be page-aligned! */
+ unsigned long start_ip = real_mode_header->trampoline_start;
+
+ for (cpu = 0; cpu < nr_cpu_ids; cpu++)
+ if (apicid == apic->cpu_present_to_apicid(cpu))
+ break;
+
+ if (cpu >= nr_cpu_ids)
+ return -EINVAL;
+
+ if (per_cpu(cpu_dead, cpu) & CPU_DEAD_MWAIT) {
+ boot_error = wakeup_secondary_cpu_via_mwait(cpu);
+ } else if (per_cpu(cpu_dead, cpu) & CPU_DEAD_HLT) {
+ int *cpu_dead_ptr;
+
+ cpu_dead_ptr = &per_cpu(cpu_dead, cpu);
+ *cpu_dead_ptr |= CPU_DEAD_TRIGGER;
+
+ boot_error = wakeup_secondary_cpu_via_nmi_phys(apicid,
+ start_ip);
+ if (boot_error)
+ *cpu_dead_ptr &= ~CPU_DEAD_TRIGGER;
+ } else
+ boot_error = wakeup_secondary_cpu_via_init(apicid, start_ip);
+
+ return boot_error;
+}
+
/* reduce the number of lines printed when booting a large cpu count system */
static void __cpuinit announce_cpu(int cpu, int apicid)
{
@@ -778,6 +860,7 @@ static int __cpuinit do_boot_cpu(int apicid, int cpu, struct task_struct *idle)
*/
smpboot_restore_warm_reset_vector();
}
+
return boot_error;
}

@@ -977,6 +1060,20 @@ static void __init smp_cpu_index_default(void)
}
}

+static bool mwait_supported(void)
+{
+ struct cpuinfo_x86 *c = __this_cpu_ptr(&cpu_info);
+
+ if (!(this_cpu_has(X86_FEATURE_MWAIT) && mwait_usable(c)))
+ return false;
+ if (!this_cpu_has(X86_FEATURE_CLFLSH))
+ return false;
+ if (__this_cpu_read(cpu_info.cpuid_level) < CPUID_MWAIT_LEAF)
+ return false;
+
+ return true;
+}
+
/*
* Prepare for SMP bootup. The MP table or ACPI has been read
* earlier. Just do some sanity checking here and enable APIC mode.
@@ -1051,6 +1148,11 @@ void __init native_smp_prepare_cpus(unsigned int max_cpus)
uv_system_init();

set_mtrr_aps_delayed_init();
+
+#ifdef CONFIG_HOTPLUG_CPU
+ if (!mwait_supported())
+ register_nmi_handler(NMI_LOCAL, wakeup_cpu_nmi, 0, "wake_cpu");
+#endif
out:
preempt_enable();
}
@@ -1111,6 +1213,12 @@ static int __init _setup_possible_cpus(char *str)
}
early_param("possible_cpus", _setup_possible_cpus);

+static int __init setup_wakeup_cpu_via_init(char *str)
+{
+ apic->wakeup_secondary_cpu = NULL;
+ return 0;
+}
+__setup("wakeup_cpu_via_init", setup_wakeup_cpu_via_init);

/*
* cpu_possible_mask should be static, it cannot change as cpu's
@@ -1286,6 +1394,28 @@ void play_dead_common(void)
local_irq_disable();
}

+static bool wakeup_cpu(int *trigger)
+{
+ unsigned int timeout;
+
+ /*
+ * Wait up to 1 seconds to check if CPU wakeup trigger is set in
+ * cpu_dead by either memory write or NMI.
+ * If there is no CPU wakeup trigger, go back to sleep.
+ */
+ for (timeout = 0; timeout < 1000000; timeout++) {
+ /*
+ * Check if CPU0 wakeup NMI is issued and handled.
+ */
+ if (*trigger & CPU_DEAD_TRIGGER)
+ return true;
+
+ udelay(1);
+ }
+
+ return false;
+}
+
/*
* We need to flush the caches before going to sleep, lest we have
* dirty data in our caches when we come back up.
@@ -1296,14 +1426,9 @@ static inline void mwait_play_dead(void)
unsigned int highest_cstate = 0;
unsigned int highest_subcstate = 0;
int i;
- void *mwait_ptr;
- struct cpuinfo_x86 *c = __this_cpu_ptr(&cpu_info);
+ int *cpu_dead_ptr;

- if (!(this_cpu_has(X86_FEATURE_MWAIT) && mwait_usable(c)))
- return;
- if (!this_cpu_has(X86_FEATURE_CLFLSH))
- return;
- if (__this_cpu_read(cpu_info.cpuid_level) < CPUID_MWAIT_LEAF)
+ if (!mwait_supported())
return;

eax = CPUID_MWAIT_LEAF;
@@ -1328,16 +1453,10 @@ static inline void mwait_play_dead(void)
(highest_subcstate - 1);
}

- /*
- * This should be a memory location in a cache line which is
- * unlikely to be touched by other processors. The actual
- * content is immaterial as it is not actually modified in any way.
- */
- mwait_ptr = &current_thread_info()->flags;
-
- wbinvd();
-
+ cpu_dead_ptr = &per_cpu(cpu_dead, smp_processor_id());
+ *cpu_dead_ptr = CPU_DEAD_MWAIT;
while (1) {
+ *cpu_dead_ptr &= ~CPU_DEAD_TRIGGER;
/*
* The CLFLUSH is a workaround for erratum AAI65 for
* the Xeon 7400 series. It's not clear it is actually
@@ -1345,20 +1464,34 @@ static inline void mwait_play_dead(void)
* The WBINVD is insufficient due to the spurious-wakeup
* case where we return around the loop.
*/
- clflush(mwait_ptr);
- __monitor(mwait_ptr, 0, 0);
+ wbinvd();
+ clflush(cpu_dead_ptr);
+ __monitor(cpu_dead_ptr, 0, 0);
mb();
- __mwait(eax, 0);
+ if ((*cpu_dead_ptr & CPU_DEAD_TRIGGER) == 0)
+ __mwait(eax, 0);
+
+ /* Waken up by another CPU. */
+ if (wakeup_cpu(cpu_dead_ptr))
+ start_cpu();
}
}

static inline void hlt_play_dead(void)
{
+ int *cpu_dead_ptr;
+
if (__this_cpu_read(cpu_info.x86) >= 4)
wbinvd();

+ cpu_dead_ptr = &per_cpu(cpu_dead, smp_processor_id());
+ *cpu_dead_ptr = CPU_DEAD_HLT;
while (1) {
+ *cpu_dead_ptr &= ~CPU_DEAD_TRIGGER;
native_halt();
+ /* If NMI wants to wake up me, I'll start. */
+ if (wakeup_cpu(cpu_dead_ptr))
+ start_cpu();
}
}

--
1.6.0.3

2012-06-04 18:31:23

by Fenghua Yu

[permalink] [raw]

Subject: [PATCH 5/6] x86/x2apic_cluster.c: Wakeup function in x2apic_cluster calls mwait or nmi method

From: Fenghua Yu <[email protected]>

The wakeup_secondary_cpu function in apic_x2apic_cluster calls
wakeup_secondary_cpu_via_soft().

Signed-off-by: Fenghua Yu <[email protected]>
---
arch/x86/kernel/apic/x2apic_cluster.c | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/apic/x2apic_cluster.c b/arch/x86/kernel/apic/x2apic_cluster.c
index ff35cff..c3b8fcf 100644
--- a/arch/x86/kernel/apic/x2apic_cluster.c
+++ b/arch/x86/kernel/apic/x2apic_cluster.c
@@ -252,6 +252,7 @@ static struct apic apic_x2apic_cluster = {
.send_IPI_all = x2apic_send_IPI_all,
.send_IPI_self = x2apic_send_IPI_self,

+ .wakeup_secondary_cpu = wakeup_secondary_cpu_via_soft,
.trampoline_phys_low = DEFAULT_TRAMPOLINE_PHYS_LOW,
.trampoline_phys_high = DEFAULT_TRAMPOLINE_PHYS_HIGH,
.wait_for_init_deassert = NULL,
--
1.6.0.3

2012-06-04 18:31:51

by Fenghua Yu

[permalink] [raw]

Subject: [PATCH 2/6] x86/head_32.S/head_64.S: Kernel entry code after waking up offline CPU via mwait or nmi

From: Fenghua Yu <[email protected]>

start_cpu() is CPU entry point after it's waken up via mwait or nmi. It's called
from play_dead(). Everything has been set up already except stack. We just set
up stack here. Then call start_secondary().

Signed-off-by: Fenghua Yu <[email protected]>
---
arch/x86/include/asm/cpu.h | 1 +
arch/x86/kernel/head_32.S | 12 ++++++++++++
arch/x86/kernel/head_64.S | 14 ++++++++++++++
3 files changed, 27 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/cpu.h b/arch/x86/include/asm/cpu.h
index 4564c8e..c2ad71c 100644
--- a/arch/x86/include/asm/cpu.h
+++ b/arch/x86/include/asm/cpu.h
@@ -28,6 +28,7 @@ struct x86_cpu {
#ifdef CONFIG_HOTPLUG_CPU
extern int arch_register_cpu(int num);
extern void arch_unregister_cpu(int);
+extern void __cpuinit start_cpu(void);
#endif

DECLARE_PER_CPU(int, cpu_state);
diff --git a/arch/x86/kernel/head_32.S b/arch/x86/kernel/head_32.S
index d42ab17..55981a7 100644
--- a/arch/x86/kernel/head_32.S
+++ b/arch/x86/kernel/head_32.S
@@ -266,6 +266,18 @@ num_subarch_entries = (. - subarch_entries) / 4
jmp default_entry
#endif /* CONFIG_PARAVIRT */

+#ifdef CONFIG_HOTPLUG_CPU
+/*
+ * Boot CPU entry point. It's called from play_dead(). Everything has been set
+ * up already except stack. We just set up stack here. Then call
+ * start_secondary().
+ */
+ENTRY(start_cpu)
+ movl stack_start, %ecx
+ movl %ecx, %esp
+ jmp *(initial_code)
+#endif
+
/*
* Non-boot CPU entry point; entered from trampoline.S
* We can't lgdt here, because lgdt itself uses a data segment, but
diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index 94bf9cc..84a8e1d 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -251,6 +251,20 @@ ENTRY(secondary_startup_64)
pushq $__KERNEL_CS # set correct cs
pushq %rax # target address in negative space
lretq
+#ifdef CONFIG_HOTPLUG_CPU
+/*
+ * Boot CPU entry point. It's called from play_dead(). Everything has been set
+ * up already except stack. We just set up stack here. Then call
+ * start_secondary().
+ */
+ENTRY(start_cpu)
+ movq stack_start(%rip),%rsp
+ movq initial_code(%rip),%rax
+ pushq $0 # fake return address to stop unwinder
+ pushq $__KERNEL_CS # set correct cs
+ pushq %rax # target address in negative space
+ lretq
+#endif

/* SMP bootup changes these two */
__REFDATA
--
1.6.0.3

2012-06-04 18:59:33

by Suresh Siddha

[permalink] [raw]

Subject: Re: [PATCH 3/6] x86/smpboot.c: Wake up offline CPU via mwait or nmi

On Mon, 2012-06-04 at 11:17 -0700, Fenghua Yu wrote:
> From: Fenghua Yu <[email protected]>
>
> wakeup_secondary_cpu_via_soft() is defined to wake up offline CPU via mwait if
> the CPU is in mwait or via nmi if the CPU is in hlt.
>
> A CPU boots up by INIT, INIT, STARTUP sequence when it boots up for the first
> time during boot time or hot plug.

I think this breaks suspend/resume as the cpu state gets lost.

Have you tried suspend/resume?

>
> Signed-off-by: Fenghua Yu <[email protected]>
> ---
> arch/x86/include/asm/apic.h | 5 +-
> arch/x86/kernel/smpboot.c | 187 ++++++++++++++++++++++++++++++++++++------
> 2 files changed, 164 insertions(+), 28 deletions(-)
>
> diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h
> index eaff479..cad00b1 100644
> --- a/arch/x86/include/asm/apic.h
> +++ b/arch/x86/include/asm/apic.h
> @@ -425,7 +425,10 @@ extern struct apic *__apicdrivers[], *__apicdrivers_end[];
> #ifdef CONFIG_SMP
> extern atomic_t init_deasserted;
> extern int wakeup_secondary_cpu_via_nmi(int apicid, unsigned long start_eip);
> -#endif
> +extern int wakeup_secondary_cpu_via_soft(int apicid, unsigned long start_eip);
> +#else /* CONFIG_SMP */
> +#define wakeup_secondary_cpu_via_soft NULL
> +#endif /* CONFIG_SMP */
>
> #ifdef CONFIG_X86_LOCAL_APIC
>
> diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
> index fd019d7..109df30 100644
> --- a/arch/x86/kernel/smpboot.c
> +++ b/arch/x86/kernel/smpboot.c
> @@ -472,13 +472,8 @@ void __inquire_remote_apic(int apicid)
> }
> }
>
> -/*
> - * Poke the other CPU in the eye via NMI to wake it up. Remember that the normal
> - * INIT, INIT, STARTUP sequence will reset the chip hard for us, and this
> - * won't ... remember to clear down the APIC, etc later.
> - */
> -int __cpuinit
> -wakeup_secondary_cpu_via_nmi(int logical_apicid, unsigned long start_eip)
> +static int __cpuinit
> +_wakeup_secondary_cpu_via_nmi(int apicid, int dest_mode)
> {
> unsigned long send_status, accept_status = 0;
> int maxlvt;
> @@ -486,7 +481,7 @@ wakeup_secondary_cpu_via_nmi(int logical_apicid, unsigned long start_eip)
> /* Target chip */
> /* Boot on the stack */
> /* Kick the second */
> - apic_icr_write(APIC_DM_NMI | apic->dest_logical, logical_apicid);
> + apic_icr_write(APIC_DM_NMI | dest_mode, apicid);
>
> pr_debug("Waiting for send to finish...\n");
> send_status = safe_apic_wait_icr_idle();
> @@ -511,6 +506,47 @@ wakeup_secondary_cpu_via_nmi(int logical_apicid, unsigned long start_eip)
> return (send_status | accept_status);
> }
>
> +/*
> + * Poke the other CPU in the eye via NMI to wake it up. Remember that the normal
> + * INIT, INIT, STARTUP sequence will reset the chip hard for us, and this
> + * won't ... remember to clear down the APIC, etc later.
> + */
> +int __cpuinit
> +wakeup_secondary_cpu_via_nmi_phys(int phys_apicid, unsigned long start_eip)
> +{
> + return _wakeup_secondary_cpu_via_nmi(phys_apicid, APIC_DEST_PHYSICAL);
> +}
> +
> +int __cpuinit
> +wakeup_secondary_cpu_via_nmi(int logical_apicid, unsigned long start_eip)
> +{
> + return _wakeup_secondary_cpu_via_nmi(logical_apicid, APIC_DEST_LOGICAL);
> +}
> +
> +DEFINE_PER_CPU(int, cpu_dead) = { 0 };
> +#define CPU_DEAD_TRIGGER 1
> +#define CPU_DEAD_MWAIT 2
> +#define CPU_DEAD_HLT 4
> +
> +static int wakeup_secondary_cpu_via_mwait(int cpu)
> +{
> + per_cpu(cpu_dead, cpu) |= CPU_DEAD_TRIGGER;
> + return 0;
> +}
> +
> +static int wakeup_cpu_nmi(unsigned int cmd, struct pt_regs *regs)
> +{
> + int cpu = smp_processor_id();
> + int *cpu_dead_ptr;
> +
> + cpu_dead_ptr = &per_cpu(cpu_dead, cpu);
> + if (!cpu_online(cpu) && (*cpu_dead_ptr & CPU_DEAD_HLT) &&
> + (*cpu_dead_ptr & CPU_DEAD_TRIGGER))
> + return NMI_HANDLED;
> +
> + return NMI_DONE;
> +}
> +
> static int __cpuinit
> wakeup_secondary_cpu_via_init(int phys_apicid, unsigned long start_eip)
> {
> @@ -626,6 +662,52 @@ wakeup_secondary_cpu_via_init(int phys_apicid, unsigned long start_eip)
> return (send_status | accept_status);
> }
>
> +/*
> + * Kick a cpu.
> + *
> + * If the CPU is in mwait, wake it up by mwait method. Otherwise, if the CPU is
> + * in halt, wake it up by NMI. If none of above exists, wake it up by INIT boot
> + * APIC message.
> + *
> + * When the CPU first time boots up, i.e. cpu_dead is 0, it's waken up by INIT
> + * boot APIC message.
> + *
> + * At this point, the CPU should be in a fixed dead state. So we don't consider
> + * racy condition here.
> + */
> +int __cpuinit
> +wakeup_secondary_cpu_via_soft(int apicid, unsigned long start_eip)
> +{
> + int cpu;
> + int boot_error = 0;
> + /* start_ip had better be page-aligned! */
> + unsigned long start_ip = real_mode_header->trampoline_start;
> +
> + for (cpu = 0; cpu < nr_cpu_ids; cpu++)
> + if (apicid == apic->cpu_present_to_apicid(cpu))
> + break;
> +
> + if (cpu >= nr_cpu_ids)
> + return -EINVAL;
> +
> + if (per_cpu(cpu_dead, cpu) & CPU_DEAD_MWAIT) {
> + boot_error = wakeup_secondary_cpu_via_mwait(cpu);
> + } else if (per_cpu(cpu_dead, cpu) & CPU_DEAD_HLT) {
> + int *cpu_dead_ptr;
> +
> + cpu_dead_ptr = &per_cpu(cpu_dead, cpu);
> + *cpu_dead_ptr |= CPU_DEAD_TRIGGER;
> +
> + boot_error = wakeup_secondary_cpu_via_nmi_phys(apicid,
> + start_ip);
> + if (boot_error)
> + *cpu_dead_ptr &= ~CPU_DEAD_TRIGGER;
> + } else
> + boot_error = wakeup_secondary_cpu_via_init(apicid, start_ip);
> +
> + return boot_error;
> +}
> +
> /* reduce the number of lines printed when booting a large cpu count system */
> static void __cpuinit announce_cpu(int cpu, int apicid)
> {
> @@ -778,6 +860,7 @@ static int __cpuinit do_boot_cpu(int apicid, int cpu, struct task_struct *idle)
> */
> smpboot_restore_warm_reset_vector();
> }
> +
> return boot_error;
> }
>
> @@ -977,6 +1060,20 @@ static void __init smp_cpu_index_default(void)
> }
> }
>
> +static bool mwait_supported(void)
> +{
> + struct cpuinfo_x86 *c = __this_cpu_ptr(&cpu_info);
> +
> + if (!(this_cpu_has(X86_FEATURE_MWAIT) && mwait_usable(c)))
> + return false;
> + if (!this_cpu_has(X86_FEATURE_CLFLSH))
> + return false;
> + if (__this_cpu_read(cpu_info.cpuid_level) < CPUID_MWAIT_LEAF)
> + return false;
> +
> + return true;
> +}
> +
> /*
> * Prepare for SMP bootup. The MP table or ACPI has been read
> * earlier. Just do some sanity checking here and enable APIC mode.
> @@ -1051,6 +1148,11 @@ void __init native_smp_prepare_cpus(unsigned int max_cpus)
> uv_system_init();
>
> set_mtrr_aps_delayed_init();
> +
> +#ifdef CONFIG_HOTPLUG_CPU
> + if (!mwait_supported())
> + register_nmi_handler(NMI_LOCAL, wakeup_cpu_nmi, 0, "wake_cpu");
> +#endif
> out:
> preempt_enable();
> }
> @@ -1111,6 +1213,12 @@ static int __init _setup_possible_cpus(char *str)
> }
> early_param("possible_cpus", _setup_possible_cpus);
>
> +static int __init setup_wakeup_cpu_via_init(char *str)
> +{
> + apic->wakeup_secondary_cpu = NULL;
> + return 0;
> +}
> +__setup("wakeup_cpu_via_init", setup_wakeup_cpu_via_init);
>
> /*
> * cpu_possible_mask should be static, it cannot change as cpu's
> @@ -1286,6 +1394,28 @@ void play_dead_common(void)
> local_irq_disable();
> }
>
> +static bool wakeup_cpu(int *trigger)
> +{
> + unsigned int timeout;
> +
> + /*
> + * Wait up to 1 seconds to check if CPU wakeup trigger is set in
> + * cpu_dead by either memory write or NMI.
> + * If there is no CPU wakeup trigger, go back to sleep.
> + */
> + for (timeout = 0; timeout < 1000000; timeout++) {
> + /*
> + * Check if CPU0 wakeup NMI is issued and handled.
> + */
> + if (*trigger & CPU_DEAD_TRIGGER)
> + return true;
> +
> + udelay(1);
> + }
> +
> + return false;
> +}
> +
> /*
> * We need to flush the caches before going to sleep, lest we have
> * dirty data in our caches when we come back up.
> @@ -1296,14 +1426,9 @@ static inline void mwait_play_dead(void)
> unsigned int highest_cstate = 0;
> unsigned int highest_subcstate = 0;
> int i;
> - void *mwait_ptr;
> - struct cpuinfo_x86 *c = __this_cpu_ptr(&cpu_info);
> + int *cpu_dead_ptr;
>
> - if (!(this_cpu_has(X86_FEATURE_MWAIT) && mwait_usable(c)))
> - return;
> - if (!this_cpu_has(X86_FEATURE_CLFLSH))
> - return;
> - if (__this_cpu_read(cpu_info.cpuid_level) < CPUID_MWAIT_LEAF)
> + if (!mwait_supported())
> return;
>
> eax = CPUID_MWAIT_LEAF;
> @@ -1328,16 +1453,10 @@ static inline void mwait_play_dead(void)
> (highest_subcstate - 1);
> }
>
> - /*
> - * This should be a memory location in a cache line which is
> - * unlikely to be touched by other processors. The actual
> - * content is immaterial as it is not actually modified in any way.
> - */
> - mwait_ptr = &current_thread_info()->flags;
> -
> - wbinvd();
> -
> + cpu_dead_ptr = &per_cpu(cpu_dead, smp_processor_id());
> + *cpu_dead_ptr = CPU_DEAD_MWAIT;
> while (1) {
> + *cpu_dead_ptr &= ~CPU_DEAD_TRIGGER;
> /*
> * The CLFLUSH is a workaround for erratum AAI65 for
> * the Xeon 7400 series. It's not clear it is actually
> @@ -1345,20 +1464,34 @@ static inline void mwait_play_dead(void)
> * The WBINVD is insufficient due to the spurious-wakeup
> * case where we return around the loop.
> */
> - clflush(mwait_ptr);
> - __monitor(mwait_ptr, 0, 0);
> + wbinvd();
> + clflush(cpu_dead_ptr);
> + __monitor(cpu_dead_ptr, 0, 0);
> mb();
> - __mwait(eax, 0);
> + if ((*cpu_dead_ptr & CPU_DEAD_TRIGGER) == 0)
> + __mwait(eax, 0);
> +
> + /* Waken up by another CPU. */
> + if (wakeup_cpu(cpu_dead_ptr))
> + start_cpu();
> }
> }
>
> static inline void hlt_play_dead(void)
> {
> + int *cpu_dead_ptr;
> +
> if (__this_cpu_read(cpu_info.x86) >= 4)
> wbinvd();
>
> + cpu_dead_ptr = &per_cpu(cpu_dead, smp_processor_id());
> + *cpu_dead_ptr = CPU_DEAD_HLT;
> while (1) {
> + *cpu_dead_ptr &= ~CPU_DEAD_TRIGGER;
> native_halt();
> + /* If NMI wants to wake up me, I'll start. */
> + if (wakeup_cpu(cpu_dead_ptr))
> + start_cpu();
> }
> }
>

2012-06-04 20:11:49

by Thomas Gleixner

[permalink] [raw]

Subject: Re: [PATCH 0/6] x86/cpu hotplug: Wake up offline CPU via mwait or nmi

On Mon, 4 Jun 2012, Fenghua Yu wrote:

> From: Fenghua Yu <[email protected]>
>
> Since offline CPU is in wmait or hlt if mwait feature is not available, it can
> be waken up by writing to monitored memory range or via nmi.
>
> Compared to current INIT, INIT, STARTUP wake up sequence, waking up offline CPU
> is faster via wmait or nmi. This is especially useful when offline CPU for
> power saving and shorter waking up time is desired. On one tested desktop
> machine, waking up time via mwait or nmi is reduced to 23% of waking up time
> via INIT. Waking up time is measured from the beginning of store_online() to
> the beginning of cpu_idle() after the CPU is waken up.
>
> Waking up offline CPU via mwait or nmi is also useful to support BSP offline/
> online because offline BSP can not be waken up by the INIT's sequence. The BSP
> offline/online patchset will be sent out seperately.

I understand what you are trying to do, though I completely disagree
with the solution.

The main problem of the current hotplug code is that it is an all or
nothing approach. You have to tear down the whole thing completely
instead of just taking it out of the usable set of cpus.

I'm working on a proper state machine driven online/offline sequence,
where you can put the cpu into an intermediate state which avoids
bringing it down completely. This is enough to get the full
powersaving benefits w/o having to go through all the synchronization
states of a full online/offline. That will shorten the onlining time
of an previously offlined cpu to almost nothing.

I really want to avoid adding more bandaids to the hotplug code before
we have sorted out the existing horror.

Thanks,

tglx

2012-06-04 20:33:42

Subject: [PATCH 0/6] x86/cpu hotplug: Wake up offline CPU via mwait or nmi

Subject: [PATCH 1/6] x86/Documentation/kernel-parameters.txt: Add wakeup_cpu_via_init kernel parameter help

Subject: [PATCH 4/6] x86/apic_flat_64.c: Wakeup function in apic calls mwait or nmi method

Subject: [PATCH 6/6] x86/x2apic_phys.c: Wakeup function in x2apic_phys calls mwait or nmi method

Subject: [PATCH 6/6] x86/x2apic_phys.c: Wakeup function in x2apic_phys calls mwait or nmi method

Subject: [PATCH 3/6] x86/smpboot.c: Wake up offline CPU via mwait or nmi

Subject: [PATCH 5/6] x86/x2apic_cluster.c: Wakeup function in x2apic_cluster calls mwait or nmi method

Subject: [PATCH 2/6] x86/head_32.S/head_64.S: Kernel entry code after waking up offline CPU via mwait or nmi

Subject: Re: [PATCH 3/6] x86/smpboot.c: Wake up offline CPU via mwait or nmi

Subject: Re: [PATCH 0/6] x86/cpu hotplug: Wake up offline CPU via mwait or nmi

Subject: Re: [PATCH 0/6] x86/cpu hotplug: Wake up offline CPU via mwait or nmi

Subject: RE: [PATCH 3/6] x86/smpboot.c: Wake up offline CPU via mwait or nmi

Subject: RE: [PATCH 0/6] x86/cpu hotplug: Wake up offline CPU via mwait or nmi

Subject: RE: [PATCH 0/6] x86/cpu hotplug: Wake up offline CPU via mwait or nmi

Subject: Re: [PATCH 0/6] x86/cpu hotplug: Wake up offline CPU via mwait or nmi

Subject: Re: [PATCH 0/6] x86/cpu hotplug: Wake up offline CPU via mwait or nmi

Subject: Re: [PATCH 0/6] x86/cpu hotplug: Wake up offline CPU via mwait or nmi

Subject: Re: [PATCH 0/6] x86/cpu hotplug: Wake up offline CPU via mwait or nmi

Subject: Re: [PATCH 0/6] x86/cpu hotplug: Wake up offline CPU via mwait or nmi

Subject: [PATCH] kthread: Implement park/unpark facility

Subject: Re: [PATCH] kthread: Implement park/unpark facility

Subject: Re: [PATCH] kthread: Implement park/unpark facility

Subject: Re: [PATCH 0/6] x86/cpu hotplug: Wake up offline CPU via mwait or nmi

Subject: Re: [PATCH 0/6] x86/cpu hotplug: Wake up offline CPU via mwait or nmi

Subject: Re: [PATCH 0/6] x86/cpu hotplug: Wake up offline CPU via mwait or nmi

Subject: RE: [PATCH 0/6] x86/cpu hotplug: Wake up offline CPU via mwait or nmi

Subject: RE: [PATCH 0/6] x86/cpu hotplug: Wake up offline CPU via mwait or nmi

Subject: RE: [PATCH 0/6] x86/cpu hotplug: Wake up offline CPU via mwait or nmi

Subject: RE: [PATCH 0/6] x86/cpu hotplug: Wake up offline CPU via mwait or nmi

Subject: RE: [PATCH 0/6] x86/cpu hotplug: Wake up offline CPU via mwait or nmi

Subject: RE: [PATCH 0/6] x86/cpu hotplug: Wake up offline CPU via mwait or nmi

Subject: RE: [PATCH 0/6] x86/cpu hotplug: Wake up offline CPU via mwait or nmi

Subject: RE: [PATCH 0/6] x86/cpu hotplug: Wake up offline CPU via mwait or nmi

Subject: RE: [PATCH 0/6] x86/cpu hotplug: Wake up offline CPU via mwait or nmi

Subject: RE: [PATCH 0/6] x86/cpu hotplug: Wake up offline CPU via mwait or nmi

Subject: Re: [PATCH 0/6] x86/cpu hotplug: Wake up offline CPU via mwait or nmi

Subject: RE: [PATCH 0/6] x86/cpu hotplug: Wake up offline CPU via mwait or nmi

Subject: Re: [PATCH 0/6] x86/cpu hotplug: Wake up offline CPU via mwait or nmi

Subject: RE: [PATCH 0/6] x86/cpu hotplug: Wake up offline CPU via mwait or nmi

Subject: RE: [PATCH 0/6] x86/cpu hotplug: Wake up offline CPU via mwait or nmi

Subject: RE: [PATCH 0/6] x86/cpu hotplug: Wake up offline CPU via mwait or nmi

Subject: Re: [PATCH 0/6] x86/cpu hotplug: Wake up offline CPU via mwait or nmi

Subject: Re: [PATCH 0/6] x86/cpu hotplug: Wake up offline CPU via mwait or nmi

Subject: Re: [PATCH 0/6] x86/cpu hotplug: Wake up offline CPU via mwait or nmi

Subject: RE: [PATCH 0/6] x86/cpu hotplug: Wake up offline CPU via mwait or nmi

Subject: Re: [PATCH 0/6] x86/cpu hotplug: Wake up offline CPU via mwait or nmi

Subject: Re: [PATCH 0/6] x86/cpu hotplug: Wake up offline CPU via mwait or nmi

Subject: Re: [PATCH 0/6] x86/cpu hotplug: Wake up offline CPU via mwait or nmi

Subject: RE: [PATCH 0/6] x86/cpu hotplug: Wake up offline CPU via mwait or nmi

Subject: Re: [PATCH 0/6] x86/cpu hotplug: Wake up offline CPU via mwait or nmi

Subject: Re: [PATCH 0/6] x86/cpu hotplug: Wake up offline CPU via mwait or nmi

Subject: Re: [PATCH 0/6] x86/cpu hotplug: Wake up offline CPU via mwait or nmi

Subject: RE: [PATCH 0/6] x86/cpu hotplug: Wake up offline CPU via mwait or nmi

Subject: RE: [PATCH 0/6] x86/cpu hotplug: Wake up offline CPU via mwait or nmi

Subject: RE: [PATCH 0/6] x86/cpu hotplug: Wake up offline CPU via mwait or nmi

Subject: Re: [PATCH 0/6] x86/cpu hotplug: Wake up offline CPU via mwait or nmi

Subject: Re: [PATCH 0/6] x86/cpu hotplug: Wake up offline CPU via mwait or nmi

Subject: Re: [PATCH 0/6] x86/cpu hotplug: Wake up offline CPU via mwait or nmi

Subject: Re: [PATCH 0/6] x86/cpu hotplug: Wake up offline CPU via mwait or nmi

Subject: Re: [PATCH 0/6] x86/cpu hotplug: Wake up offline CPU via mwait or nmi

Subject: Re: [PATCH 0/6] x86/cpu hotplug: Wake up offline CPU via mwait or nmi

Subject: Re: [PATCH 0/6] x86/cpu hotplug: Wake up offline CPU via mwait or nmi

Subject: Re: [PATCH 0/6] x86/cpu hotplug: Wake up offline CPU via mwait or nmi

Subject: Re: [PATCH 0/6] x86/cpu hotplug: Wake up offline CPU via mwait or nmi

Subject: Re: [PATCH 0/6] x86/cpu hotplug: Wake up offline CPU via mwait or nmi

Subject: Re: [PATCH 0/6] x86/cpu hotplug: Wake up offline CPU via mwait or nmi

Subject: Re: [PATCH 0/6] x86/cpu hotplug: Wake up offline CPU via mwait or nmi

Subject: Re: [PATCH 0/6] x86/cpu hotplug: Wake up offline CPU via mwait or nmi

Subject: Re: [PATCH] kthread: Implement park/unpark facility

Subject: Re: [PATCH 0/6] x86/cpu hotplug: Wake up offline CPU via mwait or nmi

Subject: Re: [PATCH] kthread: Implement park/unpark facility

Subject: Re: [PATCH] kthread: Implement park/unpark facility

Subject: Re: [PATCH] kthread: Implement park/unpark facility