2012-10-12 16:52:38

by Fenghua Yu

[permalink] [raw]
Subject: [PATCH v9 0/12] x86: Arbitrary CPU hot(un)plug support

From: Fenghua Yu <[email protected]>

CPU0 or BSP (Bootstrap Processor) has been the last processor that can not be
hot removed on x86. This patchset implements CPU0 or BSP online and offline
and removes this obstacle to CPU hotplug.

RAS needs the feature. If socket0 needs to be hotplugged for any reason (any
thread on socket0 is bad, shared cache issue, uncore issue, etc), CPU0 is
required to be offline or hot replaced to keep the system run. For example,
starting with Core Duo, if you have a system that is reporting cache problems
via the "yellow" status in the MCi_STATUS msr, then there is benefit in simply
soft off-lining the cores that share that cache - assuming that leaves you at
least one online core. A single socket system with L3 cache troubles is not
helped - but problems in L1/L2 cache, or on multi-socket systems can be avoided.
They are already being avoided for the cases where CPU0 is not involved.
This patchset can help L1/L2 cache problem in CPU0 or L3 cache problem on the
socket with CPU0 in a multi-socket system.

v9: Add Intel vendor check to support the feature on Intel platforms only.

v8: Add smp_store_boot_cpu_info() and change smp_store_cpu_info() back to avoid
a compilation error. Fix a few messy subject issues.

v7: Change smp_store_cpu_info() to allow store cpuinfo for CPU0 when online.
Change PIC irq detection in native_cpu_disable().

v6: If CPU0 is offlined during boot time in CPU0 hotplug debug mode, put CPU0
online again before resuming from hibernation and disable x2apic and xapic.Don't
set __CPUINIT for start_cpu0() in head_32.S. Clean up CPU0 wake up nmi handler
after callin and callout sync. In a period (3 seconds), check if CPU0 wake up
NMI is handled after offlined CPU0 exits from mwait.

v5: Add CONFIG_BOOTPARAM_HOTPLUG_CPU0 and CONFIG_DEBUG_HOTPLUG_CPU0. Simplify
duplicate xstate_size init check. Wakeup CPU0 via nmi instead INITs. Add
mcheck_cpu_init when CPU0 online. Change variable bsp_hotpluggable to
cpu0_hotpluggable with __initdata qualifier.

v4: Add __read_mostly for internal bsp_hotpluggable variable. Add my email
address in cpu-hotplug.txt document. A wording change in comment.

v3: Register a pm notifier to check if CPU0 is online before hibernate/suspend.
Small wording changes in document and print info.

v2: Add locking changes between cpu hotplug and hibernate/suspend. Change PIC
irq bound to CPU0 detection.

Fenghua Yu (12):
doc: Add x86 CPU0 online/offline feature
x86, Kconfig: Add config switch for CPU0 hotplug
x86, topology: Don't offline CPU0 if any PIC irq can not be migrated
out of it
x86, hotplug: Support functions for CPU0 online/offline
x86, hotplug, suspend: Online CPU0 for suspend or hibernate
x86-64, hotplug: Add start_cpu0() entry point to head_64.S
x86-32, hotplug: Add start_cpu0() entry point to head_32.S
x86, hotplug: Wake up CPU0 via NMI instead of INIT, SIPI, SIPI
x86, hotplug: During CPU0 online, enable x2apic, set_numa_node.
x86, hotplug: The first online processor saves the MTRR state
x86/i387.c: Initialize thread xstate only on CPU0 only once
x86, topology: Debug CPU00 hotplug

Documentation/cpu-hotplug.txt | 24 ++++++
Documentation/kernel-parameters.txt | 14 +++
arch/x86/Kconfig | 44 ++++++++++
arch/x86/include/asm/cpu.h | 4 +
arch/x86/include/asm/smp.h | 1 +
arch/x86/kernel/cpu/common.c | 5 +-
arch/x86/kernel/cpu/mtrr/main.c | 9 ++-
arch/x86/kernel/head_32.S | 12 +++
arch/x86/kernel/head_64.S | 15 ++++
arch/x86/kernel/i387.c | 6 +-
arch/x86/kernel/smpboot.c | 150 +++++++++++++++++++++++++++++------
arch/x86/kernel/topology.c | 101 ++++++++++++++++++++++--
arch/x86/power/cpu.c | 82 +++++++++++++++++++
13 files changed, 428 insertions(+), 39 deletions(-)

--
1.7.2


2012-10-12 16:49:57

by Fenghua Yu

[permalink] [raw]
Subject: [PATCH v9 06/12] x86-64, hotplug: Add start_cpu0() entry point to head_64.S

From: Fenghua Yu <[email protected]>

start_cpu0() is defined in head_64.S for 64-bit. The function sets up stack and
jumps to start_secondary() for CPU0 wake up.

Signed-off-by: Fenghua Yu <[email protected]>
---
arch/x86/kernel/head_64.S | 15 +++++++++++++++
1 files changed, 15 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index 94bf9cc..3faac8a 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -252,6 +252,21 @@ ENTRY(secondary_startup_64)
pushq %rax # target address in negative space
lretq

+#ifdef CONFIG_HOTPLUG_CPU
+/*
+ * Boot CPU0 entry point. It's called from play_dead(). Everything has been set
+ * up already except stack. We just set up stack here. Then call
+ * start_secondary().
+ */
+ENTRY(start_cpu0)
+ movq stack_start(%rip),%rsp
+ movq initial_code(%rip),%rax
+ pushq $0 # fake return address to stop unwinder
+ pushq $__KERNEL_CS # set correct cs
+ pushq %rax # target address in negative space
+ lretq
+#endif
+
/* SMP bootup changes these two */
__REFDATA
.align 8
--
1.7.2

2012-10-12 16:49:55

by Fenghua Yu

[permalink] [raw]
Subject: [PATCH v9 01/12] doc: Add x86 CPU0 online/offline feature

From: Fenghua Yu <[email protected]>

If CONFIG_BOOTPARAM_HOTPLUG_CPU0 is turned on, CPU0 is hotpluggable. Otherwise,
by default CPU0 is not hotpluggable and kernel parameter cpu0_hotplug enables
CPU0 online/offline feature.

The documentations point out two known CPU0 dependencies. First, resume from
hibernate or suspend always starts from CPU0. So hibernate and suspend are
prevented if CPU0 is offline. Another dependency is PIC interrupts always go
to CPU0.

It's said that some machines may depend on CPU0 to poweroff/reboot. But I
haven't seen such dependency on a few tested machines.

Please let me know if you see any CPU0 dependencies on your machine.

Signed-off-by: Fenghua Yu <[email protected]>
---
Documentation/cpu-hotplug.txt | 24 ++++++++++++++++++++++++
Documentation/kernel-parameters.txt | 14 ++++++++++++++
2 files changed, 38 insertions(+), 0 deletions(-)

diff --git a/Documentation/cpu-hotplug.txt b/Documentation/cpu-hotplug.txt
index 66ef8f3..9f40135 100644
--- a/Documentation/cpu-hotplug.txt
+++ b/Documentation/cpu-hotplug.txt
@@ -207,6 +207,30 @@ by making it not-removable.

In such cases you will also notice that the online file is missing under cpu0.

+Q: Is CPU0 removable on X86?
+A: Yes. If kernel is compiled with CONFIG_BOOTPARAM_HOTPLUG_CPU0=y, CPU0 is
+removable by default. Otherwise, CPU0 is also removable by kernel option
+cpu0_hotplug.
+
+But some features depend on CPU0. Two known dependencies are:
+
+1. Resume from hibernate/suspend depends on CPU0. Hibernate/suspend will fail if
+CPU0 is offline and you need to online CPU0 before hibernate/suspend can
+continue.
+2. PIC interrupts also depend on CPU0. CPU0 can't be removed if a PIC interrupt
+is detected.
+
+It's said poweroff/reboot may depend on CPU0 on some machines although I haven't
+seen any poweroff/reboot failure so far after CPU0 is offline on a few tested
+machines.
+
+Please let me know if you know or see any other dependencies of CPU0.
+
+If the dependencies are under your control, you can turn on CPU0 hotplug feature
+either by CONFIG_BOOTPARAM_HOTPLUG_CPU0 or by kernel parameter cpu0_hotplug.
+
+--Fenghua Yu <[email protected]>
+
Q: How do i find out if a particular CPU is not removable?
A: Depending on the implementation, some architectures may show this by the
absence of the "online" file. This is done if it can be determined ahead of
diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index df43807..20d7956 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -1962,6 +1962,20 @@ bytes respectively. Such letter suffixes can also be entirely omitted.

nox2apic [X86-64,APIC] Do not enable x2APIC mode.

+ cpu0_hotplug [X86] Turn on CPU0 hotplug feature when
+ CONFIG_BOOTPARAM_HOTPLUG_CPU0 is off.
+ Some features depend on CPU0. Known dependencies are:
+ 1. Resume from suspend/hibernate depends on CPU0.
+ Suspend/hibernate will fail if CPU0 is offline and you
+ need to online CPU0 before suspend/hibernate.
+ 2. PIC interrupts also depend on CPU0. CPU0 can't be
+ removed if a PIC interrupt is detected.
+ It's said poweroff/reboot may depend on CPU0 on some
+ machines although I haven't seen such issues so far
+ after CPU0 is offline on a few tested machines.
+ If the dependencies are under your control, you can
+ turn on cpu0_hotplug.
+
nptcg= [IA-64] Override max number of concurrent global TLB
purges which is reported from either PAL_VM_SUMMARY or
SAL PALO.
--
1.7.2

2012-10-12 16:49:54

by Fenghua Yu

[permalink] [raw]
Subject: [PATCH v9 03/12] x86, topology: Don't offline CPU0 if any PIC irq can not be migrated out of it

From: Fenghua Yu <[email protected]>

If CONFIG_BOOTPARAM_HOTPLUG_CPU is turned on, CPU0 hotplug feature is enabled
by default.

If CONFIG_BOOTPARAM_HOTPLUG_CPU is not turned on, CPU0 hotplug feature is not
enabled by default. The kernel parameter cpu0_hotplug can enable CPU0 hotplug
feature at boot.

Currently the feature is supported on Intel platforms only.

Signed-off-by: Fenghua Yu <[email protected]>
---
arch/x86/kernel/topology.c | 50 +++++++++++++++++++++++++++++++++++++------
1 files changed, 43 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kernel/topology.c b/arch/x86/kernel/topology.c
index 76ee977..0e7b4a7 100644
--- a/arch/x86/kernel/topology.c
+++ b/arch/x86/kernel/topology.c
@@ -30,23 +30,59 @@
#include <linux/mmzone.h>
#include <linux/init.h>
#include <linux/smp.h>
+#include <linux/irq.h>
#include <asm/cpu.h>

static DEFINE_PER_CPU(struct x86_cpu, cpu_devices);

#ifdef CONFIG_HOTPLUG_CPU
+
+#ifdef CONFIG_BOOTPARAM_HOTPLUG_CPU0
+static int cpu0_hotpluggable = 1;
+#else
+static int cpu0_hotpluggable;
+static int __init enable_cpu0_hotplug(char *str)
+{
+ cpu0_hotpluggable = 1;
+ return 1;
+}
+
+__setup("cpu0_hotplug", enable_cpu0_hotplug);
+#endif
+
int __ref arch_register_cpu(int num)
{
+ struct cpuinfo_x86 *c = &cpu_data(num);
+
+ /*
+ * Currently CPU0 is only hotpluggable on Intel platforms. Other
+ * vendors can add hotplug support later.
+ */
+ if (c->x86_vendor != X86_VENDOR_INTEL)
+ cpu0_hotpluggable = 0;
+
/*
- * CPU0 cannot be offlined due to several
- * restrictions and assumptions in kernel. This basically
- * doesn't add a control file, one cannot attempt to offline
- * BSP.
+ * Two known BSP/CPU0 dependencies: Resume from suspend/hibernate
+ * depends on BSP. PIC interrupts depend on BSP.
*
- * Also certain PCI quirks require not to enable hotplug control
- * for all CPU's.
+ * If the BSP depencies are under control, one can tell kernel to
+ * enable BSP hotplug. This basically adds a control file and
+ * one can attempt to offline BSP.
*/
- if (num)
+ if (num == 0 && cpu0_hotpluggable) {
+ unsigned int irq;
+ /*
+ * We won't take down the boot processor on i386 if some
+ * interrupts only are able to be serviced by the BSP in PIC.
+ */
+ for_each_active_irq(irq) {
+ if (!IO_APIC_IRQ(irq) && irq_has_action(irq)) {
+ cpu0_hotpluggable = 0;
+ break;
+ }
+ }
+ }
+ if (num || cpu0_hotpluggable)
per_cpu(cpu_devices, num).cpu.hotpluggable = 1;

return register_cpu(&per_cpu(cpu_devices, num).cpu, num);
--
1.7.2

2012-10-12 16:50:40

by Fenghua Yu

[permalink] [raw]
Subject: [PATCH v9 11/12] x86/i387.c: Initialize thread xstate only on CPU0 only once

From: Fenghua Yu <[email protected]>

init_thread_xstate() is only called once to avoid overriding xstate_size during
boot time or during CPU hotplug.

Signed-off-by: Fenghua Yu <[email protected]>
---
arch/x86/kernel/i387.c | 6 +++++-
1 files changed, 5 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kernel/i387.c b/arch/x86/kernel/i387.c
index 675a050..245a71d 100644
--- a/arch/x86/kernel/i387.c
+++ b/arch/x86/kernel/i387.c
@@ -175,7 +175,11 @@ void __cpuinit fpu_init(void)
cr0 |= X86_CR0_EM;
write_cr0(cr0);

- if (!smp_processor_id())
+ /*
+ * init_thread_xstate is only called once to avoid overriding
+ * xstate_size during boot time or during CPU hotplug.
+ */
+ if (xstate_size == 0)
init_thread_xstate();

mxcsr_feature_mask_init();
--
1.7.2

2012-10-12 16:50:57

by Fenghua Yu

[permalink] [raw]
Subject: [PATCH v9 10/12] x86, hotplug: The first online processor saves the MTRR state

From: Fenghua Yu <[email protected]>

Ask the first online CPU to save mtrr instead of asking BSP. BSP could be
offline when mtrr_save_state() is called.

Signed-off-by: Fenghua Yu <[email protected]>
---
arch/x86/kernel/cpu/mtrr/main.c | 9 +++++++--
1 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/mtrr/main.c b/arch/x86/kernel/cpu/mtrr/main.c
index 6b96110..e4c1a41 100644
--- a/arch/x86/kernel/cpu/mtrr/main.c
+++ b/arch/x86/kernel/cpu/mtrr/main.c
@@ -695,11 +695,16 @@ void mtrr_ap_init(void)
}

/**
- * Save current fixed-range MTRR state of the BSP
+ * Save current fixed-range MTRR state of the first cpu in cpu_online_mask.
*/
void mtrr_save_state(void)
{
- smp_call_function_single(0, mtrr_save_fixed_ranges, NULL, 1);
+ int first_cpu;
+
+ get_online_cpus();
+ first_cpu = cpumask_first(cpu_online_mask);
+ smp_call_function_single(first_cpu, mtrr_save_fixed_ranges, NULL, 1);
+ put_online_cpus();
}

void set_mtrr_aps_delayed_init(void)
--
1.7.2

2012-10-12 16:50:55

by Fenghua Yu

[permalink] [raw]
Subject: [PATCH v9 08/12] x86, hotplug: Wake up CPU0 via NMI instead of INIT, SIPI, SIPI

From: Fenghua Yu <[email protected]>

Instead of waiting for STARTUP after INITs, BSP will execute the BIOS boot-strap
code which is not a desired behavior for waking up BSP. To avoid the boot-strap
code, wake up CPU0 by NMI instead.

This works to wake up soft offlined CPU0 only. If CPU0 is hard offlined (i.e.
physically hot removed and then hot added), NMI won't wake it up. We'll change
this code in the future to wake up hard offlined CPU0 if real platform and
request are available.

AP is still waken up as before by INIT, SIPI, SIPI sequence.

Signed-off-by: Fenghua Yu <[email protected]>
---
arch/x86/include/asm/cpu.h | 1 +
arch/x86/kernel/smpboot.c | 112 +++++++++++++++++++++++++++++++++++++++++--
2 files changed, 107 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/cpu.h b/arch/x86/include/asm/cpu.h
index 4564c8e..a119572 100644
--- a/arch/x86/include/asm/cpu.h
+++ b/arch/x86/include/asm/cpu.h
@@ -28,6 +28,7 @@ struct x86_cpu {
#ifdef CONFIG_HOTPLUG_CPU
extern int arch_register_cpu(int num);
extern void arch_unregister_cpu(int);
+extern void __cpuinit start_cpu0(void);
#endif

DECLARE_PER_CPU(int, cpu_state);
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index c297907..9878fb2 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -138,15 +138,17 @@ static void __cpuinit smp_callin(void)
* we may get here before an INIT-deassert IPI reaches
* our local APIC. We have to wait for the IPI or we'll
* lock up on an APIC access.
+ *
+ * Since CPU0 is not wakened up by INIT, it doesn't wait for the IPI.
*/
- if (apic->wait_for_init_deassert)
+ cpuid = smp_processor_id();
+ if (apic->wait_for_init_deassert && cpuid != 0)
apic->wait_for_init_deassert(&init_deasserted);

/*
* (This works even if the APIC is not enabled.)
*/
phys_id = read_apic_id();
- cpuid = smp_processor_id();
if (cpumask_test_cpu(cpuid, cpu_callin_mask)) {
panic("%s: phys CPU#%d, CPU#%d already present??\n", __func__,
phys_id, cpuid);
@@ -228,6 +230,8 @@ static void __cpuinit smp_callin(void)
cpumask_set_cpu(cpuid, cpu_callin_mask);
}

+static int cpu0_logical_apicid;
+static int enable_start_cpu0;
/*
* Activate a secondary processor.
*/
@@ -243,6 +247,8 @@ notrace static void __cpuinit start_secondary(void *unused)
preempt_disable();
smp_callin();

+ enable_start_cpu0 = 0;
+
#ifdef CONFIG_X86_32
/* switch away from the initial page table */
load_cr3(swapper_pg_dir);
@@ -492,7 +498,7 @@ void __inquire_remote_apic(int apicid)
* won't ... remember to clear down the APIC, etc later.
*/
int __cpuinit
-wakeup_secondary_cpu_via_nmi(int logical_apicid, unsigned long start_eip)
+wakeup_secondary_cpu_via_nmi(int apicid, unsigned long start_eip)
{
unsigned long send_status, accept_status = 0;
int maxlvt;
@@ -500,7 +506,7 @@ wakeup_secondary_cpu_via_nmi(int logical_apicid, unsigned long start_eip)
/* Target chip */
/* Boot on the stack */
/* Kick the second */
- apic_icr_write(APIC_DM_NMI | apic->dest_logical, logical_apicid);
+ apic_icr_write(APIC_DM_NMI | apic->dest_logical, apicid);

pr_debug("Waiting for send to finish...\n");
send_status = safe_apic_wait_icr_idle();
@@ -660,6 +666,17 @@ static void __cpuinit announce_cpu(int cpu, int apicid)
node, cpu, apicid);
}

+static int wakeup_cpu0_nmi(unsigned int cmd, struct pt_regs *regs)
+{
+ int cpu;
+
+ cpu = smp_processor_id();
+ if (cpu == 0 && !cpu_online(cpu) && enable_start_cpu0)
+ return NMI_HANDLED;
+
+ return NMI_DONE;
+}
+
/*
* NOTE - on most systems this is a PHYSICAL apic ID, but on multiquad
* (ie clustered apic addressing mode), this is a LOGICAL apic ID.
@@ -675,6 +692,7 @@ static int __cpuinit do_boot_cpu(int apicid, int cpu, struct task_struct *idle)

unsigned long boot_error = 0;
int timeout;
+ int cpu0_nmi_registered = 0;

/* Just in case we booted with a single CPU. */
alternatives_enable_smp();
@@ -727,8 +745,47 @@ static int __cpuinit do_boot_cpu(int apicid, int cpu, struct task_struct *idle)
*/
if (apic->wakeup_secondary_cpu)
boot_error = apic->wakeup_secondary_cpu(apicid, start_ip);
- else
- boot_error = wakeup_secondary_cpu_via_init(apicid, start_ip);
+ else {
+ if (cpu)
+ /*
+ * Wake up AP by INIT, INIT, STARTUP sequence.
+ */
+ boot_error = wakeup_secondary_cpu_via_init(apicid,
+ start_ip);
+ else {
+ /*
+ * Instead of waiting for STARTUP after INITs, BSP will
+ * execute the BIOS boot-strap code which is not a
+ * desired behavior for waking up BSP. To avoid the
+ * boot-strap code, wake up CPU0 by NMI instead.
+ *
+ * This works to wake up soft offlined CPU0 only. If
+ * CPU0 is hard offlined (i.e. physically hot removed
+ * and then hot added), NMI won't wake it up. We'll
+ * change this code in the future to wake up hard
+ * offlined CPU0 if real platform and request are
+ * available.
+ */
+ int id;
+
+ enable_start_cpu0 = 1;
+ /*
+ * Register a NMI handler to help wake up CPU0.
+ */
+ boot_error = register_nmi_handler(NMI_LOCAL,
+ wakeup_cpu0_nmi, 0, "wake_cpu0");
+
+ if (!boot_error) {
+ cpu0_nmi_registered = 1;
+ if (apic->dest_logical == APIC_DEST_LOGICAL)
+ id = cpu0_logical_apicid;
+ else
+ id = apicid;
+ boot_error = wakeup_secondary_cpu_via_nmi(id,
+ start_ip);
+ }
+ }
+ }

if (!boot_error) {
/*
@@ -793,6 +850,13 @@ static int __cpuinit do_boot_cpu(int apicid, int cpu, struct task_struct *idle)
*/
smpboot_restore_warm_reset_vector();
}
+ /*
+ * Clean up the nmi handler. Do this after the callin and callout sync
+ * to avoid impact of possible long unregister time.
+ */
+ if (cpu0_nmi_registered)
+ unregister_nmi_handler(NMI_LOCAL, "wake_cpu0");
+
return boot_error;
}

@@ -1037,6 +1101,8 @@ void __init native_smp_prepare_cpus(unsigned int max_cpus)
*/
setup_local_APIC();

+ cpu0_logical_apicid = GET_APIC_LOGICAL_ID(apic_read(APIC_LDR));
+
/*
* Enable IO APIC before setting up error vector
*/
@@ -1264,6 +1330,30 @@ void play_dead_common(void)
local_irq_disable();
}

+static bool wakeup_cpu0(void)
+{
+ unsigned int timeout;
+
+ if (smp_processor_id())
+ return false;
+
+ /*
+ * Wait up to 3 seconds to check if CPU0 wakeup NMI is handled.
+ * If there is no CPU0 wakeup NMI, go back to sleep.
+ */
+ for (timeout = 0; timeout < 30000; timeout++) {
+ /*
+ * Check if CPU0 wakeup NMI is issued and handled.
+ */
+ if (enable_start_cpu0)
+ return true;
+
+ udelay(100);
+ }
+
+ return false;
+}
+
/*
* We need to flush the caches before going to sleep, lest we have
* dirty data in our caches when we come back up.
@@ -1327,6 +1417,11 @@ static inline void mwait_play_dead(void)
__monitor(mwait_ptr, 0, 0);
mb();
__mwait(eax, 0);
+ /*
+ * If NMI wants to wake up CPU0, start CPU0.
+ */
+ if (wakeup_cpu0())
+ start_cpu0();
}
}

@@ -1337,6 +1432,11 @@ static inline void hlt_play_dead(void)

while (1) {
native_halt();
+ /*
+ * If NMI wants to wake up CPU0, start CPU0.
+ */
+ if (wakeup_cpu0())
+ start_cpu0();
}
}

--
1.7.2

2012-10-12 16:51:34

by Fenghua Yu

[permalink] [raw]
Subject: [PATCH v9 12/12] x86, topology: Debug CPU00 hotplug

From: Fenghua Yu <[email protected]>

CONFIG_DEBUG_HOTPLUG_CPU0 is for debugging the CPU0 hotplug feature. The switch
offlines CPU0 as soon as possible and boots userspace up with CPU0 offlined.
User can online CPU0 back after boot time. The default value of the switch is
off.

To debug CPU0 hotplug, you need to enable CPU0 offline/online feature by either
turning on CONFIG_BOOTPARAM_HOTPLUG_CPU0 during compilation or giving
cpu0_hotplug kernel parameter at boot.

It's safe and early place to take down CPU0 after all hotplug notifiers
are installed and SMP is booted.

Please note that some applications or drivers, e.g. some versions of udevd,
during boot time may put CPU0 online again in this CPU0 hotplug debug mode.

In this debug mode, setup_local_APIC() may report a warning on max_loops<=0
when CPU0 is onlined back after boot time. This is because pending interrupt in
IRR can not move to ISR. The warning is not CPU0 specfic and it can happen on
other CPUs as well. It is harmless except the first CPU0 online takes a bit
longer time. And so this debug mode is useful to expose this issue. I'll send
a seperate patch to fix this generic warning issue.

Signed-off-by: Fenghua Yu <[email protected]>
---
arch/x86/Kconfig | 15 +++++++++++++
arch/x86/include/asm/cpu.h | 3 ++
arch/x86/kernel/topology.c | 51 ++++++++++++++++++++++++++++++++++++++++++++
arch/x86/power/cpu.c | 38 ++++++++++++++++++++++++++++++++
4 files changed, 107 insertions(+), 0 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index cf53f0f..1abcae4 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1755,6 +1755,21 @@ config BOOTPARAM_HOTPLUG_CPU0
You still can enable the CPU0 hotplug feature at boot by kernel
parameter cpu0_hotplug.

+config DEBUG_HOTPLUG_CPU0
+ def_bool n
+ prompt "Debug CPU0 hotplug"
+ depends on HOTPLUG_CPU && EXPERIMENTAL
+ ---help---
+ Enabling this option offlines CPU0 (if CPU0 can be offlined) as
+ soon as possible and boots up userspace with CPU0 offlined. User
+ can online CPU0 back after boot time.
+
+ To debug CPU0 hotplug, you need to enable CPU0 offline/online
+ feature by either turning on CONFIG_BOOTPARAM_HOTPLUG_CPU0 during
+ compilation or giving cpu0_hotplug kernel parameter at boot.
+
+ If unuser, say N.
+
config COMPAT_VDSO
def_bool y
prompt "Compat VDSO support"
diff --git a/arch/x86/include/asm/cpu.h b/arch/x86/include/asm/cpu.h
index a119572..5f9a124 100644
--- a/arch/x86/include/asm/cpu.h
+++ b/arch/x86/include/asm/cpu.h
@@ -29,6 +29,9 @@ struct x86_cpu {
extern int arch_register_cpu(int num);
extern void arch_unregister_cpu(int);
extern void __cpuinit start_cpu0(void);
+#ifdef CONFIG_DEBUG_HOTPLUG_CPU0
+extern int _debug_hotplug_cpu(int cpu, int action);
+#endif
#endif

DECLARE_PER_CPU(int, cpu_state);
diff --git a/arch/x86/kernel/topology.c b/arch/x86/kernel/topology.c
index 0e7b4a7..6e60b5f 100644
--- a/arch/x86/kernel/topology.c
+++ b/arch/x86/kernel/topology.c
@@ -50,6 +50,57 @@ static int __init enable_cpu0_hotplug(char *str)
__setup("cpu0_hotplug", enable_cpu0_hotplug);
#endif

+#ifdef CONFIG_DEBUG_HOTPLUG_CPU0
+/*
+ * This function offlines a CPU as early as possible and allows userspace to
+ * boot up without the CPU. The CPU can be onlined back by user after boot.
+ *
+ * This is only called for debugging CPU offline/online feature.
+ */
+int __ref _debug_hotplug_cpu(int cpu, int action)
+{
+ struct device *dev = get_cpu_device(cpu);
+ int ret;
+
+ if (!cpu_is_hotpluggable(cpu))
+ return -EINVAL;
+
+ cpu_hotplug_driver_lock();
+
+ switch (action) {
+ case 0:
+ ret = cpu_down(cpu);
+ if (!ret) {
+ pr_info("CPU %u is now offline\n", cpu);
+ kobject_uevent(&dev->kobj, KOBJ_OFFLINE);
+ } else
+ pr_debug("Can't offline CPU%d.\n", cpu);
+ break;
+ case 1:
+ ret = cpu_up(cpu);
+ if (!ret)
+ kobject_uevent(&dev->kobj, KOBJ_ONLINE);
+ else
+ pr_debug("Can't online CPU%d.\n", cpu);
+ break;
+ default:
+ ret = -EINVAL;
+ }
+
+ cpu_hotplug_driver_unlock();
+
+ return ret;
+}
+
+static int __init debug_hotplug_cpu(void)
+{
+ _debug_hotplug_cpu(0, 0);
+ return 0;
+}
+
+late_initcall_sync(debug_hotplug_cpu);
+#endif /* CONFIG_DEBUG_HOTPLUG_CPU0 */
+
int __ref arch_register_cpu(int num)
{
struct cpuinfo_x86 *c = &cpu_data(num);
diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c
index adde775..120cee1 100644
--- a/arch/x86/power/cpu.c
+++ b/arch/x86/power/cpu.c
@@ -21,6 +21,7 @@
#include <asm/suspend.h>
#include <asm/debugreg.h>
#include <asm/fpu-internal.h> /* pcntxt_mask */
+#include <asm/cpu.h>

#ifdef CONFIG_X86_32
static struct saved_context saved_context;
@@ -263,6 +264,43 @@ static int bsp_pm_callback(struct notifier_block *nb, unsigned long action,
case PM_HIBERNATION_PREPARE:
ret = bsp_check();
break;
+#ifdef CONFIG_DEBUG_HOTPLUG_CPU0
+ case PM_RESTORE_PREPARE:
+ /*
+ * When system resumes from hibernation, online CPU0 because
+ * 1. it's required for resume and
+ * 2. the CPU was online before hibernation
+ */
+ if (!cpu_online(0))
+ _debug_hotplug_cpu(0, 1);
+ break;
+ case PM_POST_RESTORE:
+ /*
+ * When a resume really happens, this code won't be called.
+ *
+ * This code is called only when user space hibernation software
+ * prepares for snapshot device during boot time. So we just
+ * call _debug_hotplug_cpu() to restore to CPU0's state prior to
+ * preparing the snapshot device.
+ *
+ * This works for normal boot case in our CPU0 hotplug debug
+ * mode, i.e. CPU0 is offline and user mode hibernation
+ * software initializes during boot time.
+ *
+ * If CPU0 is online and user application accesses snapshot
+ * device after boot time, this will offline CPU0 and user may
+ * see different CPU0 state before and after accessing
+ * the snapshot device. But hopefully this is not a case when
+ * user debugging CPU0 hotplug. Even if users hit this case,
+ * they can easily online CPU0 back.
+ *
+ * To simplify this debug code, we only consider normal boot
+ * case. Otherwise we need to remember CPU0's state and restore
+ * to that state and resolve racy conditions etc.
+ */
+ _debug_hotplug_cpu(0, 0);
+ break;
+#endif
default:
break;
}
--
1.7.2

2012-10-12 16:51:32

by Fenghua Yu

[permalink] [raw]
Subject: [PATCH v9 07/12] x86-32, hotplug: Add start_cpu0() entry point to head_32.S

From: Fenghua Yu <[email protected]>

start_cpu0() is defined in head_32.S for 32-bit. The function sets up stack and
jumps to start_secondary() for CPU0 wake up.

Signed-off-by: Fenghua Yu <[email protected]>
---
arch/x86/kernel/head_32.S | 12 ++++++++++++
1 files changed, 12 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/head_32.S b/arch/x86/kernel/head_32.S
index 957a47a..207abd4 100644
--- a/arch/x86/kernel/head_32.S
+++ b/arch/x86/kernel/head_32.S
@@ -266,6 +266,18 @@ num_subarch_entries = (. - subarch_entries) / 4
jmp default_entry
#endif /* CONFIG_PARAVIRT */

+#ifdef CONFIG_HOTPLUG_CPU
+/*
+ * Boot CPU0 entry point. It's called from play_dead(). Everything has been set
+ * up already except stack. We just set up stack here. Then call
+ * start_secondary().
+ */
+ENTRY(start_cpu0)
+ movl stack_start, %ecx
+ movl %ecx, %esp
+ jmp *(initial_code)
+#endif
+
/*
* Non-boot CPU entry point; entered from trampoline.S
* We can't lgdt here, because lgdt itself uses a data segment, but
--
1.7.2

2012-10-12 16:51:31

by Fenghua Yu

[permalink] [raw]
Subject: [PATCH v9 09/12] x86, hotplug: During CPU0 online, enable x2apic, set_numa_node.

From: Fenghua Yu <[email protected]>

Previously these functions were not run on the BSP (CPU 0, the boot processor)
since the boot processor init would only be executed before this functionality
was initialized.

Signed-off-by: Fenghua Yu <[email protected]>
---
arch/x86/kernel/cpu/common.c | 5 ++---
1 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 7505f7b..ca165ac 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1237,7 +1237,7 @@ void __cpuinit cpu_init(void)
oist = &per_cpu(orig_ist, cpu);

#ifdef CONFIG_NUMA
- if (cpu != 0 && this_cpu_read(numa_node) == 0 &&
+ if (this_cpu_read(numa_node) == 0 &&
early_cpu_to_node(cpu) != NUMA_NO_NODE)
set_numa_node(early_cpu_to_node(cpu));
#endif
@@ -1269,8 +1269,7 @@ void __cpuinit cpu_init(void)
barrier();

x86_configure_nx();
- if (cpu != 0)
- enable_x2apic();
+ enable_x2apic();

/*
* set up and load the per-CPU TSS
--
1.7.2

2012-10-12 16:52:22

by Fenghua Yu

[permalink] [raw]
Subject: [PATCH v9 05/12] x86, hotplug, suspend: Online CPU0 for suspend or hibernate

From: Fenghua Yu <[email protected]>

Because x86 BIOS requires CPU0 to resume from sleep, suspend or hibernate can't
be executed if CPU0 is detected offline. To make suspend or hibernate and
further resume succeed, CPU0 must be online.

Signed-off-by: Fenghua Yu <[email protected]>
---
arch/x86/power/cpu.c | 44 ++++++++++++++++++++++++++++++++++++++++++++
1 files changed, 44 insertions(+), 0 deletions(-)

diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c
index 218cdb1..adde775 100644
--- a/arch/x86/power/cpu.c
+++ b/arch/x86/power/cpu.c
@@ -237,3 +237,47 @@ void restore_processor_state(void)
#ifdef CONFIG_X86_32
EXPORT_SYMBOL(restore_processor_state);
#endif
+
+/*
+ * When bsp_check() is called in hibernate and suspend, cpu hotplug
+ * is disabled already. So it's unnessary to handle race condition between
+ * cpumask query and cpu hotplug.
+ */
+static int bsp_check(void)
+{
+ if (cpumask_first(cpu_online_mask) != 0) {
+ pr_warn("CPU0 is offline.\n");
+ return -ENODEV;
+ }
+
+ return 0;
+}
+
+static int bsp_pm_callback(struct notifier_block *nb, unsigned long action,
+ void *ptr)
+{
+ int ret = 0;
+
+ switch (action) {
+ case PM_SUSPEND_PREPARE:
+ case PM_HIBERNATION_PREPARE:
+ ret = bsp_check();
+ break;
+ default:
+ break;
+ }
+ return notifier_from_errno(ret);
+}
+
+static int __init bsp_pm_check_init(void)
+{
+ /*
+ * Set this bsp_pm_callback as lower priority than
+ * cpu_hotplug_pm_callback. So cpu_hotplug_pm_callback will be called
+ * earlier to disable cpu hotplug before bsp online check.
+ */
+ pm_notifier(bsp_pm_callback, -INT_MAX);
+ return 0;
+}
+
+core_initcall(bsp_pm_check_init);
--
1.7.2

2012-10-12 16:52:37

by Fenghua Yu

[permalink] [raw]
Subject: [PATCH v9 02/12] x86, Kconfig: Add config switch for CPU0 hotplug

From: Fenghua Yu <[email protected]>

New config switch CONFIG_BOOTPARAM_HOTPLUG_CPU0 sets default state of whether
the CPU0 hotplug is on or off.

If the switch is off, CPU0 is not hotpluggable by default. But the CPU0 hotplug
feature can still be turned on by kernel parameter cpu0_hotplug at boot.

If the switch is on, CPU0 is always hotpluggable.

The default value of the switch is off.

Signed-off-by: Fenghua Yu <[email protected]>
---
arch/x86/Kconfig | 29 +++++++++++++++++++++++++++++
1 files changed, 29 insertions(+), 0 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 56e7a25..cf53f0f 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1726,6 +1726,35 @@ config HOTPLUG_CPU
automatically on SMP systems. )
Say N if you want to disable CPU hotplug.

+config BOOTPARAM_HOTPLUG_CPU0
+ bool "Set default setting of cpu0_hotpluggable"
+ default n
+ depends on HOTPLUG_CPU && EXPERIMENTAL
+ ---help---
+ Set whether default state of cpu0_hotpluggable is on or off.
+
+ Say Y here to enable CPU0 hotplug by default. If this switch
+ is turned on, there is no need to give cpu0_hotplug kernel
+ parameter and the CPU0 hotplug feature is enabled by default.
+
+ Please note: there are two known CPU0 dependencies if you want
+ to enable the CPU0 hotplug feature either by this switch or by
+ cpu0_hotplug kernel parameter.
+
+ First, resume from hibernate or suspend always starts from CPU0.
+ So hibernate and suspend are prevented if CPU0 is offline.
+
+ Second dependency is PIC interrupts always go to CPU0. CPU0 can not
+ offline if any interrupt can not migrate out of CPU0. There may
+ be other CPU0 dependencies.
+
+ Please make sure the dependencies are under your control before
+ you enable this feature.
+
+ Say N if you don't want to enable CPU0 hotplug feature by default.
+ You still can enable the CPU0 hotplug feature at boot by kernel
+ parameter cpu0_hotplug.
+
config COMPAT_VDSO
def_bool y
prompt "Compat VDSO support"
--
1.7.2

2012-10-12 16:53:06

by Fenghua Yu

[permalink] [raw]
Subject: [PATCH v9 04/12] x86, hotplug: Support functions for CPU0 online/offline

From: Fenghua Yu <[email protected]>

Add smp_store_boot_cpu_info() to store cpu info for BSP during boot time.

Now smp_store_cpu_info() stores cpu info for bringing up BSP or AP after
it's offline.

Continue to online CPU0 in native_cpu_up().

Continue to offline CPU0 in native_cpu_disable().

Signed-off-by: Fenghua Yu <[email protected]>
---
arch/x86/include/asm/smp.h | 1 +
arch/x86/kernel/smpboot.c | 38 ++++++++++++++++++--------------------
2 files changed, 19 insertions(+), 20 deletions(-)

diff --git a/arch/x86/include/asm/smp.h b/arch/x86/include/asm/smp.h
index 4f19a15..b073aae 100644
--- a/arch/x86/include/asm/smp.h
+++ b/arch/x86/include/asm/smp.h
@@ -166,6 +166,7 @@ void native_send_call_func_ipi(const struct cpumask *mask);
void native_send_call_func_single_ipi(int cpu);
void x86_idle_thread_init(unsigned int cpu, struct task_struct *idle);

+void smp_store_boot_cpu_info(void);
void smp_store_cpu_info(int id);
#define cpu_physical_id(cpu) per_cpu(x86_cpu_to_apicid, cpu)

diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index c80a33b..c297907 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -125,8 +125,8 @@ EXPORT_PER_CPU_SYMBOL(cpu_info);
atomic_t init_deasserted;

/*
- * Report back to the Boot Processor.
- * Running on AP.
+ * Report back to the Boot Processor during boot time or to the caller processor
+ * during CPU online.
*/
static void __cpuinit smp_callin(void)
{
@@ -279,19 +279,30 @@ notrace static void __cpuinit start_secondary(void *unused)
cpu_idle();
}

+void __init smp_store_boot_cpu_info(void)
+{
+ int id = 0; /* CPU 0 */
+ struct cpuinfo_x86 *c = &cpu_data(id);
+
+ *c = boot_cpu_data;
+ c->cpu_index = id;
+}
+
/*
* The bootstrap kernel entry code has set these up. Save them for
* a given CPU
*/
-
void __cpuinit smp_store_cpu_info(int id)
{
struct cpuinfo_x86 *c = &cpu_data(id);

*c = boot_cpu_data;
c->cpu_index = id;
- if (id != 0)
- identify_secondary_cpu(c);
+ /*
+ * During boot time, CPU0 has this setup already. Save the info when
+ * bringing up AP or offlined CPU0.
+ */
+ identify_secondary_cpu(c);
}

static bool __cpuinit
@@ -795,7 +806,7 @@ int __cpuinit native_cpu_up(unsigned int cpu, struct task_struct *tidle)

pr_debug("++++++++++++++++++++=_---CPU UP %u\n", cpu);

- if (apicid == BAD_APICID || apicid == boot_cpu_physical_apicid ||
+ if (apicid == BAD_APICID ||
!physid_isset(apicid, phys_cpu_present_map) ||
!apic->apic_id_valid(apicid)) {
pr_err("%s: bad cpu %d\n", __func__, cpu);
@@ -990,7 +1001,7 @@ void __init native_smp_prepare_cpus(unsigned int max_cpus)
/*
* Setup boot CPU information
*/
- smp_store_cpu_info(0); /* Final full version of the data */
+ smp_store_boot_cpu_info(); /* Final full version of the data */
cpumask_copy(cpu_callin_mask, cpumask_of(0));
mb();

@@ -1214,19 +1225,6 @@ void cpu_disable_common(void)

int native_cpu_disable(void)
{
- int cpu = smp_processor_id();
-
- /*
- * Perhaps use cpufreq to drop frequency, but that could go
- * into generic code.
- *
- * We won't take down the boot processor on i386 due to some
- * interrupts only being able to be serviced by the BSP.
- * Especially so if we're not using an IOAPIC -zwane
- */
- if (cpu == 0)
- return -EBUSY;
-
clear_local_APIC();

cpu_disable_common();
--
1.7.2

2012-10-15 20:46:27

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH v9 05/12] x86, hotplug, suspend: Online CPU0 for suspend or hibernate

On Friday 12 of October 2012 09:09:42 Fenghua Yu wrote:
> From: Fenghua Yu <[email protected]>
>
> Because x86 BIOS requires CPU0 to resume from sleep, suspend or hibernate can't
> be executed if CPU0 is detected offline. To make suspend or hibernate and
> further resume succeed, CPU0 must be online.
>
> Signed-off-by: Fenghua Yu <[email protected]>
> ---
> arch/x86/power/cpu.c | 44 ++++++++++++++++++++++++++++++++++++++++++++
> 1 files changed, 44 insertions(+), 0 deletions(-)
>
> diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c
> index 218cdb1..adde775 100644
> --- a/arch/x86/power/cpu.c
> +++ b/arch/x86/power/cpu.c
> @@ -237,3 +237,47 @@ void restore_processor_state(void)
> #ifdef CONFIG_X86_32
> EXPORT_SYMBOL(restore_processor_state);
> #endif
> +
> +/*
> + * When bsp_check() is called in hibernate and suspend, cpu hotplug
> + * is disabled already. So it's unnessary to handle race condition between
> + * cpumask query and cpu hotplug.
> + */
> +static int bsp_check(void)
> +{
> + if (cpumask_first(cpu_online_mask) != 0) {
> + pr_warn("CPU0 is offline.\n");
> + return -ENODEV;
> + }
> +
> + return 0;
> +}
> +
> +static int bsp_pm_callback(struct notifier_block *nb, unsigned long action,
> + void *ptr)
> +{
> + int ret = 0;
> +
> + switch (action) {
> + case PM_SUSPEND_PREPARE:
> + case PM_HIBERNATION_PREPARE:
> + ret = bsp_check();
> + break;
> + default:
> + break;
> + }
> + return notifier_from_errno(ret);
> +}
> +

I wonder if there's anything preventing CPU0 from becoming offline after you've
done this check and before user space is frozen?

Rafael


> +static int __init bsp_pm_check_init(void)
> +{
> + /*
> + * Set this bsp_pm_callback as lower priority than
> + * cpu_hotplug_pm_callback. So cpu_hotplug_pm_callback will be called
> + * earlier to disable cpu hotplug before bsp online check.
> + */
> + pm_notifier(bsp_pm_callback, -INT_MAX);
> + return 0;
> +}
> +
> +core_initcall(bsp_pm_check_init);
>
--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

2012-10-16 04:43:32

by Hatayama, Daisuke

[permalink] [raw]
Subject: Re: [PATCH v9 08/12] x86, hotplug: Wake up CPU0 via NMI instead of INIT, SIPI, SIPI

From: Fenghua Yu <[email protected]>
Subject: [PATCH v9 08/12] x86, hotplug: Wake up CPU0 via NMI instead of INIT, SIPI, SIPI
Date: Fri, 12 Oct 2012 09:09:45 -0700

<cut>

> @@ -1037,6 +1101,8 @@ void __init native_smp_prepare_cpus(unsigned int max_cpus)
> */
> setup_local_APIC();
>
> + cpu0_logical_apicid = GET_APIC_LOGICAL_ID(apic_read(APIC_LDR));
> +

In x2apic mode, logical apicid occupies a whole 32-bit length of LDR,
but GET_APIC_LOGICAL_ID returns high 31-24 bits only, and this is only
for xapic mode.

Thanks.
HATAYAMA, Daisuke

2012-10-16 05:36:43

by Srivatsa S. Bhat

[permalink] [raw]
Subject: Re: [PATCH v9 05/12] x86, hotplug, suspend: Online CPU0 for suspend or hibernate

On 10/16/2012 02:20 AM, Rafael J. Wysocki wrote:
> On Friday 12 of October 2012 09:09:42 Fenghua Yu wrote:
>> From: Fenghua Yu <[email protected]>
>>
>> Because x86 BIOS requires CPU0 to resume from sleep, suspend or hibernate can't
>> be executed if CPU0 is detected offline. To make suspend or hibernate and
>> further resume succeed, CPU0 must be online.
>>
>> Signed-off-by: Fenghua Yu <[email protected]>
>> ---
>> arch/x86/power/cpu.c | 44 ++++++++++++++++++++++++++++++++++++++++++++
>> 1 files changed, 44 insertions(+), 0 deletions(-)
>>
>> diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c
>> index 218cdb1..adde775 100644
>> --- a/arch/x86/power/cpu.c
>> +++ b/arch/x86/power/cpu.c
>> @@ -237,3 +237,47 @@ void restore_processor_state(void)
>> #ifdef CONFIG_X86_32
>> EXPORT_SYMBOL(restore_processor_state);
>> #endif
>> +
>> +/*
>> + * When bsp_check() is called in hibernate and suspend, cpu hotplug
>> + * is disabled already. So it's unnessary to handle race condition between
>> + * cpumask query and cpu hotplug.
>> + */
>> +static int bsp_check(void)
>> +{
>> + if (cpumask_first(cpu_online_mask) != 0) {
>> + pr_warn("CPU0 is offline.\n");
>> + return -ENODEV;
>> + }
>> +
>> + return 0;
>> +}
>> +
>> +static int bsp_pm_callback(struct notifier_block *nb, unsigned long action,
>> + void *ptr)
>> +{
>> + int ret = 0;
>> +
>> + switch (action) {
>> + case PM_SUSPEND_PREPARE:
>> + case PM_HIBERNATION_PREPARE:
>> + ret = bsp_check();
>> + break;
>> + default:
>> + break;
>> + }
>> + return notifier_from_errno(ret);
>> +}
>> +
>
> I wonder if there's anything preventing CPU0 from becoming offline after you've
> done this check and before user space is frozen?
>

Hi Rafael,

bsp_pm_callback runs as a low priority notifier callback, specifically with lower
priority than the cpu_hotplug_pm_callback (as mentioned in the comment below).
And cpu_hotplug_pm_callback disables regular CPU hotplug (till the suspend/resume
sequence is complete).. So there is no chance for CPU0 to become offline after that.

Or, are you thinking of some other scenario where CPU0 can go offline?

Regards,
Srivatsa S. Bhat

>
>
>> +static int __init bsp_pm_check_init(void)
>> +{
>> + /*
>> + * Set this bsp_pm_callback as lower priority than
>> + * cpu_hotplug_pm_callback. So cpu_hotplug_pm_callback will be called
>> + * earlier to disable cpu hotplug before bsp online check.
>> + */
>> + pm_notifier(bsp_pm_callback, -INT_MAX);
>> + return 0;
>> +}
>> +
>> +core_initcall(bsp_pm_check_init);
>>

2012-10-16 08:48:46

by Srivatsa S. Bhat

[permalink] [raw]
Subject: Re: [PATCH v9 12/12] x86, topology: Debug CPU00 hotplug

On 10/12/2012 09:39 PM, Fenghua Yu wrote:
> From: Fenghua Yu <[email protected]>

> +config DEBUG_HOTPLUG_CPU0
> + def_bool n
> + prompt "Debug CPU0 hotplug"
> + depends on HOTPLUG_CPU && EXPERIMENTAL
> + ---help---
> + Enabling this option offlines CPU0 (if CPU0 can be offlined) as
> + soon as possible and boots up userspace with CPU0 offlined. User
> + can online CPU0 back after boot time.
> +
> + To debug CPU0 hotplug, you need to enable CPU0 offline/online
> + feature by either turning on CONFIG_BOOTPARAM_HOTPLUG_CPU0 during
> + compilation or giving cpu0_hotplug kernel parameter at boot.
> +
> + If unuser, say N.

s/unuser/unsure

> +#ifdef CONFIG_DEBUG_HOTPLUG_CPU0
> +/*
> + * This function offlines a CPU as early as possible and allows userspace to
> + * boot up without the CPU. The CPU can be onlined back by user after boot.
> + *
> + * This is only called for debugging CPU offline/online feature.
> + */
> +int __ref _debug_hotplug_cpu(int cpu, int action)
> +{
> + struct device *dev = get_cpu_device(cpu);
> + int ret;
> +
> + if (!cpu_is_hotpluggable(cpu))
> + return -EINVAL;
> +
> + cpu_hotplug_driver_lock();
> +
> + switch (action) {
> + case 0:
> + ret = cpu_down(cpu);
> + if (!ret) {
> + pr_info("CPU %u is now offline\n", cpu);
> + kobject_uevent(&dev->kobj, KOBJ_OFFLINE);
> + } else
> + pr_debug("Can't offline CPU%d.\n", cpu);
> + break;
> + case 1:
> + ret = cpu_up(cpu);
> + if (!ret)
> + kobject_uevent(&dev->kobj, KOBJ_ONLINE);
> + else
> + pr_debug("Can't online CPU%d.\n", cpu);
> + break;
> + default:
> + ret = -EINVAL;
> + }
> +
> + cpu_hotplug_driver_unlock();
> +
> + return ret;
> +}
> +
> +static int __init debug_hotplug_cpu(void)
> +{
> + _debug_hotplug_cpu(0, 0);
> + return 0;
> +}
> +
> +late_initcall_sync(debug_hotplug_cpu);
> +#endif /* CONFIG_DEBUG_HOTPLUG_CPU0 */
> +
> int __ref arch_register_cpu(int num)
> {
> struct cpuinfo_x86 *c = &cpu_data(num);
> diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c
> index adde775..120cee1 100644
> --- a/arch/x86/power/cpu.c
> +++ b/arch/x86/power/cpu.c
> @@ -21,6 +21,7 @@
> #include <asm/suspend.h>
> #include <asm/debugreg.h>
> #include <asm/fpu-internal.h> /* pcntxt_mask */
> +#include <asm/cpu.h>
>
> #ifdef CONFIG_X86_32
> static struct saved_context saved_context;
> @@ -263,6 +264,43 @@ static int bsp_pm_callback(struct notifier_block *nb, unsigned long action,
> case PM_HIBERNATION_PREPARE:
> ret = bsp_check();
> break;
> +#ifdef CONFIG_DEBUG_HOTPLUG_CPU0
> + case PM_RESTORE_PREPARE:
> + /*
> + * When system resumes from hibernation, online CPU0 because
> + * 1. it's required for resume and
> + * 2. the CPU was online before hibernation
> + */
> + if (!cpu_online(0))
> + _debug_hotplug_cpu(0, 1);

This won't work. CPU hotplug is disabled by cpu_hotplug_pm_callback() during
PM_HIBERNATION_PREPARE and is enabled back only during PM_POST_HIBERNATION.

So calls to cpu_up() or cpu_down() will fail in the meantime.

Regards,
Srivatsa S. Bhat

> + break;
> + case PM_POST_RESTORE:
> + /*
> + * When a resume really happens, this code won't be called.
> + *
> + * This code is called only when user space hibernation software
> + * prepares for snapshot device during boot time. So we just
> + * call _debug_hotplug_cpu() to restore to CPU0's state prior to
> + * preparing the snapshot device.
> + *
> + * This works for normal boot case in our CPU0 hotplug debug
> + * mode, i.e. CPU0 is offline and user mode hibernation
> + * software initializes during boot time.
> + *
> + * If CPU0 is online and user application accesses snapshot
> + * device after boot time, this will offline CPU0 and user may
> + * see different CPU0 state before and after accessing
> + * the snapshot device. But hopefully this is not a case when
> + * user debugging CPU0 hotplug. Even if users hit this case,
> + * they can easily online CPU0 back.
> + *
> + * To simplify this debug code, we only consider normal boot
> + * case. Otherwise we need to remember CPU0's state and restore
> + * to that state and resolve racy conditions etc.
> + */
> + _debug_hotplug_cpu(0, 0);
> + break;
> +#endif
> default:
> break;
> }
>

2012-10-16 16:14:18

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH v9 05/12] x86, hotplug, suspend: Online CPU0 for suspend or hibernate

On Tuesday 16 of October 2012 11:05:18 Srivatsa S. Bhat wrote:
> On 10/16/2012 02:20 AM, Rafael J. Wysocki wrote:
> > On Friday 12 of October 2012 09:09:42 Fenghua Yu wrote:
> >> From: Fenghua Yu <[email protected]>
> >>
> >> Because x86 BIOS requires CPU0 to resume from sleep, suspend or hibernate can't
> >> be executed if CPU0 is detected offline. To make suspend or hibernate and
> >> further resume succeed, CPU0 must be online.
> >>
> >> Signed-off-by: Fenghua Yu <[email protected]>
> >> ---
> >> arch/x86/power/cpu.c | 44 ++++++++++++++++++++++++++++++++++++++++++++
> >> 1 files changed, 44 insertions(+), 0 deletions(-)
> >>
> >> diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c
> >> index 218cdb1..adde775 100644
> >> --- a/arch/x86/power/cpu.c
> >> +++ b/arch/x86/power/cpu.c
> >> @@ -237,3 +237,47 @@ void restore_processor_state(void)
> >> #ifdef CONFIG_X86_32
> >> EXPORT_SYMBOL(restore_processor_state);
> >> #endif
> >> +
> >> +/*
> >> + * When bsp_check() is called in hibernate and suspend, cpu hotplug
> >> + * is disabled already. So it's unnessary to handle race condition between
> >> + * cpumask query and cpu hotplug.
> >> + */
> >> +static int bsp_check(void)
> >> +{
> >> + if (cpumask_first(cpu_online_mask) != 0) {
> >> + pr_warn("CPU0 is offline.\n");
> >> + return -ENODEV;
> >> + }
> >> +
> >> + return 0;
> >> +}
> >> +
> >> +static int bsp_pm_callback(struct notifier_block *nb, unsigned long action,
> >> + void *ptr)
> >> +{
> >> + int ret = 0;
> >> +
> >> + switch (action) {
> >> + case PM_SUSPEND_PREPARE:
> >> + case PM_HIBERNATION_PREPARE:
> >> + ret = bsp_check();
> >> + break;
> >> + default:
> >> + break;
> >> + }
> >> + return notifier_from_errno(ret);
> >> +}
> >> +
> >
> > I wonder if there's anything preventing CPU0 from becoming offline after you've
> > done this check and before user space is frozen?
> >
>
> Hi Rafael,
>
> bsp_pm_callback runs as a low priority notifier callback, specifically with lower
> priority than the cpu_hotplug_pm_callback (as mentioned in the comment below).
> And cpu_hotplug_pm_callback disables regular CPU hotplug (till the suspend/resume
> sequence is complete).. So there is no chance for CPU0 to become offline after that.
>
> Or, are you thinking of some other scenario where CPU0 can go offline?

No, that should be fine technically, but designs relying on notifier priority
for correctness are kind of fragile.

Would it be possible to make cpu_hotplug_pm_callback() do the BSP online check?

> >> +static int __init bsp_pm_check_init(void)
> >> +{
> >> + /*
> >> + * Set this bsp_pm_callback as lower priority than
> >> + * cpu_hotplug_pm_callback. So cpu_hotplug_pm_callback will be called
> >> + * earlier to disable cpu hotplug before bsp online check.
> >> + */
> >> + pm_notifier(bsp_pm_callback, -INT_MAX);
> >> + return 0;
> >> +}
> >> +
> >> +core_initcall(bsp_pm_check_init);

Thanks,
Rafael


--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

2012-10-16 16:31:53

by Srivatsa S. Bhat

[permalink] [raw]
Subject: Re: [PATCH v9 05/12] x86, hotplug, suspend: Online CPU0 for suspend or hibernate

On 10/16/2012 09:47 PM, Rafael J. Wysocki wrote:
> On Tuesday 16 of October 2012 11:05:18 Srivatsa S. Bhat wrote:
>> On 10/16/2012 02:20 AM, Rafael J. Wysocki wrote:
>>> On Friday 12 of October 2012 09:09:42 Fenghua Yu wrote:
>>>> From: Fenghua Yu <[email protected]>
>>>>
>>>> +
>>>> +/*
>>>> + * When bsp_check() is called in hibernate and suspend, cpu hotplug
>>>> + * is disabled already. So it's unnessary to handle race condition between
>>>> + * cpumask query and cpu hotplug.
>>>> + */
>>>> +static int bsp_check(void)
>>>> +{
>>>> + if (cpumask_first(cpu_online_mask) != 0) {
>>>> + pr_warn("CPU0 is offline.\n");
>>>> + return -ENODEV;
>>>> + }
>>>> +
>>>> + return 0;
>>>> +}
>>>> +
>>>> +static int bsp_pm_callback(struct notifier_block *nb, unsigned long action,
>>>> + void *ptr)
>>>> +{
>>>> + int ret = 0;
>>>> +
>>>> + switch (action) {
>>>> + case PM_SUSPEND_PREPARE:
>>>> + case PM_HIBERNATION_PREPARE:
>>>> + ret = bsp_check();
>>>> + break;
>>>> + default:
>>>> + break;
>>>> + }
>>>> + return notifier_from_errno(ret);
>>>> +}
>>>> +
>>>
>>> I wonder if there's anything preventing CPU0 from becoming offline after you've
>>> done this check and before user space is frozen?
>>>
>>
>> Hi Rafael,
>>
>> bsp_pm_callback runs as a low priority notifier callback, specifically with lower
>> priority than the cpu_hotplug_pm_callback (as mentioned in the comment below).
>> And cpu_hotplug_pm_callback disables regular CPU hotplug (till the suspend/resume
>> sequence is complete).. So there is no chance for CPU0 to become offline after that.
>>
>> Or, are you thinking of some other scenario where CPU0 can go offline?
>
> No, that should be fine technically, but designs relying on notifier priority
> for correctness are kind of fragile.
>

Hmm.. I agree.

> Would it be possible to make cpu_hotplug_pm_callback() do the BSP online check?
>

Good idea! I think that could be done quite easily. And while doing that, it would
be good to rethink what to do in patch 12/12 (Debug CPU0 hotplug) to fix the bug I
pointed out in my other mail.

Regards,
Srivatsa S. Bhat

>>>> +static int __init bsp_pm_check_init(void)
>>>> +{
>>>> + /*
>>>> + * Set this bsp_pm_callback as lower priority than
>>>> + * cpu_hotplug_pm_callback. So cpu_hotplug_pm_callback will be called
>>>> + * earlier to disable cpu hotplug before bsp online check.
>>>> + */
>>>> + pm_notifier(bsp_pm_callback, -INT_MAX);
>>>> + return 0;
>>>> +}
>>>> +
>>>> +core_initcall(bsp_pm_check_init);
>

2012-10-16 16:50:16

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v9 06/12] x86-64, hotplug: Add start_cpu0() entry point to head_64.S

On Fri, Oct 12, 2012 at 09:09:43AM -0700, Fenghua Yu wrote:
> From: Fenghua Yu <[email protected]>
>
> start_cpu0() is defined in head_64.S for 64-bit. The function sets up stack and
> jumps to start_secondary() for CPU0 wake up.
>
> Signed-off-by: Fenghua Yu <[email protected]>
> ---
> arch/x86/kernel/head_64.S | 15 +++++++++++++++
> 1 files changed, 15 insertions(+), 0 deletions(-)
>
> diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
> index 94bf9cc..3faac8a 100644
> --- a/arch/x86/kernel/head_64.S
> +++ b/arch/x86/kernel/head_64.S
> @@ -252,6 +252,21 @@ ENTRY(secondary_startup_64)
> pushq %rax # target address in negative space
> lretq
>
> +#ifdef CONFIG_HOTPLUG_CPU
> +/*
> + * Boot CPU0 entry point. It's called from play_dead(). Everything has been set
> + * up already except stack. We just set up stack here. Then call
> + * start_secondary().
> + */
> +ENTRY(start_cpu0)
> + movq stack_start(%rip),%rsp
> + movq initial_code(%rip),%rax
> + pushq $0 # fake return address to stop unwinder
> + pushq $__KERNEL_CS # set correct cs
> + pushq %rax # target address in negative space
> + lretq

ENDPROC(start_cpu0) maybe?

Ditto for head_32.S in the next patch.

Thanks.

--
Regards/Gruss,
Boris.

2012-10-16 16:57:19

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v9 08/12] x86, hotplug: Wake up CPU0 via NMI instead of INIT, SIPI, SIPI

On Fri, Oct 12, 2012 at 09:09:45AM -0700, Fenghua Yu wrote:
> From: Fenghua Yu <[email protected]>
>
> Instead of waiting for STARTUP after INITs, BSP will execute the BIOS boot-strap
> code which is not a desired behavior for waking up BSP. To avoid the boot-strap
> code, wake up CPU0 by NMI instead.
>
> This works to wake up soft offlined CPU0 only. If CPU0 is hard offlined (i.e.
> physically hot removed and then hot added), NMI won't wake it up. We'll change
> this code in the future to wake up hard offlined CPU0 if real platform and
> request are available.
>
> AP is still waken up as before by INIT, SIPI, SIPI sequence.
>
> Signed-off-by: Fenghua Yu <[email protected]>
> ---
> arch/x86/include/asm/cpu.h | 1 +
> arch/x86/kernel/smpboot.c | 112 +++++++++++++++++++++++++++++++++++++++++--
> 2 files changed, 107 insertions(+), 6 deletions(-)

[ … ]

> @@ -727,8 +745,47 @@ static int __cpuinit do_boot_cpu(int apicid, int cpu, struct task_struct *idle)
> */
> if (apic->wakeup_secondary_cpu)
> boot_error = apic->wakeup_secondary_cpu(apicid, start_ip);
> - else
> - boot_error = wakeup_secondary_cpu_via_init(apicid, start_ip);
> + else {
> + if (cpu)
> + /*
> + * Wake up AP by INIT, INIT, STARTUP sequence.
> + */
> + boot_error = wakeup_secondary_cpu_via_init(apicid,
> + start_ip);
> + else {
> + /*
> + * Instead of waiting for STARTUP after INITs, BSP will
> + * execute the BIOS boot-strap code which is not a
> + * desired behavior for waking up BSP. To avoid the
> + * boot-strap code, wake up CPU0 by NMI instead.
> + *
> + * This works to wake up soft offlined CPU0 only. If
> + * CPU0 is hard offlined (i.e. physically hot removed
> + * and then hot added), NMI won't wake it up. We'll
> + * change this code in the future to wake up hard
> + * offlined CPU0 if real platform and request are
> + * available.
> + */
> + int id;
> +
> + enable_start_cpu0 = 1;
> + /*
> + * Register a NMI handler to help wake up CPU0.
> + */
> + boot_error = register_nmi_handler(NMI_LOCAL,
> + wakeup_cpu0_nmi, 0, "wake_cpu0");
> +
> + if (!boot_error) {
> + cpu0_nmi_registered = 1;
> + if (apic->dest_logical == APIC_DEST_LOGICAL)
> + id = cpu0_logical_apicid;
> + else
> + id = apicid;
> + boot_error = wakeup_secondary_cpu_via_nmi(id,
> + start_ip);
> + }

This whole else-part for cpu 0 could be carved out into a separate
function so that you can save yourself the awkward code indentation and
make this a lot more readable.

Thanks.

--
Regards/Gruss,
Boris.

2012-10-16 17:03:16

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v9 03/12] x86, topology: Don't offline CPU0 if any PIC irq can not be migrated out of it

On Fri, Oct 12, 2012 at 09:09:40AM -0700, Fenghua Yu wrote:
> From: Fenghua Yu <[email protected]>
>
> If CONFIG_BOOTPARAM_HOTPLUG_CPU is turned on, CPU0 hotplug feature is enabled
> by default.
>
> If CONFIG_BOOTPARAM_HOTPLUG_CPU is not turned on, CPU0 hotplug feature is not
> enabled by default. The kernel parameter cpu0_hotplug can enable CPU0 hotplug
> feature at boot.
>
> Currently the feature is supported on Intel platforms only.
>
> Signed-off-by: Fenghua Yu <[email protected]>
> ---
> arch/x86/kernel/topology.c | 50 +++++++++++++++++++++++++++++++++++++------

It is just me or is this whole change better suited for

arch/x86/kernel/cpu/topology.c

where cpu-specific stuff lives anyway?

Thanks.

--
Regards/Gruss,
Boris.

2012-10-16 18:43:40

by Fenghua Yu

[permalink] [raw]
Subject: RE: [PATCH v9 03/12] x86, topology: Don't offline CPU0 if any PIC irq can not be migrated out of it

> On Fri, Oct 12, 2012 at 09:09:40AM -0700, Fenghua Yu wrote:
> > From: Fenghua Yu <[email protected]>
> >
> > If CONFIG_BOOTPARAM_HOTPLUG_CPU is turned on, CPU0 hotplug feature is
> enabled
> > by default.
> >
> > If CONFIG_BOOTPARAM_HOTPLUG_CPU is not turned on, CPU0 hotplug
> feature is not
> > enabled by default. The kernel parameter cpu0_hotplug can enable CPU0
> hotplug
> > feature at boot.
> >
> > Currently the feature is supported on Intel platforms only.
> >
> > Signed-off-by: Fenghua Yu <[email protected]>
> > ---
> > arch/x86/kernel/topology.c | 50
> +++++++++++++++++++++++++++++++++++++------
>
> It is just me or is this whole change better suited for
>
> arch/x86/kernel/cpu/topology.c
>
> where cpu-specific stuff lives anyway?

arch/x86/kernel/cpu/topology.c doesn't contain CPU hotplug hooks. To place
the BSP patchset in this file, other existing CPU hotplug code should
be moved to this file as well. Apparently that's not this patchset's task.

Thanks.

-Fenghua
????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?

2012-10-16 19:40:49

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v9 03/12] x86, topology: Don't offline CPU0 if any PIC irq can not be migrated out of it

On Tue, Oct 16, 2012 at 06:43:36PM +0000, Yu, Fenghua wrote:
> arch/x86/kernel/cpu/topology.c doesn't contain CPU hotplug hooks.
> To place the BSP patchset in this file, other existing CPU hotplug
> code should be moved to this file as well. Apparently that's not this
> patchset's task.

Right, ok.

But looking at this a bit more, we (someone :)) should probably merge
those two files together since both deal with the cpu aspect of
topology...

Oh well.

--
Regards/Gruss,
Boris.

2012-10-16 22:12:31

by Fenghua Yu

[permalink] [raw]
Subject: RE: [PATCH v9 05/12] x86, hotplug, suspend: Online CPU0 for suspend or hibernate

> On 10/16/2012 09:47 PM, Rafael J. Wysocki wrote:
> > On Tuesday 16 of October 2012 11:05:18 Srivatsa S. Bhat wrote:
> >> On 10/16/2012 02:20 AM, Rafael J. Wysocki wrote:
> >>> On Friday 12 of October 2012 09:09:42 Fenghua Yu wrote:
> >>>> From: Fenghua Yu <[email protected]>
> >>>>
> >>>> +
> >>>> +/*
> >>>> + * When bsp_check() is called in hibernate and suspend, cpu
> hotplug
> >>>> + * is disabled already. So it's unnessary to handle race
> condition between
> >>>> + * cpumask query and cpu hotplug.
> >>>> + */
> >>>> +static int bsp_check(void)
> >>>> +{
> >>>> + if (cpumask_first(cpu_online_mask) != 0) {
> >>>> + pr_warn("CPU0 is offline.\n");
> >>>> + return -ENODEV;
> >>>> + }
> >>>> +
> >>>> + return 0;
> >>>> +}
> >>>> +
> >>>> +static int bsp_pm_callback(struct notifier_block *nb, unsigned
> long action,
> >>>> + void *ptr)
> >>>> +{
> >>>> + int ret = 0;
> >>>> +
> >>>> + switch (action) {
> >>>> + case PM_SUSPEND_PREPARE:
> >>>> + case PM_HIBERNATION_PREPARE:
> >>>> + ret = bsp_check();
> >>>> + break;
> >>>> + default:
> >>>> + break;
> >>>> + }
> >>>> + return notifier_from_errno(ret);
> >>>> +}
> >>>> +
> >>>
> >>> I wonder if there's anything preventing CPU0 from becoming offline
> after you've
> >>> done this check and before user space is frozen?
> >>>
> >>
> >> Hi Rafael,
> >>
> >> bsp_pm_callback runs as a low priority notifier callback,
> specifically with lower
> >> priority than the cpu_hotplug_pm_callback (as mentioned in the
> comment below).
> >> And cpu_hotplug_pm_callback disables regular CPU hotplug (till the
> suspend/resume
> >> sequence is complete).. So there is no chance for CPU0 to become
> offline after that.
> >>
> >> Or, are you thinking of some other scenario where CPU0 can go
> offline?
> >
> > No, that should be fine technically, but designs relying on notifier
> priority
> > for correctness are kind of fragile.
> >
>
> Hmm.. I agree.
>
> > Would it be possible to make cpu_hotplug_pm_callback() do the BSP
> online check?
> >
>
> Good idea! I think that could be done quite easily. And while doing
> that, it would
> be good to rethink what to do in patch 12/12 (Debug CPU0 hotplug) to
> fix the bug I
> pointed out in my other mail.

Why is this design relying on notifier priority fragile? I don't get it.

The priority in pm_notifier() is in a well designed API. This code just
follows the API to assign lower priority for bsp_pm_callback than
cpu_hotplug_pm_callback. Unless your justification is that the API of
pm_notifier() is fragile, I think this patch should be ok.

It's not a good idea to move bsp_pm_callback() into
cpu_hotplug_pm_callback(). There is no architecture specific hook to
call x86 bsp specific hotplug in cpu_hotplug_pm_callback(). Moving
bsp_pm_callback() into cpu_hotplug_pm_callback() is hacking while
this patch follows well defined API and has better coding.

Thanks.

-Fenghua
????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?

2012-10-17 00:06:16

by Fenghua Yu

[permalink] [raw]
Subject: RE: [PATCH v9 12/12] x86, topology: Debug CPU00 hotplug

> > +#ifdef CONFIG_DEBUG_HOTPLUG_CPU0
> + case PM_RESTORE_PREPARE:
> > + /*
> > + * When system resumes from hibernation, online CPU0
> because
> > + * 1. it's required for resume and
> > + * 2. the CPU was online before hibernation
> > + */
> > + if (!cpu_online(0))
> > + _debug_hotplug_cpu(0, 1);
>
> This won't work. CPU hotplug is disabled by cpu_hotplug_pm_callback()
> during
> PM_HIBERNATION_PREPARE and is enabled back only during
> PM_POST_HIBERNATION.
>
> So calls to cpu_up() or cpu_down() will fail in the meantime.

The code works just fine.

You are talking about hibernation/suspend phase. This piece of
code is all about restore phase. Of course you can call cpu_up()
and cpu_down() during restore phase.

So there is no problem here.

Thanks.

-Fenghua

2012-10-17 05:15:25

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH v9 05/12] x86, hotplug, suspend: Online CPU0 for suspend or hibernate

On Tuesday 16 of October 2012 22:12:27 Yu, Fenghua wrote:
> > On 10/16/2012 09:47 PM, Rafael J. Wysocki wrote:
> > > On Tuesday 16 of October 2012 11:05:18 Srivatsa S. Bhat wrote:
> > >> On 10/16/2012 02:20 AM, Rafael J. Wysocki wrote:
> > >>> On Friday 12 of October 2012 09:09:42 Fenghua Yu wrote:
> > >>>> From: Fenghua Yu <[email protected]>
> > >>>>
> > >>>> +
> > >>>> +/*
> > >>>> + * When bsp_check() is called in hibernate and suspend, cpu
> > hotplug
> > >>>> + * is disabled already. So it's unnessary to handle race
> > condition between
> > >>>> + * cpumask query and cpu hotplug.
> > >>>> + */
> > >>>> +static int bsp_check(void)
> > >>>> +{
> > >>>> + if (cpumask_first(cpu_online_mask) != 0) {
> > >>>> + pr_warn("CPU0 is offline.\n");
> > >>>> + return -ENODEV;
> > >>>> + }
> > >>>> +
> > >>>> + return 0;
> > >>>> +}
> > >>>> +
> > >>>> +static int bsp_pm_callback(struct notifier_block *nb, unsigned
> > long action,
> > >>>> + void *ptr)
> > >>>> +{
> > >>>> + int ret = 0;
> > >>>> +
> > >>>> + switch (action) {
> > >>>> + case PM_SUSPEND_PREPARE:
> > >>>> + case PM_HIBERNATION_PREPARE:
> > >>>> + ret = bsp_check();
> > >>>> + break;
> > >>>> + default:
> > >>>> + break;
> > >>>> + }
> > >>>> + return notifier_from_errno(ret);
> > >>>> +}
> > >>>> +
> > >>>
> > >>> I wonder if there's anything preventing CPU0 from becoming offline
> > after you've
> > >>> done this check and before user space is frozen?
> > >>>
> > >>
> > >> Hi Rafael,
> > >>
> > >> bsp_pm_callback runs as a low priority notifier callback,
> > specifically with lower
> > >> priority than the cpu_hotplug_pm_callback (as mentioned in the
> > comment below).
> > >> And cpu_hotplug_pm_callback disables regular CPU hotplug (till the
> > suspend/resume
> > >> sequence is complete).. So there is no chance for CPU0 to become
> > offline after that.
> > >>
> > >> Or, are you thinking of some other scenario where CPU0 can go
> > offline?
> > >
> > > No, that should be fine technically, but designs relying on notifier
> > priority
> > > for correctness are kind of fragile.
> > >
> >
> > Hmm.. I agree.
> >
> > > Would it be possible to make cpu_hotplug_pm_callback() do the BSP
> > online check?
> > >
> >
> > Good idea! I think that could be done quite easily. And while doing
> > that, it would
> > be good to rethink what to do in patch 12/12 (Debug CPU0 hotplug) to
> > fix the bug I
> > pointed out in my other mail.
>
> Why is this design relying on notifier priority fragile? I don't get it.

Because things like this are often overlooked by people trying to optimize
stuff and the fact that you have to add a comment explaining that dependency
only means that it is not exactly straightforward.

Moreover, you should also add a corresponding comment in the other notifier
indicating that its priority should be higher than the priority of the
new thing and explaining why.

> The priority in pm_notifier() is in a well designed API. This code just
> follows the API to assign lower priority for bsp_pm_callback than
> cpu_hotplug_pm_callback. Unless your justification is that the API of
> pm_notifier() is fragile, I think this patch should be ok.

I really think that notifiers are fragile in general and this particular
one is no exception.

> It's not a good idea to move bsp_pm_callback() into
> cpu_hotplug_pm_callback(). There is no architecture specific hook to
> call x86 bsp specific hotplug in cpu_hotplug_pm_callback(). Moving
> bsp_pm_callback() into cpu_hotplug_pm_callback() is hacking while
> this patch follows well defined API and has better coding.

I'm not sure it is better coding. You really want one piece of code
to always follow another piece of code and the best way to ensure
that is to execute them both explicitly in sequence.

Thanks,
Rafael


--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

2012-10-17 17:39:53

by Fenghua Yu

[permalink] [raw]
Subject: RE: [PATCH v9 05/12] x86, hotplug, suspend: Online CPU0 for suspend or hibernate

> > > On Tuesday, October 16, 2012 10:19 PM Rafael J. Wysocki wrote:
> On Tuesday 16 of October 2012 22:12:27 Yu, Fenghua wrote:
> > > On 10/16/2012 09:47 PM, Rafael J. Wysocki wrote:
> > > > On Tuesday 16 of October 2012 11:05:18 Srivatsa S. Bhat wrote:
> > > >> On 10/16/2012 02:20 AM, Rafael J. Wysocki wrote:
> > > >>> On Friday 12 of October 2012 09:09:42 Fenghua Yu wrote:
> > > >>>> From: Fenghua Yu <[email protected]>
> > > fix the bug I
> > > pointed out in my other mail.
> Because things like this are often overlooked by people trying to
> optimize
> stuff and the fact that you have to add a comment explaining that
> dependency
> only means that it is not exactly straightforward.

If people try to optimize pm notifier, they really need to know
pm_notifier() API and its priority. If they ignore the priority, they will
screw up kernel no matter how many comments in it.

>
> Moreover, you should also add a corresponding comment in the other
> notifier
> indicating that its priority should be higher than the priority of the
> new thing and explaining why.

The comment in the patch explains this very clearly. I don't think it's
necessary to add comments in other notifiers.

If adding comments in other notifiers each time when you add a new notifier,
will you add 10 more comments in all other notifiers if you add 10 new notifiers?

I would think when people try to change notifier priority, they should
know what they are going to do and search the notifiers and see the comments.

BTW, the other notifier is in generic code. Adding a paranoid comment in
it doesn't seem to be worth. The comment in this patch code is very clear
already.

>
> I really think that notifiers are fragile in general and this
> particular
> one is no exception.

If we think pm_notifier API is fragile, we need to fix the API instead of
leaving it there and not allowing people to use it because it's suspected
fragile.

It's simply not realistic to tell people not to use the API each
time in code review while in the meantime the API and its priority are staying
in kernel to allow people to use it.

>
> > It's not a good idea to move bsp_pm_callback() into
> > cpu_hotplug_pm_callback(). There is no architecture specific hook to
> > call x86 bsp specific hotplug in cpu_hotplug_pm_callback(). Moving
> > bsp_pm_callback() into cpu_hotplug_pm_callback() is hacking while
> > this patch follows well defined API and has better coding.
>
> I'm not sure it is better coding. You really want one piece of code
> to always follow another piece of code and the best way to ensure
> that is to execute them both explicitly in sequence.

Besides notifiers, there are other places relying on priority (e.g. initcall).
Again, if we think the priority in notifiers and other APIs are fragile,
why don't we fix the APIs?

The problem of calling bsp_pm_callback() in cpu_hotplug_pm_callback() is that
cpu_hotplug_pm_callback() is defined in generic kernel/cpu.c. This bsp hotplug
patchset is x86 arch specific code so far. Changing kernel/cpu.c to hook
arch specific code is more like hacking kernel compared to the systematic way
to do so in the patch.

Thanks.

-Fenghua
????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?

2012-10-17 18:29:16

by Fenghua Yu

[permalink] [raw]
Subject: RE: [PATCH v9 05/12] x86, hotplug, suspend: Online CPU0 for suspend or hibernate

> From: Srivatsa S. Bhat [mailto:[email protected]]
> On 10/16/2012 02:20 AM, Rafael J. Wysocki wrote:
> > On Friday 12 of October 2012 09:09:42 Fenghua Yu wrote:
> >> From: Fenghua Yu <[email protected]>
> >>
> >> Because x86 BIOS requires CPU0 to resume from sleep, suspend or
> hibernate can't
> >> be executed if CPU0 is detected offline. To make suspend or
> hibernate and
> >> further resume succeed, CPU0 must be online.
> >>
> >> Signed-off-by: Fenghua Yu <[email protected]>
> >> ---
> >> arch/x86/power/cpu.c | 44
> ++++++++++++++++++++++++++++++++++++++++++++
> >> 1 files changed, 44 insertions(+), 0 deletions(-)
> >>
> >> diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c
> >> index 218cdb1..adde775 100644
> >> --- a/arch/x86/power/cpu.c
> >> +++ b/arch/x86/power/cpu.c
> >> @@ -237,3 +237,47 @@ void restore_processor_state(void)
> >> #ifdef CONFIG_X86_32
> >> EXPORT_SYMBOL(restore_processor_state);
> >> #endif
> >> +
> >> +/*
> >> + * When bsp_check() is called in hibernate and suspend, cpu hotplug
> >> + * is disabled already. So it's unnessary to handle race condition
> between
> >> + * cpumask query and cpu hotplug.
> >> + */
> >> +static int bsp_check(void)
> >> +{
> >> + if (cpumask_first(cpu_online_mask) != 0) {
> >> + pr_warn("CPU0 is offline.\n");
> >> + return -ENODEV;
> >> + }
> >> +
> >> + return 0;
> >> +}
> >> +
> >> +static int bsp_pm_callback(struct notifier_block *nb, unsigned long
> action,
> >> + void *ptr)
> >> +{
> >> + int ret = 0;
> >> +
> >> + switch (action) {
> >> + case PM_SUSPEND_PREPARE:
> >> + case PM_HIBERNATION_PREPARE:
> >> + ret = bsp_check();
> >> + break;
> >> + default:
> >> + break;
> >> + }
> >> + return notifier_from_errno(ret);
> >> +}
> >> +
> >
> > I wonder if there's anything preventing CPU0 from becoming offline
> after you've
> > done this check and before user space is frozen?
> >
>
> Hi Rafael,
>
> bsp_pm_callback runs as a low priority notifier callback, specifically
> with lower
> priority than the cpu_hotplug_pm_callback (as mentioned in the comment
> below).
> And cpu_hotplug_pm_callback disables regular CPU hotplug (till the
> suspend/resume
> sequence is complete).. So there is no chance for CPU0 to become
> offline after that.

Bhat is right here. There is no functionality issue here.

bsp_check() is protected by cpu_hotplug_disable which is set before
bsp_pm_callback and cleared after it. There is no chance for CPU0
to be offline after that.

Thanks.

-Fenghua


????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?

2012-10-18 06:29:27

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH v9 05/12] x86, hotplug, suspend: Online CPU0 for suspend or hibernate

On Wednesday 17 of October 2012 17:39:48 Yu, Fenghua wrote:
> > > > On Tuesday, October 16, 2012 10:19 PM Rafael J. Wysocki wrote:
> > On Tuesday 16 of October 2012 22:12:27 Yu, Fenghua wrote:
> > > > On 10/16/2012 09:47 PM, Rafael J. Wysocki wrote:
> > > > > On Tuesday 16 of October 2012 11:05:18 Srivatsa S. Bhat wrote:
> > > > >> On 10/16/2012 02:20 AM, Rafael J. Wysocki wrote:
> > > > >>> On Friday 12 of October 2012 09:09:42 Fenghua Yu wrote:
> > > > >>>> From: Fenghua Yu <[email protected]>
> > > > fix the bug I
> > > > pointed out in my other mail.
> > Because things like this are often overlooked by people trying to
> > optimize
> > stuff and the fact that you have to add a comment explaining that
> > dependency
> > only means that it is not exactly straightforward.
>
> If people try to optimize pm notifier, they really need to know
> pm_notifier() API and its priority. If they ignore the priority, they will
> screw up kernel no matter how many comments in it.
>
> >
> > Moreover, you should also add a corresponding comment in the other
> > notifier
> > indicating that its priority should be higher than the priority of the
> > new thing and explaining why.
>
> The comment in the patch explains this very clearly. I don't think it's
> necessary to add comments in other notifiers.
>
> If adding comments in other notifiers each time when you add a new notifier,
> will you add 10 more comments in all other notifiers if you add 10 new notifiers?

Well, if there are strict ordering requirements regarding them, then those
requirements should be documented in both places. Otherwise, if somebody looks
at cpu_hotplug_pm_callback() alone, for example, he/she may not even realize
that it has to be strictly ordered with respect to bsp_pm_callback().

Of course, you can argue that people should know what they are doing, but
then reality is that it's quite easy to overlook things like that.

> I would think when people try to change notifier priority, they should
> know what they are going to do and search the notifiers and see the comments.
>
> BTW, the other notifier is in generic code. Adding a paranoid comment in
> it doesn't seem to be worth. The comment in this patch code is very clear
> already.

The problem is that it is generally difficult to find all subsystems that
have registered notifiers in a given chain and figuring out dependencies
between them is highly problematic. That's why I'm saying this is a fragile
design - because it is so easy to break accidentally (and that's already
happened in CPU hotplug, so I'm not making that up).

> > I really think that notifiers are fragile in general and this
> > particular
> > one is no exception.
>
> If we think pm_notifier API is fragile, we need to fix the API instead of
> leaving it there and not allowing people to use it because it's suspected
> fragile.

Because it is use by the existing code which generally works and may be
broken while attempting to "fix" the API.

> It's simply not realistic to tell people not to use the API each
> time in code review while in the meantime the API and its priority are staying
> in kernel to allow people to use it.

I don't quite agree. It sometimes is not worth changing code that's been
around for a while already, because it's been tested in multiple configurations
and changing it may cause things to break, although we may not like the APIs
used by that code. That doesn't necessarily mean, however, that adding new
code using those APIs is always a good idea.

In this particular case I just wondered if we could avoid adding more code
that would use an API having known problems. The answer seems to be that
it would cost some additional complexity that might not be worth it. This
is a good answer, but you might have given it to me directly. :-)

Thanks,
Rafael


--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

2012-10-18 06:43:58

by Srivatsa S. Bhat

[permalink] [raw]
Subject: Re: [PATCH v9 12/12] x86, topology: Debug CPU00 hotplug

On 10/17/2012 05:36 AM, Yu, Fenghua wrote:
>>> +#ifdef CONFIG_DEBUG_HOTPLUG_CPU0
> > + case PM_RESTORE_PREPARE:
>>> + /*
>>> + * When system resumes from hibernation, online CPU0
>> because
>>> + * 1. it's required for resume and
>>> + * 2. the CPU was online before hibernation
>>> + */
>>> + if (!cpu_online(0))
>>> + _debug_hotplug_cpu(0, 1);
>>
>> This won't work. CPU hotplug is disabled by cpu_hotplug_pm_callback()
>> during
>> PM_HIBERNATION_PREPARE and is enabled back only during
>> PM_POST_HIBERNATION.
>>
>> So calls to cpu_up() or cpu_down() will fail in the meantime.
>
> The code works just fine.
>
> You are talking about hibernation/suspend phase. This piece of
> code is all about restore phase. Of course you can call cpu_up()
> and cpu_down() during restore phase.
>
> So there is no problem here.
>

I looked into it again, and actually you are right, it'll work fine;
but the reason *why* it will work fine is quite subtle.

Note that there is a difference between calling cpu_up() or cpu_down()
vs calling _cpu_up() or _cpu_down(). The former set of calls will return
-EBUSY if the 'cpu_hotplug_disabled' flag is set. The latter set of calls
don't care about that flag and hence proceed irrespective of the state of
that flag. The former set of calls are used when you echo 0/1 to the online
file from userspace. Whereas the latter set of calls are used by the
disable/enable_nonboot_cpus() functions. This is how you can disable
cpu hotplug by userspace, but continue to do cpu hotplug yourself from
within the kernel in the suspend/hibernation/restore phases.

Now, coming to the notifications part, earlier, I was referring to the
entire hibernate->restore phase, not just the hibernate phase. The relevant
notifications that are sent out during this entire (successful) sequence are:

-> PM_HIBERNATION_PREPARE
-> PM_RESTORE_PREPARE
-> PM_POST_HIBERNATION

cpu_hotplug_pm_callback() will set the cpu_hotplug_disabled flag during
PM_HIBERNATION_PREPARE and clears it only during PM_POST_HIBERNATION.
So in-between, if you invoke cpu_up() or cpu_down(), it should return -EBUSY.

Then how is it that this patch can work fine? The answer is, during the
restore phase, you are not yet in the kernel-to-be-restored. IOW, that
cpu_hotplug_disabled flag is clear in that kernel. That's why cpu_up() and
cpu_down() will continue to work at that point. But once you resume execution
from the saved hibernation image, you won't be able to do cpu_up() and
cpu_down() until that flag is cleared by cpu_hotplug_pm_callback() during
PM_POST_HIBERNATION.

All in all, its tricky, and luckily this patch uses cpu_up()/cpu_down()
at the right moment...

I don't actually see why cpu_hotplug_pm_callback() should not clear the
cpu_hotplug_disabled flag during PM_RESTORE_PREPARE itself.. if we do that,
we don't have to be *this* careful about -when- we can invoke -which-
function..

Regards,
Srivatsa S. Bhat

2012-10-18 06:52:24

by Srivatsa S. Bhat

[permalink] [raw]
Subject: Re: [PATCH v9 05/12] x86, hotplug, suspend: Online CPU0 for suspend or hibernate

On 10/18/2012 12:03 PM, Rafael J. Wysocki wrote:
> On Wednesday 17 of October 2012 17:39:48 Yu, Fenghua wrote:
>>>>> On Tuesday, October 16, 2012 10:19 PM Rafael J. Wysocki wrote:
>>> On Tuesday 16 of October 2012 22:12:27 Yu, Fenghua wrote:
>>>>> On 10/16/2012 09:47 PM, Rafael J. Wysocki wrote:
>>>>>> On Tuesday 16 of October 2012 11:05:18 Srivatsa S. Bhat wrote:
>>>>>>> On 10/16/2012 02:20 AM, Rafael J. Wysocki wrote:
>>>>>>>> On Friday 12 of October 2012 09:09:42 Fenghua Yu wrote:
>>>>>>>>> From: Fenghua Yu <[email protected]>
>>>>> fix the bug I
>>>>> pointed out in my other mail.
>>> Because things like this are often overlooked by people trying to
>>> optimize
>>> stuff and the fact that you have to add a comment explaining that
>>> dependency
>>> only means that it is not exactly straightforward.
>>
>> If people try to optimize pm notifier, they really need to know
>> pm_notifier() API and its priority. If they ignore the priority, they will
>> screw up kernel no matter how many comments in it.
>>
>>>
>>> Moreover, you should also add a corresponding comment in the other
>>> notifier
>>> indicating that its priority should be higher than the priority of the
>>> new thing and explaining why.
>>
>> The comment in the patch explains this very clearly. I don't think it's
>> necessary to add comments in other notifiers.
>>
>> If adding comments in other notifiers each time when you add a new notifier,
>> will you add 10 more comments in all other notifiers if you add 10 new notifiers?
>
> Well, if there are strict ordering requirements regarding them, then those
> requirements should be documented in both places. Otherwise, if somebody looks
> at cpu_hotplug_pm_callback() alone, for example, he/she may not even realize
> that it has to be strictly ordered with respect to bsp_pm_callback().
>

Right, either we could put comments at both places or set up #defines for
their priorities in a common header file or some such and comment about it there,
like what is done in include/linux/cpu.h for example. But atleast one of these
*has* to be done; just commenting at one place is too risky for code-maintenance.

> Of course, you can argue that people should know what they are doing, but
> then reality is that it's quite easy to overlook things like that.
>

Yep.

>> I would think when people try to change notifier priority, they should
>> know what they are going to do and search the notifiers and see the comments.
>>
>> BTW, the other notifier is in generic code. Adding a paranoid comment in
>> it doesn't seem to be worth. The comment in this patch code is very clear
>> already.
>
> The problem is that it is generally difficult to find all subsystems that
> have registered notifiers in a given chain and figuring out dependencies
> between them is highly problematic. That's why I'm saying this is a fragile
> design - because it is so easy to break accidentally (and that's already
> happened in CPU hotplug, so I'm not making that up).

True..

Regards,
Srivatsa S. Bhat

>
>>> I really think that notifiers are fragile in general and this
>>> particular
>>> one is no exception.
>>
>> If we think pm_notifier API is fragile, we need to fix the API instead of
>> leaving it there and not allowing people to use it because it's suspected
>> fragile.
>
> Because it is use by the existing code which generally works and may be
> broken while attempting to "fix" the API.
>
>> It's simply not realistic to tell people not to use the API each
>> time in code review while in the meantime the API and its priority are staying
>> in kernel to allow people to use it.
>
> I don't quite agree. It sometimes is not worth changing code that's been
> around for a while already, because it's been tested in multiple configurations
> and changing it may cause things to break, although we may not like the APIs
> used by that code. That doesn't necessarily mean, however, that adding new
> code using those APIs is always a good idea.
>
> In this particular case I just wondered if we could avoid adding more code
> that would use an API having known problems. The answer seems to be that
> it would cost some additional complexity that might not be worth it. This
> is a good answer, but you might have given it to me directly. :-)
>

2012-10-19 01:07:55

by Fenghua Yu

[permalink] [raw]
Subject: RE: [PATCH v9 12/12] x86, topology: Debug CPU00 hotplug

> From: Srivatsa S. Bhat [mailto:[email protected]]
> Sent: Wednesday, October 17, 2012 11:43 PM
> On 10/17/2012 05:36 AM, Yu, Fenghua wrote:
> >>> +#ifdef CONFIG_DEBUG_HOTPLUG_CPU0
> > > + case PM_RESTORE_PREPARE:
> >>> + /*
> >>> + * When system resumes from hibernation, online CPU0
> >> because
> >>> + * 1. it's required for resume and
> >>> + * 2. the CPU was online before hibernation
> >>> + */
> >>> + if (!cpu_online(0))
> >>> + _debug_hotplug_cpu(0, 1);
> >>
> >> This won't work. CPU hotplug is disabled by cpu_hotplug_pm_callback()
> >> during
> >> PM_HIBERNATION_PREPARE and is enabled back only during
> >> PM_POST_HIBERNATION.
> >>
> >> So calls to cpu_up() or cpu_down() will fail in the meantime.
> >
> > The code works just fine.
> >
> > You are talking about hibernation/suspend phase. This piece of
> > code is all about restore phase. Of course you can call cpu_up()
> > and cpu_down() during restore phase.
> >
> > So there is no problem here.
> >
>
> I looked into it again, and actually you are right, it'll work fine;
> but the reason *why* it will work fine is quite subtle.
>
> Note that there is a difference between calling cpu_up() or cpu_down()
> vs calling _cpu_up() or _cpu_down(). The former set of calls will
> return
> -EBUSY if the 'cpu_hotplug_disabled' flag is set. The latter set of
> calls
> don't care about that flag and hence proceed irrespective of the state
> of
> that flag. The former set of calls are used when you echo 0/1 to the
> online
> file from userspace. Whereas the latter set of calls are used by the
> disable/enable_nonboot_cpus() functions. This is how you can disable
> cpu hotplug by userspace, but continue to do cpu hotplug yourself from
> within the kernel in the suspend/hibernation/restore phases.
>
> Now, coming to the notifications part, earlier, I was referring to the
> entire hibernate->restore phase, not just the hibernate phase. The
> relevant
> notifications that are sent out during this entire (successful)
> sequence are:
>
> -> PM_HIBERNATION_PREPARE
> -> PM_RESTORE_PREPARE
> -> PM_POST_HIBERNATION
>
> cpu_hotplug_pm_callback() will set the cpu_hotplug_disabled flag during
> PM_HIBERNATION_PREPARE and clears it only during PM_POST_HIBERNATION.
> So in-between, if you invoke cpu_up() or cpu_down(), it should return -
> EBUSY.
>
> Then how is it that this patch can work fine? The answer is, during the
> restore phase, you are not yet in the kernel-to-be-restored. IOW, that
> cpu_hotplug_disabled flag is clear in that kernel. That's why cpu_up()
> and
> cpu_down() will continue to work at that point. But once you resume
> execution
> from the saved hibernation image, you won't be able to do cpu_up() and
> cpu_down() until that flag is cleared by cpu_hotplug_pm_callback()
> during
> PM_POST_HIBERNATION.
>
> All in all, its tricky, and luckily this patch uses cpu_up()/cpu_down()
> at the right moment...
>
> I don't actually see why cpu_hotplug_pm_callback() should not clear the
> cpu_hotplug_disabled flag during PM_RESTORE_PREPARE itself.. if we do
> that,
> we don't have to be *this* careful about -when- we can invoke -which-
> function..

Yes, this is tricky because we have to offline CPU0 in debug mode and online
CPU0 in resume. So we need to do online and offline in this short period
of boot and resume.

I explained this tick in the comment:
+ * When a resume really happens, this code won't be called.
+ *
+ * This code is called only when user space hibernation software
+ * prepares for snapshot device during boot time. So we just
+ * call _debug_hotplug_cpu() to restore to CPU0's state prior to
+ * preparing the snapshot device.
......

Thanks.

-Fenghua