2013-04-24 11:19:15

by Robin Holt

[permalink] [raw]
Subject: [PATCH -v6 0/5] Shutdown from reboot_cpuid without stopping other cpus.

We recently noticed that reboot of a 1024 cpu machine takes approx 16
minutes of just stopping the cpus. The slowdown was tracked to commit
f96972f.

The current implementation does all the work of hot removing the cpus
before halting the system. We are switching to just migrating to the
reboot_cpu and then continuing with shutdown/reboot.

The patch set is broken into five parts. The first two are planned for
the stable release. The others move the halt/shutdown/reboot related
functions to their own kernel/reboot.c file and then introduce the kernel
boot parameter.


Changes since -v5.
- Moved the arch/x86 reboot= up to the generic kernel code.

Changes since -v4.
- Integrated Srivatsa S. Bhat creating cpu_hotplug_disable()
function

- Integrated comments by Srivatsa S. Bhat.

- Made one more comment consistent with others in function.

Changes since -v3.
- Added a tested-by for the original reporter.

- Fix compile failure found by Joe Perches.

- Integrated comments by Joe Perches.


2013-04-24 11:19:30

by Robin Holt

[permalink] [raw]
Subject: [PATCH -v6 1/5] CPU hotplug: Provide a generic helper to disable/enable CPU hotplug

From: "Srivatsa S. Bhat" <[email protected]>

There are instances in the kernel where we would like to disable
CPU hotplug (from sysfs) during some important operation. Today
the freezer code depends on this and the code to do it was kinda
tailor-made for that.

Restructure the code and make it generic enough to be useful for
other usecases too.

Signed-off-by: Srivatsa S. Bhat <[email protected]>
Signed-off-by: Robin Holt <[email protected]>
To: Andrew Morton <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Robin Holt <[email protected]>
Cc: Russ Anderson <[email protected]>
Cc: Linux Kernel Mailing List <[email protected]>
Cc: the arch/x86 maintainers <[email protected]>
---
include/linux/cpu.h | 2 ++
kernel/cpu.c | 27 +++++++++------------------
2 files changed, 11 insertions(+), 18 deletions(-)

diff --git a/include/linux/cpu.h b/include/linux/cpu.h
index ce7a074..73282d4 100644
--- a/include/linux/cpu.h
+++ b/include/linux/cpu.h
@@ -141,6 +141,8 @@ static inline void unregister_cpu_notifier(struct notifier_block *nb)
}
#endif

+extern void cpu_hotplug_disable(void);
+extern void cpu_hotplug_enable(void);
int cpu_up(unsigned int cpu);
void notify_cpu_starting(unsigned int cpu);
extern void cpu_maps_update_begin(void);
diff --git a/kernel/cpu.c b/kernel/cpu.c
index b5e4ab2..28769f5 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -541,29 +541,20 @@ static int __init alloc_frozen_cpus(void)
core_initcall(alloc_frozen_cpus);

/*
- * Prevent regular CPU hotplug from racing with the freezer, by disabling CPU
- * hotplug when tasks are about to be frozen. Also, don't allow the freezer
- * to continue until any currently running CPU hotplug operation gets
- * completed.
- * To modify the 'cpu_hotplug_disabled' flag, we need to acquire the
- * 'cpu_add_remove_lock'. And this same lock is also taken by the regular
- * CPU hotplug path and released only after it is complete. Thus, we
- * (and hence the freezer) will block here until any currently running CPU
- * hotplug operation gets completed.
+ * Wait for currently running CPU hotplug operations to complete (if any) and
+ * disable future CPU hotplug (from sysfs). The 'cpu_add_remove_lock' protects
+ * the 'cpu_hotplug_disabled' flag. The same lock is also acquired by the
+ * hotplug path before performing hotplug operations. So acquiring that lock
+ * guarantees mutual exclusion from any currently running hotplug operations.
*/
-void cpu_hotplug_disable_before_freeze(void)
+void cpu_hotplug_disable(void)
{
cpu_maps_update_begin();
cpu_hotplug_disabled = 1;
cpu_maps_update_done();
}

-
-/*
- * When tasks have been thawed, re-enable regular CPU hotplug (which had been
- * disabled while beginning to freeze tasks).
- */
-void cpu_hotplug_enable_after_thaw(void)
+void cpu_hotplug_enable(void)
{
cpu_maps_update_begin();
cpu_hotplug_disabled = 0;
@@ -589,12 +580,12 @@ cpu_hotplug_pm_callback(struct notifier_block *nb,

case PM_SUSPEND_PREPARE:
case PM_HIBERNATION_PREPARE:
- cpu_hotplug_disable_before_freeze();
+ cpu_hotplug_disable();
break;

case PM_POST_SUSPEND:
case PM_POST_HIBERNATION:
- cpu_hotplug_enable_after_thaw();
+ cpu_hotplug_enable();
break;

default:
--
1.8.1.2

2013-04-24 11:19:34

by Robin Holt

[permalink] [raw]
Subject: [PATCH -v6 3/5] Move shutdown/reboot related functions to kernel/reboot.c

This patch is preparatory. It moves reboot related syscall, etc
functions from kernel/sys.c to kernel/reboot.c.

Signed-off-by: Robin Holt <[email protected]>
To: Andrew Morton <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Robin Holt <[email protected]>
Cc: Russ Anderson <[email protected]>
Cc: Linux Kernel Mailing List <[email protected]>
Cc: the arch/x86 maintainers <[email protected]>
---
kernel/Makefile | 2 +-
kernel/reboot.c | 331 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
kernel/sys.c | 317 -----------------------------------------------------
3 files changed, 332 insertions(+), 318 deletions(-)
create mode 100644 kernel/reboot.c

diff --git a/kernel/Makefile b/kernel/Makefile
index bbde5f1..5c2cef7 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -9,7 +9,7 @@ obj-y = fork.o exec_domain.o panic.o printk.o \
rcupdate.o extable.o params.o posix-timers.o \
kthread.o wait.o sys_ni.o posix-cpu-timers.o mutex.o \
hrtimer.o rwsem.o nsproxy.o srcu.o semaphore.o \
- notifier.o ksysfs.o cred.o \
+ notifier.o ksysfs.o cred.o reboot.o \
async.o range.o groups.o lglock.o smpboot.o

ifdef CONFIG_FUNCTION_TRACER
diff --git a/kernel/reboot.c b/kernel/reboot.c
new file mode 100644
index 0000000..ff92601
--- /dev/null
+++ b/kernel/reboot.c
@@ -0,0 +1,331 @@
+/*
+ * linux/kernel/reboot.c
+ *
+ * Copyright (C) 2013 Linus Torvalds
+ */
+
+#include <linux/export.h>
+#include <linux/kexec.h>
+#include <linux/kmod.h>
+#include <linux/kmsg_dump.h>
+#include <linux/reboot.h>
+#include <linux/suspend.h>
+#include <linux/syscalls.h>
+#include <linux/syscore_ops.h>
+
+/**
+ * emergency_restart - reboot the system
+ *
+ * Without shutting down any hardware or taking any locks
+ * reboot the system. This is called when we know we are in
+ * trouble so this is our best effort to reboot. This is
+ * safe to call in interrupt context.
+ */
+void emergency_restart(void)
+{
+ kmsg_dump(KMSG_DUMP_EMERG);
+ machine_emergency_restart();
+}
+EXPORT_SYMBOL_GPL(emergency_restart);
+
+void kernel_restart_prepare(char *cmd)
+{
+ blocking_notifier_call_chain(&reboot_notifier_list, SYS_RESTART, cmd);
+ system_state = SYSTEM_RESTART;
+ usermodehelper_disable();
+ device_shutdown();
+}
+
+/**
+ * register_reboot_notifier - Register function to be called at reboot time
+ * @nb: Info about notifier function to be called
+ *
+ * Registers a function with the list of functions
+ * to be called at reboot time.
+ *
+ * Currently always returns zero, as blocking_notifier_chain_register()
+ * always returns zero.
+ */
+int register_reboot_notifier(struct notifier_block *nb)
+{
+ return blocking_notifier_chain_register(&reboot_notifier_list, nb);
+}
+EXPORT_SYMBOL(register_reboot_notifier);
+
+/**
+ * unregister_reboot_notifier - Unregister previously registered reboot notifier
+ * @nb: Hook to be unregistered
+ *
+ * Unregisters a previously registered reboot
+ * notifier function.
+ *
+ * Returns zero on success, or %-ENOENT on failure.
+ */
+int unregister_reboot_notifier(struct notifier_block *nb)
+{
+ return blocking_notifier_chain_unregister(&reboot_notifier_list, nb);
+}
+EXPORT_SYMBOL(unregister_reboot_notifier);
+
+static void migrate_to_reboot_cpu(void)
+{
+ /* The boot cpu is always logical cpu 0 */
+ int reboot_cpu_id = 0;
+
+ cpu_hotplug_disable();
+
+ /* Make certain the cpu I'm about to reboot on is online */
+ if (!cpu_online(reboot_cpu_id))
+ reboot_cpu_id = cpumask_first(cpu_online_mask);
+
+ /* Prevent races with other tasks migrating this task */
+ current->flags |= PF_THREAD_BOUND;
+
+ /* Make certain I only run on the appropriate processor */
+ set_cpus_allowed_ptr(current, cpumask_of(reboot_cpu_id));
+}
+
+/**
+ * kernel_restart - reboot the system
+ * @cmd: pointer to buffer containing command to execute for restart
+ * or %NULL
+ *
+ * Shutdown everything and perform a clean reboot.
+ * This is not safe to call in interrupt context.
+ */
+void kernel_restart(char *cmd)
+{
+ kernel_restart_prepare(cmd);
+ migrate_to_reboot_cpu();
+ syscore_shutdown();
+ if (!cmd)
+ printk(KERN_EMERG "Restarting system.\n");
+ else
+ printk(KERN_EMERG "Restarting system with command '%s'.\n", cmd);
+ kmsg_dump(KMSG_DUMP_RESTART);
+ machine_restart(cmd);
+}
+EXPORT_SYMBOL_GPL(kernel_restart);
+
+static void kernel_shutdown_prepare(enum system_states state)
+{
+ blocking_notifier_call_chain(&reboot_notifier_list,
+ (state == SYSTEM_HALT)?SYS_HALT:SYS_POWER_OFF, NULL);
+ system_state = state;
+ usermodehelper_disable();
+ device_shutdown();
+}
+/**
+ * kernel_halt - halt the system
+ *
+ * Shutdown everything and perform a clean system halt.
+ */
+void kernel_halt(void)
+{
+ kernel_shutdown_prepare(SYSTEM_HALT);
+ migrate_to_reboot_cpu();
+ syscore_shutdown();
+ printk(KERN_EMERG "System halted.\n");
+ kmsg_dump(KMSG_DUMP_HALT);
+ machine_halt();
+}
+
+EXPORT_SYMBOL_GPL(kernel_halt);
+
+/**
+ * kernel_power_off - power_off the system
+ *
+ * Shutdown everything and perform a clean system power_off.
+ */
+void kernel_power_off(void)
+{
+ kernel_shutdown_prepare(SYSTEM_POWER_OFF);
+ if (pm_power_off_prepare)
+ pm_power_off_prepare();
+ migrate_to_reboot_cpu();
+ syscore_shutdown();
+ printk(KERN_EMERG "Power down.\n");
+ kmsg_dump(KMSG_DUMP_POWEROFF);
+ machine_power_off();
+}
+EXPORT_SYMBOL_GPL(kernel_power_off);
+
+static DEFINE_MUTEX(reboot_mutex);
+
+/*
+ * Reboot system call: for obvious reasons only root may call it,
+ * and even root needs to set up some magic numbers in the registers
+ * so that some mistake won't make this reboot the whole machine.
+ * You can also set the meaning of the ctrl-alt-del-key here.
+ *
+ * reboot doesn't sync: do that yourself before calling this.
+ */
+SYSCALL_DEFINE4(reboot, int, magic1, int, magic2, unsigned int, cmd,
+ void __user *, arg)
+{
+ struct pid_namespace *pid_ns = task_active_pid_ns(current);
+ char buffer[256];
+ int ret = 0;
+
+ /* We only trust the superuser with rebooting the system. */
+ if (!ns_capable(pid_ns->user_ns, CAP_SYS_BOOT))
+ return -EPERM;
+
+ /* For safety, we require "magic" arguments. */
+ if (magic1 != LINUX_REBOOT_MAGIC1 ||
+ (magic2 != LINUX_REBOOT_MAGIC2 &&
+ magic2 != LINUX_REBOOT_MAGIC2A &&
+ magic2 != LINUX_REBOOT_MAGIC2B &&
+ magic2 != LINUX_REBOOT_MAGIC2C))
+ return -EINVAL;
+
+ /*
+ * If pid namespaces are enabled and the current task is in a child
+ * pid_namespace, the command is handled by reboot_pid_ns() which will
+ * call do_exit().
+ */
+ ret = reboot_pid_ns(pid_ns, cmd);
+ if (ret)
+ return ret;
+
+ /* Instead of trying to make the power_off code look like
+ * halt when pm_power_off is not set do it the easy way.
+ */
+ if ((cmd == LINUX_REBOOT_CMD_POWER_OFF) && !pm_power_off)
+ cmd = LINUX_REBOOT_CMD_HALT;
+
+ mutex_lock(&reboot_mutex);
+ switch (cmd) {
+ case LINUX_REBOOT_CMD_RESTART:
+ kernel_restart(NULL);
+ break;
+
+ case LINUX_REBOOT_CMD_CAD_ON:
+ C_A_D = 1;
+ break;
+
+ case LINUX_REBOOT_CMD_CAD_OFF:
+ C_A_D = 0;
+ break;
+
+ case LINUX_REBOOT_CMD_HALT:
+ kernel_halt();
+ do_exit(0);
+ panic("cannot halt");
+
+ case LINUX_REBOOT_CMD_POWER_OFF:
+ kernel_power_off();
+ do_exit(0);
+ break;
+
+ case LINUX_REBOOT_CMD_RESTART2:
+ if (strncpy_from_user(&buffer[0], arg, sizeof(buffer) - 1) < 0) {
+ ret = -EFAULT;
+ break;
+ }
+ buffer[sizeof(buffer) - 1] = '\0';
+
+ kernel_restart(buffer);
+ break;
+
+#ifdef CONFIG_KEXEC
+ case LINUX_REBOOT_CMD_KEXEC:
+ ret = kernel_kexec();
+ break;
+#endif
+
+#ifdef CONFIG_HIBERNATION
+ case LINUX_REBOOT_CMD_SW_SUSPEND:
+ ret = hibernate();
+ break;
+#endif
+
+ default:
+ ret = -EINVAL;
+ break;
+ }
+ mutex_unlock(&reboot_mutex);
+ return ret;
+}
+
+static void deferred_cad(struct work_struct *dummy)
+{
+ kernel_restart(NULL);
+}
+
+/*
+ * This function gets called by ctrl-alt-del - ie the keyboard interrupt.
+ * As it's called within an interrupt, it may NOT sync: the only choice
+ * is whether to reboot at once, or just ignore the ctrl-alt-del.
+ */
+void ctrl_alt_del(void)
+{
+ static DECLARE_WORK(cad_work, deferred_cad);
+
+ if (C_A_D)
+ schedule_work(&cad_work);
+ else
+ kill_cad_pid(SIGINT, 1);
+}
+
+char poweroff_cmd[POWEROFF_CMD_PATH_LEN] = "/sbin/poweroff";
+
+static int __orderly_poweroff(bool force)
+{
+ char **argv;
+ static char *envp[] = {
+ "HOME=/",
+ "PATH=/sbin:/bin:/usr/sbin:/usr/bin",
+ NULL
+ };
+ int ret;
+
+ argv = argv_split(GFP_KERNEL, poweroff_cmd, NULL);
+ if (argv) {
+ ret = call_usermodehelper(argv[0], argv, envp, UMH_WAIT_EXEC);
+ argv_free(argv);
+ } else {
+ printk(KERN_WARNING "%s failed to allocate memory for \"%s\"\n",
+ __func__, poweroff_cmd);
+ ret = -ENOMEM;
+ }
+
+ if (ret && force) {
+ printk(KERN_WARNING "Failed to start orderly shutdown: "
+ "forcing the issue\n");
+ /*
+ * I guess this should try to kick off some daemon to sync and
+ * poweroff asap. Or not even bother syncing if we're doing an
+ * emergency shutdown?
+ */
+ emergency_sync();
+ kernel_power_off();
+ }
+
+ return ret;
+}
+
+static bool poweroff_force;
+
+static void poweroff_work_func(struct work_struct *work)
+{
+ __orderly_poweroff(poweroff_force);
+}
+
+static DECLARE_WORK(poweroff_work, poweroff_work_func);
+
+/**
+ * orderly_poweroff - Trigger an orderly system poweroff
+ * @force: force poweroff if command execution fails
+ *
+ * This may be called from any context to trigger a system shutdown.
+ * If the orderly shutdown fails, it will force an immediate shutdown.
+ */
+int orderly_poweroff(bool force)
+{
+ if (force) /* do not override the pending "true" */
+ poweroff_force = true;
+ schedule_work(&poweroff_work);
+ return 0;
+}
+EXPORT_SYMBOL_GPL(orderly_poweroff);
diff --git a/kernel/sys.c b/kernel/sys.c
index 232214e..76b9a93 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -303,261 +303,6 @@ out_unlock:
return retval;
}

-/**
- * emergency_restart - reboot the system
- *
- * Without shutting down any hardware or taking any locks
- * reboot the system. This is called when we know we are in
- * trouble so this is our best effort to reboot. This is
- * safe to call in interrupt context.
- */
-void emergency_restart(void)
-{
- kmsg_dump(KMSG_DUMP_EMERG);
- machine_emergency_restart();
-}
-EXPORT_SYMBOL_GPL(emergency_restart);
-
-void kernel_restart_prepare(char *cmd)
-{
- blocking_notifier_call_chain(&reboot_notifier_list, SYS_RESTART, cmd);
- system_state = SYSTEM_RESTART;
- usermodehelper_disable();
- device_shutdown();
-}
-
-/**
- * register_reboot_notifier - Register function to be called at reboot time
- * @nb: Info about notifier function to be called
- *
- * Registers a function with the list of functions
- * to be called at reboot time.
- *
- * Currently always returns zero, as blocking_notifier_chain_register()
- * always returns zero.
- */
-int register_reboot_notifier(struct notifier_block *nb)
-{
- return blocking_notifier_chain_register(&reboot_notifier_list, nb);
-}
-EXPORT_SYMBOL(register_reboot_notifier);
-
-/**
- * unregister_reboot_notifier - Unregister previously registered reboot notifier
- * @nb: Hook to be unregistered
- *
- * Unregisters a previously registered reboot
- * notifier function.
- *
- * Returns zero on success, or %-ENOENT on failure.
- */
-int unregister_reboot_notifier(struct notifier_block *nb)
-{
- return blocking_notifier_chain_unregister(&reboot_notifier_list, nb);
-}
-EXPORT_SYMBOL(unregister_reboot_notifier);
-
-static void migrate_to_reboot_cpu(void)
-{
- /* The boot cpu is always logical cpu 0 */
- int reboot_cpu_id = 0;
-
- cpu_hotplug_disable();
-
- /* Make certain the cpu I'm about to reboot on is online */
- if (!cpu_online(reboot_cpu_id))
- reboot_cpu_id = cpumask_first(cpu_online_mask);
-
- /* Prevent races with other tasks migrating this task */
- current->flags |= PF_THREAD_BOUND;
-
- /* Make certain I only run on the appropriate processor */
- set_cpus_allowed_ptr(current, cpumask_of(reboot_cpu_id));
-}
-
-/**
- * kernel_restart - reboot the system
- * @cmd: pointer to buffer containing command to execute for restart
- * or %NULL
- *
- * Shutdown everything and perform a clean reboot.
- * This is not safe to call in interrupt context.
- */
-void kernel_restart(char *cmd)
-{
- kernel_restart_prepare(cmd);
- migrate_to_reboot_cpu();
- syscore_shutdown();
- if (!cmd)
- printk(KERN_EMERG "Restarting system.\n");
- else
- printk(KERN_EMERG "Restarting system with command '%s'.\n", cmd);
- kmsg_dump(KMSG_DUMP_RESTART);
- machine_restart(cmd);
-}
-EXPORT_SYMBOL_GPL(kernel_restart);
-
-static void kernel_shutdown_prepare(enum system_states state)
-{
- blocking_notifier_call_chain(&reboot_notifier_list,
- (state == SYSTEM_HALT)?SYS_HALT:SYS_POWER_OFF, NULL);
- system_state = state;
- usermodehelper_disable();
- device_shutdown();
-}
-/**
- * kernel_halt - halt the system
- *
- * Shutdown everything and perform a clean system halt.
- */
-void kernel_halt(void)
-{
- kernel_shutdown_prepare(SYSTEM_HALT);
- migrate_to_reboot_cpu();
- syscore_shutdown();
- printk(KERN_EMERG "System halted.\n");
- kmsg_dump(KMSG_DUMP_HALT);
- machine_halt();
-}
-
-EXPORT_SYMBOL_GPL(kernel_halt);
-
-/**
- * kernel_power_off - power_off the system
- *
- * Shutdown everything and perform a clean system power_off.
- */
-void kernel_power_off(void)
-{
- kernel_shutdown_prepare(SYSTEM_POWER_OFF);
- if (pm_power_off_prepare)
- pm_power_off_prepare();
- migrate_to_reboot_cpu();
- syscore_shutdown();
- printk(KERN_EMERG "Power down.\n");
- kmsg_dump(KMSG_DUMP_POWEROFF);
- machine_power_off();
-}
-EXPORT_SYMBOL_GPL(kernel_power_off);
-
-static DEFINE_MUTEX(reboot_mutex);
-
-/*
- * Reboot system call: for obvious reasons only root may call it,
- * and even root needs to set up some magic numbers in the registers
- * so that some mistake won't make this reboot the whole machine.
- * You can also set the meaning of the ctrl-alt-del-key here.
- *
- * reboot doesn't sync: do that yourself before calling this.
- */
-SYSCALL_DEFINE4(reboot, int, magic1, int, magic2, unsigned int, cmd,
- void __user *, arg)
-{
- struct pid_namespace *pid_ns = task_active_pid_ns(current);
- char buffer[256];
- int ret = 0;
-
- /* We only trust the superuser with rebooting the system. */
- if (!ns_capable(pid_ns->user_ns, CAP_SYS_BOOT))
- return -EPERM;
-
- /* For safety, we require "magic" arguments. */
- if (magic1 != LINUX_REBOOT_MAGIC1 ||
- (magic2 != LINUX_REBOOT_MAGIC2 &&
- magic2 != LINUX_REBOOT_MAGIC2A &&
- magic2 != LINUX_REBOOT_MAGIC2B &&
- magic2 != LINUX_REBOOT_MAGIC2C))
- return -EINVAL;
-
- /*
- * If pid namespaces are enabled and the current task is in a child
- * pid_namespace, the command is handled by reboot_pid_ns() which will
- * call do_exit().
- */
- ret = reboot_pid_ns(pid_ns, cmd);
- if (ret)
- return ret;
-
- /* Instead of trying to make the power_off code look like
- * halt when pm_power_off is not set do it the easy way.
- */
- if ((cmd == LINUX_REBOOT_CMD_POWER_OFF) && !pm_power_off)
- cmd = LINUX_REBOOT_CMD_HALT;
-
- mutex_lock(&reboot_mutex);
- switch (cmd) {
- case LINUX_REBOOT_CMD_RESTART:
- kernel_restart(NULL);
- break;
-
- case LINUX_REBOOT_CMD_CAD_ON:
- C_A_D = 1;
- break;
-
- case LINUX_REBOOT_CMD_CAD_OFF:
- C_A_D = 0;
- break;
-
- case LINUX_REBOOT_CMD_HALT:
- kernel_halt();
- do_exit(0);
- panic("cannot halt");
-
- case LINUX_REBOOT_CMD_POWER_OFF:
- kernel_power_off();
- do_exit(0);
- break;
-
- case LINUX_REBOOT_CMD_RESTART2:
- if (strncpy_from_user(&buffer[0], arg, sizeof(buffer) - 1) < 0) {
- ret = -EFAULT;
- break;
- }
- buffer[sizeof(buffer) - 1] = '\0';
-
- kernel_restart(buffer);
- break;
-
-#ifdef CONFIG_KEXEC
- case LINUX_REBOOT_CMD_KEXEC:
- ret = kernel_kexec();
- break;
-#endif
-
-#ifdef CONFIG_HIBERNATION
- case LINUX_REBOOT_CMD_SW_SUSPEND:
- ret = hibernate();
- break;
-#endif
-
- default:
- ret = -EINVAL;
- break;
- }
- mutex_unlock(&reboot_mutex);
- return ret;
-}
-
-static void deferred_cad(struct work_struct *dummy)
-{
- kernel_restart(NULL);
-}
-
-/*
- * This function gets called by ctrl-alt-del - ie the keyboard interrupt.
- * As it's called within an interrupt, it may NOT sync: the only choice
- * is whether to reboot at once, or just ignore the ctrl-alt-del.
- */
-void ctrl_alt_del(void)
-{
- static DECLARE_WORK(cad_work, deferred_cad);
-
- if (C_A_D)
- schedule_work(&cad_work);
- else
- kill_cad_pid(SIGINT, 1);
-}
-
/*
* Unprivileged users may change the real gid to the effective gid
* or vice versa. (BSD-style)
@@ -2201,65 +1946,3 @@ SYSCALL_DEFINE3(getcpu, unsigned __user *, cpup, unsigned __user *, nodep,
err |= put_user(cpu_to_node(cpu), nodep);
return err ? -EFAULT : 0;
}
-
-char poweroff_cmd[POWEROFF_CMD_PATH_LEN] = "/sbin/poweroff";
-
-static int __orderly_poweroff(bool force)
-{
- char **argv;
- static char *envp[] = {
- "HOME=/",
- "PATH=/sbin:/bin:/usr/sbin:/usr/bin",
- NULL
- };
- int ret;
-
- argv = argv_split(GFP_KERNEL, poweroff_cmd, NULL);
- if (argv) {
- ret = call_usermodehelper(argv[0], argv, envp, UMH_WAIT_EXEC);
- argv_free(argv);
- } else {
- printk(KERN_WARNING "%s failed to allocate memory for \"%s\"\n",
- __func__, poweroff_cmd);
- ret = -ENOMEM;
- }
-
- if (ret && force) {
- printk(KERN_WARNING "Failed to start orderly shutdown: "
- "forcing the issue\n");
- /*
- * I guess this should try to kick off some daemon to sync and
- * poweroff asap. Or not even bother syncing if we're doing an
- * emergency shutdown?
- */
- emergency_sync();
- kernel_power_off();
- }
-
- return ret;
-}
-
-static bool poweroff_force;
-
-static void poweroff_work_func(struct work_struct *work)
-{
- __orderly_poweroff(poweroff_force);
-}
-
-static DECLARE_WORK(poweroff_work, poweroff_work_func);
-
-/**
- * orderly_poweroff - Trigger an orderly system poweroff
- * @force: force poweroff if command execution fails
- *
- * This may be called from any context to trigger a system shutdown.
- * If the orderly shutdown fails, it will force an immediate shutdown.
- */
-int orderly_poweroff(bool force)
-{
- if (force) /* do not override the pending "true" */
- poweroff_force = true;
- schedule_work(&poweroff_work);
- return 0;
-}
-EXPORT_SYMBOL_GPL(orderly_poweroff);
--
1.8.1.2

2013-04-24 11:19:38

by Robin Holt

[permalink] [raw]
Subject: [PATCH -v6 5/5] Move arch/x86 reboot= handling to generic kernel.

Since arch/x86 has the most extensive reboot= handling and it appears to
have no effect on the remaining architectures. This patch moves that
reboot= handling from arch/x86 to the generic kernel.

Signed-off-by: Robin Holt <[email protected]>
To: Andrew Morton <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Robin Holt <[email protected]>
Cc: Russ Anderson <[email protected]>
Cc: Linux Kernel Mailing List <[email protected]>
Cc: the arch/x86 maintainers <[email protected]>
---
Documentation/kernel-parameters.txt | 14 ++++-
arch/x86/kernel/reboot.c | 113 +++---------------------------------
kernel/reboot.c | 96 ++++++++++++++++++++++++++++--
3 files changed, 109 insertions(+), 114 deletions(-)

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 8ccbf27..b57d492 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -2616,9 +2616,17 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
Run specified binary instead of /init from the ramdisk,
used for early userspace startup. See initrd.

- reboot= [BUGS=X86-32,BUGS=ARM,BUGS=IA-64] Rebooting mode
- Format: <reboot_mode>[,<reboot_mode2>[,...]]
- See arch/*/kernel/reboot.c or arch/*/kernel/process.c
+ reboot= [KNL]
+ Format:
+ [w[arm] | c[old]] \
+ [,b[ios] | a[cpi] | k[bd] | t[riple] | e[fi] | p[ci]] \
+ [,f[orce] \
+ [,] s[mp]####
+ Where reboot_mode is one of warm or cold,
+ reboot_type is one of bios, acpi, kbd, triple, efi, or pci,
+ reboot_force is either force or not specified,
+ and reboot_cpu is s[mp]#### with #### being the
+ processor to be used for rebooting.

relax_domain_level=
[KNL, SMP] Set scheduler's default relax_domain_level.
diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
index 76fa1e9..f9e8bf4 100644
--- a/arch/x86/kernel/reboot.c
+++ b/arch/x86/kernel/reboot.c
@@ -36,22 +36,11 @@ void (*pm_power_off)(void);
EXPORT_SYMBOL(pm_power_off);

static const struct desc_ptr no_idt = {};
-static int reboot_mode;
-enum reboot_type reboot_type = BOOT_ACPI;
-int reboot_force;
+extern int reboot_mode;
+extern enum reboot_type reboot_type;
+extern int reboot_force;

-/*
- * This variable is used privately to keep track of whether or not
- * reboot_type is still set to its default value (i.e., reboot= hasn't
- * been set on the command line). This is needed so that we can
- * suppress DMI scanning for reboot quirks. Without it, it's
- * impossible to override a faulty reboot quirk without recompiling.
- */
-static int reboot_default = 1;
-
-#ifdef CONFIG_SMP
-static int reboot_cpu = -1;
-#endif
+extern int reboot_default;

/*
* This is set if we need to go through the 'emergency' path.
@@ -63,78 +52,6 @@ static int reboot_emergency;
/* This is set by the PCI code if either type 1 or type 2 PCI is detected */
bool port_cf9_safe = false;

-/*
- * reboot=b[ios] | s[mp] | t[riple] | k[bd] | e[fi] [, [w]arm | [c]old] | p[ci]
- * warm Don't set the cold reboot flag
- * cold Set the cold reboot flag
- * bios Reboot by jumping through the BIOS
- * smp Reboot by executing reset on BSP or other CPU
- * triple Force a triple fault (init)
- * kbd Use the keyboard controller. cold reset (default)
- * acpi Use the RESET_REG in the FADT
- * efi Use efi reset_system runtime service
- * pci Use the so-called "PCI reset register", CF9
- * force Avoid anything that could hang.
- */
-static int __init reboot_setup(char *str)
-{
- for (;;) {
- /*
- * Having anything passed on the command line via
- * reboot= will cause us to disable DMI checking
- * below.
- */
- reboot_default = 0;
-
- switch (*str) {
- case 'w':
- reboot_mode = 0x1234;
- break;
-
- case 'c':
- reboot_mode = 0;
- break;
-
-#ifdef CONFIG_SMP
- case 's':
- if (isdigit(*(str+1))) {
- reboot_cpu = (int) (*(str+1) - '0');
- if (isdigit(*(str+2)))
- reboot_cpu = reboot_cpu*10 + (int)(*(str+2) - '0');
- }
- /*
- * We will leave sorting out the final value
- * when we are ready to reboot, since we might not
- * have detected BSP APIC ID or smp_num_cpu
- */
- break;
-#endif /* CONFIG_SMP */
-
- case 'b':
- case 'a':
- case 'k':
- case 't':
- case 'e':
- case 'p':
- reboot_type = *str;
- break;
-
- case 'f':
- reboot_force = 1;
- break;
- }
-
- str = strchr(str, ',');
- if (str)
- str++;
- else
- break;
- }
- return 1;
-}
-
-__setup("reboot=", reboot_setup);
-

/*
* Reboot options and system auto-detection code provided by
@@ -614,26 +531,10 @@ void native_machine_shutdown(void)
{
/* Stop the cpus and apics */
#ifdef CONFIG_SMP
-
- /* The boot cpu is always logical cpu 0 */
- int reboot_cpu_id = 0;
-
- /* See if there has been given a command line override */
- if ((reboot_cpu != -1) && (reboot_cpu < nr_cpu_ids) &&
- cpu_online(reboot_cpu))
- reboot_cpu_id = reboot_cpu;
-
- /* Make certain the cpu I'm about to reboot on is online */
- if (!cpu_online(reboot_cpu_id))
- reboot_cpu_id = smp_processor_id();
-
- /* Make certain I only run on the appropriate processor */
- set_cpus_allowed_ptr(current, cpumask_of(reboot_cpu_id));
-
/*
- * O.K Now that I'm on the appropriate processor, stop all of the
- * others. Also disable the local irq to not receive the per-cpu
- * timer interrupt which may trigger scheduler's load balance.
+ * Stop all of the others. Also disable the local irq to
+ * not receive the per-cpu timer interrupt which may trigger
+ * scheduler's load balance.
*/
local_irq_disable();
stop_other_cpus();
diff --git a/kernel/reboot.c b/kernel/reboot.c
index 7f4658f..3448a1d 100644
--- a/kernel/reboot.c
+++ b/kernel/reboot.c
@@ -6,6 +6,7 @@

#define pr_fmt(fmt) "reboot: " fmt

+#include <linux/ctype.h>
#include <linux/export.h>
#include <linux/kexec.h>
#include <linux/kmod.h>
@@ -69,22 +70,107 @@ int unregister_reboot_notifier(struct notifier_block *nb)
}
EXPORT_SYMBOL(unregister_reboot_notifier);

+/*
+ * This variable is used privately to keep track of whether or not
+ * reboot_type is still set to its default value (i.e., reboot= hasn't
+ * been set on the command line). This is needed so that we can
+ * suppress DMI scanning for reboot quirks. Without it, it's
+ * impossible to override a faulty reboot quirk without recompiling.
+ */
+int reboot_default = 1;
+int reboot_mode;
+enum reboot_type reboot_type = BOOT_ACPI;
+int reboot_force;
+static int reboot_cpu;
+
+/*
+ * reboot=b[ios] | s[mp] | t[riple] | k[bd] | e[fi] [, [w]arm | [c]old] | p[ci]
+ * warm Don't set the cold reboot flag
+ * cold Set the cold reboot flag
+ * bios Reboot by jumping through the BIOS
+ * smp Reboot by executing reset on BSP or other CPU
+ * triple Force a triple fault (init)
+ * kbd Use the keyboard controller. cold reset (default)
+ * acpi Use the RESET_REG in the FADT
+ * efi Use efi reset_system runtime service
+ * pci Use the so-called "PCI reset register", CF9
+ * force Avoid anything that could hang.
+ */
+static int __init reboot_setup(char *str)
+{
+ int i;
+
+ for (;;) {
+ /*
+ * Having anything passed on the command line via
+ * reboot= will cause us to disable DMI checking
+ * below.
+ */
+ reboot_default = 0;
+
+ switch (*str) {
+ case 'w':
+ reboot_mode = 0x1234;
+ break;
+
+ case 'c':
+ reboot_mode = 0;
+ break;
+
+#ifdef CONFIG_SMP
+ case 's':
+ if (*(str + 1) == 'm' && *(str + 2) == 'p')
+ str += 2;
+
+ reboot_cpu = 0;
+ for (i = 1; isdigit(*(str + i)); i++) {
+ reboot_cpu *= 10;
+ reboot_cpu += (int)(*(str + i) - '0');
+ }
+
+ break;
+#endif /* CONFIG_SMP */
+
+ case 'b':
+ case 'a':
+ case 'k':
+ case 't':
+ case 'e':
+ case 'p':
+ reboot_type = *str;
+ break;
+
+ case 'f':
+ reboot_force = 1;
+ break;
+ }
+
+ str = strchr(str, ',');
+ if (str)
+ str++;
+ else
+ break;
+ }
+ return 1;
+}
+__setup("reboot=", reboot_setup);
+
+
static void migrate_to_reboot_cpu(void)
{
- /* The boot cpu is always logical cpu 0 */
- int reboot_cpu_id = 0;
+ int cpu = reboot_cpu;

cpu_hotplug_disable();

/* Make certain the cpu I'm about to reboot on is online */
- if (!cpu_online(reboot_cpu_id))
- reboot_cpu_id = cpumask_first(cpu_online_mask);
+ if (!cpu_online(cpu))
+ cpu = cpumask_first(cpu_online_mask);

/* Prevent races with other tasks migrating this task */
current->flags |= PF_THREAD_BOUND;

/* Make certain I only run on the appropriate processor */
- set_cpus_allowed_ptr(current, cpumask_of(reboot_cpu_id));
+ set_cpus_allowed_ptr(current, cpumask_of(cpu));
}

/**
--
1.8.1.2

2013-04-24 11:19:36

by Robin Holt

[permalink] [raw]
Subject: [PATCH -v6 2/5] Migrate shutdown/reboot to boot cpu.

We recently noticed that reboot of a 1024 cpu machine takes approx 16
minutes of just stopping the cpus. The slowdown was tracked to commit
f96972f.

The current implementation does all the work of hot removing the cpus
before halting the system. We are switching to just migrating to the
boot cpu and then continuing with shutdown/reboot.

This also has the effect of not breaking x86's command line parameter for
specifying the reboot cpu. Note, this code was shamelessly copied from
arch/x86/kernel/reboot.c with bits removed pertaining to the reboot_cpu
command line parameter.

Signed-off-by: Robin Holt <[email protected]>
Tested-by: Shawn Guo <[email protected]>
To: Andrew Morton <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Robin Holt <[email protected]>
Cc: Russ Anderson <[email protected]>
Cc: Linux Kernel Mailing List <[email protected]>
Cc: the arch/x86 maintainers <[email protected]>
Cc: <[email protected]>

---

Changes since -v4.
- Integrated comments from Srivatsa S. Bhat

- Used new cpu_hotplug_disable() function.

- Added 'static' to new function.

Changes since -v1.
- Set PF_THREAD_BOUND before migrating to eliminate potential race.

- Modified kernel_power_off to also migrate instead of using
disable_nonboot_cpus().
---
kernel/sys.c | 24 +++++++++++++++++++++---
1 file changed, 21 insertions(+), 3 deletions(-)

diff --git a/kernel/sys.c b/kernel/sys.c
index 0da73cf..232214e 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -357,6 +357,24 @@ int unregister_reboot_notifier(struct notifier_block *nb)
}
EXPORT_SYMBOL(unregister_reboot_notifier);

+static void migrate_to_reboot_cpu(void)
+{
+ /* The boot cpu is always logical cpu 0 */
+ int reboot_cpu_id = 0;
+
+ cpu_hotplug_disable();
+
+ /* Make certain the cpu I'm about to reboot on is online */
+ if (!cpu_online(reboot_cpu_id))
+ reboot_cpu_id = cpumask_first(cpu_online_mask);
+
+ /* Prevent races with other tasks migrating this task */
+ current->flags |= PF_THREAD_BOUND;
+
+ /* Make certain I only run on the appropriate processor */
+ set_cpus_allowed_ptr(current, cpumask_of(reboot_cpu_id));
+}
+
/**
* kernel_restart - reboot the system
* @cmd: pointer to buffer containing command to execute for restart
@@ -368,7 +386,7 @@ EXPORT_SYMBOL(unregister_reboot_notifier);
void kernel_restart(char *cmd)
{
kernel_restart_prepare(cmd);
- disable_nonboot_cpus();
+ migrate_to_reboot_cpu();
syscore_shutdown();
if (!cmd)
printk(KERN_EMERG "Restarting system.\n");
@@ -395,7 +413,7 @@ static void kernel_shutdown_prepare(enum system_states state)
void kernel_halt(void)
{
kernel_shutdown_prepare(SYSTEM_HALT);
- disable_nonboot_cpus();
+ migrate_to_reboot_cpu();
syscore_shutdown();
printk(KERN_EMERG "System halted.\n");
kmsg_dump(KMSG_DUMP_HALT);
@@ -414,7 +432,7 @@ void kernel_power_off(void)
kernel_shutdown_prepare(SYSTEM_POWER_OFF);
if (pm_power_off_prepare)
pm_power_off_prepare();
- disable_nonboot_cpus();
+ migrate_to_reboot_cpu();
syscore_shutdown();
printk(KERN_EMERG "Power down.\n");
kmsg_dump(KMSG_DUMP_POWEROFF);
--
1.8.1.2

2013-04-24 11:20:33

by Robin Holt

[permalink] [raw]
Subject: [PATCH -v6 4/5] checkpatch.pl the new kernel/reboot.c file.

I did allow the remaining 81 character line behind. It did not seem
like it was worth changing. Otherwise, it now passes checkpatch.pl.

Signed-off-by: Robin Holt <[email protected]>
To: Andrew Morton <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Robin Holt <[email protected]>
Cc: Russ Anderson <[email protected]>
Cc: Linux Kernel Mailing List <[email protected]>
Cc: the arch/x86 maintainers <[email protected]>

---

Changes since -v3:
Integrated feedback from Joe Perches.
---
kernel/reboot.c | 24 +++++++++++-------------
1 file changed, 11 insertions(+), 13 deletions(-)

diff --git a/kernel/reboot.c b/kernel/reboot.c
index ff92601..7f4658f 100644
--- a/kernel/reboot.c
+++ b/kernel/reboot.c
@@ -4,6 +4,8 @@
* Copyright (C) 2013 Linus Torvalds
*/

+#define pr_fmt(fmt) "reboot: " fmt
+
#include <linux/export.h>
#include <linux/kexec.h>
#include <linux/kmod.h>
@@ -99,9 +101,9 @@ void kernel_restart(char *cmd)
migrate_to_reboot_cpu();
syscore_shutdown();
if (!cmd)
- printk(KERN_EMERG "Restarting system.\n");
+ pr_emerg("Restarting system\n");
else
- printk(KERN_EMERG "Restarting system with command '%s'.\n", cmd);
+ pr_emerg("Restarting system with command '%s'\n", cmd);
kmsg_dump(KMSG_DUMP_RESTART);
machine_restart(cmd);
}
@@ -110,7 +112,7 @@ EXPORT_SYMBOL_GPL(kernel_restart);
static void kernel_shutdown_prepare(enum system_states state)
{
blocking_notifier_call_chain(&reboot_notifier_list,
- (state == SYSTEM_HALT)?SYS_HALT:SYS_POWER_OFF, NULL);
+ (state == SYSTEM_HALT) ? SYS_HALT : SYS_POWER_OFF, NULL);
system_state = state;
usermodehelper_disable();
device_shutdown();
@@ -125,11 +127,10 @@ void kernel_halt(void)
kernel_shutdown_prepare(SYSTEM_HALT);
migrate_to_reboot_cpu();
syscore_shutdown();
- printk(KERN_EMERG "System halted.\n");
+ pr_emerg("System halted\n");
kmsg_dump(KMSG_DUMP_HALT);
machine_halt();
}
-
EXPORT_SYMBOL_GPL(kernel_halt);

/**
@@ -144,7 +145,7 @@ void kernel_power_off(void)
pm_power_off_prepare();
migrate_to_reboot_cpu();
syscore_shutdown();
- printk(KERN_EMERG "Power down.\n");
+ pr_emerg("Power down\n");
kmsg_dump(KMSG_DUMP_POWEROFF);
machine_power_off();
}
@@ -173,10 +174,10 @@ SYSCALL_DEFINE4(reboot, int, magic1, int, magic2, unsigned int, cmd,

/* For safety, we require "magic" arguments. */
if (magic1 != LINUX_REBOOT_MAGIC1 ||
- (magic2 != LINUX_REBOOT_MAGIC2 &&
- magic2 != LINUX_REBOOT_MAGIC2A &&
+ (magic2 != LINUX_REBOOT_MAGIC2 &&
+ magic2 != LINUX_REBOOT_MAGIC2A &&
magic2 != LINUX_REBOOT_MAGIC2B &&
- magic2 != LINUX_REBOOT_MAGIC2C))
+ magic2 != LINUX_REBOOT_MAGIC2C))
return -EINVAL;

/*
@@ -285,14 +286,11 @@ static int __orderly_poweroff(bool force)
ret = call_usermodehelper(argv[0], argv, envp, UMH_WAIT_EXEC);
argv_free(argv);
} else {
- printk(KERN_WARNING "%s failed to allocate memory for \"%s\"\n",
- __func__, poweroff_cmd);
ret = -ENOMEM;
}

if (ret && force) {
- printk(KERN_WARNING "Failed to start orderly shutdown: "
- "forcing the issue\n");
+ pr_warn("Failed to start orderly shutdown: forcing the issue\n");
/*
* I guess this should try to kick off some daemon to sync and
* poweroff asap. Or not even bother syncing if we're doing an
--
1.8.1.2

2013-04-24 22:11:16

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH -v6 5/5] Move arch/x86 reboot= handling to generic kernel.

On Wed, 24 Apr 2013 06:18:53 -0500 Robin Holt <[email protected]> wrote:

> Since arch/x86 has the most extensive reboot= handling and it appears to
> have no effect on the remaining architectures. This patch moves that
> reboot= handling from arch/x86 to the generic kernel.
>

This seems incomplete. These, in arch/x86/kernel/reboot.c

extern int reboot_mode;
extern enum reboot_type reboot_type;
extern int reboot_force;
extern int reboot_default;

should surely be moved into include/linux/reboot.h. But `enum
reboot_type' remains stuck over in
arch/x86/include/asm/emergency-restart.h.

2013-04-24 22:17:52

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH -v6 2/5] Migrate shutdown/reboot to boot cpu.

On Wed, 24 Apr 2013 06:18:50 -0500 Robin Holt <[email protected]> wrote:

> We recently noticed that reboot of a 1024 cpu machine takes approx 16
> minutes of just stopping the cpus. The slowdown was tracked to commit
> f96972f.
>
> The current implementation does all the work of hot removing the cpus
> before halting the system. We are switching to just migrating to the
> boot cpu and then continuing with shutdown/reboot.
>
> This also has the effect of not breaking x86's command line parameter for
> specifying the reboot cpu. Note, this code was shamelessly copied from
> arch/x86/kernel/reboot.c with bits removed pertaining to the reboot_cpu
> command line parameter.
>
> @@ -357,6 +357,24 @@ int unregister_reboot_notifier(struct notifier_block *nb)
> }
> EXPORT_SYMBOL(unregister_reboot_notifier);
>
> +static void migrate_to_reboot_cpu(void)
> +{
> + /* The boot cpu is always logical cpu 0 */
> + int reboot_cpu_id = 0;
> +
> + cpu_hotplug_disable();
> +
> + /* Make certain the cpu I'm about to reboot on is online */
> + if (!cpu_online(reboot_cpu_id))
> + reboot_cpu_id = cpumask_first(cpu_online_mask);
> +
> + /* Prevent races with other tasks migrating this task */
> + current->flags |= PF_THREAD_BOUND;

PF_THREAD_BOUND was removed/replaced in linux-next's a9ab775bc
("workqueue: directly restore CPU affinity of workers from CPU_ONLINE")

2013-04-24 22:19:52

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH -v6 2/5] Migrate shutdown/reboot to boot cpu.

On Wed, 24 Apr 2013 15:17:49 -0700 Andrew Morton <[email protected]> wrote:

> > +static void migrate_to_reboot_cpu(void)
> > +{
> > + /* The boot cpu is always logical cpu 0 */
> > + int reboot_cpu_id = 0;
> > +
> > + cpu_hotplug_disable();
> > +
> > + /* Make certain the cpu I'm about to reboot on is online */
> > + if (!cpu_online(reboot_cpu_id))
> > + reboot_cpu_id = cpumask_first(cpu_online_mask);
> > +
> > + /* Prevent races with other tasks migrating this task */
> > + current->flags |= PF_THREAD_BOUND;
>
> PF_THREAD_BOUND was removed/replaced in linux-next's a9ab775bc
> ("workqueue: directly restore CPU affinity of workers from CPU_ONLINE")

Which means that we'll need an updated patchset for 3.10-rc1, please.
And when Greg discovers that the patch won't backport correctly, please
provide him with the 3.9.1 version.

2013-04-25 06:57:10

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH -v6 2/5] Migrate shutdown/reboot to boot cpu.


* Andrew Morton <[email protected]> wrote:

> On Wed, 24 Apr 2013 06:18:50 -0500 Robin Holt <[email protected]> wrote:
>
> > We recently noticed that reboot of a 1024 cpu machine takes approx 16
> > minutes of just stopping the cpus. The slowdown was tracked to commit
> > f96972f.
> >
> > The current implementation does all the work of hot removing the cpus
> > before halting the system. We are switching to just migrating to the
> > boot cpu and then continuing with shutdown/reboot.
> >
> > This also has the effect of not breaking x86's command line parameter for
> > specifying the reboot cpu. Note, this code was shamelessly copied from
> > arch/x86/kernel/reboot.c with bits removed pertaining to the reboot_cpu
> > command line parameter.
> >
> > @@ -357,6 +357,24 @@ int unregister_reboot_notifier(struct notifier_block *nb)
> > }
> > EXPORT_SYMBOL(unregister_reboot_notifier);
> >
> > +static void migrate_to_reboot_cpu(void)
> > +{
> > + /* The boot cpu is always logical cpu 0 */
> > + int reboot_cpu_id = 0;
> > +
> > + cpu_hotplug_disable();
> > +
> > + /* Make certain the cpu I'm about to reboot on is online */
> > + if (!cpu_online(reboot_cpu_id))
> > + reboot_cpu_id = cpumask_first(cpu_online_mask);
> > +
> > + /* Prevent races with other tasks migrating this task */
> > + current->flags |= PF_THREAD_BOUND;
>
> PF_THREAD_BOUND was removed/replaced in linux-next's a9ab775bc
> ("workqueue: directly restore CPU affinity of workers from CPU_ONLINE")

I don't see PF_THREAD_BOUND being removed by commit a9ab775bc. Did you
mean some other commit?

Thanks,

Ingo

2013-04-25 07:29:49

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH -v6 2/5] Migrate shutdown/reboot to boot cpu.

On Thu, 25 Apr 2013 08:57:04 +0200 Ingo Molnar <[email protected]> wrote:

> > PF_THREAD_BOUND was removed/replaced in linux-next's a9ab775bc
> > ("workqueue: directly restore CPU affinity of workers from CPU_ONLINE")
>
> I don't see PF_THREAD_BOUND being removed by commit a9ab775bc. Did you
> mean some other commit?

commit 14a40ffccd6163bbcd1d6f32b28a88ffe6149fc6
Author: Tejun Heo <[email protected]>
Date: Tue Mar 19 13:45:20 2013 -0700

sched: replace PF_THREAD_BOUND with PF_NO_SETAFFINITY

2013-04-25 07:34:42

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH -v6 2/5] Migrate shutdown/reboot to boot cpu.


* Andrew Morton <[email protected]> wrote:

> On Thu, 25 Apr 2013 08:57:04 +0200 Ingo Molnar <[email protected]> wrote:
>
> > > PF_THREAD_BOUND was removed/replaced in linux-next's a9ab775bc
> > > ("workqueue: directly restore CPU affinity of workers from CPU_ONLINE")
> >
> > I don't see PF_THREAD_BOUND being removed by commit a9ab775bc. Did you
> > mean some other commit?
>
> commit 14a40ffccd6163bbcd1d6f32b28a88ffe6149fc6
> Author: Tejun Heo <[email protected]>
> Date: Tue Mar 19 13:45:20 2013 -0700
>
> sched: replace PF_THREAD_BOUND with PF_NO_SETAFFINITY

I don't think much action is needed: whenever someone integrates the
workqueue tree with these reboot changes, or the workqueue changes hit
upstream, a s/PF_THREAD_BOUND/PF_NO_SETAFFINITY should do the trick,
right?

Btw., if that's true then the workqueue tree should probably grow this
helper:

#define PF_THREAD_BOUND PF_NO_SETAFFINITY

migration/backport helper, which can be removed in v3.11 or so.

Thanks,

Ingo

2013-04-25 15:36:48

by Robin Holt

[permalink] [raw]
Subject: Re: [PATCH -v6 2/5] Migrate shutdown/reboot to boot cpu.

On Thu, Apr 25, 2013 at 09:34:37AM +0200, Ingo Molnar wrote:
>
> * Andrew Morton <[email protected]> wrote:
>
> > On Thu, 25 Apr 2013 08:57:04 +0200 Ingo Molnar <[email protected]> wrote:
> >
> > > > PF_THREAD_BOUND was removed/replaced in linux-next's a9ab775bc
> > > > ("workqueue: directly restore CPU affinity of workers from CPU_ONLINE")
> > >
> > > I don't see PF_THREAD_BOUND being removed by commit a9ab775bc. Did you
> > > mean some other commit?
> >
> > commit 14a40ffccd6163bbcd1d6f32b28a88ffe6149fc6
> > Author: Tejun Heo <[email protected]>
> > Date: Tue Mar 19 13:45:20 2013 -0700
> >
> > sched: replace PF_THREAD_BOUND with PF_NO_SETAFFINITY
>
> I don't think much action is needed: whenever someone integrates the
> workqueue tree with these reboot changes, or the workqueue changes hit
> upstream, a s/PF_THREAD_BOUND/PF_NO_SETAFFINITY should do the trick,
> right?
>
> Btw., if that's true then the workqueue tree should probably grow this
> helper:
>
> #define PF_THREAD_BOUND PF_NO_SETAFFINITY

Andrew, how about I add:
--- a/kernel/reboot.c
+++ b/kernel/reboot.c
@@ -155,6 +155,9 @@ static int __init reboot_setup(char *str)
}
__setup("reboot=", reboot_setup);

+#ifndef PF_THREAD_BOUND
+#define PF_THREAD_BOUND PF_NO_SETAFFINITY
+#endif

static void migrate_to_reboot_cpu(void)
{

to this patch and then add another patch to the series to remove those.
That will make the stable pull more clear and accomplish the same final
outcome.

Thanks,
Robin

2013-04-25 21:23:58

by Robin Holt

[permalink] [raw]
Subject: Re: [PATCH -v6 0/5] Shutdown from reboot_cpuid without stopping other cpus.

Andrew,

Please go ahead and kill this patch set. It breaks with !CONFIG_SMP
and breaks on arm. I spent time today getting an arm cross compiler
working and fixing up the build. It indicates there is more work to
be done beyond your comments for the reboot_mode parameter. I should
have a newer patch set tomorrow, but it will need to go through arm and
unicore32 reviews as it will affect both of them as well.

Robin

On Wed, Apr 24, 2013 at 06:18:48AM -0500, Robin Holt wrote:
> We recently noticed that reboot of a 1024 cpu machine takes approx 16
> minutes of just stopping the cpus. The slowdown was tracked to commit
> f96972f.
>
> The current implementation does all the work of hot removing the cpus
> before halting the system. We are switching to just migrating to the
> reboot_cpu and then continuing with shutdown/reboot.
>
> The patch set is broken into five parts. The first two are planned for
> the stable release. The others move the halt/shutdown/reboot related
> functions to their own kernel/reboot.c file and then introduce the kernel
> boot parameter.
>
>
> Changes since -v5.
> - Moved the arch/x86 reboot= up to the generic kernel code.
>
> Changes since -v4.
> - Integrated Srivatsa S. Bhat creating cpu_hotplug_disable()
> function
>
> - Integrated comments by Srivatsa S. Bhat.
>
> - Made one more comment consistent with others in function.
>
> Changes since -v3.
> - Added a tested-by for the original reporter.
>
> - Fix compile failure found by Joe Perches.
>
> - Integrated comments by Joe Perches.