2013-04-17 18:43:30

by Robin Holt

[permalink] [raw]
Subject: [PATCH -v5 0/5] Shutdown from reboot_cpuid without stopping other cpus.

We recently noticed that reboot of a 1024 cpu machine takes approx 16
minutes of just stopping the cpus. The slowdown was tracked to commit
f96972f.

The current implementation does all the work of hot removing the cpus
before halting the system. We are switching to just migrating to the
reboot_cpuid and then continuing with shutdown/reboot.

The patch set is broken into five parts. The first two are planned for
the stable release. The others move the halt/shutdown/reboot related
functions to their own kernel/reboot.c file and then introduce the kernel
boot parameter.


Changes since -v4.
- Integrated Srivatsa S. Bhat creating cpu_hotplug_disable()
function

- Integrated comments by Srivatsa S. Bhat.

- Made one more comment consistent with others in function.

Changes since -v3.
- Added a tested-by for the original reporter.

- Fix compile failure found by Joe Perches.

- Integrated comments by Joe Perches.


2013-04-17 18:43:33

by Robin Holt

[permalink] [raw]
Subject: [PATCH -v5 1/5] CPU hotplug: Provide a generic helper to disable/enable CPU hotplug

From: "Srivatsa S. Bhat" <[email protected]>

There are instances in the kernel where we would like to disable
CPU hotplug (from sysfs) during some important operation. Today
the freezer code depends on this and the code to do it was kinda
tailor-made for that.

Restructure the code and make it generic enough to be useful for
other usecases too.

Signed-off-by: Srivatsa S. Bhat <[email protected]>
Signed-off-by: Robin Holt <[email protected]>
---
include/linux/cpu.h | 2 ++
kernel/cpu.c | 27 +++++++++------------------
2 files changed, 11 insertions(+), 18 deletions(-)

diff --git a/include/linux/cpu.h b/include/linux/cpu.h
index ce7a074..73282d4 100644
--- a/include/linux/cpu.h
+++ b/include/linux/cpu.h
@@ -141,6 +141,8 @@ static inline void unregister_cpu_notifier(struct notifier_block *nb)
}
#endif

+extern void cpu_hotplug_disable(void);
+extern void cpu_hotplug_enable(void);
int cpu_up(unsigned int cpu);
void notify_cpu_starting(unsigned int cpu);
extern void cpu_maps_update_begin(void);
diff --git a/kernel/cpu.c b/kernel/cpu.c
index b5e4ab2..28769f5 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -541,29 +541,20 @@ static int __init alloc_frozen_cpus(void)
core_initcall(alloc_frozen_cpus);

/*
- * Prevent regular CPU hotplug from racing with the freezer, by disabling CPU
- * hotplug when tasks are about to be frozen. Also, don't allow the freezer
- * to continue until any currently running CPU hotplug operation gets
- * completed.
- * To modify the 'cpu_hotplug_disabled' flag, we need to acquire the
- * 'cpu_add_remove_lock'. And this same lock is also taken by the regular
- * CPU hotplug path and released only after it is complete. Thus, we
- * (and hence the freezer) will block here until any currently running CPU
- * hotplug operation gets completed.
+ * Wait for currently running CPU hotplug operations to complete (if any) and
+ * disable future CPU hotplug (from sysfs). The 'cpu_add_remove_lock' protects
+ * the 'cpu_hotplug_disabled' flag. The same lock is also acquired by the
+ * hotplug path before performing hotplug operations. So acquiring that lock
+ * guarantees mutual exclusion from any currently running hotplug operations.
*/
-void cpu_hotplug_disable_before_freeze(void)
+void cpu_hotplug_disable(void)
{
cpu_maps_update_begin();
cpu_hotplug_disabled = 1;
cpu_maps_update_done();
}

-
-/*
- * When tasks have been thawed, re-enable regular CPU hotplug (which had been
- * disabled while beginning to freeze tasks).
- */
-void cpu_hotplug_enable_after_thaw(void)
+void cpu_hotplug_enable(void)
{
cpu_maps_update_begin();
cpu_hotplug_disabled = 0;
@@ -589,12 +580,12 @@ cpu_hotplug_pm_callback(struct notifier_block *nb,

case PM_SUSPEND_PREPARE:
case PM_HIBERNATION_PREPARE:
- cpu_hotplug_disable_before_freeze();
+ cpu_hotplug_disable();
break;

case PM_POST_SUSPEND:
case PM_POST_HIBERNATION:
- cpu_hotplug_enable_after_thaw();
+ cpu_hotplug_enable();
break;

default:
--
1.8.1.2

2013-04-17 18:43:32

by Robin Holt

[permalink] [raw]
Subject: [PATCH -v5 2/5] Migrate shutdown/reboot to boot cpu.

We recently noticed that reboot of a 1024 cpu machine takes approx 16
minutes of just stopping the cpus. The slowdown was tracked to commit
f96972f.

The current implementation does all the work of hot removing the cpus
before halting the system. We are switching to just migrating to the
boot cpu and then continuing with shutdown/reboot.

This also has the effect of not breaking x86's command line parameter for
specifying the reboot cpu. Note, this code was shamelessly copied from
arch/x86/kernel/reboot.c with bits removed pertaining to the reboot_cpu
command line parameter.

Signed-off-by: Robin Holt <[email protected]>
Tested-by: Shawn Guo <[email protected]>
To: Ingo Molnar <[email protected]>
To: Russ Anderson <[email protected]>
To: Oleg Nesterov <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Lai Jiangshan <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Linux Kernel Mailing List <[email protected]>
Cc: Michel Lespinasse <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: "Paul E. McKenney" <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Robin Holt <[email protected]>
Cc: "[email protected]" <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: the arch/x86 maintainers <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: <[email protected]>

---

Changes since -v4.
- Integrated comments from Srivatsa S. Bhat

- Used new cpu_hotplug_disable() function.

- Added 'static' to new function.

Changes since -v1.
- Set PF_THREAD_BOUND before migrating to eliminate potential race.

- Modified kernel_power_off to also migrate instead of using
disable_nonboot_cpus().
---
kernel/sys.c | 24 +++++++++++++++++++++---
1 file changed, 21 insertions(+), 3 deletions(-)

diff --git a/kernel/sys.c b/kernel/sys.c
index 0da73cf..232214e 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -357,6 +357,24 @@ int unregister_reboot_notifier(struct notifier_block *nb)
}
EXPORT_SYMBOL(unregister_reboot_notifier);

+static void migrate_to_reboot_cpu(void)
+{
+ /* The boot cpu is always logical cpu 0 */
+ int reboot_cpu_id = 0;
+
+ cpu_hotplug_disable();
+
+ /* Make certain the cpu I'm about to reboot on is online */
+ if (!cpu_online(reboot_cpu_id))
+ reboot_cpu_id = cpumask_first(cpu_online_mask);
+
+ /* Prevent races with other tasks migrating this task */
+ current->flags |= PF_THREAD_BOUND;
+
+ /* Make certain I only run on the appropriate processor */
+ set_cpus_allowed_ptr(current, cpumask_of(reboot_cpu_id));
+}
+
/**
* kernel_restart - reboot the system
* @cmd: pointer to buffer containing command to execute for restart
@@ -368,7 +386,7 @@ EXPORT_SYMBOL(unregister_reboot_notifier);
void kernel_restart(char *cmd)
{
kernel_restart_prepare(cmd);
- disable_nonboot_cpus();
+ migrate_to_reboot_cpu();
syscore_shutdown();
if (!cmd)
printk(KERN_EMERG "Restarting system.\n");
@@ -395,7 +413,7 @@ static void kernel_shutdown_prepare(enum system_states state)
void kernel_halt(void)
{
kernel_shutdown_prepare(SYSTEM_HALT);
- disable_nonboot_cpus();
+ migrate_to_reboot_cpu();
syscore_shutdown();
printk(KERN_EMERG "System halted.\n");
kmsg_dump(KMSG_DUMP_HALT);
@@ -414,7 +432,7 @@ void kernel_power_off(void)
kernel_shutdown_prepare(SYSTEM_POWER_OFF);
if (pm_power_off_prepare)
pm_power_off_prepare();
- disable_nonboot_cpus();
+ migrate_to_reboot_cpu();
syscore_shutdown();
printk(KERN_EMERG "Power down.\n");
kmsg_dump(KMSG_DUMP_POWEROFF);
--
1.8.1.2

2013-04-17 18:43:31

by Robin Holt

[permalink] [raw]
Subject: [PATCH -v5 4/5] checkpatch.pl the new kernel/reboot.c file.

I did allow the remaining 81 character line behind. It did not seem
like it was worth changing. Otherwise, it now passes checkpatch.pl.

Signed-off-by: Robin Holt <[email protected]>
To: Ingo Molnar <[email protected]>
To: Russ Anderson <[email protected]>
Cc: Shawn Guo <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Lai Jiangshan <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Linux Kernel Mailing List <[email protected]>
Cc: Michel Lespinasse <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: "Paul E. McKenney" <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Robin Holt <[email protected]>
Cc: "[email protected]" <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: the arch/x86 maintainers <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Joe Perches <[email protected]>

---

Changes since -v3:
Integrated feedback from Joe Perches.
---
kernel/reboot.c | 26 +++++++++++++-------------
1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/kernel/reboot.c b/kernel/reboot.c
index ff92601..c4eb2a4 100644
--- a/kernel/reboot.c
+++ b/kernel/reboot.c
@@ -4,6 +4,8 @@
* Copyright (C) 2013 Linus Torvalds
*/

+#define pr_fmt(fmt) "reboot: " fmt
+
#include <linux/export.h>
#include <linux/kexec.h>
#include <linux/kmod.h>
@@ -99,9 +101,9 @@ void kernel_restart(char *cmd)
migrate_to_reboot_cpu();
syscore_shutdown();
if (!cmd)
- printk(KERN_EMERG "Restarting system.\n");
+ pr_emerg("Restarting system\n");
else
- printk(KERN_EMERG "Restarting system with command '%s'.\n", cmd);
+ pr_emerg("Restarting system with command '%s'\n", cmd);
kmsg_dump(KMSG_DUMP_RESTART);
machine_restart(cmd);
}
@@ -110,7 +112,7 @@ EXPORT_SYMBOL_GPL(kernel_restart);
static void kernel_shutdown_prepare(enum system_states state)
{
blocking_notifier_call_chain(&reboot_notifier_list,
- (state == SYSTEM_HALT)?SYS_HALT:SYS_POWER_OFF, NULL);
+ (state == SYSTEM_HALT) ? SYS_HALT : SYS_POWER_OFF, NULL);
system_state = state;
usermodehelper_disable();
device_shutdown();
@@ -125,11 +127,10 @@ void kernel_halt(void)
kernel_shutdown_prepare(SYSTEM_HALT);
migrate_to_reboot_cpu();
syscore_shutdown();
- printk(KERN_EMERG "System halted.\n");
+ pr_emerg("System halted\n");
kmsg_dump(KMSG_DUMP_HALT);
machine_halt();
}
-
EXPORT_SYMBOL_GPL(kernel_halt);

/**
@@ -144,7 +145,7 @@ void kernel_power_off(void)
pm_power_off_prepare();
migrate_to_reboot_cpu();
syscore_shutdown();
- printk(KERN_EMERG "Power down.\n");
+ pr_emerg("Power down\n");
kmsg_dump(KMSG_DUMP_POWEROFF);
machine_power_off();
}
@@ -173,10 +174,10 @@ SYSCALL_DEFINE4(reboot, int, magic1, int, magic2, unsigned int, cmd,

/* For safety, we require "magic" arguments. */
if (magic1 != LINUX_REBOOT_MAGIC1 ||
- (magic2 != LINUX_REBOOT_MAGIC2 &&
- magic2 != LINUX_REBOOT_MAGIC2A &&
+ (magic2 != LINUX_REBOOT_MAGIC2 &&
+ magic2 != LINUX_REBOOT_MAGIC2A &&
magic2 != LINUX_REBOOT_MAGIC2B &&
- magic2 != LINUX_REBOOT_MAGIC2C))
+ magic2 != LINUX_REBOOT_MAGIC2C))
return -EINVAL;

/*
@@ -285,14 +286,13 @@ static int __orderly_poweroff(bool force)
ret = call_usermodehelper(argv[0], argv, envp, UMH_WAIT_EXEC);
argv_free(argv);
} else {
- printk(KERN_WARNING "%s failed to allocate memory for \"%s\"\n",
- __func__, poweroff_cmd);
+ pr_warn("%s failed to allocate memory for \"%s\"\n",
+ __func__, poweroff_cmd);
ret = -ENOMEM;
}

if (ret && force) {
- printk(KERN_WARNING "Failed to start orderly shutdown: "
- "forcing the issue\n");
+ pr_warn("Failed to start orderly shutdown: forcing the issue\n");
/*
* I guess this should try to kick off some daemon to sync and
* poweroff asap. Or not even bother syncing if we're doing an
--
1.8.1.2

2013-04-17 18:44:24

by Robin Holt

[permalink] [raw]
Subject: [PATCH -v5 3/5] Move shutdown/reboot related functions to kernel/reboot.c

This patch is preparatory. It moves reboot related syscall, etc
functions from kernel/sys.c to kernel/reboot.c.

Signed-off-by: Robin Holt <[email protected]>
To: Ingo Molnar <[email protected]>
To: Russ Anderson <[email protected]>
Cc: Shawn Guo <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Lai Jiangshan <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Linux Kernel Mailing List <[email protected]>
Cc: Michel Lespinasse <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: "Paul E. McKenney" <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Robin Holt <[email protected]>
Cc: "[email protected]" <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: the arch/x86 maintainers <[email protected]>
Cc: Thomas Gleixner <[email protected]>
---
kernel/Makefile | 2 +-
kernel/reboot.c | 331 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
kernel/sys.c | 317 -----------------------------------------------------
3 files changed, 332 insertions(+), 318 deletions(-)
create mode 100644 kernel/reboot.c

diff --git a/kernel/Makefile b/kernel/Makefile
index bbde5f1..5c2cef7 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -9,7 +9,7 @@ obj-y = fork.o exec_domain.o panic.o printk.o \
rcupdate.o extable.o params.o posix-timers.o \
kthread.o wait.o sys_ni.o posix-cpu-timers.o mutex.o \
hrtimer.o rwsem.o nsproxy.o srcu.o semaphore.o \
- notifier.o ksysfs.o cred.o \
+ notifier.o ksysfs.o cred.o reboot.o \
async.o range.o groups.o lglock.o smpboot.o

ifdef CONFIG_FUNCTION_TRACER
diff --git a/kernel/reboot.c b/kernel/reboot.c
new file mode 100644
index 0000000..ff92601
--- /dev/null
+++ b/kernel/reboot.c
@@ -0,0 +1,331 @@
+/*
+ * linux/kernel/reboot.c
+ *
+ * Copyright (C) 2013 Linus Torvalds
+ */
+
+#include <linux/export.h>
+#include <linux/kexec.h>
+#include <linux/kmod.h>
+#include <linux/kmsg_dump.h>
+#include <linux/reboot.h>
+#include <linux/suspend.h>
+#include <linux/syscalls.h>
+#include <linux/syscore_ops.h>
+
+/**
+ * emergency_restart - reboot the system
+ *
+ * Without shutting down any hardware or taking any locks
+ * reboot the system. This is called when we know we are in
+ * trouble so this is our best effort to reboot. This is
+ * safe to call in interrupt context.
+ */
+void emergency_restart(void)
+{
+ kmsg_dump(KMSG_DUMP_EMERG);
+ machine_emergency_restart();
+}
+EXPORT_SYMBOL_GPL(emergency_restart);
+
+void kernel_restart_prepare(char *cmd)
+{
+ blocking_notifier_call_chain(&reboot_notifier_list, SYS_RESTART, cmd);
+ system_state = SYSTEM_RESTART;
+ usermodehelper_disable();
+ device_shutdown();
+}
+
+/**
+ * register_reboot_notifier - Register function to be called at reboot time
+ * @nb: Info about notifier function to be called
+ *
+ * Registers a function with the list of functions
+ * to be called at reboot time.
+ *
+ * Currently always returns zero, as blocking_notifier_chain_register()
+ * always returns zero.
+ */
+int register_reboot_notifier(struct notifier_block *nb)
+{
+ return blocking_notifier_chain_register(&reboot_notifier_list, nb);
+}
+EXPORT_SYMBOL(register_reboot_notifier);
+
+/**
+ * unregister_reboot_notifier - Unregister previously registered reboot notifier
+ * @nb: Hook to be unregistered
+ *
+ * Unregisters a previously registered reboot
+ * notifier function.
+ *
+ * Returns zero on success, or %-ENOENT on failure.
+ */
+int unregister_reboot_notifier(struct notifier_block *nb)
+{
+ return blocking_notifier_chain_unregister(&reboot_notifier_list, nb);
+}
+EXPORT_SYMBOL(unregister_reboot_notifier);
+
+static void migrate_to_reboot_cpu(void)
+{
+ /* The boot cpu is always logical cpu 0 */
+ int reboot_cpu_id = 0;
+
+ cpu_hotplug_disable();
+
+ /* Make certain the cpu I'm about to reboot on is online */
+ if (!cpu_online(reboot_cpu_id))
+ reboot_cpu_id = cpumask_first(cpu_online_mask);
+
+ /* Prevent races with other tasks migrating this task */
+ current->flags |= PF_THREAD_BOUND;
+
+ /* Make certain I only run on the appropriate processor */
+ set_cpus_allowed_ptr(current, cpumask_of(reboot_cpu_id));
+}
+
+/**
+ * kernel_restart - reboot the system
+ * @cmd: pointer to buffer containing command to execute for restart
+ * or %NULL
+ *
+ * Shutdown everything and perform a clean reboot.
+ * This is not safe to call in interrupt context.
+ */
+void kernel_restart(char *cmd)
+{
+ kernel_restart_prepare(cmd);
+ migrate_to_reboot_cpu();
+ syscore_shutdown();
+ if (!cmd)
+ printk(KERN_EMERG "Restarting system.\n");
+ else
+ printk(KERN_EMERG "Restarting system with command '%s'.\n", cmd);
+ kmsg_dump(KMSG_DUMP_RESTART);
+ machine_restart(cmd);
+}
+EXPORT_SYMBOL_GPL(kernel_restart);
+
+static void kernel_shutdown_prepare(enum system_states state)
+{
+ blocking_notifier_call_chain(&reboot_notifier_list,
+ (state == SYSTEM_HALT)?SYS_HALT:SYS_POWER_OFF, NULL);
+ system_state = state;
+ usermodehelper_disable();
+ device_shutdown();
+}
+/**
+ * kernel_halt - halt the system
+ *
+ * Shutdown everything and perform a clean system halt.
+ */
+void kernel_halt(void)
+{
+ kernel_shutdown_prepare(SYSTEM_HALT);
+ migrate_to_reboot_cpu();
+ syscore_shutdown();
+ printk(KERN_EMERG "System halted.\n");
+ kmsg_dump(KMSG_DUMP_HALT);
+ machine_halt();
+}
+
+EXPORT_SYMBOL_GPL(kernel_halt);
+
+/**
+ * kernel_power_off - power_off the system
+ *
+ * Shutdown everything and perform a clean system power_off.
+ */
+void kernel_power_off(void)
+{
+ kernel_shutdown_prepare(SYSTEM_POWER_OFF);
+ if (pm_power_off_prepare)
+ pm_power_off_prepare();
+ migrate_to_reboot_cpu();
+ syscore_shutdown();
+ printk(KERN_EMERG "Power down.\n");
+ kmsg_dump(KMSG_DUMP_POWEROFF);
+ machine_power_off();
+}
+EXPORT_SYMBOL_GPL(kernel_power_off);
+
+static DEFINE_MUTEX(reboot_mutex);
+
+/*
+ * Reboot system call: for obvious reasons only root may call it,
+ * and even root needs to set up some magic numbers in the registers
+ * so that some mistake won't make this reboot the whole machine.
+ * You can also set the meaning of the ctrl-alt-del-key here.
+ *
+ * reboot doesn't sync: do that yourself before calling this.
+ */
+SYSCALL_DEFINE4(reboot, int, magic1, int, magic2, unsigned int, cmd,
+ void __user *, arg)
+{
+ struct pid_namespace *pid_ns = task_active_pid_ns(current);
+ char buffer[256];
+ int ret = 0;
+
+ /* We only trust the superuser with rebooting the system. */
+ if (!ns_capable(pid_ns->user_ns, CAP_SYS_BOOT))
+ return -EPERM;
+
+ /* For safety, we require "magic" arguments. */
+ if (magic1 != LINUX_REBOOT_MAGIC1 ||
+ (magic2 != LINUX_REBOOT_MAGIC2 &&
+ magic2 != LINUX_REBOOT_MAGIC2A &&
+ magic2 != LINUX_REBOOT_MAGIC2B &&
+ magic2 != LINUX_REBOOT_MAGIC2C))
+ return -EINVAL;
+
+ /*
+ * If pid namespaces are enabled and the current task is in a child
+ * pid_namespace, the command is handled by reboot_pid_ns() which will
+ * call do_exit().
+ */
+ ret = reboot_pid_ns(pid_ns, cmd);
+ if (ret)
+ return ret;
+
+ /* Instead of trying to make the power_off code look like
+ * halt when pm_power_off is not set do it the easy way.
+ */
+ if ((cmd == LINUX_REBOOT_CMD_POWER_OFF) && !pm_power_off)
+ cmd = LINUX_REBOOT_CMD_HALT;
+
+ mutex_lock(&reboot_mutex);
+ switch (cmd) {
+ case LINUX_REBOOT_CMD_RESTART:
+ kernel_restart(NULL);
+ break;
+
+ case LINUX_REBOOT_CMD_CAD_ON:
+ C_A_D = 1;
+ break;
+
+ case LINUX_REBOOT_CMD_CAD_OFF:
+ C_A_D = 0;
+ break;
+
+ case LINUX_REBOOT_CMD_HALT:
+ kernel_halt();
+ do_exit(0);
+ panic("cannot halt");
+
+ case LINUX_REBOOT_CMD_POWER_OFF:
+ kernel_power_off();
+ do_exit(0);
+ break;
+
+ case LINUX_REBOOT_CMD_RESTART2:
+ if (strncpy_from_user(&buffer[0], arg, sizeof(buffer) - 1) < 0) {
+ ret = -EFAULT;
+ break;
+ }
+ buffer[sizeof(buffer) - 1] = '\0';
+
+ kernel_restart(buffer);
+ break;
+
+#ifdef CONFIG_KEXEC
+ case LINUX_REBOOT_CMD_KEXEC:
+ ret = kernel_kexec();
+ break;
+#endif
+
+#ifdef CONFIG_HIBERNATION
+ case LINUX_REBOOT_CMD_SW_SUSPEND:
+ ret = hibernate();
+ break;
+#endif
+
+ default:
+ ret = -EINVAL;
+ break;
+ }
+ mutex_unlock(&reboot_mutex);
+ return ret;
+}
+
+static void deferred_cad(struct work_struct *dummy)
+{
+ kernel_restart(NULL);
+}
+
+/*
+ * This function gets called by ctrl-alt-del - ie the keyboard interrupt.
+ * As it's called within an interrupt, it may NOT sync: the only choice
+ * is whether to reboot at once, or just ignore the ctrl-alt-del.
+ */
+void ctrl_alt_del(void)
+{
+ static DECLARE_WORK(cad_work, deferred_cad);
+
+ if (C_A_D)
+ schedule_work(&cad_work);
+ else
+ kill_cad_pid(SIGINT, 1);
+}
+
+char poweroff_cmd[POWEROFF_CMD_PATH_LEN] = "/sbin/poweroff";
+
+static int __orderly_poweroff(bool force)
+{
+ char **argv;
+ static char *envp[] = {
+ "HOME=/",
+ "PATH=/sbin:/bin:/usr/sbin:/usr/bin",
+ NULL
+ };
+ int ret;
+
+ argv = argv_split(GFP_KERNEL, poweroff_cmd, NULL);
+ if (argv) {
+ ret = call_usermodehelper(argv[0], argv, envp, UMH_WAIT_EXEC);
+ argv_free(argv);
+ } else {
+ printk(KERN_WARNING "%s failed to allocate memory for \"%s\"\n",
+ __func__, poweroff_cmd);
+ ret = -ENOMEM;
+ }
+
+ if (ret && force) {
+ printk(KERN_WARNING "Failed to start orderly shutdown: "
+ "forcing the issue\n");
+ /*
+ * I guess this should try to kick off some daemon to sync and
+ * poweroff asap. Or not even bother syncing if we're doing an
+ * emergency shutdown?
+ */
+ emergency_sync();
+ kernel_power_off();
+ }
+
+ return ret;
+}
+
+static bool poweroff_force;
+
+static void poweroff_work_func(struct work_struct *work)
+{
+ __orderly_poweroff(poweroff_force);
+}
+
+static DECLARE_WORK(poweroff_work, poweroff_work_func);
+
+/**
+ * orderly_poweroff - Trigger an orderly system poweroff
+ * @force: force poweroff if command execution fails
+ *
+ * This may be called from any context to trigger a system shutdown.
+ * If the orderly shutdown fails, it will force an immediate shutdown.
+ */
+int orderly_poweroff(bool force)
+{
+ if (force) /* do not override the pending "true" */
+ poweroff_force = true;
+ schedule_work(&poweroff_work);
+ return 0;
+}
+EXPORT_SYMBOL_GPL(orderly_poweroff);
diff --git a/kernel/sys.c b/kernel/sys.c
index 232214e..76b9a93 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -303,261 +303,6 @@ out_unlock:
return retval;
}

-/**
- * emergency_restart - reboot the system
- *
- * Without shutting down any hardware or taking any locks
- * reboot the system. This is called when we know we are in
- * trouble so this is our best effort to reboot. This is
- * safe to call in interrupt context.
- */
-void emergency_restart(void)
-{
- kmsg_dump(KMSG_DUMP_EMERG);
- machine_emergency_restart();
-}
-EXPORT_SYMBOL_GPL(emergency_restart);
-
-void kernel_restart_prepare(char *cmd)
-{
- blocking_notifier_call_chain(&reboot_notifier_list, SYS_RESTART, cmd);
- system_state = SYSTEM_RESTART;
- usermodehelper_disable();
- device_shutdown();
-}
-
-/**
- * register_reboot_notifier - Register function to be called at reboot time
- * @nb: Info about notifier function to be called
- *
- * Registers a function with the list of functions
- * to be called at reboot time.
- *
- * Currently always returns zero, as blocking_notifier_chain_register()
- * always returns zero.
- */
-int register_reboot_notifier(struct notifier_block *nb)
-{
- return blocking_notifier_chain_register(&reboot_notifier_list, nb);
-}
-EXPORT_SYMBOL(register_reboot_notifier);
-
-/**
- * unregister_reboot_notifier - Unregister previously registered reboot notifier
- * @nb: Hook to be unregistered
- *
- * Unregisters a previously registered reboot
- * notifier function.
- *
- * Returns zero on success, or %-ENOENT on failure.
- */
-int unregister_reboot_notifier(struct notifier_block *nb)
-{
- return blocking_notifier_chain_unregister(&reboot_notifier_list, nb);
-}
-EXPORT_SYMBOL(unregister_reboot_notifier);
-
-static void migrate_to_reboot_cpu(void)
-{
- /* The boot cpu is always logical cpu 0 */
- int reboot_cpu_id = 0;
-
- cpu_hotplug_disable();
-
- /* Make certain the cpu I'm about to reboot on is online */
- if (!cpu_online(reboot_cpu_id))
- reboot_cpu_id = cpumask_first(cpu_online_mask);
-
- /* Prevent races with other tasks migrating this task */
- current->flags |= PF_THREAD_BOUND;
-
- /* Make certain I only run on the appropriate processor */
- set_cpus_allowed_ptr(current, cpumask_of(reboot_cpu_id));
-}
-
-/**
- * kernel_restart - reboot the system
- * @cmd: pointer to buffer containing command to execute for restart
- * or %NULL
- *
- * Shutdown everything and perform a clean reboot.
- * This is not safe to call in interrupt context.
- */
-void kernel_restart(char *cmd)
-{
- kernel_restart_prepare(cmd);
- migrate_to_reboot_cpu();
- syscore_shutdown();
- if (!cmd)
- printk(KERN_EMERG "Restarting system.\n");
- else
- printk(KERN_EMERG "Restarting system with command '%s'.\n", cmd);
- kmsg_dump(KMSG_DUMP_RESTART);
- machine_restart(cmd);
-}
-EXPORT_SYMBOL_GPL(kernel_restart);
-
-static void kernel_shutdown_prepare(enum system_states state)
-{
- blocking_notifier_call_chain(&reboot_notifier_list,
- (state == SYSTEM_HALT)?SYS_HALT:SYS_POWER_OFF, NULL);
- system_state = state;
- usermodehelper_disable();
- device_shutdown();
-}
-/**
- * kernel_halt - halt the system
- *
- * Shutdown everything and perform a clean system halt.
- */
-void kernel_halt(void)
-{
- kernel_shutdown_prepare(SYSTEM_HALT);
- migrate_to_reboot_cpu();
- syscore_shutdown();
- printk(KERN_EMERG "System halted.\n");
- kmsg_dump(KMSG_DUMP_HALT);
- machine_halt();
-}
-
-EXPORT_SYMBOL_GPL(kernel_halt);
-
-/**
- * kernel_power_off - power_off the system
- *
- * Shutdown everything and perform a clean system power_off.
- */
-void kernel_power_off(void)
-{
- kernel_shutdown_prepare(SYSTEM_POWER_OFF);
- if (pm_power_off_prepare)
- pm_power_off_prepare();
- migrate_to_reboot_cpu();
- syscore_shutdown();
- printk(KERN_EMERG "Power down.\n");
- kmsg_dump(KMSG_DUMP_POWEROFF);
- machine_power_off();
-}
-EXPORT_SYMBOL_GPL(kernel_power_off);
-
-static DEFINE_MUTEX(reboot_mutex);
-
-/*
- * Reboot system call: for obvious reasons only root may call it,
- * and even root needs to set up some magic numbers in the registers
- * so that some mistake won't make this reboot the whole machine.
- * You can also set the meaning of the ctrl-alt-del-key here.
- *
- * reboot doesn't sync: do that yourself before calling this.
- */
-SYSCALL_DEFINE4(reboot, int, magic1, int, magic2, unsigned int, cmd,
- void __user *, arg)
-{
- struct pid_namespace *pid_ns = task_active_pid_ns(current);
- char buffer[256];
- int ret = 0;
-
- /* We only trust the superuser with rebooting the system. */
- if (!ns_capable(pid_ns->user_ns, CAP_SYS_BOOT))
- return -EPERM;
-
- /* For safety, we require "magic" arguments. */
- if (magic1 != LINUX_REBOOT_MAGIC1 ||
- (magic2 != LINUX_REBOOT_MAGIC2 &&
- magic2 != LINUX_REBOOT_MAGIC2A &&
- magic2 != LINUX_REBOOT_MAGIC2B &&
- magic2 != LINUX_REBOOT_MAGIC2C))
- return -EINVAL;
-
- /*
- * If pid namespaces are enabled and the current task is in a child
- * pid_namespace, the command is handled by reboot_pid_ns() which will
- * call do_exit().
- */
- ret = reboot_pid_ns(pid_ns, cmd);
- if (ret)
- return ret;
-
- /* Instead of trying to make the power_off code look like
- * halt when pm_power_off is not set do it the easy way.
- */
- if ((cmd == LINUX_REBOOT_CMD_POWER_OFF) && !pm_power_off)
- cmd = LINUX_REBOOT_CMD_HALT;
-
- mutex_lock(&reboot_mutex);
- switch (cmd) {
- case LINUX_REBOOT_CMD_RESTART:
- kernel_restart(NULL);
- break;
-
- case LINUX_REBOOT_CMD_CAD_ON:
- C_A_D = 1;
- break;
-
- case LINUX_REBOOT_CMD_CAD_OFF:
- C_A_D = 0;
- break;
-
- case LINUX_REBOOT_CMD_HALT:
- kernel_halt();
- do_exit(0);
- panic("cannot halt");
-
- case LINUX_REBOOT_CMD_POWER_OFF:
- kernel_power_off();
- do_exit(0);
- break;
-
- case LINUX_REBOOT_CMD_RESTART2:
- if (strncpy_from_user(&buffer[0], arg, sizeof(buffer) - 1) < 0) {
- ret = -EFAULT;
- break;
- }
- buffer[sizeof(buffer) - 1] = '\0';
-
- kernel_restart(buffer);
- break;
-
-#ifdef CONFIG_KEXEC
- case LINUX_REBOOT_CMD_KEXEC:
- ret = kernel_kexec();
- break;
-#endif
-
-#ifdef CONFIG_HIBERNATION
- case LINUX_REBOOT_CMD_SW_SUSPEND:
- ret = hibernate();
- break;
-#endif
-
- default:
- ret = -EINVAL;
- break;
- }
- mutex_unlock(&reboot_mutex);
- return ret;
-}
-
-static void deferred_cad(struct work_struct *dummy)
-{
- kernel_restart(NULL);
-}
-
-/*
- * This function gets called by ctrl-alt-del - ie the keyboard interrupt.
- * As it's called within an interrupt, it may NOT sync: the only choice
- * is whether to reboot at once, or just ignore the ctrl-alt-del.
- */
-void ctrl_alt_del(void)
-{
- static DECLARE_WORK(cad_work, deferred_cad);
-
- if (C_A_D)
- schedule_work(&cad_work);
- else
- kill_cad_pid(SIGINT, 1);
-}
-
/*
* Unprivileged users may change the real gid to the effective gid
* or vice versa. (BSD-style)
@@ -2201,65 +1946,3 @@ SYSCALL_DEFINE3(getcpu, unsigned __user *, cpup, unsigned __user *, nodep,
err |= put_user(cpu_to_node(cpu), nodep);
return err ? -EFAULT : 0;
}
-
-char poweroff_cmd[POWEROFF_CMD_PATH_LEN] = "/sbin/poweroff";
-
-static int __orderly_poweroff(bool force)
-{
- char **argv;
- static char *envp[] = {
- "HOME=/",
- "PATH=/sbin:/bin:/usr/sbin:/usr/bin",
- NULL
- };
- int ret;
-
- argv = argv_split(GFP_KERNEL, poweroff_cmd, NULL);
- if (argv) {
- ret = call_usermodehelper(argv[0], argv, envp, UMH_WAIT_EXEC);
- argv_free(argv);
- } else {
- printk(KERN_WARNING "%s failed to allocate memory for \"%s\"\n",
- __func__, poweroff_cmd);
- ret = -ENOMEM;
- }
-
- if (ret && force) {
- printk(KERN_WARNING "Failed to start orderly shutdown: "
- "forcing the issue\n");
- /*
- * I guess this should try to kick off some daemon to sync and
- * poweroff asap. Or not even bother syncing if we're doing an
- * emergency shutdown?
- */
- emergency_sync();
- kernel_power_off();
- }
-
- return ret;
-}
-
-static bool poweroff_force;
-
-static void poweroff_work_func(struct work_struct *work)
-{
- __orderly_poweroff(poweroff_force);
-}
-
-static DECLARE_WORK(poweroff_work, poweroff_work_func);
-
-/**
- * orderly_poweroff - Trigger an orderly system poweroff
- * @force: force poweroff if command execution fails
- *
- * This may be called from any context to trigger a system shutdown.
- * If the orderly shutdown fails, it will force an immediate shutdown.
- */
-int orderly_poweroff(bool force)
-{
- if (force) /* do not override the pending "true" */
- poweroff_force = true;
- schedule_work(&poweroff_work);
- return 0;
-}
-EXPORT_SYMBOL_GPL(orderly_poweroff);
--
1.8.1.2

2013-04-17 18:44:23

by Robin Holt

[permalink] [raw]
Subject: [PATCH -v5 5/5] Make reboot_cpuid a kernel parameter.

Moving the reboot=s<##> parameter for x86 to a kernel parameter
proper. I did not find any other arch that was specifying the
reboot cpu.

I left a compatibility mode in there. The new parameter always
takes precedence. I also fixed up the current code to support
up to cpuid's up to the current max of 4096.

Signed-off-by: Robin Holt <[email protected]>
To: Ingo Molnar <[email protected]>
To: Russ Anderson <[email protected]>
Cc: Shawn Guo <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Lai Jiangshan <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Linux Kernel Mailing List <[email protected]>
Cc: Michel Lespinasse <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: "Paul E. McKenney" <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Robin Holt <[email protected]>
Cc: "[email protected]" <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: the arch/x86 maintainers <[email protected]>
Cc: Thomas Gleixner <[email protected]>
---
Documentation/kernel-parameters.txt | 2 ++
arch/x86/kernel/reboot.c | 39 +++++++++++++------------------------
kernel/reboot.c | 6 +++++-
3 files changed, 20 insertions(+), 27 deletions(-)

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 4609e81..11ebb48 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -2597,6 +2597,8 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
Format: <reboot_mode>[,<reboot_mode2>[,...]]
See arch/*/kernel/reboot.c or arch/*/kernel/process.c

+ reboot_cpuid= [KNL] cpuid to use for reboot/halt, etc operations.
+
relax_domain_level=
[KNL, SMP] Set scheduler's default relax_domain_level.
See Documentation/cgroups/cpusets.txt.
diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
index 76fa1e9..cdcf28f 100644
--- a/arch/x86/kernel/reboot.c
+++ b/arch/x86/kernel/reboot.c
@@ -49,10 +49,6 @@ int reboot_force;
*/
static int reboot_default = 1;

-#ifdef CONFIG_SMP
-static int reboot_cpu = -1;
-#endif
-
/*
* This is set if we need to go through the 'emergency' path.
* When machine_emergency_restart() is called, we may be on
@@ -63,6 +59,8 @@ static int reboot_emergency;
/* This is set by the PCI code if either type 1 or type 2 PCI is detected */
bool port_cf9_safe = false;

+extern int reboot_cpuid;
+
/*
* reboot=b[ios] | s[mp] | t[riple] | k[bd] | e[fi] [, [w]arm | [c]old] | p[ci]
* warm Don't set the cold reboot flag
@@ -97,10 +95,14 @@ static int __init reboot_setup(char *str)

#ifdef CONFIG_SMP
case 's':
- if (isdigit(*(str+1))) {
- reboot_cpu = (int) (*(str+1) - '0');
- if (isdigit(*(str+2)))
- reboot_cpu = reboot_cpu*10 + (int)(*(str+2) - '0');
+ if (reboot_cpuid < 0) {
+ int coffset;
+
+ reboot_cpuid = 0;
+ for (coffset = 1; isdigit(*(str + coffset)); coffset++) {
+ reboot_cpuid *= 10;
+ reboot_cpuid += (int)(*(str+2) - '0');
+ }
}
/*
* We will leave sorting out the final value
@@ -615,25 +617,10 @@ void native_machine_shutdown(void)
/* Stop the cpus and apics */
#ifdef CONFIG_SMP

- /* The boot cpu is always logical cpu 0 */
- int reboot_cpu_id = 0;
-
- /* See if there has been given a command line override */
- if ((reboot_cpu != -1) && (reboot_cpu < nr_cpu_ids) &&
- cpu_online(reboot_cpu))
- reboot_cpu_id = reboot_cpu;
-
- /* Make certain the cpu I'm about to reboot on is online */
- if (!cpu_online(reboot_cpu_id))
- reboot_cpu_id = smp_processor_id();
-
- /* Make certain I only run on the appropriate processor */
- set_cpus_allowed_ptr(current, cpumask_of(reboot_cpu_id));
-
/*
- * O.K Now that I'm on the appropriate processor, stop all of the
- * others. Also disable the local irq to not receive the per-cpu
- * timer interrupt which may trigger scheduler's load balance.
+ * Stop all of the others. Also disable the local irq to
+ * not receive the per-cpu timer interrupt which may trigger
+ * scheduler's load balance.
*/
local_irq_disable();
stop_other_cpus();
diff --git a/kernel/reboot.c b/kernel/reboot.c
index c4eb2a4..04a162a 100644
--- a/kernel/reboot.c
+++ b/kernel/reboot.c
@@ -10,6 +10,7 @@
#include <linux/kexec.h>
#include <linux/kmod.h>
#include <linux/kmsg_dump.h>
+#include <linux/moduleparam.h>
#include <linux/reboot.h>
#include <linux/suspend.h>
#include <linux/syscalls.h>
@@ -69,10 +70,13 @@ int unregister_reboot_notifier(struct notifier_block *nb)
}
EXPORT_SYMBOL(unregister_reboot_notifier);

+int reboot_cpuid = -1;
+core_param(reboot_cpuid, reboot_cpuid, int, 0644);
+
static void migrate_to_reboot_cpu(void)
{
/* The boot cpu is always logical cpu 0 */
- int reboot_cpu_id = 0;
+ int reboot_cpu_id = reboot_cpuid;

cpu_hotplug_disable();

--
1.8.1.2

2013-04-17 19:13:57

by Robin Holt

[permalink] [raw]
Subject: Re: [PATCH -v5 4/5] checkpatch.pl the new kernel/reboot.c file.

On Wed, Apr 17, 2013 at 01:43:17PM -0500, Robin Holt wrote:
> I did allow the remaining 81 character line behind. It did not seem
> like it was worth changing. Otherwise, it now passes checkpatch.pl.
>
> Signed-off-by: Robin Holt <[email protected]>
> To: Ingo Molnar <[email protected]>
> To: Russ Anderson <[email protected]>
> Cc: Shawn Guo <[email protected]>
> Cc: Oleg Nesterov <[email protected]>
> Cc: Andrew Morton <[email protected]>
> Cc: "H. Peter Anvin" <[email protected]>
> Cc: Lai Jiangshan <[email protected]>
> Cc: Linus Torvalds <[email protected]>
> Cc: Linux Kernel Mailing List <[email protected]>
> Cc: Michel Lespinasse <[email protected]>
> Cc: Oleg Nesterov <[email protected]>
> Cc: "Paul E. McKenney" <[email protected]>
> Cc: Paul Mackerras <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Robin Holt <[email protected]>
> Cc: "[email protected]" <[email protected]>
> Cc: Tejun Heo <[email protected]>
> Cc: the arch/x86 maintainers <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Joe Perches <[email protected]>
>
> ---
>
> Changes since -v3:
> Integrated feedback from Joe Perches.
> ---
> kernel/reboot.c | 26 +++++++++++++-------------
> 1 file changed, 13 insertions(+), 13 deletions(-)
>
> diff --git a/kernel/reboot.c b/kernel/reboot.c
> index ff92601..c4eb2a4 100644
> --- a/kernel/reboot.c
> +++ b/kernel/reboot.c
> @@ -4,6 +4,8 @@
> * Copyright (C) 2013 Linus Torvalds
> */
>
> +#define pr_fmt(fmt) "reboot: " fmt
> +
> #include <linux/export.h>
> #include <linux/kexec.h>
> #include <linux/kmod.h>
> @@ -99,9 +101,9 @@ void kernel_restart(char *cmd)
> migrate_to_reboot_cpu();
> syscore_shutdown();
> if (!cmd)
> - printk(KERN_EMERG "Restarting system.\n");
> + pr_emerg("Restarting system\n");
> else
> - printk(KERN_EMERG "Restarting system with command '%s'.\n", cmd);
> + pr_emerg("Restarting system with command '%s'\n", cmd);
> kmsg_dump(KMSG_DUMP_RESTART);
> machine_restart(cmd);
> }
> @@ -110,7 +112,7 @@ EXPORT_SYMBOL_GPL(kernel_restart);
> static void kernel_shutdown_prepare(enum system_states state)
> {
> blocking_notifier_call_chain(&reboot_notifier_list,
> - (state == SYSTEM_HALT)?SYS_HALT:SYS_POWER_OFF, NULL);
> + (state == SYSTEM_HALT) ? SYS_HALT : SYS_POWER_OFF, NULL);
> system_state = state;
> usermodehelper_disable();
> device_shutdown();
> @@ -125,11 +127,10 @@ void kernel_halt(void)
> kernel_shutdown_prepare(SYSTEM_HALT);
> migrate_to_reboot_cpu();
> syscore_shutdown();
> - printk(KERN_EMERG "System halted.\n");
> + pr_emerg("System halted\n");
> kmsg_dump(KMSG_DUMP_HALT);
> machine_halt();
> }
> -
> EXPORT_SYMBOL_GPL(kernel_halt);
>
> /**
> @@ -144,7 +145,7 @@ void kernel_power_off(void)
> pm_power_off_prepare();
> migrate_to_reboot_cpu();
> syscore_shutdown();
> - printk(KERN_EMERG "Power down.\n");
> + pr_emerg("Power down\n");
> kmsg_dump(KMSG_DUMP_POWEROFF);
> machine_power_off();
> }
> @@ -173,10 +174,10 @@ SYSCALL_DEFINE4(reboot, int, magic1, int, magic2, unsigned int, cmd,
>
> /* For safety, we require "magic" arguments. */
> if (magic1 != LINUX_REBOOT_MAGIC1 ||
> - (magic2 != LINUX_REBOOT_MAGIC2 &&
> - magic2 != LINUX_REBOOT_MAGIC2A &&
> + (magic2 != LINUX_REBOOT_MAGIC2 &&
> + magic2 != LINUX_REBOOT_MAGIC2A &&
> magic2 != LINUX_REBOOT_MAGIC2B &&
> - magic2 != LINUX_REBOOT_MAGIC2C))
> + magic2 != LINUX_REBOOT_MAGIC2C))
> return -EINVAL;
>
> /*
> @@ -285,14 +286,13 @@ static int __orderly_poweroff(bool force)
> ret = call_usermodehelper(argv[0], argv, envp, UMH_WAIT_EXEC);
> argv_free(argv);
> } else {
> - printk(KERN_WARNING "%s failed to allocate memory for \"%s\"\n",
> - __func__, poweroff_cmd);
> + pr_warn("%s failed to allocate memory for \"%s\"\n",
> + __func__, poweroff_cmd);

Argh. I had intended to remove this. I will fix up my workarea and be
ready to not miss it on the next pass.

Robin

2013-04-17 19:37:13

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH -v5 5/5] Make reboot_cpuid a kernel parameter.

On 04/17/2013 11:43 AM, Robin Holt wrote:
> Moving the reboot=s<##> parameter for x86 to a kernel parameter
> proper. I did not find any other arch that was specifying the
> reboot cpu.
>
> I left a compatibility mode in there. The new parameter always
> takes precedence. I also fixed up the current code to support
> up to cpuid's up to the current max of 4096.
>

I still don't understand why you insist on introducing a new parameter
rather than just supporting the existing, somewhat ugly, syntax: it is
rare enough to need that the compatibility wins over the aestetics.

Furthermore that word "cpuid" that you keep using, I don't think it
means what you think it means...

-hpa

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.

2013-04-17 19:48:39

by Robin Holt

[permalink] [raw]
Subject: Re: [PATCH -v5 5/5] Make reboot_cpuid a kernel parameter.

On Wed, Apr 17, 2013 at 12:37:02PM -0700, H. Peter Anvin wrote:
> On 04/17/2013 11:43 AM, Robin Holt wrote:
> > Moving the reboot=s<##> parameter for x86 to a kernel parameter
> > proper. I did not find any other arch that was specifying the
> > reboot cpu.
> >
> > I left a compatibility mode in there. The new parameter always
> > takes precedence. I also fixed up the current code to support
> > up to cpuid's up to the current max of 4096.
> >
>
> I still don't understand why you insist on introducing a new parameter
> rather than just supporting the existing, somewhat ugly, syntax: it is
> rare enough to need that the compatibility wins over the aestetics.

Did you see my response I sent this morning?

I would really like to try and remove the apparently unused reboot=
parameter from arm and unicore32 as well. Does anybody have a concern
with that? That should make documenting slightly easier.

> Furthermore that word "cpuid" that you keep using, I don't think it
> means what you think it means...

If we stayed with the core_param, would you prefer reboot_processor=###
over reboot_cpuid=###?

Robin

2013-04-17 19:55:58

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH -v5 5/5] Make reboot_cpuid a kernel parameter.

On 04/17/2013 12:48 PM, Robin Holt wrote:
>
> Did you see my response I sent this morning?
>

I did not, although I just read it.

I have a hard time seeing maintaining backwards/forwards compatibility
as "very wrong" ... it would seem like a pretty major concern. In
comparison losing the currently nonexistent /sys file seems like a
rather minor issue.

My suggestion would be to have a universal parser for reboot= and have
the arch functions fed data in already parsed form.

> I would really like to try and remove the apparently unused reboot=
> parameter from arm and unicore32 as well. Does anybody have a concern
> with that? That should make documenting slightly easier.

You have to ask the arm and unicore32 maintainers that, obviously.

>> Furthermore that word "cpuid" that you keep using, I don't think it
>> means what you think it means...
>
> If we stayed with the core_param, would you prefer reboot_processor=###
> over reboot_cpuid=###?

reboot_cpu=<n>

-hpa

2013-04-17 20:00:25

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH -v5 5/5] Make reboot_cpuid a kernel parameter.

It is also worth noting that the documentation says reboot=s[mp]#
whereas in fact only reboot=s# parse correctly. I consider this to be a
bug.

If we centralized the parser, we could take a string like

"reboot=bios,smp32,warm"

and parse it into:

reboot_cpu = 32
reboot_mode = "bw"

... and pass the information in that form to the arch layer. I don't
think we can do more parsing at that in the main kernel.

-hpa

2013-04-17 20:15:37

by Robin Holt

[permalink] [raw]
Subject: Re: [PATCH -v5 5/5] Make reboot_cpuid a kernel parameter.

On Wed, Apr 17, 2013 at 12:59:57PM -0700, H. Peter Anvin wrote:
> It is also worth noting that the documentation says reboot=s[mp]#
> whereas in fact only reboot=s# parse correctly. I consider this to be a
> bug.
>
> If we centralized the parser, we could take a string like
>
> "reboot=bios,smp32,warm"
>
> and parse it into:
>
> reboot_cpu = 32
> reboot_mode = "bw"
>
> ... and pass the information in that form to the arch layer. I don't
> think we can do more parsing at that in the main kernel.

OK. I will go back to the drawing board again.

Robin

2013-04-18 00:17:30

by Robin Holt

[permalink] [raw]
Subject: Re: [PATCH -v5 5/5] Make reboot_cpuid a kernel parameter.

On Wed, Apr 17, 2013 at 03:15:33PM -0500, Robin Holt wrote:
> On Wed, Apr 17, 2013 at 12:59:57PM -0700, H. Peter Anvin wrote:
> > It is also worth noting that the documentation says reboot=s[mp]#
> > whereas in fact only reboot=s# parse correctly. I consider this to be a
> > bug.
> >
> > If we centralized the parser, we could take a string like
> >
> > "reboot=bios,smp32,warm"
> >
> > and parse it into:
> >
> > reboot_cpu = 32
> > reboot_mode = "bw"
> >
> > ... and pass the information in that form to the arch layer. I don't
> > think we can do more parsing at that in the main kernel.
>
> OK. I will go back to the drawing board again.

There are 4 items being parsed out of reboot= for x86:
- reboot_mode w[arm] | c[old]
- reboot_cpu s[mp]####
- reboot_type b[ios] | a[cpi] | k[bd] | t[riple] | e[fi] | p[ci]
- reboot_force f[orce]

This seems like a lot to push into the generic kernel just to make it
appear consistent when there will be no real cross arch consistency.


Contrast that with:
1) New kernel parameter (reboot_cpu) which is clear and concise, uses standard
parsing methods.
2) Backwards compatibility in that a user with an existing (broken) reboot=s32
on the command line will set reboot_cpu unless both were specified, in which
case reboot_cpu takes precedence.

What is so fundamentally wrong with that? It accomplishes exactly what
you had asked for in that existing users are not broken. We are introducing
a new functionality in the general kernel. Why not introduce a new parameter
associated with that functionality.

Thanks,
Robin

2013-04-18 00:30:58

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH -v5 5/5] Make reboot_cpuid a kernel parameter.

Better that than someone creating a completely different syntax.

Robin Holt <[email protected]> wrote:

>On Wed, Apr 17, 2013 at 03:15:33PM -0500, Robin Holt wrote:
>> On Wed, Apr 17, 2013 at 12:59:57PM -0700, H. Peter Anvin wrote:
>> > It is also worth noting that the documentation says reboot=s[mp]#
>> > whereas in fact only reboot=s# parse correctly. I consider this to
>be a
>> > bug.
>> >
>> > If we centralized the parser, we could take a string like
>> >
>> > "reboot=bios,smp32,warm"
>> >
>> > and parse it into:
>> >
>> > reboot_cpu = 32
>> > reboot_mode = "bw"
>> >
>> > ... and pass the information in that form to the arch layer. I
>don't
>> > think we can do more parsing at that in the main kernel.
>>
>> OK. I will go back to the drawing board again.
>
>There are 4 items being parsed out of reboot= for x86:
> - reboot_mode w[arm] | c[old]
> - reboot_cpu s[mp]####
> - reboot_type b[ios] | a[cpi] | k[bd] | t[riple] | e[fi] | p[ci]
> - reboot_force f[orce]
>
>This seems like a lot to push into the generic kernel just to make it
>appear consistent when there will be no real cross arch consistency.
>
>
>Contrast that with:
>1) New kernel parameter (reboot_cpu) which is clear and concise, uses
>standard
> parsing methods.
>2) Backwards compatibility in that a user with an existing (broken)
>reboot=s32
>on the command line will set reboot_cpu unless both were specified, in
>which
> case reboot_cpu takes precedence.
>
>What is so fundamentally wrong with that? It accomplishes exactly what
>you had asked for in that existing users are not broken. We are
>introducing
>a new functionality in the general kernel. Why not introduce a new
>parameter
>associated with that functionality.
>
>Thanks,
>Robin

--
Sent from my mobile phone. Please excuse brevity and lack of formatting.

2013-04-18 00:39:43

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH -v5 5/5] Make reboot_cpuid a kernel parameter.

On 04/17/2013 05:17 PM, Robin Holt wrote:
>
> There are 4 items being parsed out of reboot= for x86:
> - reboot_mode w[arm] | c[old]
> - reboot_cpu s[mp]####
> - reboot_type b[ios] | a[cpi] | k[bd] | t[riple] | e[fi] | p[ci]
> - reboot_force f[orce]
>
> This seems like a lot to push into the generic kernel just to make it
> appear consistent when there will be no real cross arch consistency.
>
> Contrast that with:
> 1) New kernel parameter (reboot_cpu) which is clear and concise, uses standard
> parsing methods.
> 2) Backwards compatibility in that a user with an existing (broken) reboot=s32
> on the command line will set reboot_cpu unless both were specified, in which
> case reboot_cpu takes precedence.
>
> What is so fundamentally wrong with that? It accomplishes exactly what
> you had asked for in that existing users are not broken. We are introducing
> a new functionality in the general kernel. Why not introduce a new parameter
> associated with that functionality.
>

You are confusing implementation with interface. That is what is so
fundamentally wrong with that. You really, really don't want to change
interface unless the world will end if you don't.

As far as why centralize -- the main concern I have is that someone
might try to introduce an arch-specific reboot= which is *syntactically*
different, which is yet again really awful from a user perspective.

-hpa

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.

2013-04-18 01:25:41

by Robin Holt

[permalink] [raw]
Subject: Re: [PATCH -v5 5/5] Make reboot_cpuid a kernel parameter.

On Wed, Apr 17, 2013 at 05:39:33PM -0700, H. Peter Anvin wrote:
> On 04/17/2013 05:17 PM, Robin Holt wrote:
> >
> > There are 4 items being parsed out of reboot= for x86:
> > - reboot_mode w[arm] | c[old]
> > - reboot_cpu s[mp]####
> > - reboot_type b[ios] | a[cpi] | k[bd] | t[riple] | e[fi] | p[ci]
> > - reboot_force f[orce]
> >
> > This seems like a lot to push into the generic kernel just to make it
> > appear consistent when there will be no real cross arch consistency.
> >
> > Contrast that with:
> > 1) New kernel parameter (reboot_cpu) which is clear and concise, uses standard
> > parsing methods.
> > 2) Backwards compatibility in that a user with an existing (broken) reboot=s32
> > on the command line will set reboot_cpu unless both were specified, in which
> > case reboot_cpu takes precedence.
> >
> > What is so fundamentally wrong with that? It accomplishes exactly what
> > you had asked for in that existing users are not broken. We are introducing
> > a new functionality in the general kernel. Why not introduce a new parameter
> > associated with that functionality.
> >
>
> You are confusing implementation with interface. That is what is so
> fundamentally wrong with that. You really, really don't want to change
> interface unless the world will end if you don't.
>
> As far as why centralize -- the main concern I have is that someone
> might try to introduce an arch-specific reboot= which is *syntactically*
> different, which is yet again really awful from a user perspective.

Yes and no. I am saying that the interface is garbage and already
specified as arch specific. You are asking me to take that garbage
interface and promote it to a general interface which will force us to
implement it in a completely crappy way.

Compare that with introducing a new interface which is concise and then
providing backwards compatibility. Add to that the fact, I don't need
to pollute the kernel with some poorly done x86 interface and leave that
cruft for others to clean up.

Thanks,
Robin

2013-04-18 02:04:59

by Robin Holt

[permalink] [raw]
Subject: Re: [PATCH -v5 5/5] Make reboot_cpuid a kernel parameter.

On Wed, Apr 17, 2013 at 08:25:39PM -0500, Robin Holt wrote:
> On Wed, Apr 17, 2013 at 05:39:33PM -0700, H. Peter Anvin wrote:
> > On 04/17/2013 05:17 PM, Robin Holt wrote:
> > >
> > > There are 4 items being parsed out of reboot= for x86:
> > > - reboot_mode w[arm] | c[old]
> > > - reboot_cpu s[mp]####
> > > - reboot_type b[ios] | a[cpi] | k[bd] | t[riple] | e[fi] | p[ci]
> > > - reboot_force f[orce]
> > >
> > > This seems like a lot to push into the generic kernel just to make it
> > > appear consistent when there will be no real cross arch consistency.
> > >
> > > Contrast that with:
> > > 1) New kernel parameter (reboot_cpu) which is clear and concise, uses standard
> > > parsing methods.
> > > 2) Backwards compatibility in that a user with an existing (broken) reboot=s32
> > > on the command line will set reboot_cpu unless both were specified, in which
> > > case reboot_cpu takes precedence.
> > >
> > > What is so fundamentally wrong with that? It accomplishes exactly what
> > > you had asked for in that existing users are not broken. We are introducing
> > > a new functionality in the general kernel. Why not introduce a new parameter
> > > associated with that functionality.
> > >
> >
> > You are confusing implementation with interface. That is what is so
> > fundamentally wrong with that. You really, really don't want to change
> > interface unless the world will end if you don't.
> >
> > As far as why centralize -- the main concern I have is that someone
> > might try to introduce an arch-specific reboot= which is *syntactically*
> > different, which is yet again really awful from a user perspective.
>
> Yes and no. I am saying that the interface is garbage and already
> specified as arch specific. You are asking me to take that garbage
> interface and promote it to a general interface which will force us to
> implement it in a completely crappy way.
>
> Compare that with introducing a new interface which is concise and then
> providing backwards compatibility. Add to that the fact, I don't need
> to pollute the kernel with some poorly done x86 interface and leave that
> cruft for others to clean up.

OK. Here goes:

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 4609e81..35af99e 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -2593,9 +2593,17 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
Run specified binary instead of /init from the ramdisk,
used for early userspace startup. See initrd.

- reboot= [BUGS=X86-32,BUGS=ARM,BUGS=IA-64] Rebooting mode
- Format: <reboot_mode>[,<reboot_mode2>[,...]]
- See arch/*/kernel/reboot.c or arch/*/kernel/process.c
+ reboot= [KNL]
+ Format:
+ [w[arm] | c[old]] \
+ [,b[ios] | a[cpi] | k[bd] | t[riple] | e[fi] | p[ci]] \
+ [,f[orce] \
+ [,] s[mp]####
+ Where reboot_mode is one of warm or cold,
+ reboot_type is one of bios, acpi, kbd, triple, efi, or pci,
+ reboot_force is either force or not specified,
+ and reboot_cpu is s[mp]#### with #### being the
+ processor to be used for rebooting.

relax_domain_level=
[KNL, SMP] Set scheduler's default relax_domain_level.
diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
index 76fa1e9..f9e8bf4 100644
--- a/arch/x86/kernel/reboot.c
+++ b/arch/x86/kernel/reboot.c
@@ -36,22 +36,11 @@ void (*pm_power_off)(void);
EXPORT_SYMBOL(pm_power_off);

static const struct desc_ptr no_idt = {};
-static int reboot_mode;
-enum reboot_type reboot_type = BOOT_ACPI;
-int reboot_force;
+extern int reboot_mode;
+extern enum reboot_type reboot_type;
+extern int reboot_force;

-/*
- * This variable is used privately to keep track of whether or not
- * reboot_type is still set to its default value (i.e., reboot= hasn't
- * been set on the command line). This is needed so that we can
- * suppress DMI scanning for reboot quirks. Without it, it's
- * impossible to override a faulty reboot quirk without recompiling.
- */
-static int reboot_default = 1;
-
-#ifdef CONFIG_SMP
-static int reboot_cpu = -1;
-#endif
+extern int reboot_default;

/*
* This is set if we need to go through the 'emergency' path.
@@ -63,78 +52,6 @@ static int reboot_emergency;
/* This is set by the PCI code if either type 1 or type 2 PCI is detected */
bool port_cf9_safe = false;

-/*
- * reboot=b[ios] | s[mp] | t[riple] | k[bd] | e[fi] [, [w]arm | [c]old] | p[ci]
- * warm Don't set the cold reboot flag
- * cold Set the cold reboot flag
- * bios Reboot by jumping through the BIOS
- * smp Reboot by executing reset on BSP or other CPU
- * triple Force a triple fault (init)
- * kbd Use the keyboard controller. cold reset (default)
- * acpi Use the RESET_REG in the FADT
- * efi Use efi reset_system runtime service
- * pci Use the so-called "PCI reset register", CF9
- * force Avoid anything that could hang.
- */
-static int __init reboot_setup(char *str)
-{
- for (;;) {
- /*
- * Having anything passed on the command line via
- * reboot= will cause us to disable DMI checking
- * below.
- */
- reboot_default = 0;
-
- switch (*str) {
- case 'w':
- reboot_mode = 0x1234;
- break;
-
- case 'c':
- reboot_mode = 0;
- break;
-
-#ifdef CONFIG_SMP
- case 's':
- if (isdigit(*(str+1))) {
- reboot_cpu = (int) (*(str+1) - '0');
- if (isdigit(*(str+2)))
- reboot_cpu = reboot_cpu*10 + (int)(*(str+2) - '0');
- }
- /*
- * We will leave sorting out the final value
- * when we are ready to reboot, since we might not
- * have detected BSP APIC ID or smp_num_cpu
- */
- break;
-#endif /* CONFIG_SMP */
-
- case 'b':
- case 'a':
- case 'k':
- case 't':
- case 'e':
- case 'p':
- reboot_type = *str;
- break;
-
- case 'f':
- reboot_force = 1;
- break;
- }
-
- str = strchr(str, ',');
- if (str)
- str++;
- else
- break;
- }
- return 1;
-}
-
-__setup("reboot=", reboot_setup);
-

/*
* Reboot options and system auto-detection code provided by
@@ -614,26 +531,10 @@ void native_machine_shutdown(void)
{
/* Stop the cpus and apics */
#ifdef CONFIG_SMP
-
- /* The boot cpu is always logical cpu 0 */
- int reboot_cpu_id = 0;
-
- /* See if there has been given a command line override */
- if ((reboot_cpu != -1) && (reboot_cpu < nr_cpu_ids) &&
- cpu_online(reboot_cpu))
- reboot_cpu_id = reboot_cpu;
-
- /* Make certain the cpu I'm about to reboot on is online */
- if (!cpu_online(reboot_cpu_id))
- reboot_cpu_id = smp_processor_id();
-
- /* Make certain I only run on the appropriate processor */
- set_cpus_allowed_ptr(current, cpumask_of(reboot_cpu_id));
-
/*
- * O.K Now that I'm on the appropriate processor, stop all of the
- * others. Also disable the local irq to not receive the per-cpu
- * timer interrupt which may trigger scheduler's load balance.
+ * Stop all of the others. Also disable the local irq to
+ * not receive the per-cpu timer interrupt which may trigger
+ * scheduler's load balance.
*/
local_irq_disable();
stop_other_cpus();
diff --git a/kernel/reboot.c b/kernel/reboot.c
index 7f4658f..3448a1d 100644
--- a/kernel/reboot.c
+++ b/kernel/reboot.c
@@ -6,6 +6,7 @@

#define pr_fmt(fmt) "reboot: " fmt

+#include <linux/ctype.h>
#include <linux/export.h>
#include <linux/kexec.h>
#include <linux/kmod.h>
@@ -69,22 +70,107 @@ int unregister_reboot_notifier(struct notifier_block *nb)
}
EXPORT_SYMBOL(unregister_reboot_notifier);

+/*
+ * This variable is used privately to keep track of whether or not
+ * reboot_type is still set to its default value (i.e., reboot= hasn't
+ * been set on the command line). This is needed so that we can
+ * suppress DMI scanning for reboot quirks. Without it, it's
+ * impossible to override a faulty reboot quirk without recompiling.
+ */
+int reboot_default = 1;
+int reboot_mode;
+enum reboot_type reboot_type = BOOT_ACPI;
+int reboot_force;
+static int reboot_cpu;
+
+/*
+ * reboot=b[ios] | s[mp] | t[riple] | k[bd] | e[fi] [, [w]arm | [c]old] | p[ci]
+ * warm Don't set the cold reboot flag
+ * cold Set the cold reboot flag
+ * bios Reboot by jumping through the BIOS
+ * smp Reboot by executing reset on BSP or other CPU
+ * triple Force a triple fault (init)
+ * kbd Use the keyboard controller. cold reset (default)
+ * acpi Use the RESET_REG in the FADT
+ * efi Use efi reset_system runtime service
+ * pci Use the so-called "PCI reset register", CF9
+ * force Avoid anything that could hang.
+ */
+static int __init reboot_setup(char *str)
+{
+ int i;
+
+ for (;;) {
+ /*
+ * Having anything passed on the command line via
+ * reboot= will cause us to disable DMI checking
+ * below.
+ */
+ reboot_default = 0;
+
+ switch (*str) {
+ case 'w':
+ reboot_mode = 0x1234;
+ break;
+
+ case 'c':
+ reboot_mode = 0;
+ break;
+
+#ifdef CONFIG_SMP
+ case 's':
+ if (*(str + 1) == 'm' && *(str + 2) == 'p')
+ str += 2;
+
+ reboot_cpu = 0;
+ for (i = 1; isdigit(*(str + i)); i++) {
+ reboot_cpu *= 10;
+ reboot_cpu += (int)(*(str + i) - '0');
+ }
+
+ break;
+#endif /* CONFIG_SMP */
+
+ case 'b':
+ case 'a':
+ case 'k':
+ case 't':
+ case 'e':
+ case 'p':
+ reboot_type = *str;
+ break;
+
+ case 'f':
+ reboot_force = 1;
+ break;
+ }
+
+ str = strchr(str, ',');
+ if (str)
+ str++;
+ else
+ break;
+ }
+ return 1;
+}
+__setup("reboot=", reboot_setup);
+
+
static void migrate_to_reboot_cpu(void)
{
- /* The boot cpu is always logical cpu 0 */
- int reboot_cpu_id = 0;
+ int cpu = reboot_cpu;

cpu_hotplug_disable();

/* Make certain the cpu I'm about to reboot on is online */
- if (!cpu_online(reboot_cpu_id))
- reboot_cpu_id = cpumask_first(cpu_online_mask);
+ if (!cpu_online(cpu))
+ cpu = cpumask_first(cpu_online_mask);

/* Prevent races with other tasks migrating this task */
current->flags |= PF_THREAD_BOUND;

/* Make certain I only run on the appropriate processor */
- set_cpus_allowed_ptr(current, cpumask_of(reboot_cpu_id));
+ set_cpus_allowed_ptr(current, cpumask_of(cpu));
}

/**