Hi,
There are multiple problems with sysdevs, or struct sys_device objects to
be precise, that are so annoying that some people have started to think
of removind them entirely from the kernel. To me, personally, the most
obvious issue is the way sysdevs are used for defining suspend/resume
callbacks to be executed with one CPU on-line and interrupts disabled.
Greg and Kay may tell you more about the other problems with sysdevs. :-)
Some subsystems need to carry out certain operations during suspend after
we've disabled non-boot CPUs and interrupts have been switched off on the
only on-line one. Currently, the only way to achieve that is to define
sysdev suspend/resume callbacks, but this is cumbersome and inefficient.
Namely, to do that, one has to define a sysdev class providing the callbacks
and a sysdev actually using them, which is excessively complicated. Moreover,
the sysdev suspend/resume callbacks take arguments that are not really used
by the majority of subsystems defining sysdev suspend/resume callbacks
(or even if they are used, they don't really _need_ to be used, so they
are simply unnecessary). Of course, if a sysdev is only defined to provide
suspend/resume (and maybe shutdown) callbacks, there's no real reason why
it should show up in sysfs.
For this reason, I thought it would be a good idea to provide a simpler
interface for subsystems to define "very late" suspend callbacks and
"very early" resume callbacks (and "very late" shutdown callbacks as well)
without the entire bloat related to sysdevs. The interface is introduced
by the first of the following patches, while the second patch converts some
sysdev users related to the x86 architecture to using the new interface.
I believe that call sysdev users who need to define suspend/resume/shutdown
callbacks may be converted to using the interface provided by the first patch,
which in turn should allow us to convert the remaining sysdev functionality
into "normal" struct device interfaces. Still, even if that turns out to be
too complicated, the bloat reduction resulting from the second patch kind of
shows that moving at least some sysdev users to a simpler interface (like in
the first patch) is a good idea anyway.
This is a proof of concept, so the patches have not been tested. Please be
extrememly careful, because they touch sensitive code, so to speak. In the
majority of cases the changes are rather straightforward, but there are some
more interesting cases as well (io_apic.c most importantly).
Please have a look and tell me what you think.
Thanks,
Rafael
Some subsystems need to carry out suspend/resume and shutdown
operations with one CPU on-line and interrupts disabled. The only
way to register such operations is to define a sysdev class and
a sysdev specifically for this purpose which is cumbersome and
inefficient. Moreover, the arguments taken by sysdev suspend,
resume and shutdown callbacks are practically never necessary.
For this reason, introduce a simpler interface allowing subsystems
to register operations to be executed very late during system suspend
and shutdown and very early during resume in the form of
strcut syscore_ops objects.
---
drivers/base/Makefile | 2
drivers/base/syscore.c | 102 ++++++++++++++++++++++++++++++++++++++++++++
include/linux/syscore_ops.h | 39 ++++++++++++++++
kernel/power/hibernate.c | 9 +++
kernel/power/suspend.c | 4 +
kernel/sys.c | 4 +
6 files changed, 159 insertions(+), 1 deletion(-)
Index: linux-2.6/include/linux/syscore_ops.h
===================================================================
--- /dev/null
+++ linux-2.6/include/linux/syscore_ops.h
@@ -0,0 +1,39 @@
+/*
+ * syscore_ops.h - System core operations.
+ *
+ * Copyright (C) 2011 Rafael J. Wysocki <[email protected]>, Novell Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ */
+
+#ifndef _LINUX_SYSCORE_OPS_H
+#define _LINUX_SYSCORE_OPS_H
+
+#include <linux/list.h>
+
+struct syscore_ops {
+ struct list_head node;
+ int (*suspend)(void);
+ void (*resume)(void);
+ void (*shutdown)(void);
+};
+
+extern void register_syscore_ops(struct syscore_ops *ops);
+extern void unregister_syscore_ops(struct syscore_ops *ops);
+extern int syscore_suspend(void);
+extern void syscore_resume(void);
+extern void syscore_shutdown(void);
+
+#endif
Index: linux-2.6/drivers/base/syscore.c
===================================================================
--- /dev/null
+++ linux-2.6/drivers/base/syscore.c
@@ -0,0 +1,102 @@
+/*
+ * syscore.c - Execution of system core operations.
+ *
+ * Copyright (C) 2011 Rafael J. Wysocki <[email protected]>, Novell Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distribqted in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ */
+
+#include <linux/syscore_ops.h>
+#include <linux/mutex.h>
+#include <linux/module.h>
+
+static LIST_HEAD(syscore_ops_list);
+static DEFINE_MUTEX(syscore_ops_lock);
+
+/**
+ * register_syscore_ops - Register a set of system core operatmons.
+ * @ops: System core operations to register.
+ */
+void register_syscore_ops(struct syscore_ops *ops)
+{
+ mutex_lock(&syscore_ops_lock);
+ list_add_tail(&ops->node, &syscore_ops_list);
+ mutex_unlock(&syscore_ops_lock);
+}
+EXPORT_SYMBOL_GPL(register_syscore_ops);
+
+/**
+ * unregister_syscore_ops - Unregister a set of system core operations.
+ * @ops: System core operations to unregister.
+ */
+void unregister_syscore_ops(struct syscore_ops *ops)
+{
+ mutex_lock(&syscore_ops_lock);
+ list_del(&ops->node);
+ mutex_unlock(&syscore_ops_lock);
+}
+EXPORT_SYMBOL_GPL(unregister_syscore_ops);
+
+/**
+ * syscore_suspend - Execute all the registered system core suspend callbacks.
+ *
+ * This function is executed with one CPU on-line and disabled interrupts.
+ */
+int syscore_suspend(void)
+{
+ struct syscore_ops *ops;
+
+ list_for_each_entry_reverse(ops, &syscore_ops_list, node)
+ if (ops->suspend) {
+ int ret = ops->suspend();
+ if (ret) {
+ pr_err("PM: System core suspend callback "
+ "%pF failed.\n", ops->suspend);
+ return ret;
+ }
+ }
+
+ return 0;
+}
+
+/**
+ * syscore_resume - Execute all the registered system core resume callbacks.
+ *
+ * This function is executed with one CPU on-line and disabled interrupts.
+ */
+void syscore_resume(void)
+{
+ struct syscore_ops *ops;
+
+ list_for_each_entry(ops, &syscore_ops_list, node)
+ if (ops->resume)
+ ops->resume();
+}
+
+/**
+ * syscore_shutdown - Execute all the registered system core shutdown callbacks.
+ */
+void syscore_shutdown(void)
+{
+ struct syscore_ops *ops;
+
+ mutex_lock(&syscore_ops_lock);
+
+ list_for_each_entry(ops, &syscore_ops_list, node)
+ if (ops->shutdown)
+ ops->shutdown();
+
+ mutex_unlock(&syscore_ops_lock);
+}
Index: linux-2.6/kernel/power/suspend.c
===================================================================
--- linux-2.6.orig/kernel/power/suspend.c
+++ linux-2.6/kernel/power/suspend.c
@@ -22,6 +22,7 @@
#include <linux/mm.h>
#include <linux/slab.h>
#include <linux/suspend.h>
+#include <linux/syscore_ops.h>
#include <trace/events/power.h>
#include "power.h"
@@ -163,11 +164,14 @@ static int suspend_enter(suspend_state_t
BUG_ON(!irqs_disabled());
error = sysdev_suspend(PMSG_SUSPEND);
+ if (!error)
+ error = syscore_suspend();
if (!error) {
if (!(suspend_test(TEST_CORE) || pm_wakeup_pending())) {
error = suspend_ops->enter(state);
events_check_enabled = false;
}
+ syscore_resume();
sysdev_resume();
}
Index: linux-2.6/kernel/power/hibernate.c
===================================================================
--- linux-2.6.orig/kernel/power/hibernate.c
+++ linux-2.6/kernel/power/hibernate.c
@@ -23,6 +23,7 @@
#include <linux/cpu.h>
#include <linux/freezer.h>
#include <linux/gfp.h>
+#include <linux/syscore_ops.h>
#include <scsi/scsi_scan.h>
#include <asm/suspend.h>
@@ -272,6 +273,8 @@ static int create_image(int platform_mod
local_irq_disable();
error = sysdev_suspend(PMSG_FREEZE);
+ if (!error)
+ error = syscore_suspend();
if (error) {
printk(KERN_ERR "PM: Some system devices failed to power down, "
"aborting hibernation\n");
@@ -295,6 +298,7 @@ static int create_image(int platform_mod
}
Power_up:
+ syscore_resume();
sysdev_resume();
/* NOTE: dpm_resume_noirq() is just a resume() for devices
* that suspended with irqs off ... no overall powerup.
@@ -403,6 +407,8 @@ static int resume_target_kernel(bool pla
local_irq_disable();
error = sysdev_suspend(PMSG_QUIESCE);
+ if (!error)
+ error = syscore_suspend();
if (error)
goto Enable_irqs;
@@ -429,6 +435,7 @@ static int resume_target_kernel(bool pla
restore_processor_state();
touch_softlockup_watchdog();
+ syscore_resume();
sysdev_resume();
Enable_irqs:
@@ -516,6 +523,7 @@ int hibernation_platform_enter(void)
local_irq_disable();
sysdev_suspend(PMSG_HIBERNATE);
+ syscore_suspend();
if (pm_wakeup_pending()) {
error = -EAGAIN;
goto Power_up;
@@ -526,6 +534,7 @@ int hibernation_platform_enter(void)
while (1);
Power_up:
+ syscore_resume();
sysdev_resume();
local_irq_enable();
enable_nonboot_cpus();
Index: linux-2.6/kernel/sys.c
===================================================================
--- linux-2.6.orig/kernel/sys.c
+++ linux-2.6/kernel/sys.c
@@ -37,6 +37,7 @@
#include <linux/ptrace.h>
#include <linux/fs_struct.h>
#include <linux/gfp.h>
+#include <linux/syscore_ops.h>
#include <linux/compat.h>
#include <linux/syscalls.h>
@@ -298,6 +299,7 @@ void kernel_restart_prepare(char *cmd)
system_state = SYSTEM_RESTART;
device_shutdown();
sysdev_shutdown();
+ syscore_shutdown();
}
/**
@@ -336,6 +338,7 @@ void kernel_halt(void)
{
kernel_shutdown_prepare(SYSTEM_HALT);
sysdev_shutdown();
+ syscore_shutdown();
printk(KERN_EMERG "System halted.\n");
kmsg_dump(KMSG_DUMP_HALT);
machine_halt();
@@ -355,6 +358,7 @@ void kernel_power_off(void)
pm_power_off_prepare();
disable_nonboot_cpus();
sysdev_shutdown();
+ syscore_shutdown();
printk(KERN_EMERG "Power down.\n");
kmsg_dump(KMSG_DUMP_POWEROFF);
machine_power_off();
Index: linux-2.6/drivers/base/Makefile
===================================================================
--- linux-2.6.orig/drivers/base/Makefile
+++ linux-2.6/drivers/base/Makefile
@@ -1,6 +1,6 @@
# Makefile for the Linux device tree
-obj-y := core.o sys.o bus.o dd.o \
+obj-y := core.o sys.o bus.o dd.o syscore.o \
driver.o class.o platform.o \
cpu.o firmware.o init.o map.o devres.o \
attribute_container.o transport_class.o
Some subsystems need to carry out suspend/resume and shutdown
operations with one CPU on-line and interrupts disabled and they
define sysdev classes and sysdevs specifically for this purpose.
This leads to unnecessarily complicated code and excessive memory
usage, so switch them to using struct syscore_ops objects for this
purpose instead.
---
arch/x86/kernel/amd_iommu_init.c | 26 ++--------
arch/x86/kernel/apic/apic.c | 29 +++--------
arch/x86/kernel/apic/io_apic.c | 97 ++++++++++++++++++---------------------
arch/x86/kernel/i8237.c | 30 ++----------
arch/x86/kernel/i8259.c | 33 ++++---------
arch/x86/kernel/pci-gart_64.c | 32 ++----------
arch/x86/oprofile/nmi_int.c | 44 ++++-------------
drivers/acpi/pci_link.c | 30 +++---------
drivers/pci/intel-iommu.c | 38 +++------------
kernel/time/timekeeping.c | 27 +++-------
10 files changed, 121 insertions(+), 265 deletions(-)
Index: linux-2.6/arch/x86/kernel/amd_iommu_init.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/amd_iommu_init.c
+++ linux-2.6/arch/x86/kernel/amd_iommu_init.c
@@ -21,7 +21,7 @@
#include <linux/acpi.h>
#include <linux/list.h>
#include <linux/slab.h>
-#include <linux/sysdev.h>
+#include <linux/syscore_ops.h>
#include <linux/interrupt.h>
#include <linux/msi.h>
#include <asm/pci-direct.h>
@@ -1260,7 +1260,7 @@ static void disable_iommus(void)
* disable suspend until real resume implemented
*/
-static int amd_iommu_resume(struct sys_device *dev)
+static void amd_iommu_resume(void)
{
struct amd_iommu *iommu;
@@ -1276,11 +1276,9 @@ static int amd_iommu_resume(struct sys_d
*/
amd_iommu_flush_all_devices();
amd_iommu_flush_all_domains();
-
- return 0;
}
-static int amd_iommu_suspend(struct sys_device *dev, pm_message_t state)
+static int amd_iommu_suspend(void)
{
/* disable IOMMUs to go out of the way for BIOS */
disable_iommus();
@@ -1288,17 +1286,11 @@ static int amd_iommu_suspend(struct sys_
return 0;
}
-static struct sysdev_class amd_iommu_sysdev_class = {
- .name = "amd_iommu",
+static struct syscore_ops amd_iommu_syscore_ops = {
.suspend = amd_iommu_suspend,
.resume = amd_iommu_resume,
};
-static struct sys_device device_amd_iommu = {
- .id = 0,
- .cls = &amd_iommu_sysdev_class,
-};
-
/*
* This is the core init function for AMD IOMMU hardware in the system.
* This function is called from the generic x86 DMA layer initialization
@@ -1415,14 +1407,6 @@ static int __init amd_iommu_init(void)
goto free;
}
- ret = sysdev_class_register(&amd_iommu_sysdev_class);
- if (ret)
- goto free;
-
- ret = sysdev_register(&device_amd_iommu);
- if (ret)
- goto free;
-
ret = amd_iommu_init_devices();
if (ret)
goto free;
@@ -1441,6 +1425,8 @@ static int __init amd_iommu_init(void)
amd_iommu_init_notifier();
+ register_syscore_ops(&amd_iommu_syscore_ops);
+
if (iommu_pass_through)
goto out;
Index: linux-2.6/arch/x86/kernel/apic/apic.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/apic/apic.c
+++ linux-2.6/arch/x86/kernel/apic/apic.c
@@ -24,7 +24,7 @@
#include <linux/ftrace.h>
#include <linux/ioport.h>
#include <linux/module.h>
-#include <linux/sysdev.h>
+#include <linux/syscore_ops.h>
#include <linux/delay.h>
#include <linux/timex.h>
#include <linux/dmar.h>
@@ -2036,7 +2036,7 @@ static struct {
unsigned int apic_thmr;
} apic_pm_state;
-static int lapic_suspend(struct sys_device *dev, pm_message_t state)
+static int lapic_suspend(void)
{
unsigned long flags;
int maxlvt;
@@ -2074,7 +2074,7 @@ static int lapic_suspend(struct sys_devi
return 0;
}
-static int lapic_resume(struct sys_device *dev)
+static void lapic_resume(void)
{
unsigned int l, h;
unsigned long flags;
@@ -2083,7 +2083,7 @@ static int lapic_resume(struct sys_devic
struct IO_APIC_route_entry **ioapic_entries = NULL;
if (!apic_pm_state.active)
- return 0;
+ return;
local_irq_save(flags);
if (intr_remapping_enabled) {
@@ -2152,8 +2152,6 @@ static int lapic_resume(struct sys_devic
}
restore:
local_irq_restore(flags);
-
- return ret;
}
/*
@@ -2161,17 +2159,11 @@ restore:
* are needed on every CPU up until machine_halt/restart/poweroff.
*/
-static struct sysdev_class lapic_sysclass = {
- .name = "lapic",
+static struct syscore_ops lapic_syscore_ops = {
.resume = lapic_resume,
.suspend = lapic_suspend,
};
-static struct sys_device device_lapic = {
- .id = 0,
- .cls = &lapic_sysclass,
-};
-
static void __cpuinit apic_pm_activate(void)
{
apic_pm_state.active = 1;
@@ -2179,16 +2171,11 @@ static void __cpuinit apic_pm_activate(v
static int __init init_lapic_sysfs(void)
{
- int error;
-
- if (!cpu_has_apic)
- return 0;
/* XXX: remove suspend/resume procs if !apic_pm_state.active? */
+ if (cpu_has_apic)
+ register_syscore_ops(&lapic_syscore_ops);
- error = sysdev_class_register(&lapic_sysclass);
- if (!error)
- error = sysdev_register(&device_lapic);
- return error;
+ return 0;
}
/* local apic needs to resume before other devices access its registers. */
Index: linux-2.6/arch/x86/kernel/apic/io_apic.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/apic/io_apic.c
+++ linux-2.6/arch/x86/kernel/apic/io_apic.c
@@ -30,7 +30,7 @@
#include <linux/compiler.h>
#include <linux/acpi.h>
#include <linux/module.h>
-#include <linux/sysdev.h>
+#include <linux/syscore_ops.h>
#include <linux/msi.h>
#include <linux/htirq.h>
#include <linux/freezer.h>
@@ -2948,89 +2948,84 @@ static int __init io_apic_bug_finalize(v
late_initcall(io_apic_bug_finalize);
-struct sysfs_ioapic_data {
- struct sys_device dev;
- struct IO_APIC_route_entry entry[0];
-};
-static struct sysfs_ioapic_data * mp_ioapic_data[MAX_IO_APICS];
+static struct IO_APIC_route_entry *ioapic_saved_data[MAX_IO_APICS];
-static int ioapic_suspend(struct sys_device *dev, pm_message_t state)
+static void suspend_ioapic(int ioapic_id)
{
- struct IO_APIC_route_entry *entry;
- struct sysfs_ioapic_data *data;
+ struct IO_APIC_route_entry *saved_data = ioapic_saved_data[ioapic_id];
int i;
- data = container_of(dev, struct sysfs_ioapic_data, dev);
- entry = data->entry;
- for (i = 0; i < nr_ioapic_registers[dev->id]; i ++, entry ++ )
- *entry = ioapic_read_entry(dev->id, i);
+ if (!saved_data)
+ return;
+
+ for (i = 0; i < nr_ioapic_registers[ioapic_id]; i++)
+ saved_data[i] = ioapic_read_entry(ioapic_id, i);
+}
+
+static int ioapic_suspend(void)
+{
+ int ioapic_id;
+
+ for (ioapic_id = 0; ioapic_id < nr_ioapics; ioapic_id++)
+ suspend_ioapic(ioapic_id);
return 0;
}
-static int ioapic_resume(struct sys_device *dev)
+static void resume_ioapic(int ioapic_id)
{
- struct IO_APIC_route_entry *entry;
- struct sysfs_ioapic_data *data;
+ struct IO_APIC_route_entry *saved_data = ioapic_saved_data[ioapic_id];
unsigned long flags;
union IO_APIC_reg_00 reg_00;
int i;
- data = container_of(dev, struct sysfs_ioapic_data, dev);
- entry = data->entry;
+ if (!saved_data)
+ return;
raw_spin_lock_irqsave(&ioapic_lock, flags);
- reg_00.raw = io_apic_read(dev->id, 0);
- if (reg_00.bits.ID != mp_ioapics[dev->id].apicid) {
- reg_00.bits.ID = mp_ioapics[dev->id].apicid;
- io_apic_write(dev->id, 0, reg_00.raw);
+ reg_00.raw = io_apic_read(ioapic_id, 0);
+ if (reg_00.bits.ID != mp_ioapics[ioapic_id].apicid) {
+ reg_00.bits.ID = mp_ioapics[ioapic_id].apicid;
+ io_apic_write(ioapic_id, 0, reg_00.raw);
}
raw_spin_unlock_irqrestore(&ioapic_lock, flags);
- for (i = 0; i < nr_ioapic_registers[dev->id]; i++)
- ioapic_write_entry(dev->id, i, entry[i]);
+ for (i = 0; i < nr_ioapic_registers[ioapic_id]; i++)
+ ioapic_write_entry(ioapic_id, i, saved_data[i]);
+}
- return 0;
+static void ioapic_resume(void)
+{
+ int ioapic_id;
+
+ for (ioapic_id = nr_ioapics - 1; ioapic_id >= 0; ioapic_id--)
+ resume_ioapic(ioapic_id);
}
-static struct sysdev_class ioapic_sysdev_class = {
- .name = "ioapic",
+static struct syscore_ops ioapic_syscore_ops = {
.suspend = ioapic_suspend,
.resume = ioapic_resume,
};
-static int __init ioapic_init_sysfs(void)
+static int __init ioapic_init_ops(void)
{
- struct sys_device * dev;
- int i, size, error;
+ int i;
- error = sysdev_class_register(&ioapic_sysdev_class);
- if (error)
- return error;
+ for (i = 0; i < nr_ioapics; i++) {
+ unsigned int size;
- for (i = 0; i < nr_ioapics; i++ ) {
- size = sizeof(struct sys_device) + nr_ioapic_registers[i]
+ size = nr_ioapic_registers[i]
* sizeof(struct IO_APIC_route_entry);
- mp_ioapic_data[i] = kzalloc(size, GFP_KERNEL);
- if (!mp_ioapic_data[i]) {
- printk(KERN_ERR "Can't suspend/resume IOAPIC %d\n", i);
- continue;
- }
- dev = &mp_ioapic_data[i]->dev;
- dev->id = i;
- dev->cls = &ioapic_sysdev_class;
- error = sysdev_register(dev);
- if (error) {
- kfree(mp_ioapic_data[i]);
- mp_ioapic_data[i] = NULL;
- printk(KERN_ERR "Can't suspend/resume IOAPIC %d\n", i);
- continue;
- }
+ ioapic_saved_data[i] = kzalloc(size, GFP_KERNEL);
+ if (!ioapic_saved_data[i])
+ pr_err("IOAPIC %d: suspend/resume impossible!\n", i);
}
+ register_syscore_ops(&ioapic_syscore_ops);
+
return 0;
}
-device_initcall(ioapic_init_sysfs);
+device_initcall(ioapic_init_ops);
/*
* Dynamic irq allocate and deallocation
Index: linux-2.6/arch/x86/kernel/i8237.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/i8237.c
+++ linux-2.6/arch/x86/kernel/i8237.c
@@ -10,7 +10,7 @@
*/
#include <linux/init.h>
-#include <linux/sysdev.h>
+#include <linux/syscore_ops.h>
#include <asm/dma.h>
@@ -21,7 +21,7 @@
* in asm/dma.h.
*/
-static int i8237A_resume(struct sys_device *dev)
+static void i8237A_resume(void)
{
unsigned long flags;
int i;
@@ -41,31 +41,15 @@ static int i8237A_resume(struct sys_devi
enable_dma(4);
release_dma_lock(flags);
-
- return 0;
}
-static int i8237A_suspend(struct sys_device *dev, pm_message_t state)
-{
- return 0;
-}
-
-static struct sysdev_class i8237_sysdev_class = {
- .name = "i8237",
- .suspend = i8237A_suspend,
+static struct syscore_ops i8237_syscore_ops = {
.resume = i8237A_resume,
};
-static struct sys_device device_i8237A = {
- .id = 0,
- .cls = &i8237_sysdev_class,
-};
-
-static int __init i8237A_init_sysfs(void)
+static int __init i8237A_init_ops(void)
{
- int error = sysdev_class_register(&i8237_sysdev_class);
- if (!error)
- error = sysdev_register(&device_i8237A);
- return error;
+ register_syscore_ops(&i8237_syscore_ops);
+ return 0;
}
-device_initcall(i8237A_init_sysfs);
+device_initcall(i8237A_init_ops);
Index: linux-2.6/arch/x86/kernel/i8259.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/i8259.c
+++ linux-2.6/arch/x86/kernel/i8259.c
@@ -8,7 +8,7 @@
#include <linux/random.h>
#include <linux/init.h>
#include <linux/kernel_stat.h>
-#include <linux/sysdev.h>
+#include <linux/syscore_ops.h>
#include <linux/bitops.h>
#include <linux/acpi.h>
#include <linux/io.h>
@@ -245,20 +245,19 @@ static void save_ELCR(char *trigger)
trigger[1] = inb(0x4d1) & 0xDE;
}
-static int i8259A_resume(struct sys_device *dev)
+static void i8259A_resume(void)
{
init_8259A(i8259A_auto_eoi);
restore_ELCR(irq_trigger);
- return 0;
}
-static int i8259A_suspend(struct sys_device *dev, pm_message_t state)
+static int i8259A_suspend(void)
{
save_ELCR(irq_trigger);
return 0;
}
-static int i8259A_shutdown(struct sys_device *dev)
+static void i8259A_shutdown(void)
{
/* Put the i8259A into a quiescent state that
* the kernel initialization code can get it
@@ -266,21 +265,14 @@ static int i8259A_shutdown(struct sys_de
*/
outb(0xff, PIC_MASTER_IMR); /* mask all of 8259A-1 */
outb(0xff, PIC_SLAVE_IMR); /* mask all of 8259A-1 */
- return 0;
}
-static struct sysdev_class i8259_sysdev_class = {
- .name = "i8259",
+static struct syscore_ops i8259_syscore_ops = {
.suspend = i8259A_suspend,
.resume = i8259A_resume,
.shutdown = i8259A_shutdown,
};
-static struct sys_device device_i8259A = {
- .id = 0,
- .cls = &i8259_sysdev_class,
-};
-
static void mask_8259A(void)
{
unsigned long flags;
@@ -399,17 +391,12 @@ struct legacy_pic default_legacy_pic = {
struct legacy_pic *legacy_pic = &default_legacy_pic;
-static int __init i8259A_init_sysfs(void)
+static int __init i8259A_init_ops(void)
{
- int error;
-
- if (legacy_pic != &default_legacy_pic)
- return 0;
+ if (legacy_pic == &default_legacy_pic)
+ register_syscore_ops(&i8259_syscore_ops);
- error = sysdev_class_register(&i8259_sysdev_class);
- if (!error)
- error = sysdev_register(&device_i8259A);
- return error;
+ return 0;
}
-device_initcall(i8259A_init_sysfs);
+device_initcall(i8259A_init_ops);
Index: linux-2.6/arch/x86/kernel/pci-gart_64.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/pci-gart_64.c
+++ linux-2.6/arch/x86/kernel/pci-gart_64.c
@@ -27,7 +27,7 @@
#include <linux/kdebug.h>
#include <linux/scatterlist.h>
#include <linux/iommu-helper.h>
-#include <linux/sysdev.h>
+#include <linux/syscore_ops.h>
#include <linux/io.h>
#include <linux/gfp.h>
#include <asm/atomic.h>
@@ -589,7 +589,7 @@ void set_up_gart_resume(u32 aper_order,
aperture_alloc = aper_alloc;
}
-static void gart_fixup_northbridges(struct sys_device *dev)
+static void gart_fixup_northbridges(void)
{
int i;
@@ -613,33 +613,20 @@ static void gart_fixup_northbridges(stru
}
}
-static int gart_resume(struct sys_device *dev)
+static void gart_resume(void)
{
pr_info("PCI-DMA: Resuming GART IOMMU\n");
- gart_fixup_northbridges(dev);
+ gart_fixup_northbridges();
enable_gart_translations();
-
- return 0;
}
-static int gart_suspend(struct sys_device *dev, pm_message_t state)
-{
- return 0;
-}
-
-static struct sysdev_class gart_sysdev_class = {
- .name = "gart",
- .suspend = gart_suspend,
+static struct syscore_ops gart_syscore_ops = {
.resume = gart_resume,
};
-static struct sys_device device_gart = {
- .cls = &gart_sysdev_class,
-};
-
/*
* Private Northbridge GATT initialization in case we cannot use the
* AGP driver for some reason.
@@ -650,7 +637,7 @@ static __init int init_amd_gatt(struct a
unsigned aper_base, new_aper_base;
struct pci_dev *dev;
void *gatt;
- int i, error;
+ int i;
pr_info("PCI-DMA: Disabling AGP.\n");
@@ -685,12 +672,7 @@ static __init int init_amd_gatt(struct a
agp_gatt_table = gatt;
- error = sysdev_class_register(&gart_sysdev_class);
- if (!error)
- error = sysdev_register(&device_gart);
- if (error)
- panic("Could not register gart_sysdev -- "
- "would corrupt data on next suspend");
+ register_syscore_ops(&gart_syscore_ops);
flush_gart();
Index: linux-2.6/arch/x86/oprofile/nmi_int.c
===================================================================
--- linux-2.6.orig/arch/x86/oprofile/nmi_int.c
+++ linux-2.6/arch/x86/oprofile/nmi_int.c
@@ -15,7 +15,7 @@
#include <linux/notifier.h>
#include <linux/smp.h>
#include <linux/oprofile.h>
-#include <linux/sysdev.h>
+#include <linux/syscore_ops.h>
#include <linux/slab.h>
#include <linux/moduleparam.h>
#include <linux/kdebug.h>
@@ -536,7 +536,7 @@ static void nmi_shutdown(void)
#ifdef CONFIG_PM
-static int nmi_suspend(struct sys_device *dev, pm_message_t state)
+static int nmi_suspend(void)
{
/* Only one CPU left, just stop that one */
if (nmi_enabled == 1)
@@ -544,49 +544,31 @@ static int nmi_suspend(struct sys_device
return 0;
}
-static int nmi_resume(struct sys_device *dev)
+static void nmi_resume(void)
{
if (nmi_enabled == 1)
nmi_cpu_start(NULL);
- return 0;
}
-static struct sysdev_class oprofile_sysclass = {
- .name = "oprofile",
+static struct syscore_ops oprofile_syscore_ops = {
.resume = nmi_resume,
.suspend = nmi_suspend,
};
-static struct sys_device device_oprofile = {
- .id = 0,
- .cls = &oprofile_sysclass,
-};
-
-static int __init init_sysfs(void)
+static void __init init_suspend_resume(void)
{
- int error;
-
- error = sysdev_class_register(&oprofile_sysclass);
- if (error)
- return error;
-
- error = sysdev_register(&device_oprofile);
- if (error)
- sysdev_class_unregister(&oprofile_sysclass);
-
- return error;
+ register_syscore_ops(&oprofile_syscore_ops);
}
-static void exit_sysfs(void)
+static void exit_suspend_resume(void)
{
- sysdev_unregister(&device_oprofile);
- sysdev_class_unregister(&oprofile_sysclass);
+ unregister_syscore_ops(&oprofile_syscore_ops);
}
#else
-static inline int init_sysfs(void) { return 0; }
-static inline void exit_sysfs(void) { }
+static inline void init_suspend_resume(void) { }
+static inline void exit_suspend_resume(void) { }
#endif /* CONFIG_PM */
@@ -789,9 +771,7 @@ int __init op_nmi_init(struct oprofile_o
mux_init(ops);
- ret = init_sysfs();
- if (ret)
- return ret;
+ init_suspend_resume();
printk(KERN_INFO "oprofile: using NMI interrupt.\n");
return 0;
@@ -799,5 +779,5 @@ int __init op_nmi_init(struct oprofile_o
void op_nmi_exit(void)
{
- exit_sysfs();
+ exit_suspend_resume();
}
Index: linux-2.6/drivers/acpi/pci_link.c
===================================================================
--- linux-2.6.orig/drivers/acpi/pci_link.c
+++ linux-2.6/drivers/acpi/pci_link.c
@@ -29,7 +29,7 @@
* for IRQ management (e.g. start()->_SRS).
*/
-#include <linux/sysdev.h>
+#include <linux/syscore_ops.h>
#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/init.h>
@@ -757,14 +757,13 @@ static int acpi_pci_link_resume(struct a
return 0;
}
-static int irqrouter_resume(struct sys_device *dev)
+static void irqrouter_resume(void)
{
struct acpi_pci_link *link;
list_for_each_entry(link, &acpi_link_list, list) {
acpi_pci_link_resume(link);
}
- return 0;
}
static int acpi_pci_link_remove(struct acpi_device *device, int type)
@@ -871,32 +870,19 @@ static int __init acpi_irq_balance_set(c
__setup("acpi_irq_balance", acpi_irq_balance_set);
-/* FIXME: we will remove this interface after all drivers call pci_disable_device */
-static struct sysdev_class irqrouter_sysdev_class = {
- .name = "irqrouter",
+static struct syscore_ops irqrouter_syscore_ops = {
.resume = irqrouter_resume,
};
-static struct sys_device device_irqrouter = {
- .id = 0,
- .cls = &irqrouter_sysdev_class,
-};
-
-static int __init irqrouter_init_sysfs(void)
+static int __init irqrouter_init_ops(void)
{
- int error;
+ if (!acpi_disabled && !acpi_noirq)
+ register_syscore_ops(&irqrouter_syscore_ops);
- if (acpi_disabled || acpi_noirq)
- return 0;
-
- error = sysdev_class_register(&irqrouter_sysdev_class);
- if (!error)
- error = sysdev_register(&device_irqrouter);
-
- return error;
+ return 0;
}
-device_initcall(irqrouter_init_sysfs);
+device_initcall(irqrouter_init_ops);
static int __init acpi_pci_link_init(void)
{
Index: linux-2.6/drivers/pci/intel-iommu.c
===================================================================
--- linux-2.6.orig/drivers/pci/intel-iommu.c
+++ linux-2.6/drivers/pci/intel-iommu.c
@@ -36,7 +36,7 @@
#include <linux/iova.h>
#include <linux/iommu.h>
#include <linux/intel-iommu.h>
-#include <linux/sysdev.h>
+#include <linux/syscore_ops.h>
#include <linux/tboot.h>
#include <linux/dmi.h>
#include <asm/cacheflush.h>
@@ -3135,7 +3135,7 @@ static void iommu_flush_all(void)
}
}
-static int iommu_suspend(struct sys_device *dev, pm_message_t state)
+static int iommu_suspend(void)
{
struct dmar_drhd_unit *drhd;
struct intel_iommu *iommu = NULL;
@@ -3175,7 +3175,7 @@ nomem:
return -ENOMEM;
}
-static int iommu_resume(struct sys_device *dev)
+static void iommu_resume(void)
{
struct dmar_drhd_unit *drhd;
struct intel_iommu *iommu = NULL;
@@ -3183,7 +3183,7 @@ static int iommu_resume(struct sys_devic
if (init_iommu_hw()) {
WARN(1, "IOMMU setup failed, DMAR can not resume!\n");
- return -EIO;
+ return;
}
for_each_active_iommu(iommu, drhd) {
@@ -3204,40 +3204,20 @@ static int iommu_resume(struct sys_devic
for_each_active_iommu(iommu, drhd)
kfree(iommu->iommu_state);
-
- return 0;
}
-static struct sysdev_class iommu_sysclass = {
- .name = "iommu",
+static struct syscore_ops iommu_syscore_ops = {
.resume = iommu_resume,
.suspend = iommu_suspend,
};
-static struct sys_device device_iommu = {
- .cls = &iommu_sysclass,
-};
-
-static int __init init_iommu_sysfs(void)
+static void __init init_iommu_pm_ops(void)
{
- int error;
-
- error = sysdev_class_register(&iommu_sysclass);
- if (error)
- return error;
-
- error = sysdev_register(&device_iommu);
- if (error)
- sysdev_class_unregister(&iommu_sysclass);
-
- return error;
+ register_syscore_ops(&iommu_syscore_ops);
}
#else
-static int __init init_iommu_sysfs(void)
-{
- return 0;
-}
+static inline int init_iommu_pm_ops(void) { }
#endif /* CONFIG_PM */
/*
@@ -3320,7 +3300,7 @@ int __init intel_iommu_init(void)
#endif
dma_ops = &intel_dma_ops;
- init_iommu_sysfs();
+ init_iommu_pm_ops();
register_iommu(&intel_iommu_ops);
Index: linux-2.6/kernel/time/timekeeping.c
===================================================================
--- linux-2.6.orig/kernel/time/timekeeping.c
+++ linux-2.6/kernel/time/timekeeping.c
@@ -14,7 +14,7 @@
#include <linux/init.h>
#include <linux/mm.h>
#include <linux/sched.h>
-#include <linux/sysdev.h>
+#include <linux/syscore_ops.h>
#include <linux/clocksource.h>
#include <linux/jiffies.h>
#include <linux/time.h>
@@ -561,13 +561,12 @@ static struct timespec timekeeping_suspe
/**
* timekeeping_resume - Resumes the generic timekeeping subsystem.
- * @dev: unused
*
* This is for the generic clocksource timekeeping.
* xtime/wall_to_monotonic/jiffies/etc are
* still managed by arch specific suspend/resume code.
*/
-static int timekeeping_resume(struct sys_device *dev)
+static void timekeeping_resume(void)
{
unsigned long flags;
struct timespec ts;
@@ -596,11 +595,9 @@ static int timekeeping_resume(struct sys
/* Resume hrtimers */
hres_timers_resume();
-
- return 0;
}
-static int timekeeping_suspend(struct sys_device *dev, pm_message_t state)
+static int timekeeping_suspend(void)
{
unsigned long flags;
@@ -618,26 +615,18 @@ static int timekeeping_suspend(struct sy
}
/* sysfs resume/suspend bits for timekeeping */
-static struct sysdev_class timekeeping_sysclass = {
- .name = "timekeeping",
+static struct syscore_ops timekeeping_syscore_ops = {
.resume = timekeeping_resume,
.suspend = timekeeping_suspend,
};
-static struct sys_device device_timer = {
- .id = 0,
- .cls = &timekeeping_sysclass,
-};
-
-static int __init timekeeping_init_device(void)
+static int __init timekeeping_init_ops(void)
{
- int error = sysdev_class_register(&timekeeping_sysclass);
- if (!error)
- error = sysdev_register(&device_timer);
- return error;
+ register_syscore_ops(&timekeeping_syscore_ops);
+ return 0;
}
-device_initcall(timekeeping_init_device);
+device_initcall(timekeeping_init_ops);
/*
* If the error is already larger, we look ahead even further
On Thu, 10 Mar 2011, Rafael J. Wysocki wrote:
> Some subsystems need to carry out suspend/resume and shutdown
> operations with one CPU on-line and interrupts disabled. The only
> way to register such operations is to define a sysdev class and
> a sysdev specifically for this purpose which is cumbersome and
> inefficient. Moreover, the arguments taken by sysdev suspend,
> resume and shutdown callbacks are practically never necessary.
>
> For this reason, introduce a simpler interface allowing subsystems
> to register operations to be executed very late during system suspend
> and shutdown and very early during resume in the form of
> strcut syscore_ops objects.
...
> Index: linux-2.6/drivers/base/syscore.c
> ===================================================================
> --- /dev/null
> +++ linux-2.6/drivers/base/syscore.c
It's true that the existing sys.c file lies in drivers/base; this is
presumably because it handles a bunch of class-related registration
stuff. Now you're getting rid of all that, leaving just the
power-related operations, so doesn't it make more sense to put this
file in drivers/base/power?
> +/**
> + * syscore_suspend - Execute all the registered system core suspend callbacks.
> + *
> + * This function is executed with one CPU on-line and disabled interrupts.
> + */
> +int syscore_suspend(void)
> +{
> + struct syscore_ops *ops;
> +
> + list_for_each_entry_reverse(ops, &syscore_ops_list, node)
> + if (ops->suspend) {
> + int ret = ops->suspend();
> + if (ret) {
> + pr_err("PM: System core suspend callback "
> + "%pF failed.\n", ops->suspend);
> + return ret;
If an error occurs, you need to go back and resume all the things that
were suspended. At least, that's what the code in sysdev_suspend does.
> + }
> + }
> +
> + return 0;
> +}
Alan Stern
On Thursday, March 10, 2011, Alan Stern wrote:
> On Thu, 10 Mar 2011, Rafael J. Wysocki wrote:
>
> > Some subsystems need to carry out suspend/resume and shutdown
> > operations with one CPU on-line and interrupts disabled. The only
> > way to register such operations is to define a sysdev class and
> > a sysdev specifically for this purpose which is cumbersome and
> > inefficient. Moreover, the arguments taken by sysdev suspend,
> > resume and shutdown callbacks are practically never necessary.
> >
> > For this reason, introduce a simpler interface allowing subsystems
> > to register operations to be executed very late during system suspend
> > and shutdown and very early during resume in the form of
> > strcut syscore_ops objects.
>
> ...
>
> > Index: linux-2.6/drivers/base/syscore.c
> > ===================================================================
> > --- /dev/null
> > +++ linux-2.6/drivers/base/syscore.c
>
> It's true that the existing sys.c file lies in drivers/base; this is
> presumably because it handles a bunch of class-related registration
> stuff. Now you're getting rid of all that, leaving just the
> power-related operations, so doesn't it make more sense to put this
> file in drivers/base/power?
I didn't, because shutdown() doesn't really belong in there (well, depending).
> > +/**
> > + * syscore_suspend - Execute all the registered system core suspend callbacks.
> > + *
> > + * This function is executed with one CPU on-line and disabled interrupts.
> > + */
> > +int syscore_suspend(void)
> > +{
> > + struct syscore_ops *ops;
> > +
> > + list_for_each_entry_reverse(ops, &syscore_ops_list, node)
> > + if (ops->suspend) {
> > + int ret = ops->suspend();
> > + if (ret) {
> > + pr_err("PM: System core suspend callback "
> > + "%pF failed.\n", ops->suspend);
> > + return ret;
>
> If an error occurs, you need to go back and resume all the things that
> were suspended. At least, that's what the code in sysdev_suspend does.
Right, thanks a lot!
> > + }
> > + }
> > +
> > + return 0;
> > +}
Rafael
On Thursday, March 10, 2011, Alan Stern wrote:
> On Thu, 10 Mar 2011, Rafael J. Wysocki wrote:
>
> > Some subsystems need to carry out suspend/resume and shutdown
> > operations with one CPU on-line and interrupts disabled. The only
> > way to register such operations is to define a sysdev class and
> > a sysdev specifically for this purpose which is cumbersome and
> > inefficient. Moreover, the arguments taken by sysdev suspend,
> > resume and shutdown callbacks are practically never necessary.
> >
> > For this reason, introduce a simpler interface allowing subsystems
> > to register operations to be executed very late during system suspend
> > and shutdown and very early during resume in the form of
> > strcut syscore_ops objects.
>
> ...
>
> > Index: linux-2.6/drivers/base/syscore.c
> > ===================================================================
> > --- /dev/null
> > +++ linux-2.6/drivers/base/syscore.c
>
> It's true that the existing sys.c file lies in drivers/base; this is
> presumably because it handles a bunch of class-related registration
> stuff. Now you're getting rid of all that, leaving just the
> power-related operations, so doesn't it make more sense to put this
> file in drivers/base/power?
>
> > +/**
> > + * syscore_suspend - Execute all the registered system core suspend callbacks.
> > + *
> > + * This function is executed with one CPU on-line and disabled interrupts.
> > + */
> > +int syscore_suspend(void)
> > +{
> > + struct syscore_ops *ops;
> > +
> > + list_for_each_entry_reverse(ops, &syscore_ops_list, node)
> > + if (ops->suspend) {
> > + int ret = ops->suspend();
> > + if (ret) {
> > + pr_err("PM: System core suspend callback "
> > + "%pF failed.\n", ops->suspend);
> > + return ret;
>
> If an error occurs, you need to go back and resume all the things that
> were suspended. At least, that's what the code in sysdev_suspend does.
>
> > + }
> > + }
> > +
> > + return 0;
> > +}
Below is a new version of the patch. I've taken your comment on the failing
suspend into account, fix the list traversal direction in syscore_shutdown()
and added some debug statements.
Thanks,
Rafael
---
Some subsystems need to carry out suspend/resume and shutdown
operations with one CPU on-line and interrupts disabled. The only
way to register such operations is to define a sysdev class and
a sysdev specifically for this purpose which is cumbersome and
inefficient. Moreover, the arguments taken by sysdev suspend,
resume and shutdown callbacks are practically never necessary.
For this reason, introduce a simpler interface allowing subsystems
to register operations to be executed very late during system suspend
and shutdown and very early during resume in the form of
strcut syscore_ops objects.
---
drivers/base/Makefile | 2
drivers/base/syscore.c | 107 ++++++++++++++++++++++++++++++++++++++++++++
include/linux/syscore_ops.h | 29 +++++++++++
kernel/power/hibernate.c | 9 +++
kernel/power/suspend.c | 4 +
kernel/sys.c | 4 +
6 files changed, 154 insertions(+), 1 deletion(-)
Index: linux-2.6/include/linux/syscore_ops.h
===================================================================
--- /dev/null
+++ linux-2.6/include/linux/syscore_ops.h
@@ -0,0 +1,29 @@
+/*
+ * syscore_ops.h - System core operations.
+ *
+ * Copyright (C) 2011 Rafael J. Wysocki <[email protected]>, Novell Inc.
+ *
+ * This file is released under the GPLv2.
+ */
+
+#ifndef _LINUX_SYSCORE_OPS_H
+#define _LINUX_SYSCORE_OPS_H
+
+#include <linux/list.h>
+
+struct syscore_ops {
+ struct list_head node;
+ int (*suspend)(void);
+ void (*resume)(void);
+ void (*shutdown)(void);
+};
+
+extern void register_syscore_ops(struct syscore_ops *ops);
+extern void unregister_syscore_ops(struct syscore_ops *ops);
+#ifdef CONFIG_PM_SLEEP
+extern int syscore_suspend(void);
+extern void syscore_resume(void);
+#endif
+extern void syscore_shutdown(void);
+
+#endif
Index: linux-2.6/drivers/base/syscore.c
===================================================================
--- /dev/null
+++ linux-2.6/drivers/base/syscore.c
@@ -0,0 +1,107 @@
+/*
+ * syscore.c - Execution of system core operations.
+ *
+ * Copyright (C) 2011 Rafael J. Wysocki <[email protected]>, Novell Inc.
+ *
+ * This file is released under the GPLv2.
+ */
+
+#include <linux/syscore_ops.h>
+#include <linux/mutex.h>
+#include <linux/module.h>
+
+static LIST_HEAD(syscore_ops_list);
+static DEFINE_MUTEX(syscore_ops_lock);
+
+/**
+ * register_syscore_ops - Register a set of system core operations.
+ * @ops: System core operations to register.
+ */
+void register_syscore_ops(struct syscore_ops *ops)
+{
+ mutex_lock(&syscore_ops_lock);
+ list_add_tail(&ops->node, &syscore_ops_list);
+ mutex_unlock(&syscore_ops_lock);
+}
+EXPORT_SYMBOL_GPL(register_syscore_ops);
+
+/**
+ * unregister_syscore_ops - Unregister a set of system core operations.
+ * @ops: System core operations to unregister.
+ */
+void unregister_syscore_ops(struct syscore_ops *ops)
+{
+ mutex_lock(&syscore_ops_lock);
+ list_del(&ops->node);
+ mutex_unlock(&syscore_ops_lock);
+}
+EXPORT_SYMBOL_GPL(unregister_syscore_ops);
+
+#ifdef CONFIG_PM_SLEEP
+/**
+ * syscore_suspend - Execute all the registered system core suspend callbacks.
+ *
+ * This function is executed with one CPU on-line and disabled interrupts.
+ */
+int syscore_suspend(void)
+{
+ struct syscore_ops *ops;
+ int ret = 0;
+
+ list_for_each_entry_reverse(ops, &syscore_ops_list, node)
+ if (ops->suspend) {
+ if (initcall_debug)
+ pr_info("PM: Calling %pF\n", ops->suspend);
+ ret = ops->suspend();
+ if (ret)
+ goto err_out;
+ }
+
+ return 0;
+
+ err_out:
+ pr_err("PM: System core suspend callback %pF failed.\n", ops->suspend);
+
+ list_for_each_entry_continue(ops, &syscore_ops_list, node)
+ if (ops->resume)
+ ops->resume();
+
+ return ret;
+}
+
+/**
+ * syscore_resume - Execute all the registered system core resume callbacks.
+ *
+ * This function is executed with one CPU on-line and disabled interrupts.
+ */
+void syscore_resume(void)
+{
+ struct syscore_ops *ops;
+
+ list_for_each_entry(ops, &syscore_ops_list, node)
+ if (ops->resume) {
+ if (initcall_debug)
+ pr_info("PM: Calling %pF\n", ops->resume);
+ ops->resume();
+ }
+}
+#endif /* CONFIG_PM_SLEEP */
+
+/**
+ * syscore_shutdown - Execute all the registered system core shutdown callbacks.
+ */
+void syscore_shutdown(void)
+{
+ struct syscore_ops *ops;
+
+ mutex_lock(&syscore_ops_lock);
+
+ list_for_each_entry_reverse(ops, &syscore_ops_list, node)
+ if (ops->shutdown) {
+ if (initcall_debug)
+ pr_info("PM: Calling %pF\n", ops->shutdown);
+ ops->shutdown();
+ }
+
+ mutex_unlock(&syscore_ops_lock);
+}
Index: linux-2.6/kernel/power/suspend.c
===================================================================
--- linux-2.6.orig/kernel/power/suspend.c
+++ linux-2.6/kernel/power/suspend.c
@@ -22,6 +22,7 @@
#include <linux/mm.h>
#include <linux/slab.h>
#include <linux/suspend.h>
+#include <linux/syscore_ops.h>
#include <trace/events/power.h>
#include "power.h"
@@ -163,11 +164,14 @@ static int suspend_enter(suspend_state_t
BUG_ON(!irqs_disabled());
error = sysdev_suspend(PMSG_SUSPEND);
+ if (!error)
+ error = syscore_suspend();
if (!error) {
if (!(suspend_test(TEST_CORE) || pm_wakeup_pending())) {
error = suspend_ops->enter(state);
events_check_enabled = false;
}
+ syscore_resume();
sysdev_resume();
}
Index: linux-2.6/kernel/power/hibernate.c
===================================================================
--- linux-2.6.orig/kernel/power/hibernate.c
+++ linux-2.6/kernel/power/hibernate.c
@@ -23,6 +23,7 @@
#include <linux/cpu.h>
#include <linux/freezer.h>
#include <linux/gfp.h>
+#include <linux/syscore_ops.h>
#include <scsi/scsi_scan.h>
#include <asm/suspend.h>
@@ -272,6 +273,8 @@ static int create_image(int platform_mod
local_irq_disable();
error = sysdev_suspend(PMSG_FREEZE);
+ if (!error)
+ error = syscore_suspend();
if (error) {
printk(KERN_ERR "PM: Some system devices failed to power down, "
"aborting hibernation\n");
@@ -295,6 +298,7 @@ static int create_image(int platform_mod
}
Power_up:
+ syscore_resume();
sysdev_resume();
/* NOTE: dpm_resume_noirq() is just a resume() for devices
* that suspended with irqs off ... no overall powerup.
@@ -403,6 +407,8 @@ static int resume_target_kernel(bool pla
local_irq_disable();
error = sysdev_suspend(PMSG_QUIESCE);
+ if (!error)
+ error = syscore_suspend();
if (error)
goto Enable_irqs;
@@ -429,6 +435,7 @@ static int resume_target_kernel(bool pla
restore_processor_state();
touch_softlockup_watchdog();
+ syscore_resume();
sysdev_resume();
Enable_irqs:
@@ -516,6 +523,7 @@ int hibernation_platform_enter(void)
local_irq_disable();
sysdev_suspend(PMSG_HIBERNATE);
+ syscore_suspend();
if (pm_wakeup_pending()) {
error = -EAGAIN;
goto Power_up;
@@ -526,6 +534,7 @@ int hibernation_platform_enter(void)
while (1);
Power_up:
+ syscore_resume();
sysdev_resume();
local_irq_enable();
enable_nonboot_cpus();
Index: linux-2.6/kernel/sys.c
===================================================================
--- linux-2.6.orig/kernel/sys.c
+++ linux-2.6/kernel/sys.c
@@ -37,6 +37,7 @@
#include <linux/ptrace.h>
#include <linux/fs_struct.h>
#include <linux/gfp.h>
+#include <linux/syscore_ops.h>
#include <linux/compat.h>
#include <linux/syscalls.h>
@@ -298,6 +299,7 @@ void kernel_restart_prepare(char *cmd)
system_state = SYSTEM_RESTART;
device_shutdown();
sysdev_shutdown();
+ syscore_shutdown();
}
/**
@@ -336,6 +338,7 @@ void kernel_halt(void)
{
kernel_shutdown_prepare(SYSTEM_HALT);
sysdev_shutdown();
+ syscore_shutdown();
printk(KERN_EMERG "System halted.\n");
kmsg_dump(KMSG_DUMP_HALT);
machine_halt();
@@ -355,6 +358,7 @@ void kernel_power_off(void)
pm_power_off_prepare();
disable_nonboot_cpus();
sysdev_shutdown();
+ syscore_shutdown();
printk(KERN_EMERG "Power down.\n");
kmsg_dump(KMSG_DUMP_POWEROFF);
machine_power_off();
Index: linux-2.6/drivers/base/Makefile
===================================================================
--- linux-2.6.orig/drivers/base/Makefile
+++ linux-2.6/drivers/base/Makefile
@@ -1,6 +1,6 @@
# Makefile for the Linux device tree
-obj-y := core.o sys.o bus.o dd.o \
+obj-y := core.o sys.o bus.o dd.o syscore.o \
driver.o class.o platform.o \
cpu.o firmware.o init.o map.o devres.o \
attribute_container.o transport_class.o
On Thu, 2011-03-10 at 01:31 +0100, Rafael J. Wysocki wrote:
> There are multiple problems with sysdevs, or struct sys_device objects to
> be precise, that are so annoying that some people have started to think
> of removind them entirely from the kernel. To me, personally, the most
> obvious issue is the way sysdevs are used for defining suspend/resume
> callbacks to be executed with one CPU on-line and interrupts disabled.
> Greg and Kay may tell you more about the other problems with sysdevs. :-)
>
> Some subsystems need to carry out certain operations during suspend after
> we've disabled non-boot CPUs and interrupts have been switched off on the
> only on-line one. Currently, the only way to achieve that is to define
> sysdev suspend/resume callbacks, but this is cumbersome and inefficient.
> Namely, to do that, one has to define a sysdev class providing the callbacks
> and a sysdev actually using them, which is excessively complicated. Moreover,
> the sysdev suspend/resume callbacks take arguments that are not really used
> by the majority of subsystems defining sysdev suspend/resume callbacks
> (or even if they are used, they don't really _need_ to be used, so they
> are simply unnecessary). Of course, if a sysdev is only defined to provide
> suspend/resume (and maybe shutdown) callbacks, there's no real reason why
> it should show up in sysfs.
>
> For this reason, I thought it would be a good idea to provide a simpler
> interface for subsystems to define "very late" suspend callbacks and
> "very early" resume callbacks (and "very late" shutdown callbacks as well)
> without the entire bloat related to sysdevs. The interface is introduced
> by the first of the following patches, while the second patch converts some
> sysdev users related to the x86 architecture to using the new interface.
>
> I believe that call sysdev users who need to define suspend/resume/shutdown
> callbacks may be converted to using the interface provided by the first patch,
> which in turn should allow us to convert the remaining sysdev functionality
> into "normal" struct device interfaces. Still, even if that turns out to be
> too complicated, the bloat reduction resulting from the second patch kind of
> shows that moving at least some sysdev users to a simpler interface (like in
> the first patch) is a good idea anyway.
Do I read that right? We get rid of the entire dance of creating
sysdevs/sysdev_classes and the pointless and broken stuff in /sys?
We just dynamically maintain a list of devices/operations, which is
list-executed when needed?
These new "core" operations are not included in every device but only
global per subsystem, just like the sysdev_class did earlier?
Looks all like a nice plan to me.
Thanks,
Kay
On Thursday, March 10, 2011, Kay Sievers wrote:
> On Thu, 2011-03-10 at 01:31 +0100, Rafael J. Wysocki wrote:
> > There are multiple problems with sysdevs, or struct sys_device objects to
> > be precise, that are so annoying that some people have started to think
> > of removind them entirely from the kernel. To me, personally, the most
> > obvious issue is the way sysdevs are used for defining suspend/resume
> > callbacks to be executed with one CPU on-line and interrupts disabled.
> > Greg and Kay may tell you more about the other problems with sysdevs. :-)
> >
> > Some subsystems need to carry out certain operations during suspend after
> > we've disabled non-boot CPUs and interrupts have been switched off on the
> > only on-line one. Currently, the only way to achieve that is to define
> > sysdev suspend/resume callbacks, but this is cumbersome and inefficient.
> > Namely, to do that, one has to define a sysdev class providing the callbacks
> > and a sysdev actually using them, which is excessively complicated. Moreover,
> > the sysdev suspend/resume callbacks take arguments that are not really used
> > by the majority of subsystems defining sysdev suspend/resume callbacks
> > (or even if they are used, they don't really _need_ to be used, so they
> > are simply unnecessary). Of course, if a sysdev is only defined to provide
> > suspend/resume (and maybe shutdown) callbacks, there's no real reason why
> > it should show up in sysfs.
> >
> > For this reason, I thought it would be a good idea to provide a simpler
> > interface for subsystems to define "very late" suspend callbacks and
> > "very early" resume callbacks (and "very late" shutdown callbacks as well)
> > without the entire bloat related to sysdevs. The interface is introduced
> > by the first of the following patches, while the second patch converts some
> > sysdev users related to the x86 architecture to using the new interface.
> >
> > I believe that call sysdev users who need to define suspend/resume/shutdown
> > callbacks may be converted to using the interface provided by the first patch,
> > which in turn should allow us to convert the remaining sysdev functionality
> > into "normal" struct device interfaces. Still, even if that turns out to be
> > too complicated, the bloat reduction resulting from the second patch kind of
> > shows that moving at least some sysdev users to a simpler interface (like in
> > the first patch) is a good idea anyway.
>
> Do I read that right? We get rid of the entire dance of creating
> sysdevs/sysdev_classes and the pointless and broken stuff in /sys?
That's the plan at least.
> We just dynamically maintain a list of devices/operations, which is
> list-executed when needed?
>
> These new "core" operations are not included in every device but only
> global per subsystem, just like the sysdev_class did earlier?
Yup.
> Looks all like a nice plan to me.
Good. :-)
Thanks,
Rafael
On Thu, Mar 10, 2011 at 01:34:02AM +0100, Rafael J. Wysocki wrote:
> Some subsystems need to carry out suspend/resume and shutdown
> operations with one CPU on-line and interrupts disabled and they
> define sysdev classes and sysdevs specifically for this purpose.
> This leads to unnecessarily complicated code and excessive memory
> usage, so switch them to using struct syscore_ops objects for this
> purpose instead.
Heavily-Acked-by: Greg Kroah-Hartman <[email protected]>
:)
Do you want to resend this with a signed-off-by and your first one so I
can apply it to my tree, or do you want to take it through yours?
thanks again for doing this.
greg k-h
On Thu, Mar 10, 2011 at 12:30:45PM +0100, Rafael J. Wysocki wrote:
> On Thursday, March 10, 2011, Alan Stern wrote:
> > On Thu, 10 Mar 2011, Rafael J. Wysocki wrote:
> >
> > > Some subsystems need to carry out suspend/resume and shutdown
> > > operations with one CPU on-line and interrupts disabled. The only
> > > way to register such operations is to define a sysdev class and
> > > a sysdev specifically for this purpose which is cumbersome and
> > > inefficient. Moreover, the arguments taken by sysdev suspend,
> > > resume and shutdown callbacks are practically never necessary.
> > >
> > > For this reason, introduce a simpler interface allowing subsystems
> > > to register operations to be executed very late during system suspend
> > > and shutdown and very early during resume in the form of
> > > strcut syscore_ops objects.
> >
> > ...
> >
> > > Index: linux-2.6/drivers/base/syscore.c
> > > ===================================================================
> > > --- /dev/null
> > > +++ linux-2.6/drivers/base/syscore.c
> >
> > It's true that the existing sys.c file lies in drivers/base; this is
> > presumably because it handles a bunch of class-related registration
> > stuff. Now you're getting rid of all that, leaving just the
> > power-related operations, so doesn't it make more sense to put this
> > file in drivers/base/power?
> >
> > > +/**
> > > + * syscore_suspend - Execute all the registered system core suspend callbacks.
> > > + *
> > > + * This function is executed with one CPU on-line and disabled interrupts.
> > > + */
> > > +int syscore_suspend(void)
> > > +{
> > > + struct syscore_ops *ops;
> > > +
> > > + list_for_each_entry_reverse(ops, &syscore_ops_list, node)
> > > + if (ops->suspend) {
> > > + int ret = ops->suspend();
> > > + if (ret) {
> > > + pr_err("PM: System core suspend callback "
> > > + "%pF failed.\n", ops->suspend);
> > > + return ret;
> >
> > If an error occurs, you need to go back and resume all the things that
> > were suspended. At least, that's what the code in sysdev_suspend does.
> >
> > > + }
> > > + }
> > > +
> > > + return 0;
> > > +}
>
> Below is a new version of the patch. I've taken your comment on the failing
> suspend into account, fix the list traversal direction in syscore_shutdown()
> and added some debug statements.
>
> Thanks,
> Rafael
>
> ---
> Some subsystems need to carry out suspend/resume and shutdown
> operations with one CPU on-line and interrupts disabled. The only
> way to register such operations is to define a sysdev class and
> a sysdev specifically for this purpose which is cumbersome and
> inefficient. Moreover, the arguments taken by sysdev suspend,
> resume and shutdown callbacks are practically never necessary.
>
> For this reason, introduce a simpler interface allowing subsystems
> to register operations to be executed very late during system suspend
> and shutdown and very early during resume in the form of
> strcut syscore_ops objects.
>
> ---
> drivers/base/Makefile | 2
> drivers/base/syscore.c | 107 ++++++++++++++++++++++++++++++++++++++++++++
> include/linux/syscore_ops.h | 29 +++++++++++
> kernel/power/hibernate.c | 9 +++
> kernel/power/suspend.c | 4 +
> kernel/sys.c | 4 +
> 6 files changed, 154 insertions(+), 1 deletion(-)
>
> Index: linux-2.6/include/linux/syscore_ops.h
> ===================================================================
> --- /dev/null
> +++ linux-2.6/include/linux/syscore_ops.h
> @@ -0,0 +1,29 @@
> +/*
> + * syscore_ops.h - System core operations.
> + *
> + * Copyright (C) 2011 Rafael J. Wysocki <[email protected]>, Novell Inc.
> + *
> + * This file is released under the GPLv2.
> + */
> +
> +#ifndef _LINUX_SYSCORE_OPS_H
> +#define _LINUX_SYSCORE_OPS_H
> +
> +#include <linux/list.h>
> +
> +struct syscore_ops {
> + struct list_head node;
> + int (*suspend)(void);
> + void (*resume)(void);
> + void (*shutdown)(void);
> +};
> +
> +extern void register_syscore_ops(struct syscore_ops *ops);
> +extern void unregister_syscore_ops(struct syscore_ops *ops);
> +#ifdef CONFIG_PM_SLEEP
> +extern int syscore_suspend(void);
> +extern void syscore_resume(void);
> +#endif
Minor nit, provide inline functions for these when CONFIG_PM_SLEEP is
not defined so the code still builds?
Other than that, this looks great to me, thanks for doing this. Do you
want me to take it through my tree, or yours?
thanks,
greg k-h
On Friday, March 11, 2011, Greg KH wrote:
> On Thu, Mar 10, 2011 at 12:30:45PM +0100, Rafael J. Wysocki wrote:
> > On Thursday, March 10, 2011, Alan Stern wrote:
> > > On Thu, 10 Mar 2011, Rafael J. Wysocki wrote:
> > >
> > > > Some subsystems need to carry out suspend/resume and shutdown
> > > > operations with one CPU on-line and interrupts disabled. The only
> > > > way to register such operations is to define a sysdev class and
> > > > a sysdev specifically for this purpose which is cumbersome and
> > > > inefficient. Moreover, the arguments taken by sysdev suspend,
> > > > resume and shutdown callbacks are practically never necessary.
> > > >
> > > > For this reason, introduce a simpler interface allowing subsystems
> > > > to register operations to be executed very late during system suspend
> > > > and shutdown and very early during resume in the form of
> > > > strcut syscore_ops objects.
> > >
> > > ...
> > >
> > > > Index: linux-2.6/drivers/base/syscore.c
> > > > ===================================================================
> > > > --- /dev/null
> > > > +++ linux-2.6/drivers/base/syscore.c
> > >
> > > It's true that the existing sys.c file lies in drivers/base; this is
> > > presumably because it handles a bunch of class-related registration
> > > stuff. Now you're getting rid of all that, leaving just the
> > > power-related operations, so doesn't it make more sense to put this
> > > file in drivers/base/power?
> > >
> > > > +/**
> > > > + * syscore_suspend - Execute all the registered system core suspend callbacks.
> > > > + *
> > > > + * This function is executed with one CPU on-line and disabled interrupts.
> > > > + */
> > > > +int syscore_suspend(void)
> > > > +{
> > > > + struct syscore_ops *ops;
> > > > +
> > > > + list_for_each_entry_reverse(ops, &syscore_ops_list, node)
> > > > + if (ops->suspend) {
> > > > + int ret = ops->suspend();
> > > > + if (ret) {
> > > > + pr_err("PM: System core suspend callback "
> > > > + "%pF failed.\n", ops->suspend);
> > > > + return ret;
> > >
> > > If an error occurs, you need to go back and resume all the things that
> > > were suspended. At least, that's what the code in sysdev_suspend does.
> > >
> > > > + }
> > > > + }
> > > > +
> > > > + return 0;
> > > > +}
> >
> > Below is a new version of the patch. I've taken your comment on the failing
> > suspend into account, fix the list traversal direction in syscore_shutdown()
> > and added some debug statements.
> >
> > Thanks,
> > Rafael
> >
> > ---
> > Some subsystems need to carry out suspend/resume and shutdown
> > operations with one CPU on-line and interrupts disabled. The only
> > way to register such operations is to define a sysdev class and
> > a sysdev specifically for this purpose which is cumbersome and
> > inefficient. Moreover, the arguments taken by sysdev suspend,
> > resume and shutdown callbacks are practically never necessary.
> >
> > For this reason, introduce a simpler interface allowing subsystems
> > to register operations to be executed very late during system suspend
> > and shutdown and very early during resume in the form of
> > strcut syscore_ops objects.
> >
> > ---
> > drivers/base/Makefile | 2
> > drivers/base/syscore.c | 107 ++++++++++++++++++++++++++++++++++++++++++++
> > include/linux/syscore_ops.h | 29 +++++++++++
> > kernel/power/hibernate.c | 9 +++
> > kernel/power/suspend.c | 4 +
> > kernel/sys.c | 4 +
> > 6 files changed, 154 insertions(+), 1 deletion(-)
> >
> > Index: linux-2.6/include/linux/syscore_ops.h
> > ===================================================================
> > --- /dev/null
> > +++ linux-2.6/include/linux/syscore_ops.h
> > @@ -0,0 +1,29 @@
> > +/*
> > + * syscore_ops.h - System core operations.
> > + *
> > + * Copyright (C) 2011 Rafael J. Wysocki <[email protected]>, Novell Inc.
> > + *
> > + * This file is released under the GPLv2.
> > + */
> > +
> > +#ifndef _LINUX_SYSCORE_OPS_H
> > +#define _LINUX_SYSCORE_OPS_H
> > +
> > +#include <linux/list.h>
> > +
> > +struct syscore_ops {
> > + struct list_head node;
> > + int (*suspend)(void);
> > + void (*resume)(void);
> > + void (*shutdown)(void);
> > +};
> > +
> > +extern void register_syscore_ops(struct syscore_ops *ops);
> > +extern void unregister_syscore_ops(struct syscore_ops *ops);
> > +#ifdef CONFIG_PM_SLEEP
> > +extern int syscore_suspend(void);
> > +extern void syscore_resume(void);
> > +#endif
>
> Minor nit, provide inline functions for these when CONFIG_PM_SLEEP is
> not defined so the code still builds?
The code using them depends on CONFIG_PM_SLEEP and they are nobody else's
business. :-)
I could avoid using the #ifdef here, but I thought I'd make it clear that
these things were only available when CONFIG_PM_SLEEP was set.
> Other than that, this looks great to me, thanks for doing this.
No problem. :-)
> Do you want me to take it through my tree, or yours?
I can handle it if you give me an ack.
Do you think I should push [1/2] alone for 2.6.39 or wait for the patches
converting subsystems to use this stuff to be ready? I think it'll take
some time to prepare them, especialy for things in the ARM tree that use
sysdevs in some interesting ways ...
Thanks,
Rafael
On Fri, Mar 11, 2011 at 09:13:13PM +0100, Rafael J. Wysocki wrote:
> On Friday, March 11, 2011, Greg KH wrote:
> > On Thu, Mar 10, 2011 at 12:30:45PM +0100, Rafael J. Wysocki wrote:
> > > On Thursday, March 10, 2011, Alan Stern wrote:
> > > > On Thu, 10 Mar 2011, Rafael J. Wysocki wrote:
> > > >
> > > > > Some subsystems need to carry out suspend/resume and shutdown
> > > > > operations with one CPU on-line and interrupts disabled. The only
> > > > > way to register such operations is to define a sysdev class and
> > > > > a sysdev specifically for this purpose which is cumbersome and
> > > > > inefficient. Moreover, the arguments taken by sysdev suspend,
> > > > > resume and shutdown callbacks are practically never necessary.
> > > > >
> > > > > For this reason, introduce a simpler interface allowing subsystems
> > > > > to register operations to be executed very late during system suspend
> > > > > and shutdown and very early during resume in the form of
> > > > > strcut syscore_ops objects.
> > > >
> > > > ...
> > > >
> > > > > Index: linux-2.6/drivers/base/syscore.c
> > > > > ===================================================================
> > > > > --- /dev/null
> > > > > +++ linux-2.6/drivers/base/syscore.c
> > > >
> > > > It's true that the existing sys.c file lies in drivers/base; this is
> > > > presumably because it handles a bunch of class-related registration
> > > > stuff. Now you're getting rid of all that, leaving just the
> > > > power-related operations, so doesn't it make more sense to put this
> > > > file in drivers/base/power?
> > > >
> > > > > +/**
> > > > > + * syscore_suspend - Execute all the registered system core suspend callbacks.
> > > > > + *
> > > > > + * This function is executed with one CPU on-line and disabled interrupts.
> > > > > + */
> > > > > +int syscore_suspend(void)
> > > > > +{
> > > > > + struct syscore_ops *ops;
> > > > > +
> > > > > + list_for_each_entry_reverse(ops, &syscore_ops_list, node)
> > > > > + if (ops->suspend) {
> > > > > + int ret = ops->suspend();
> > > > > + if (ret) {
> > > > > + pr_err("PM: System core suspend callback "
> > > > > + "%pF failed.\n", ops->suspend);
> > > > > + return ret;
> > > >
> > > > If an error occurs, you need to go back and resume all the things that
> > > > were suspended. At least, that's what the code in sysdev_suspend does.
> > > >
> > > > > + }
> > > > > + }
> > > > > +
> > > > > + return 0;
> > > > > +}
> > >
> > > Below is a new version of the patch. I've taken your comment on the failing
> > > suspend into account, fix the list traversal direction in syscore_shutdown()
> > > and added some debug statements.
> > >
> > > Thanks,
> > > Rafael
> > >
> > > ---
> > > Some subsystems need to carry out suspend/resume and shutdown
> > > operations with one CPU on-line and interrupts disabled. The only
> > > way to register such operations is to define a sysdev class and
> > > a sysdev specifically for this purpose which is cumbersome and
> > > inefficient. Moreover, the arguments taken by sysdev suspend,
> > > resume and shutdown callbacks are practically never necessary.
> > >
> > > For this reason, introduce a simpler interface allowing subsystems
> > > to register operations to be executed very late during system suspend
> > > and shutdown and very early during resume in the form of
> > > strcut syscore_ops objects.
> > >
> > > ---
> > > drivers/base/Makefile | 2
> > > drivers/base/syscore.c | 107 ++++++++++++++++++++++++++++++++++++++++++++
> > > include/linux/syscore_ops.h | 29 +++++++++++
> > > kernel/power/hibernate.c | 9 +++
> > > kernel/power/suspend.c | 4 +
> > > kernel/sys.c | 4 +
> > > 6 files changed, 154 insertions(+), 1 deletion(-)
> > >
> > > Index: linux-2.6/include/linux/syscore_ops.h
> > > ===================================================================
> > > --- /dev/null
> > > +++ linux-2.6/include/linux/syscore_ops.h
> > > @@ -0,0 +1,29 @@
> > > +/*
> > > + * syscore_ops.h - System core operations.
> > > + *
> > > + * Copyright (C) 2011 Rafael J. Wysocki <[email protected]>, Novell Inc.
> > > + *
> > > + * This file is released under the GPLv2.
> > > + */
> > > +
> > > +#ifndef _LINUX_SYSCORE_OPS_H
> > > +#define _LINUX_SYSCORE_OPS_H
> > > +
> > > +#include <linux/list.h>
> > > +
> > > +struct syscore_ops {
> > > + struct list_head node;
> > > + int (*suspend)(void);
> > > + void (*resume)(void);
> > > + void (*shutdown)(void);
> > > +};
> > > +
> > > +extern void register_syscore_ops(struct syscore_ops *ops);
> > > +extern void unregister_syscore_ops(struct syscore_ops *ops);
> > > +#ifdef CONFIG_PM_SLEEP
> > > +extern int syscore_suspend(void);
> > > +extern void syscore_resume(void);
> > > +#endif
> >
> > Minor nit, provide inline functions for these when CONFIG_PM_SLEEP is
> > not defined so the code still builds?
>
> The code using them depends on CONFIG_PM_SLEEP and they are nobody else's
> business. :-)
Ah, ok.
> I could avoid using the #ifdef here, but I thought I'd make it clear that
> these things were only available when CONFIG_PM_SLEEP was set.
That's fine.
> > Other than that, this looks great to me, thanks for doing this.
>
> No problem. :-)
>
> > Do you want me to take it through my tree, or yours?
>
> I can handle it if you give me an ack.
Acked-by: Greg Kroah-Hartman <[email protected]>
> Do you think I should push [1/2] alone for 2.6.39 or wait for the patches
> converting subsystems to use this stuff to be ready? I think it'll take
> some time to prepare them, especialy for things in the ARM tree that use
> sysdevs in some interesting ways ...
Send it for .39, and then start converting everyone over to using it.
It's easier once the code is in place to handle the different trees,
that way you don't have to worry about ordering issues.
thanks,
greg k-h
On Friday, March 11, 2011, Greg KH wrote:
> On Thu, Mar 10, 2011 at 01:34:02AM +0100, Rafael J. Wysocki wrote:
> > Some subsystems need to carry out suspend/resume and shutdown
> > operations with one CPU on-line and interrupts disabled and they
> > define sysdev classes and sysdevs specifically for this purpose.
> > This leads to unnecessarily complicated code and excessive memory
> > usage, so switch them to using struct syscore_ops objects for this
> > purpose instead.
>
> Heavily-Acked-by: Greg Kroah-Hartman <[email protected]>
>
> :)
Heh, thanks!
> Do you want to resend this with a signed-off-by and your first one so I
> can apply it to my tree, or do you want to take it through yours?
Well, I'm going to resend with sign-offs anyway. Besides, I'm not sure if
I should split [2/2] into a few smaller patches. At least the stuff outside
of arch/x86 should be done in separate patches IMHO.
Apart from this, there are other architectures using sysdevs for defining
"very late" and "very early" PM callbacks, ARM in particular (that one is going
to be fun to untangle).
I thought about two different possible ways forward:
(1) Push [1/2] and the patches converting things that x86 depends on first,
followed perhaps by a patch introducing something like
CONFIG_ARCH_NO_SYSDEV_OPS that would simply disable
sysdev_{suspend|resume|shutdown}() (x86 would set it). The other arches
might then be converted over time.
(2) Prepare patches converting everything that can be converted in the tree
and push them all in one shot.
The advantage of (1) is that we can start making changes RSN and the
advantage of (2) seems to be that we may avoid some potential suspend/resume
ordering issues on non-x86 architectures that may arise in principle if some
subsystems are converted to using struct syscore_ops while the others are
not (syscore_suspend() is executed after sysdev_suspend(), so if we move
something from the latter to the former, it may end up being executed after
things that it was executed before previously).
Please let me know what your opinion is.
> thanks again for doing this.
No big deal really. :-)
Thanks,
Rafael
On Friday, March 11, 2011, Greg KH wrote:
> On Fri, Mar 11, 2011 at 09:13:13PM +0100, Rafael J. Wysocki wrote:
> > On Friday, March 11, 2011, Greg KH wrote:
> > > On Thu, Mar 10, 2011 at 12:30:45PM +0100, Rafael J. Wysocki wrote:
> > > > On Thursday, March 10, 2011, Alan Stern wrote:
> > > > > On Thu, 10 Mar 2011, Rafael J. Wysocki wrote:
> > > > >
> > > > > > Some subsystems need to carry out suspend/resume and shutdown
> > > > > > operations with one CPU on-line and interrupts disabled. The only
> > > > > > way to register such operations is to define a sysdev class and
> > > > > > a sysdev specifically for this purpose which is cumbersome and
> > > > > > inefficient. Moreover, the arguments taken by sysdev suspend,
> > > > > > resume and shutdown callbacks are practically never necessary.
> > > > > >
> > > > > > For this reason, introduce a simpler interface allowing subsystems
> > > > > > to register operations to be executed very late during system suspend
> > > > > > and shutdown and very early during resume in the form of
> > > > > > strcut syscore_ops objects.
> > > > >
> > > > > ...
> > > > >
> > > > > > Index: linux-2.6/drivers/base/syscore.c
> > > > > > ===================================================================
> > > > > > --- /dev/null
> > > > > > +++ linux-2.6/drivers/base/syscore.c
> > > > >
> > > > > It's true that the existing sys.c file lies in drivers/base; this is
> > > > > presumably because it handles a bunch of class-related registration
> > > > > stuff. Now you're getting rid of all that, leaving just the
> > > > > power-related operations, so doesn't it make more sense to put this
> > > > > file in drivers/base/power?
> > > > >
> > > > > > +/**
> > > > > > + * syscore_suspend - Execute all the registered system core suspend callbacks.
> > > > > > + *
> > > > > > + * This function is executed with one CPU on-line and disabled interrupts.
> > > > > > + */
> > > > > > +int syscore_suspend(void)
> > > > > > +{
> > > > > > + struct syscore_ops *ops;
> > > > > > +
> > > > > > + list_for_each_entry_reverse(ops, &syscore_ops_list, node)
> > > > > > + if (ops->suspend) {
> > > > > > + int ret = ops->suspend();
> > > > > > + if (ret) {
> > > > > > + pr_err("PM: System core suspend callback "
> > > > > > + "%pF failed.\n", ops->suspend);
> > > > > > + return ret;
> > > > >
> > > > > If an error occurs, you need to go back and resume all the things that
> > > > > were suspended. At least, that's what the code in sysdev_suspend does.
> > > > >
> > > > > > + }
> > > > > > + }
> > > > > > +
> > > > > > + return 0;
> > > > > > +}
> > > >
> > > > Below is a new version of the patch. I've taken your comment on the failing
> > > > suspend into account, fix the list traversal direction in syscore_shutdown()
> > > > and added some debug statements.
> > > >
> > > > Thanks,
> > > > Rafael
> > > >
> > > > ---
> > > > Some subsystems need to carry out suspend/resume and shutdown
> > > > operations with one CPU on-line and interrupts disabled. The only
> > > > way to register such operations is to define a sysdev class and
> > > > a sysdev specifically for this purpose which is cumbersome and
> > > > inefficient. Moreover, the arguments taken by sysdev suspend,
> > > > resume and shutdown callbacks are practically never necessary.
> > > >
> > > > For this reason, introduce a simpler interface allowing subsystems
> > > > to register operations to be executed very late during system suspend
> > > > and shutdown and very early during resume in the form of
> > > > strcut syscore_ops objects.
> > > >
> > > > ---
> > > > drivers/base/Makefile | 2
> > > > drivers/base/syscore.c | 107 ++++++++++++++++++++++++++++++++++++++++++++
> > > > include/linux/syscore_ops.h | 29 +++++++++++
> > > > kernel/power/hibernate.c | 9 +++
> > > > kernel/power/suspend.c | 4 +
> > > > kernel/sys.c | 4 +
> > > > 6 files changed, 154 insertions(+), 1 deletion(-)
> > > >
> > > > Index: linux-2.6/include/linux/syscore_ops.h
> > > > ===================================================================
> > > > --- /dev/null
> > > > +++ linux-2.6/include/linux/syscore_ops.h
> > > > @@ -0,0 +1,29 @@
> > > > +/*
> > > > + * syscore_ops.h - System core operations.
> > > > + *
> > > > + * Copyright (C) 2011 Rafael J. Wysocki <[email protected]>, Novell Inc.
> > > > + *
> > > > + * This file is released under the GPLv2.
> > > > + */
> > > > +
> > > > +#ifndef _LINUX_SYSCORE_OPS_H
> > > > +#define _LINUX_SYSCORE_OPS_H
> > > > +
> > > > +#include <linux/list.h>
> > > > +
> > > > +struct syscore_ops {
> > > > + struct list_head node;
> > > > + int (*suspend)(void);
> > > > + void (*resume)(void);
> > > > + void (*shutdown)(void);
> > > > +};
> > > > +
> > > > +extern void register_syscore_ops(struct syscore_ops *ops);
> > > > +extern void unregister_syscore_ops(struct syscore_ops *ops);
> > > > +#ifdef CONFIG_PM_SLEEP
> > > > +extern int syscore_suspend(void);
> > > > +extern void syscore_resume(void);
> > > > +#endif
> > >
> > > Minor nit, provide inline functions for these when CONFIG_PM_SLEEP is
> > > not defined so the code still builds?
> >
> > The code using them depends on CONFIG_PM_SLEEP and they are nobody else's
> > business. :-)
>
> Ah, ok.
>
> > I could avoid using the #ifdef here, but I thought I'd make it clear that
> > these things were only available when CONFIG_PM_SLEEP was set.
>
> That's fine.
>
> > > Other than that, this looks great to me, thanks for doing this.
> >
> > No problem. :-)
> >
> > > Do you want me to take it through my tree, or yours?
> >
> > I can handle it if you give me an ack.
>
> Acked-by: Greg Kroah-Hartman <[email protected]>
>
> > Do you think I should push [1/2] alone for 2.6.39 or wait for the patches
> > converting subsystems to use this stuff to be ready? I think it'll take
> > some time to prepare them, especialy for things in the ARM tree that use
> > sysdevs in some interesting ways ...
>
> Send it for .39, and then start converting everyone over to using it.
> It's easier once the code is in place to handle the different trees,
> that way you don't have to worry about ordering issues.
OK, I will.
Thanks,
Rafael
On Fri, Mar 11, 2011 at 09:29:24PM +0100, Rafael J. Wysocki wrote:
> I thought about two different possible ways forward:
>
> (1) Push [1/2] and the patches converting things that x86 depends on first,
> followed perhaps by a patch introducing something like
> CONFIG_ARCH_NO_SYSDEV_OPS that would simply disable
> sysdev_{suspend|resume|shutdown}() (x86 would set it). The other arches
> might then be converted over time.
>
> (2) Prepare patches converting everything that can be converted in the tree
> and push them all in one shot.
>
> The advantage of (1) is that we can start making changes RSN and the
> advantage of (2) seems to be that we may avoid some potential suspend/resume
> ordering issues on non-x86 architectures that may arise in principle if some
> subsystems are converted to using struct syscore_ops while the others are
> not (syscore_suspend() is executed after sysdev_suspend(), so if we move
> something from the latter to the former, it may end up being executed after
> things that it was executed before previously).
>
> Please let me know what your opinion is.
Hm, I would prefer (1) as that lets us get this moving sooner, and "flag
days" are never good to have. If there are problems that arise because
of it, as you have noted, it will be simple just to convert the parts
that were using the "old" methods to the new ones to fix the issue,
right?
thanks,
greg k-h
On Friday, March 11, 2011, Greg KH wrote:
> On Fri, Mar 11, 2011 at 09:29:24PM +0100, Rafael J. Wysocki wrote:
> > I thought about two different possible ways forward:
> >
> > (1) Push [1/2] and the patches converting things that x86 depends on first,
> > followed perhaps by a patch introducing something like
> > CONFIG_ARCH_NO_SYSDEV_OPS that would simply disable
> > sysdev_{suspend|resume|shutdown}() (x86 would set it). The other arches
> > might then be converted over time.
> >
> > (2) Prepare patches converting everything that can be converted in the tree
> > and push them all in one shot.
> >
> > The advantage of (1) is that we can start making changes RSN and the
> > advantage of (2) seems to be that we may avoid some potential suspend/resume
> > ordering issues on non-x86 architectures that may arise in principle if some
> > subsystems are converted to using struct syscore_ops while the others are
> > not (syscore_suspend() is executed after sysdev_suspend(), so if we move
> > something from the latter to the former, it may end up being executed after
> > things that it was executed before previously).
> >
> > Please let me know what your opinion is.
>
> Hm, I would prefer (1) as that lets us get this moving sooner, and "flag
> days" are never good to have. If there are problems that arise because
> of it, as you have noted, it will be simple just to convert the parts
> that were using the "old" methods to the new ones to fix the issue,
> right?
Yes, I agree.
Thanks,
Rafael
From: Rafael J. Wysocki <[email protected]>
The timekeeping subsystem uses a sysdev class and a sysdev for
executing timekeeping_suspend() after interrupts have been turned off
on the boot CPU (during system suspend) and for executing
timekeeping_resume() before turning on interrupts on the boot CPU
(during system resume). However, since both of these functions
ignore their arguments, the entire mechanism may be replaced with a
struct syscore_ops object which is simpler.
Signed-off-by: Rafael J. Wysocki <[email protected]>
---
kernel/time/timekeeping.c | 27 ++++++++-------------------
1 file changed, 8 insertions(+), 19 deletions(-)
Index: linux-2.6/kernel/time/timekeeping.c
===================================================================
--- linux-2.6.orig/kernel/time/timekeeping.c
+++ linux-2.6/kernel/time/timekeeping.c
@@ -14,7 +14,7 @@
#include <linux/init.h>
#include <linux/mm.h>
#include <linux/sched.h>
-#include <linux/sysdev.h>
+#include <linux/syscore_ops.h>
#include <linux/clocksource.h>
#include <linux/jiffies.h>
#include <linux/time.h>
@@ -561,13 +561,12 @@ static struct timespec timekeeping_suspe
/**
* timekeeping_resume - Resumes the generic timekeeping subsystem.
- * @dev: unused
*
* This is for the generic clocksource timekeeping.
* xtime/wall_to_monotonic/jiffies/etc are
* still managed by arch specific suspend/resume code.
*/
-static int timekeeping_resume(struct sys_device *dev)
+static void timekeeping_resume(void)
{
unsigned long flags;
struct timespec ts;
@@ -596,11 +595,9 @@ static int timekeeping_resume(struct sys
/* Resume hrtimers */
hres_timers_resume();
-
- return 0;
}
-static int timekeeping_suspend(struct sys_device *dev, pm_message_t state)
+static int timekeeping_suspend(void)
{
unsigned long flags;
@@ -618,26 +615,18 @@ static int timekeeping_suspend(struct sy
}
/* sysfs resume/suspend bits for timekeeping */
-static struct sysdev_class timekeeping_sysclass = {
- .name = "timekeeping",
+static struct syscore_ops timekeeping_syscore_ops = {
.resume = timekeeping_resume,
.suspend = timekeeping_suspend,
};
-static struct sys_device device_timer = {
- .id = 0,
- .cls = &timekeeping_sysclass,
-};
-
-static int __init timekeeping_init_device(void)
+static int __init timekeeping_init_ops(void)
{
- int error = sysdev_class_register(&timekeeping_sysclass);
- if (!error)
- error = sysdev_register(&device_timer);
- return error;
+ register_syscore_ops(&timekeeping_syscore_ops);
+ return 0;
}
-device_initcall(timekeeping_init_device);
+device_initcall(timekeeping_init_ops);
/*
* If the error is already larger, we look ahead even further
From: Rafael J. Wysocki <[email protected]>
Some subsystems need to carry out suspend/resume and shutdown
operations with one CPU on-line and interrupts disabled. The only
way to register such operations is to define a sysdev class and
a sysdev specifically for this purpose which is cumbersome and
inefficient. Moreover, the arguments taken by sysdev suspend,
resume and shutdown callbacks are practically never necessary.
For this reason, introduce a simpler interface allowing subsystems
to register operations to be executed very late during system suspend
and shutdown and very early during resume in the form of
strcut syscore_ops objects.
Signed-off-by: Rafael J. Wysocki <[email protected]>
Acked-by: Greg Kroah-Hartman <[email protected]>
---
drivers/base/Makefile | 2
drivers/base/syscore.c | 117 ++++++++++++++++++++++++++++++++++++++++++++
include/linux/syscore_ops.h | 29 ++++++++++
kernel/power/hibernate.c | 9 +++
kernel/power/suspend.c | 4 +
kernel/sys.c | 4 +
6 files changed, 164 insertions(+), 1 deletion(-)
Index: linux-2.6/include/linux/syscore_ops.h
===================================================================
--- /dev/null
+++ linux-2.6/include/linux/syscore_ops.h
@@ -0,0 +1,29 @@
+/*
+ * syscore_ops.h - System core operations.
+ *
+ * Copyright (C) 2011 Rafael J. Wysocki <[email protected]>, Novell Inc.
+ *
+ * This file is released under the GPLv2.
+ */
+
+#ifndef _LINUX_SYSCORE_OPS_H
+#define _LINUX_SYSCORE_OPS_H
+
+#include <linux/list.h>
+
+struct syscore_ops {
+ struct list_head node;
+ int (*suspend)(void);
+ void (*resume)(void);
+ void (*shutdown)(void);
+};
+
+extern void register_syscore_ops(struct syscore_ops *ops);
+extern void unregister_syscore_ops(struct syscore_ops *ops);
+#ifdef CONFIG_PM_SLEEP
+extern int syscore_suspend(void);
+extern void syscore_resume(void);
+#endif
+extern void syscore_shutdown(void);
+
+#endif
Index: linux-2.6/drivers/base/syscore.c
===================================================================
--- /dev/null
+++ linux-2.6/drivers/base/syscore.c
@@ -0,0 +1,117 @@
+/*
+ * syscore.c - Execution of system core operations.
+ *
+ * Copyright (C) 2011 Rafael J. Wysocki <[email protected]>, Novell Inc.
+ *
+ * This file is released under the GPLv2.
+ */
+
+#include <linux/syscore_ops.h>
+#include <linux/mutex.h>
+#include <linux/module.h>
+
+static LIST_HEAD(syscore_ops_list);
+static DEFINE_MUTEX(syscore_ops_lock);
+
+/**
+ * register_syscore_ops - Register a set of system core operations.
+ * @ops: System core operations to register.
+ */
+void register_syscore_ops(struct syscore_ops *ops)
+{
+ mutex_lock(&syscore_ops_lock);
+ list_add_tail(&ops->node, &syscore_ops_list);
+ mutex_unlock(&syscore_ops_lock);
+}
+EXPORT_SYMBOL_GPL(register_syscore_ops);
+
+/**
+ * unregister_syscore_ops - Unregister a set of system core operations.
+ * @ops: System core operations to unregister.
+ */
+void unregister_syscore_ops(struct syscore_ops *ops)
+{
+ mutex_lock(&syscore_ops_lock);
+ list_del(&ops->node);
+ mutex_unlock(&syscore_ops_lock);
+}
+EXPORT_SYMBOL_GPL(unregister_syscore_ops);
+
+#ifdef CONFIG_PM_SLEEP
+/**
+ * syscore_suspend - Execute all the registered system core suspend callbacks.
+ *
+ * This function is executed with one CPU on-line and disabled interrupts.
+ */
+int syscore_suspend(void)
+{
+ struct syscore_ops *ops;
+ int ret = 0;
+
+ WARN_ONCE(!irqs_disabled(),
+ "Interrupts enabled before system core suspend.\n");
+
+ list_for_each_entry_reverse(ops, &syscore_ops_list, node)
+ if (ops->suspend) {
+ if (initcall_debug)
+ pr_info("PM: Calling %pF\n", ops->suspend);
+ ret = ops->suspend();
+ if (ret)
+ goto err_out;
+ WARN_ONCE(!irqs_disabled(),
+ "Interrupts enabled after %pF\n", ops->suspend);
+ }
+
+ return 0;
+
+ err_out:
+ pr_err("PM: System core suspend callback %pF failed.\n", ops->suspend);
+
+ list_for_each_entry_continue(ops, &syscore_ops_list, node)
+ if (ops->resume)
+ ops->resume();
+
+ return ret;
+}
+
+/**
+ * syscore_resume - Execute all the registered system core resume callbacks.
+ *
+ * This function is executed with one CPU on-line and disabled interrupts.
+ */
+void syscore_resume(void)
+{
+ struct syscore_ops *ops;
+
+ WARN_ONCE(!irqs_disabled(),
+ "Interrupts enabled before system core resume.\n");
+
+ list_for_each_entry(ops, &syscore_ops_list, node)
+ if (ops->resume) {
+ if (initcall_debug)
+ pr_info("PM: Calling %pF\n", ops->resume);
+ ops->resume();
+ WARN_ONCE(!irqs_disabled(),
+ "Interrupts enabled after %pF\n", ops->resume);
+ }
+}
+#endif /* CONFIG_PM_SLEEP */
+
+/**
+ * syscore_shutdown - Execute all the registered system core shutdown callbacks.
+ */
+void syscore_shutdown(void)
+{
+ struct syscore_ops *ops;
+
+ mutex_lock(&syscore_ops_lock);
+
+ list_for_each_entry_reverse(ops, &syscore_ops_list, node)
+ if (ops->shutdown) {
+ if (initcall_debug)
+ pr_info("PM: Calling %pF\n", ops->shutdown);
+ ops->shutdown();
+ }
+
+ mutex_unlock(&syscore_ops_lock);
+}
Index: linux-2.6/kernel/power/suspend.c
===================================================================
--- linux-2.6.orig/kernel/power/suspend.c
+++ linux-2.6/kernel/power/suspend.c
@@ -22,6 +22,7 @@
#include <linux/mm.h>
#include <linux/slab.h>
#include <linux/suspend.h>
+#include <linux/syscore_ops.h>
#include <trace/events/power.h>
#include "power.h"
@@ -163,11 +164,14 @@ static int suspend_enter(suspend_state_t
BUG_ON(!irqs_disabled());
error = sysdev_suspend(PMSG_SUSPEND);
+ if (!error)
+ error = syscore_suspend();
if (!error) {
if (!(suspend_test(TEST_CORE) || pm_wakeup_pending())) {
error = suspend_ops->enter(state);
events_check_enabled = false;
}
+ syscore_resume();
sysdev_resume();
}
Index: linux-2.6/kernel/power/hibernate.c
===================================================================
--- linux-2.6.orig/kernel/power/hibernate.c
+++ linux-2.6/kernel/power/hibernate.c
@@ -23,6 +23,7 @@
#include <linux/cpu.h>
#include <linux/freezer.h>
#include <linux/gfp.h>
+#include <linux/syscore_ops.h>
#include <scsi/scsi_scan.h>
#include <asm/suspend.h>
@@ -272,6 +273,8 @@ static int create_image(int platform_mod
local_irq_disable();
error = sysdev_suspend(PMSG_FREEZE);
+ if (!error)
+ error = syscore_suspend();
if (error) {
printk(KERN_ERR "PM: Some system devices failed to power down, "
"aborting hibernation\n");
@@ -295,6 +298,7 @@ static int create_image(int platform_mod
}
Power_up:
+ syscore_resume();
sysdev_resume();
/* NOTE: dpm_resume_noirq() is just a resume() for devices
* that suspended with irqs off ... no overall powerup.
@@ -403,6 +407,8 @@ static int resume_target_kernel(bool pla
local_irq_disable();
error = sysdev_suspend(PMSG_QUIESCE);
+ if (!error)
+ error = syscore_suspend();
if (error)
goto Enable_irqs;
@@ -429,6 +435,7 @@ static int resume_target_kernel(bool pla
restore_processor_state();
touch_softlockup_watchdog();
+ syscore_resume();
sysdev_resume();
Enable_irqs:
@@ -516,6 +523,7 @@ int hibernation_platform_enter(void)
local_irq_disable();
sysdev_suspend(PMSG_HIBERNATE);
+ syscore_suspend();
if (pm_wakeup_pending()) {
error = -EAGAIN;
goto Power_up;
@@ -526,6 +534,7 @@ int hibernation_platform_enter(void)
while (1);
Power_up:
+ syscore_resume();
sysdev_resume();
local_irq_enable();
enable_nonboot_cpus();
Index: linux-2.6/kernel/sys.c
===================================================================
--- linux-2.6.orig/kernel/sys.c
+++ linux-2.6/kernel/sys.c
@@ -37,6 +37,7 @@
#include <linux/ptrace.h>
#include <linux/fs_struct.h>
#include <linux/gfp.h>
+#include <linux/syscore_ops.h>
#include <linux/compat.h>
#include <linux/syscalls.h>
@@ -298,6 +299,7 @@ void kernel_restart_prepare(char *cmd)
system_state = SYSTEM_RESTART;
device_shutdown();
sysdev_shutdown();
+ syscore_shutdown();
}
/**
@@ -336,6 +338,7 @@ void kernel_halt(void)
{
kernel_shutdown_prepare(SYSTEM_HALT);
sysdev_shutdown();
+ syscore_shutdown();
printk(KERN_EMERG "System halted.\n");
kmsg_dump(KMSG_DUMP_HALT);
machine_halt();
@@ -355,6 +358,7 @@ void kernel_power_off(void)
pm_power_off_prepare();
disable_nonboot_cpus();
sysdev_shutdown();
+ syscore_shutdown();
printk(KERN_EMERG "Power down.\n");
kmsg_dump(KMSG_DUMP_POWEROFF);
machine_power_off();
Index: linux-2.6/drivers/base/Makefile
===================================================================
--- linux-2.6.orig/drivers/base/Makefile
+++ linux-2.6/drivers/base/Makefile
@@ -1,6 +1,6 @@
# Makefile for the Linux device tree
-obj-y := core.o sys.o bus.o dd.o \
+obj-y := core.o sys.o bus.o dd.o syscore.o \
driver.o class.o platform.o \
cpu.o firmware.o init.o map.o devres.o \
attribute_container.o transport_class.o
From: Rafael J. Wysocki <[email protected]>
Some subsystems in the x86 tree need to carry out suspend/resume and
shutdown operations with one CPU on-line and interrupts disabled and
they define sysdev classes and sysdevs or sysdev drivers for this
purpose. This leads to unnecessarily complicated code and excessive
memory usage, so switch them to using struct syscore_ops objects for
this purpose instead.
Generally, there are three categories of subsystems that use
sysdevs for implementing PM operations: (1) subsystems whose
suspend/resume callbacks ignore their arguments entirely (the
majority), (2) subsystems whose suspend/resume callbacks use their
struct sys_device argument, but don't really need to do that,
because they can be implemented differently in an arguably simpler
way (io_apic.c), and (3) subsystems whose suspend/resume callbacks
use their struct sys_device argument, but the value of that argument
is always the same and could be ignored (microcode_core.c). In all
of these cases the subsystems in question may be readily converted to
using struct syscore_ops objects for power management and shutdown.
Signed-off-by: Rafael J. Wysocki <[email protected]>
---
arch/x86/kernel/amd_iommu_init.c | 26 ++--------
arch/x86/kernel/apic/apic.c | 29 +++--------
arch/x86/kernel/apic/io_apic.c | 97 ++++++++++++++++++---------------------
arch/x86/kernel/cpu/mcheck/mce.c | 21 ++++----
arch/x86/kernel/cpu/mtrr/main.c | 10 ++--
arch/x86/kernel/i8237.c | 30 ++----------
arch/x86/kernel/i8259.c | 33 ++++---------
arch/x86/kernel/microcode_core.c | 34 +++++--------
arch/x86/kernel/pci-gart_64.c | 32 ++----------
arch/x86/oprofile/nmi_int.c | 44 ++++-------------
10 files changed, 127 insertions(+), 229 deletions(-)
Index: linux-2.6/arch/x86/kernel/amd_iommu_init.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/amd_iommu_init.c
+++ linux-2.6/arch/x86/kernel/amd_iommu_init.c
@@ -21,7 +21,7 @@
#include <linux/acpi.h>
#include <linux/list.h>
#include <linux/slab.h>
-#include <linux/sysdev.h>
+#include <linux/syscore_ops.h>
#include <linux/interrupt.h>
#include <linux/msi.h>
#include <asm/pci-direct.h>
@@ -1260,7 +1260,7 @@ static void disable_iommus(void)
* disable suspend until real resume implemented
*/
-static int amd_iommu_resume(struct sys_device *dev)
+static void amd_iommu_resume(void)
{
struct amd_iommu *iommu;
@@ -1276,11 +1276,9 @@ static int amd_iommu_resume(struct sys_d
*/
amd_iommu_flush_all_devices();
amd_iommu_flush_all_domains();
-
- return 0;
}
-static int amd_iommu_suspend(struct sys_device *dev, pm_message_t state)
+static int amd_iommu_suspend(void)
{
/* disable IOMMUs to go out of the way for BIOS */
disable_iommus();
@@ -1288,17 +1286,11 @@ static int amd_iommu_suspend(struct sys_
return 0;
}
-static struct sysdev_class amd_iommu_sysdev_class = {
- .name = "amd_iommu",
+static struct syscore_ops amd_iommu_syscore_ops = {
.suspend = amd_iommu_suspend,
.resume = amd_iommu_resume,
};
-static struct sys_device device_amd_iommu = {
- .id = 0,
- .cls = &amd_iommu_sysdev_class,
-};
-
/*
* This is the core init function for AMD IOMMU hardware in the system.
* This function is called from the generic x86 DMA layer initialization
@@ -1415,14 +1407,6 @@ static int __init amd_iommu_init(void)
goto free;
}
- ret = sysdev_class_register(&amd_iommu_sysdev_class);
- if (ret)
- goto free;
-
- ret = sysdev_register(&device_amd_iommu);
- if (ret)
- goto free;
-
ret = amd_iommu_init_devices();
if (ret)
goto free;
@@ -1441,6 +1425,8 @@ static int __init amd_iommu_init(void)
amd_iommu_init_notifier();
+ register_syscore_ops(&amd_iommu_syscore_ops);
+
if (iommu_pass_through)
goto out;
Index: linux-2.6/arch/x86/kernel/apic/apic.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/apic/apic.c
+++ linux-2.6/arch/x86/kernel/apic/apic.c
@@ -24,7 +24,7 @@
#include <linux/ftrace.h>
#include <linux/ioport.h>
#include <linux/module.h>
-#include <linux/sysdev.h>
+#include <linux/syscore_ops.h>
#include <linux/delay.h>
#include <linux/timex.h>
#include <linux/dmar.h>
@@ -2036,7 +2036,7 @@ static struct {
unsigned int apic_thmr;
} apic_pm_state;
-static int lapic_suspend(struct sys_device *dev, pm_message_t state)
+static int lapic_suspend(void)
{
unsigned long flags;
int maxlvt;
@@ -2074,7 +2074,7 @@ static int lapic_suspend(struct sys_devi
return 0;
}
-static int lapic_resume(struct sys_device *dev)
+static void lapic_resume(void)
{
unsigned int l, h;
unsigned long flags;
@@ -2083,7 +2083,7 @@ static int lapic_resume(struct sys_devic
struct IO_APIC_route_entry **ioapic_entries = NULL;
if (!apic_pm_state.active)
- return 0;
+ return;
local_irq_save(flags);
if (intr_remapping_enabled) {
@@ -2152,8 +2152,6 @@ static int lapic_resume(struct sys_devic
}
restore:
local_irq_restore(flags);
-
- return ret;
}
/*
@@ -2161,17 +2159,11 @@ restore:
* are needed on every CPU up until machine_halt/restart/poweroff.
*/
-static struct sysdev_class lapic_sysclass = {
- .name = "lapic",
+static struct syscore_ops lapic_syscore_ops = {
.resume = lapic_resume,
.suspend = lapic_suspend,
};
-static struct sys_device device_lapic = {
- .id = 0,
- .cls = &lapic_sysclass,
-};
-
static void __cpuinit apic_pm_activate(void)
{
apic_pm_state.active = 1;
@@ -2179,16 +2171,11 @@ static void __cpuinit apic_pm_activate(v
static int __init init_lapic_sysfs(void)
{
- int error;
-
- if (!cpu_has_apic)
- return 0;
/* XXX: remove suspend/resume procs if !apic_pm_state.active? */
+ if (cpu_has_apic)
+ register_syscore_ops(&lapic_syscore_ops);
- error = sysdev_class_register(&lapic_sysclass);
- if (!error)
- error = sysdev_register(&device_lapic);
- return error;
+ return 0;
}
/* local apic needs to resume before other devices access its registers. */
Index: linux-2.6/arch/x86/kernel/apic/io_apic.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/apic/io_apic.c
+++ linux-2.6/arch/x86/kernel/apic/io_apic.c
@@ -30,7 +30,7 @@
#include <linux/compiler.h>
#include <linux/acpi.h>
#include <linux/module.h>
-#include <linux/sysdev.h>
+#include <linux/syscore_ops.h>
#include <linux/msi.h>
#include <linux/htirq.h>
#include <linux/freezer.h>
@@ -2948,89 +2948,84 @@ static int __init io_apic_bug_finalize(v
late_initcall(io_apic_bug_finalize);
-struct sysfs_ioapic_data {
- struct sys_device dev;
- struct IO_APIC_route_entry entry[0];
-};
-static struct sysfs_ioapic_data * mp_ioapic_data[MAX_IO_APICS];
+static struct IO_APIC_route_entry *ioapic_saved_data[MAX_IO_APICS];
-static int ioapic_suspend(struct sys_device *dev, pm_message_t state)
+static void suspend_ioapic(int ioapic_id)
{
- struct IO_APIC_route_entry *entry;
- struct sysfs_ioapic_data *data;
+ struct IO_APIC_route_entry *saved_data = ioapic_saved_data[ioapic_id];
int i;
- data = container_of(dev, struct sysfs_ioapic_data, dev);
- entry = data->entry;
- for (i = 0; i < nr_ioapic_registers[dev->id]; i ++, entry ++ )
- *entry = ioapic_read_entry(dev->id, i);
+ if (!saved_data)
+ return;
+
+ for (i = 0; i < nr_ioapic_registers[ioapic_id]; i++)
+ saved_data[i] = ioapic_read_entry(ioapic_id, i);
+}
+
+static int ioapic_suspend(void)
+{
+ int ioapic_id;
+
+ for (ioapic_id = 0; ioapic_id < nr_ioapics; ioapic_id++)
+ suspend_ioapic(ioapic_id);
return 0;
}
-static int ioapic_resume(struct sys_device *dev)
+static void resume_ioapic(int ioapic_id)
{
- struct IO_APIC_route_entry *entry;
- struct sysfs_ioapic_data *data;
+ struct IO_APIC_route_entry *saved_data = ioapic_saved_data[ioapic_id];
unsigned long flags;
union IO_APIC_reg_00 reg_00;
int i;
- data = container_of(dev, struct sysfs_ioapic_data, dev);
- entry = data->entry;
+ if (!saved_data)
+ return;
raw_spin_lock_irqsave(&ioapic_lock, flags);
- reg_00.raw = io_apic_read(dev->id, 0);
- if (reg_00.bits.ID != mp_ioapics[dev->id].apicid) {
- reg_00.bits.ID = mp_ioapics[dev->id].apicid;
- io_apic_write(dev->id, 0, reg_00.raw);
+ reg_00.raw = io_apic_read(ioapic_id, 0);
+ if (reg_00.bits.ID != mp_ioapics[ioapic_id].apicid) {
+ reg_00.bits.ID = mp_ioapics[ioapic_id].apicid;
+ io_apic_write(ioapic_id, 0, reg_00.raw);
}
raw_spin_unlock_irqrestore(&ioapic_lock, flags);
- for (i = 0; i < nr_ioapic_registers[dev->id]; i++)
- ioapic_write_entry(dev->id, i, entry[i]);
+ for (i = 0; i < nr_ioapic_registers[ioapic_id]; i++)
+ ioapic_write_entry(ioapic_id, i, saved_data[i]);
+}
- return 0;
+static void ioapic_resume(void)
+{
+ int ioapic_id;
+
+ for (ioapic_id = nr_ioapics - 1; ioapic_id >= 0; ioapic_id--)
+ resume_ioapic(ioapic_id);
}
-static struct sysdev_class ioapic_sysdev_class = {
- .name = "ioapic",
+static struct syscore_ops ioapic_syscore_ops = {
.suspend = ioapic_suspend,
.resume = ioapic_resume,
};
-static int __init ioapic_init_sysfs(void)
+static int __init ioapic_init_ops(void)
{
- struct sys_device * dev;
- int i, size, error;
+ int i;
- error = sysdev_class_register(&ioapic_sysdev_class);
- if (error)
- return error;
+ for (i = 0; i < nr_ioapics; i++) {
+ unsigned int size;
- for (i = 0; i < nr_ioapics; i++ ) {
- size = sizeof(struct sys_device) + nr_ioapic_registers[i]
+ size = nr_ioapic_registers[i]
* sizeof(struct IO_APIC_route_entry);
- mp_ioapic_data[i] = kzalloc(size, GFP_KERNEL);
- if (!mp_ioapic_data[i]) {
- printk(KERN_ERR "Can't suspend/resume IOAPIC %d\n", i);
- continue;
- }
- dev = &mp_ioapic_data[i]->dev;
- dev->id = i;
- dev->cls = &ioapic_sysdev_class;
- error = sysdev_register(dev);
- if (error) {
- kfree(mp_ioapic_data[i]);
- mp_ioapic_data[i] = NULL;
- printk(KERN_ERR "Can't suspend/resume IOAPIC %d\n", i);
- continue;
- }
+ ioapic_saved_data[i] = kzalloc(size, GFP_KERNEL);
+ if (!ioapic_saved_data[i])
+ pr_err("IOAPIC %d: suspend/resume impossible!\n", i);
}
+ register_syscore_ops(&ioapic_syscore_ops);
+
return 0;
}
-device_initcall(ioapic_init_sysfs);
+device_initcall(ioapic_init_ops);
/*
* Dynamic irq allocate and deallocation
Index: linux-2.6/arch/x86/kernel/i8237.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/i8237.c
+++ linux-2.6/arch/x86/kernel/i8237.c
@@ -10,7 +10,7 @@
*/
#include <linux/init.h>
-#include <linux/sysdev.h>
+#include <linux/syscore_ops.h>
#include <asm/dma.h>
@@ -21,7 +21,7 @@
* in asm/dma.h.
*/
-static int i8237A_resume(struct sys_device *dev)
+static void i8237A_resume(void)
{
unsigned long flags;
int i;
@@ -41,31 +41,15 @@ static int i8237A_resume(struct sys_devi
enable_dma(4);
release_dma_lock(flags);
-
- return 0;
}
-static int i8237A_suspend(struct sys_device *dev, pm_message_t state)
-{
- return 0;
-}
-
-static struct sysdev_class i8237_sysdev_class = {
- .name = "i8237",
- .suspend = i8237A_suspend,
+static struct syscore_ops i8237_syscore_ops = {
.resume = i8237A_resume,
};
-static struct sys_device device_i8237A = {
- .id = 0,
- .cls = &i8237_sysdev_class,
-};
-
-static int __init i8237A_init_sysfs(void)
+static int __init i8237A_init_ops(void)
{
- int error = sysdev_class_register(&i8237_sysdev_class);
- if (!error)
- error = sysdev_register(&device_i8237A);
- return error;
+ register_syscore_ops(&i8237_syscore_ops);
+ return 0;
}
-device_initcall(i8237A_init_sysfs);
+device_initcall(i8237A_init_ops);
Index: linux-2.6/arch/x86/kernel/i8259.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/i8259.c
+++ linux-2.6/arch/x86/kernel/i8259.c
@@ -8,7 +8,7 @@
#include <linux/random.h>
#include <linux/init.h>
#include <linux/kernel_stat.h>
-#include <linux/sysdev.h>
+#include <linux/syscore_ops.h>
#include <linux/bitops.h>
#include <linux/acpi.h>
#include <linux/io.h>
@@ -245,20 +245,19 @@ static void save_ELCR(char *trigger)
trigger[1] = inb(0x4d1) & 0xDE;
}
-static int i8259A_resume(struct sys_device *dev)
+static void i8259A_resume(void)
{
init_8259A(i8259A_auto_eoi);
restore_ELCR(irq_trigger);
- return 0;
}
-static int i8259A_suspend(struct sys_device *dev, pm_message_t state)
+static int i8259A_suspend(void)
{
save_ELCR(irq_trigger);
return 0;
}
-static int i8259A_shutdown(struct sys_device *dev)
+static void i8259A_shutdown(void)
{
/* Put the i8259A into a quiescent state that
* the kernel initialization code can get it
@@ -266,21 +265,14 @@ static int i8259A_shutdown(struct sys_de
*/
outb(0xff, PIC_MASTER_IMR); /* mask all of 8259A-1 */
outb(0xff, PIC_SLAVE_IMR); /* mask all of 8259A-1 */
- return 0;
}
-static struct sysdev_class i8259_sysdev_class = {
- .name = "i8259",
+static struct syscore_ops i8259_syscore_ops = {
.suspend = i8259A_suspend,
.resume = i8259A_resume,
.shutdown = i8259A_shutdown,
};
-static struct sys_device device_i8259A = {
- .id = 0,
- .cls = &i8259_sysdev_class,
-};
-
static void mask_8259A(void)
{
unsigned long flags;
@@ -399,17 +391,12 @@ struct legacy_pic default_legacy_pic = {
struct legacy_pic *legacy_pic = &default_legacy_pic;
-static int __init i8259A_init_sysfs(void)
+static int __init i8259A_init_ops(void)
{
- int error;
-
- if (legacy_pic != &default_legacy_pic)
- return 0;
+ if (legacy_pic == &default_legacy_pic)
+ register_syscore_ops(&i8259_syscore_ops);
- error = sysdev_class_register(&i8259_sysdev_class);
- if (!error)
- error = sysdev_register(&device_i8259A);
- return error;
+ return 0;
}
-device_initcall(i8259A_init_sysfs);
+device_initcall(i8259A_init_ops);
Index: linux-2.6/arch/x86/kernel/pci-gart_64.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/pci-gart_64.c
+++ linux-2.6/arch/x86/kernel/pci-gart_64.c
@@ -27,7 +27,7 @@
#include <linux/kdebug.h>
#include <linux/scatterlist.h>
#include <linux/iommu-helper.h>
-#include <linux/sysdev.h>
+#include <linux/syscore_ops.h>
#include <linux/io.h>
#include <linux/gfp.h>
#include <asm/atomic.h>
@@ -589,7 +589,7 @@ void set_up_gart_resume(u32 aper_order,
aperture_alloc = aper_alloc;
}
-static void gart_fixup_northbridges(struct sys_device *dev)
+static void gart_fixup_northbridges(void)
{
int i;
@@ -613,33 +613,20 @@ static void gart_fixup_northbridges(stru
}
}
-static int gart_resume(struct sys_device *dev)
+static void gart_resume(void)
{
pr_info("PCI-DMA: Resuming GART IOMMU\n");
- gart_fixup_northbridges(dev);
+ gart_fixup_northbridges();
enable_gart_translations();
-
- return 0;
}
-static int gart_suspend(struct sys_device *dev, pm_message_t state)
-{
- return 0;
-}
-
-static struct sysdev_class gart_sysdev_class = {
- .name = "gart",
- .suspend = gart_suspend,
+static struct syscore_ops gart_syscore_ops = {
.resume = gart_resume,
};
-static struct sys_device device_gart = {
- .cls = &gart_sysdev_class,
-};
-
/*
* Private Northbridge GATT initialization in case we cannot use the
* AGP driver for some reason.
@@ -650,7 +637,7 @@ static __init int init_amd_gatt(struct a
unsigned aper_base, new_aper_base;
struct pci_dev *dev;
void *gatt;
- int i, error;
+ int i;
pr_info("PCI-DMA: Disabling AGP.\n");
@@ -685,12 +672,7 @@ static __init int init_amd_gatt(struct a
agp_gatt_table = gatt;
- error = sysdev_class_register(&gart_sysdev_class);
- if (!error)
- error = sysdev_register(&device_gart);
- if (error)
- panic("Could not register gart_sysdev -- "
- "would corrupt data on next suspend");
+ register_syscore_ops(&gart_syscore_ops);
flush_gart();
Index: linux-2.6/arch/x86/oprofile/nmi_int.c
===================================================================
--- linux-2.6.orig/arch/x86/oprofile/nmi_int.c
+++ linux-2.6/arch/x86/oprofile/nmi_int.c
@@ -15,7 +15,7 @@
#include <linux/notifier.h>
#include <linux/smp.h>
#include <linux/oprofile.h>
-#include <linux/sysdev.h>
+#include <linux/syscore_ops.h>
#include <linux/slab.h>
#include <linux/moduleparam.h>
#include <linux/kdebug.h>
@@ -536,7 +536,7 @@ static void nmi_shutdown(void)
#ifdef CONFIG_PM
-static int nmi_suspend(struct sys_device *dev, pm_message_t state)
+static int nmi_suspend(void)
{
/* Only one CPU left, just stop that one */
if (nmi_enabled == 1)
@@ -544,49 +544,31 @@ static int nmi_suspend(struct sys_device
return 0;
}
-static int nmi_resume(struct sys_device *dev)
+static void nmi_resume(void)
{
if (nmi_enabled == 1)
nmi_cpu_start(NULL);
- return 0;
}
-static struct sysdev_class oprofile_sysclass = {
- .name = "oprofile",
+static struct syscore_ops oprofile_syscore_ops = {
.resume = nmi_resume,
.suspend = nmi_suspend,
};
-static struct sys_device device_oprofile = {
- .id = 0,
- .cls = &oprofile_sysclass,
-};
-
-static int __init init_sysfs(void)
+static void __init init_suspend_resume(void)
{
- int error;
-
- error = sysdev_class_register(&oprofile_sysclass);
- if (error)
- return error;
-
- error = sysdev_register(&device_oprofile);
- if (error)
- sysdev_class_unregister(&oprofile_sysclass);
-
- return error;
+ register_syscore_ops(&oprofile_syscore_ops);
}
-static void exit_sysfs(void)
+static void exit_suspend_resume(void)
{
- sysdev_unregister(&device_oprofile);
- sysdev_class_unregister(&oprofile_sysclass);
+ unregister_syscore_ops(&oprofile_syscore_ops);
}
#else
-static inline int init_sysfs(void) { return 0; }
-static inline void exit_sysfs(void) { }
+static inline void init_suspend_resume(void) { }
+static inline void exit_suspend_resume(void) { }
#endif /* CONFIG_PM */
@@ -789,9 +771,7 @@ int __init op_nmi_init(struct oprofile_o
mux_init(ops);
- ret = init_sysfs();
- if (ret)
- return ret;
+ init_suspend_resume();
printk(KERN_INFO "oprofile: using NMI interrupt.\n");
return 0;
@@ -799,5 +779,5 @@ int __init op_nmi_init(struct oprofile_o
void op_nmi_exit(void)
{
- exit_sysfs();
+ exit_suspend_resume();
}
Index: linux-2.6/arch/x86/kernel/cpu/mcheck/mce.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/cpu/mcheck/mce.c
+++ linux-2.6/arch/x86/kernel/cpu/mcheck/mce.c
@@ -21,6 +21,7 @@
#include <linux/percpu.h>
#include <linux/string.h>
#include <linux/sysdev.h>
+#include <linux/syscore_ops.h>
#include <linux/delay.h>
#include <linux/ctype.h>
#include <linux/sched.h>
@@ -1749,14 +1750,14 @@ static int mce_disable_error_reporting(v
return 0;
}
-static int mce_suspend(struct sys_device *dev, pm_message_t state)
+static int mce_suspend(void)
{
return mce_disable_error_reporting();
}
-static int mce_shutdown(struct sys_device *dev)
+static void mce_shutdown(void)
{
- return mce_disable_error_reporting();
+ mce_disable_error_reporting();
}
/*
@@ -1764,14 +1765,18 @@ static int mce_shutdown(struct sys_devic
* Only one CPU is active at this time, the others get re-added later using
* CPU hotplug:
*/
-static int mce_resume(struct sys_device *dev)
+static void mce_resume(void)
{
__mcheck_cpu_init_generic();
__mcheck_cpu_init_vendor(__this_cpu_ptr(&cpu_info));
-
- return 0;
}
+static struct syscore_ops mce_syscore_ops = {
+ .suspend = mce_suspend,
+ .shutdown = mce_shutdown,
+ .resume = mce_resume,
+};
+
static void mce_cpu_restart(void *data)
{
del_timer_sync(&__get_cpu_var(mce_timer));
@@ -1808,9 +1813,6 @@ static void mce_enable_ce(void *all)
}
static struct sysdev_class mce_sysclass = {
- .suspend = mce_suspend,
- .shutdown = mce_shutdown,
- .resume = mce_resume,
.name = "machinecheck",
};
@@ -2139,6 +2141,7 @@ static __init int mcheck_init_device(voi
return err;
}
+ register_syscore_ops(&mce_syscore_ops);
register_hotcpu_notifier(&mce_cpu_notifier);
misc_register(&mce_log_device);
Index: linux-2.6/arch/x86/kernel/cpu/mtrr/main.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/cpu/mtrr/main.c
+++ linux-2.6/arch/x86/kernel/cpu/mtrr/main.c
@@ -45,6 +45,7 @@
#include <linux/cpu.h>
#include <linux/pci.h>
#include <linux/smp.h>
+#include <linux/syscore_ops.h>
#include <asm/processor.h>
#include <asm/e820.h>
@@ -630,7 +631,7 @@ struct mtrr_value {
static struct mtrr_value mtrr_value[MTRR_MAX_VAR_RANGES];
-static int mtrr_save(struct sys_device *sysdev, pm_message_t state)
+static int mtrr_save(void)
{
int i;
@@ -642,7 +643,7 @@ static int mtrr_save(struct sys_device *
return 0;
}
-static int mtrr_restore(struct sys_device *sysdev)
+static void mtrr_restore(void)
{
int i;
@@ -653,12 +654,11 @@ static int mtrr_restore(struct sys_devic
mtrr_value[i].ltype);
}
}
- return 0;
}
-static struct sysdev_driver mtrr_sysdev_driver = {
+static struct syscore_ops mtrr_syscore_ops = {
.suspend = mtrr_save,
.resume = mtrr_restore,
};
@@ -839,7 +839,7 @@ static int __init mtrr_init_finialize(vo
* TBD: is there any system with such CPU which supports
* suspend/resume? If no, we should remove the code.
*/
- sysdev_driver_register(&cpu_sysdev_class, &mtrr_sysdev_driver);
+ register_syscore_ops(&mtrr_syscore_ops);
return 0;
}
Index: linux-2.6/arch/x86/kernel/microcode_core.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/microcode_core.c
+++ linux-2.6/arch/x86/kernel/microcode_core.c
@@ -82,6 +82,7 @@
#include <linux/cpu.h>
#include <linux/fs.h>
#include <linux/mm.h>
+#include <linux/syscore_ops.h>
#include <asm/microcode.h>
#include <asm/processor.h>
@@ -436,33 +437,25 @@ static int mc_sysdev_remove(struct sys_d
return 0;
}
-static int mc_sysdev_resume(struct sys_device *dev)
+static struct sysdev_driver mc_sysdev_driver = {
+ .add = mc_sysdev_add,
+ .remove = mc_sysdev_remove,
+};
+
+/**
+ * mc_bp_resume - Update boot CPU microcode during resume.
+ */
+static void mc_bp_resume(void)
{
- int cpu = dev->id;
+ int cpu = smp_processor_id();
struct ucode_cpu_info *uci = ucode_cpu_info + cpu;
- if (!cpu_online(cpu))
- return 0;
-
- /*
- * All non-bootup cpus are still disabled,
- * so only CPU 0 will apply ucode here.
- *
- * Moreover, there can be no concurrent
- * updates from any other places at this point.
- */
- WARN_ON(cpu != 0);
-
if (uci->valid && uci->mc)
microcode_ops->apply_microcode(cpu);
-
- return 0;
}
-static struct sysdev_driver mc_sysdev_driver = {
- .add = mc_sysdev_add,
- .remove = mc_sysdev_remove,
- .resume = mc_sysdev_resume,
+static struct syscore_ops mc_syscore_ops = {
+ .resume = mc_bp_resume,
};
static __cpuinit int
@@ -540,6 +533,7 @@ static int __init microcode_init(void)
if (error)
return error;
+ register_syscore_ops(&mc_syscore_ops);
register_hotcpu_notifier(&mc_cpu_notifier);
pr_info("Microcode Update Driver: v" MICROCODE_VERSION
From: Rafael J. Wysocki <[email protected]>
Introduce option allowing architectures where sysdev operations
used during system suspend, resume and shutdown have been completely
replaced with struct sycore_ops operations to avoid building sysdev
code that will never be used. Making callbacks in struct sys_device
and struct sysdev_driver depend on ARCH_NO_SYSDEV_OPS allows us to
verify if all of the references have been actually removed from the
code the given architecture depends on.
Make x86 set ARCH_NO_SYSDEV_OPS.
Signed-off-by: Rafael J. Wysocki <[email protected]>
---
arch/x86/Kconfig | 3 +++
drivers/base/sys.c | 3 ++-
include/linux/device.h | 4 ++++
include/linux/pm.h | 10 ++++++++--
include/linux/sysdev.h | 7 +++++--
5 files changed, 22 insertions(+), 5 deletions(-)
Index: linux-2.6/include/linux/pm.h
===================================================================
--- linux-2.6.orig/include/linux/pm.h
+++ linux-2.6/include/linux/pm.h
@@ -529,13 +529,19 @@ struct dev_power_domain {
*/
#ifdef CONFIG_PM_SLEEP
-extern void device_pm_lock(void);
+#ifndef CONFIG_ARCH_NO_SYSDEV_OPS
+extern int sysdev_suspend(pm_message_t state);
extern int sysdev_resume(void);
+#else
+static inline int sysdev_suspend(pm_message_t state) { return 0; }
+static inline int sysdev_resume(void) { return 0; }
+#endif
+
+extern void device_pm_lock(void);
extern void dpm_resume_noirq(pm_message_t state);
extern void dpm_resume_end(pm_message_t state);
extern void device_pm_unlock(void);
-extern int sysdev_suspend(pm_message_t state);
extern int dpm_suspend_noirq(pm_message_t state);
extern int dpm_suspend_start(pm_message_t state);
Index: linux-2.6/include/linux/sysdev.h
===================================================================
--- linux-2.6.orig/include/linux/sysdev.h
+++ linux-2.6/include/linux/sysdev.h
@@ -33,12 +33,13 @@ struct sysdev_class {
const char *name;
struct list_head drivers;
struct sysdev_class_attribute **attrs;
-
+ struct kset kset;
+#ifndef CONFIG_ARCH_NO_SYSDEV_OPS
/* Default operations for these types of devices */
int (*shutdown)(struct sys_device *);
int (*suspend)(struct sys_device *, pm_message_t state);
int (*resume)(struct sys_device *);
- struct kset kset;
+#endif
};
struct sysdev_class_attribute {
@@ -76,9 +77,11 @@ struct sysdev_driver {
struct list_head entry;
int (*add)(struct sys_device *);
int (*remove)(struct sys_device *);
+#ifndef CONFIG_ARCH_NO_SYSDEV_OPS
int (*shutdown)(struct sys_device *);
int (*suspend)(struct sys_device *, pm_message_t state);
int (*resume)(struct sys_device *);
+#endif
};
Index: linux-2.6/include/linux/device.h
===================================================================
--- linux-2.6.orig/include/linux/device.h
+++ linux-2.6/include/linux/device.h
@@ -635,8 +635,12 @@ static inline int devtmpfs_mount(const c
/* drivers/base/power/shutdown.c */
extern void device_shutdown(void);
+#ifndef CONFIG_ARCH_NO_SYSDEV_OPS
/* drivers/base/sys.c */
extern void sysdev_shutdown(void);
+#else
+static inline void sysdev_shutdown(void) { }
+#endif
/* debugging and troubleshooting/diagnostic helpers. */
extern const char *dev_driver_string(const struct device *dev);
Index: linux-2.6/arch/x86/Kconfig
===================================================================
--- linux-2.6.orig/arch/x86/Kconfig
+++ linux-2.6/arch/x86/Kconfig
@@ -184,6 +184,9 @@ config ARCH_HIBERNATION_POSSIBLE
config ARCH_SUSPEND_POSSIBLE
def_bool y
+config ARCH_NO_SYSDEV_OPS
+ def_bool y
+
config ZONE_DMA32
bool
default X86_64
Index: linux-2.6/drivers/base/sys.c
===================================================================
--- linux-2.6.orig/drivers/base/sys.c
+++ linux-2.6/drivers/base/sys.c
@@ -302,7 +302,7 @@ void sysdev_unregister(struct sys_device
}
-
+#ifndef CONFIG_ARCH_NO_SYSDEV_OPS
/**
* sysdev_shutdown - Shut down all system devices.
*
@@ -497,6 +497,7 @@ int sysdev_resume(void)
return 0;
}
EXPORT_SYMBOL_GPL(sysdev_resume);
+#endif /* CONFIG_ARCH_NO_SYSDEV_OPS */
int __init system_bus_init(void)
{
From: Rafael J. Wysocki <[email protected]>
The cpufreq subsystem uses sysdev suspend and resume for
executing cpufreq_suspend() and cpufreq_resume(), respectively,
during system suspend, after interrupts have been switched off on the
boot CPU, and during system resume, while interrupts are still off on
the boot CPU. In both cases the other CPUs are off-line at the
relevant point (either they have been switched off via CPU hotplug
during suspend, or they haven't been switched on yet during resume).
For this reason, although it may seem that cpufreq_suspend() and
cpufreq_resume() are executed for all CPUs in the system, they are
only called for the boot CPU in fact, which is quite confusing.
To remove the confusion and to prepare for elimiating sysdev
suspend and resume operations from the kernel enirely, convernt
cpufreq to using a struct syscore_ops object for the boot CPU
suspend and resume and rename the callbacks so that their names
reflect their purpose. In addition, put some explanatory remarks
into their kerneldoc comments.
Signed-off-by: Rafael J. Wysocki <[email protected]>
---
drivers/cpufreq/cpufreq.c | 43 +++++++++++++++++++++++--------------------
1 file changed, 23 insertions(+), 20 deletions(-)
Index: linux-2.6/drivers/cpufreq/cpufreq.c
===================================================================
--- linux-2.6.orig/drivers/cpufreq/cpufreq.c
+++ linux-2.6/drivers/cpufreq/cpufreq.c
@@ -28,6 +28,7 @@
#include <linux/cpu.h>
#include <linux/completion.h>
#include <linux/mutex.h>
+#include <linux/syscore_ops.h>
#include <trace/events/power.h>
@@ -1340,23 +1341,27 @@ out:
}
EXPORT_SYMBOL(cpufreq_get);
+static struct sysdev_driver cpufreq_sysdev_driver = {
+ .add = cpufreq_add_dev,
+ .remove = cpufreq_remove_dev,
+};
+
/**
- * cpufreq_suspend - let the low level driver prepare for suspend
+ * cpufreq_bp_suspend - Prepare the boot CPU for system suspend.
+ *
+ * This function is only executed for the boot processor. The other CPUs
+ * have been put offline by means of CPU hotplug.
*/
-
-static int cpufreq_suspend(struct sys_device *sysdev, pm_message_t pmsg)
+static int cpufreq_bp_suspend(void)
{
int ret = 0;
- int cpu = sysdev->id;
+ int cpu = smp_processor_id();
struct cpufreq_policy *cpu_policy;
dprintk("suspending cpu %u\n", cpu);
- if (!cpu_online(cpu))
- return 0;
-
/* we may be lax here as interrupts are off. Nonetheless
* we need to grab the correct cpu policy, as to check
* whether we really run on this CPU.
@@ -1383,7 +1388,7 @@ out:
}
/**
- * cpufreq_resume - restore proper CPU frequency handling after resume
+ * cpufreq_bp_resume - Restore proper frequency handling of the boot CPU.
*
* 1.) resume CPUfreq hardware support (cpufreq_driver->resume())
* 2.) schedule call cpufreq_update_policy() ASAP as interrupts are
@@ -1391,19 +1396,19 @@ out:
* what we believe it to be. This is a bit later than when it
* should be, but nonethteless it's better than calling
* cpufreq_driver->get() here which might re-enable interrupts...
+ *
+ * This function is only executed for the boot CPU. The other CPUs have not
+ * been turned on yet.
*/
-static int cpufreq_resume(struct sys_device *sysdev)
+static void cpufreq_bp_resume(void)
{
int ret = 0;
- int cpu = sysdev->id;
+ int cpu = smp_processor_id();
struct cpufreq_policy *cpu_policy;
dprintk("resuming cpu %u\n", cpu);
- if (!cpu_online(cpu))
- return 0;
-
/* we may be lax here as interrupts are off. Nonetheless
* we need to grab the correct cpu policy, as to check
* whether we really run on this CPU.
@@ -1411,7 +1416,7 @@ static int cpufreq_resume(struct sys_dev
cpu_policy = cpufreq_cpu_get(cpu);
if (!cpu_policy)
- return -EINVAL;
+ return;
/* only handle each CPU group once */
if (unlikely(cpu_policy->cpu != cpu))
@@ -1430,14 +1435,11 @@ static int cpufreq_resume(struct sys_dev
fail:
cpufreq_cpu_put(cpu_policy);
- return ret;
}
-static struct sysdev_driver cpufreq_sysdev_driver = {
- .add = cpufreq_add_dev,
- .remove = cpufreq_remove_dev,
- .suspend = cpufreq_suspend,
- .resume = cpufreq_resume,
+static struct syscore_ops cpufreq_syscore_ops = {
+ .suspend = cpufreq_bp_suspend,
+ .resume = cpufreq_bp_resume,
};
@@ -2002,6 +2004,7 @@ static int __init cpufreq_core_init(void
cpufreq_global_kobject = kobject_create_and_add("cpufreq",
&cpu_sysdev_class.kset.kobj);
BUG_ON(!cpufreq_global_kobject);
+ register_syscore_ops(&cpufreq_syscore_ops);
return 0;
}
From: Rafael J. Wysocki <[email protected]>
The Intel IOMMU subsystem uses a sysdev class and a sysdev for
executing iommu_suspend() after interrupts have been turned off
on the boot CPU (during system suspend) and for executing
iommu_resume() before turning on interrupts on the boot CPU
(during system resume). However, since both of these functions
ignore their arguments, the entire mechanism may be replaced with a
struct syscore_ops object which is simpler.
Signed-off-by: Rafael J. Wysocki <[email protected]>
---
drivers/pci/intel-iommu.c | 38 +++++++++-----------------------------
1 file changed, 9 insertions(+), 29 deletions(-)
Index: linux-2.6/drivers/pci/intel-iommu.c
===================================================================
--- linux-2.6.orig/drivers/pci/intel-iommu.c
+++ linux-2.6/drivers/pci/intel-iommu.c
@@ -36,7 +36,7 @@
#include <linux/iova.h>
#include <linux/iommu.h>
#include <linux/intel-iommu.h>
-#include <linux/sysdev.h>
+#include <linux/syscore_ops.h>
#include <linux/tboot.h>
#include <linux/dmi.h>
#include <asm/cacheflush.h>
@@ -3135,7 +3135,7 @@ static void iommu_flush_all(void)
}
}
-static int iommu_suspend(struct sys_device *dev, pm_message_t state)
+static int iommu_suspend(void)
{
struct dmar_drhd_unit *drhd;
struct intel_iommu *iommu = NULL;
@@ -3175,7 +3175,7 @@ nomem:
return -ENOMEM;
}
-static int iommu_resume(struct sys_device *dev)
+static void iommu_resume(void)
{
struct dmar_drhd_unit *drhd;
struct intel_iommu *iommu = NULL;
@@ -3183,7 +3183,7 @@ static int iommu_resume(struct sys_devic
if (init_iommu_hw()) {
WARN(1, "IOMMU setup failed, DMAR can not resume!\n");
- return -EIO;
+ return;
}
for_each_active_iommu(iommu, drhd) {
@@ -3204,40 +3204,20 @@ static int iommu_resume(struct sys_devic
for_each_active_iommu(iommu, drhd)
kfree(iommu->iommu_state);
-
- return 0;
}
-static struct sysdev_class iommu_sysclass = {
- .name = "iommu",
+static struct syscore_ops iommu_syscore_ops = {
.resume = iommu_resume,
.suspend = iommu_suspend,
};
-static struct sys_device device_iommu = {
- .cls = &iommu_sysclass,
-};
-
-static int __init init_iommu_sysfs(void)
+static void __init init_iommu_pm_ops(void)
{
- int error;
-
- error = sysdev_class_register(&iommu_sysclass);
- if (error)
- return error;
-
- error = sysdev_register(&device_iommu);
- if (error)
- sysdev_class_unregister(&iommu_sysclass);
-
- return error;
+ register_syscore_ops(&iommu_syscore_ops);
}
#else
-static int __init init_iommu_sysfs(void)
-{
- return 0;
-}
+static inline int init_iommu_pm_ops(void) { }
#endif /* CONFIG_PM */
/*
@@ -3320,7 +3300,7 @@ int __init intel_iommu_init(void)
#endif
dma_ops = &intel_dma_ops;
- init_iommu_sysfs();
+ init_iommu_pm_ops();
register_iommu(&intel_iommu_ops);
From: Rafael J. Wysocki <[email protected]>
KVM uses a sysdev class and a sysdev for executing kvm_suspend()
after interrupts have been turned off on the boot CPU (during system
suspend) and for executing kvm_resume() before turning on interrupts
on the boot CPU (during system resume). However, since both of these
functions ignore their arguments, the entire mechanism may be
replaced with a struct syscore_ops object which is simpler.
Signed-off-by: Rafael J. Wysocki <[email protected]>
---
virt/kvm/kvm_main.c | 34 ++++++++--------------------------
1 file changed, 8 insertions(+), 26 deletions(-)
Index: linux-2.6/virt/kvm/kvm_main.c
===================================================================
--- linux-2.6.orig/virt/kvm/kvm_main.c
+++ linux-2.6/virt/kvm/kvm_main.c
@@ -30,7 +30,7 @@
#include <linux/debugfs.h>
#include <linux/highmem.h>
#include <linux/file.h>
-#include <linux/sysdev.h>
+#include <linux/syscore_ops.h>
#include <linux/cpu.h>
#include <linux/sched.h>
#include <linux/cpumask.h>
@@ -2392,33 +2392,26 @@ static void kvm_exit_debug(void)
debugfs_remove(kvm_debugfs_dir);
}
-static int kvm_suspend(struct sys_device *dev, pm_message_t state)
+static int kvm_suspend(void)
{
if (kvm_usage_count)
hardware_disable_nolock(NULL);
return 0;
}
-static int kvm_resume(struct sys_device *dev)
+static void kvm_resume(void)
{
if (kvm_usage_count) {
WARN_ON(spin_is_locked(&kvm_lock));
hardware_enable_nolock(NULL);
}
- return 0;
}
-static struct sysdev_class kvm_sysdev_class = {
- .name = "kvm",
+static struct syscore_ops kvm_syscore_ops = {
.suspend = kvm_suspend,
.resume = kvm_resume,
};
-static struct sys_device kvm_sysdev = {
- .id = 0,
- .cls = &kvm_sysdev_class,
-};
-
struct page *bad_page;
pfn_t bad_pfn;
@@ -2502,14 +2495,6 @@ int kvm_init(void *opaque, unsigned vcpu
goto out_free_2;
register_reboot_notifier(&kvm_reboot_notifier);
- r = sysdev_class_register(&kvm_sysdev_class);
- if (r)
- goto out_free_3;
-
- r = sysdev_register(&kvm_sysdev);
- if (r)
- goto out_free_4;
-
/* A kmem cache lets us meet the alignment requirements of fx_save. */
if (!vcpu_align)
vcpu_align = __alignof__(struct kvm_vcpu);
@@ -2517,7 +2502,7 @@ int kvm_init(void *opaque, unsigned vcpu
0, NULL);
if (!kvm_vcpu_cache) {
r = -ENOMEM;
- goto out_free_5;
+ goto out_free_3;
}
r = kvm_async_pf_init();
@@ -2534,6 +2519,8 @@ int kvm_init(void *opaque, unsigned vcpu
goto out_unreg;
}
+ register_syscore_ops(&kvm_syscore_ops);
+
kvm_preempt_ops.sched_in = kvm_sched_in;
kvm_preempt_ops.sched_out = kvm_sched_out;
@@ -2545,10 +2532,6 @@ out_unreg:
kvm_async_pf_deinit();
out_free:
kmem_cache_destroy(kvm_vcpu_cache);
-out_free_5:
- sysdev_unregister(&kvm_sysdev);
-out_free_4:
- sysdev_class_unregister(&kvm_sysdev_class);
out_free_3:
unregister_reboot_notifier(&kvm_reboot_notifier);
unregister_cpu_notifier(&kvm_cpu_notifier);
@@ -2576,8 +2559,7 @@ void kvm_exit(void)
misc_deregister(&kvm_dev);
kmem_cache_destroy(kvm_vcpu_cache);
kvm_async_pf_deinit();
- sysdev_unregister(&kvm_sysdev);
- sysdev_class_unregister(&kvm_sysdev_class);
+ unregister_syscore_ops(&kvm_syscore_ops);
unregister_reboot_notifier(&kvm_reboot_notifier);
unregister_cpu_notifier(&kvm_cpu_notifier);
on_each_cpu(hardware_disable_nolock, NULL, 1);
From: Rafael J. Wysocki <[email protected]>
ACPI uses a sysdev class and a sysdev for executing
irqrouter_resume() before turning on interrupts on the boot CPU.
However, since irqrouter_resume() ignores its argument, the entire
mechanism may be replaced with a struct syscore_ops object which
is considerably simpler.
Signed-off-by: Rafael J. Wysocki <[email protected]>
---
drivers/acpi/pci_link.c | 30 ++++++++----------------------
1 file changed, 8 insertions(+), 22 deletions(-)
Index: linux-2.6/drivers/acpi/pci_link.c
===================================================================
--- linux-2.6.orig/drivers/acpi/pci_link.c
+++ linux-2.6/drivers/acpi/pci_link.c
@@ -29,7 +29,7 @@
* for IRQ management (e.g. start()->_SRS).
*/
-#include <linux/sysdev.h>
+#include <linux/syscore_ops.h>
#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/init.h>
@@ -757,14 +757,13 @@ static int acpi_pci_link_resume(struct a
return 0;
}
-static int irqrouter_resume(struct sys_device *dev)
+static void irqrouter_resume(void)
{
struct acpi_pci_link *link;
list_for_each_entry(link, &acpi_link_list, list) {
acpi_pci_link_resume(link);
}
- return 0;
}
static int acpi_pci_link_remove(struct acpi_device *device, int type)
@@ -871,32 +870,19 @@ static int __init acpi_irq_balance_set(c
__setup("acpi_irq_balance", acpi_irq_balance_set);
-/* FIXME: we will remove this interface after all drivers call pci_disable_device */
-static struct sysdev_class irqrouter_sysdev_class = {
- .name = "irqrouter",
+static struct syscore_ops irqrouter_syscore_ops = {
.resume = irqrouter_resume,
};
-static struct sys_device device_irqrouter = {
- .id = 0,
- .cls = &irqrouter_sysdev_class,
-};
-
-static int __init irqrouter_init_sysfs(void)
+static int __init irqrouter_init_ops(void)
{
- int error;
+ if (!acpi_disabled && !acpi_noirq)
+ register_syscore_ops(&irqrouter_syscore_ops);
- if (acpi_disabled || acpi_noirq)
- return 0;
-
- error = sysdev_class_register(&irqrouter_sysdev_class);
- if (!error)
- error = sysdev_register(&device_irqrouter);
-
- return error;
+ return 0;
}
-device_initcall(irqrouter_init_sysfs);
+device_initcall(irqrouter_init_ops);
static int __init acpi_pci_link_init(void)
{
Hi,
On Thursday, March 10, 2011, Rafael J. Wysocki wrote:
> There are multiple problems with sysdevs, or struct sys_device objects to
> be precise, that are so annoying that some people have started to think
> of removind them entirely from the kernel. To me, personally, the most
> obvious issue is the way sysdevs are used for defining suspend/resume
> callbacks to be executed with one CPU on-line and interrupts disabled.
> Greg and Kay may tell you more about the other problems with sysdevs. :-)
>
> Some subsystems need to carry out certain operations during suspend after
> we've disabled non-boot CPUs and interrupts have been switched off on the
> only on-line one. Currently, the only way to achieve that is to define
> sysdev suspend/resume callbacks, but this is cumbersome and inefficient.
> Namely, to do that, one has to define a sysdev class providing the callbacks
> and a sysdev actually using them, which is excessively complicated. Moreover,
> the sysdev suspend/resume callbacks take arguments that are not really used
> by the majority of subsystems defining sysdev suspend/resume callbacks
> (or even if they are used, they don't really _need_ to be used, so they
> are simply unnecessary). Of course, if a sysdev is only defined to provide
> suspend/resume (and maybe shutdown) callbacks, there's no real reason why
> it should show up in sysfs.
>
> For this reason, I thought it would be a good idea to provide a simpler
> interface for subsystems to define "very late" suspend callbacks and
> "very early" resume callbacks (and "very late" shutdown callbacks as well)
> without the entire bloat related to sysdevs. The interface is introduced
> by the first of the following patches, while the second patch converts some
> sysdev users related to the x86 architecture to using the new interface.
>
> I believe that call sysdev users who need to define suspend/resume/shutdown
> callbacks may be converted to using the interface provided by the first patch,
> which in turn should allow us to convert the remaining sysdev functionality
> into "normal" struct device interfaces. Still, even if that turns out to be
> too complicated, the bloat reduction resulting from the second patch kind of
> shows that moving at least some sysdev users to a simpler interface (like in
> the first patch) is a good idea anyway.
>
> This is a proof of concept, so the patches have not been tested. Please be
> extrememly careful, because they touch sensitive code, so to speak. In the
> majority of cases the changes are rather straightforward, but there are some
> more interesting cases as well (io_apic.c most importantly).
Since Greg likes the idea and there haven't been any objections so far, here's
the official submission. The patches have been tested on HP nx6325 and
Toshiba Portege R500.
Patch [1/8] is regareded as 2.6.38 material, following Greg's advice. The
other patches in the set are regarded as 2.6.39 material. The last one
obviously depends on all of the previous ones.
[1/8] - Introduce struct syscore_ops for registering operations to be run on
one CPU during suspend/resume/shutdown.
[2/8] - Convert sysdev users in arch/x86 to using struct syscore_ops.
[3/8] - Make ACPI use struct syscore_ops for irqrouter_resume().
[4/8] - Make timekeeping use struct syscore_ops for suspend/resume.
[5/8] - Make Intel IOMMU use struct syscore_ops for suspend/resume.
[6/8] - Make KVM use struct syscore_ops for suspend/resume.
[7/8] - Make cpufreq use struct syscore_ops for boot CPU suspend/resume.
[8/8] - Introduce config switch allowing architectures to skip sysdev
suspend/resume/shutdown code.
If there are no objectsions, I'd like to push these patches through the suspend
tree.
Thanks,
Rafael
On Saturday, March 12, 2011, Rafael J. Wysocki wrote:
> Hi,
>
> On Thursday, March 10, 2011, Rafael J. Wysocki wrote:
> > There are multiple problems with sysdevs, or struct sys_device objects to
> > be precise, that are so annoying that some people have started to think
> > of removind them entirely from the kernel. To me, personally, the most
> > obvious issue is the way sysdevs are used for defining suspend/resume
> > callbacks to be executed with one CPU on-line and interrupts disabled.
> > Greg and Kay may tell you more about the other problems with sysdevs. :-)
> >
> > Some subsystems need to carry out certain operations during suspend after
> > we've disabled non-boot CPUs and interrupts have been switched off on the
> > only on-line one. Currently, the only way to achieve that is to define
> > sysdev suspend/resume callbacks, but this is cumbersome and inefficient.
> > Namely, to do that, one has to define a sysdev class providing the callbacks
> > and a sysdev actually using them, which is excessively complicated. Moreover,
> > the sysdev suspend/resume callbacks take arguments that are not really used
> > by the majority of subsystems defining sysdev suspend/resume callbacks
> > (or even if they are used, they don't really _need_ to be used, so they
> > are simply unnecessary). Of course, if a sysdev is only defined to provide
> > suspend/resume (and maybe shutdown) callbacks, there's no real reason why
> > it should show up in sysfs.
> >
> > For this reason, I thought it would be a good idea to provide a simpler
> > interface for subsystems to define "very late" suspend callbacks and
> > "very early" resume callbacks (and "very late" shutdown callbacks as well)
> > without the entire bloat related to sysdevs. The interface is introduced
> > by the first of the following patches, while the second patch converts some
> > sysdev users related to the x86 architecture to using the new interface.
> >
> > I believe that call sysdev users who need to define suspend/resume/shutdown
> > callbacks may be converted to using the interface provided by the first patch,
> > which in turn should allow us to convert the remaining sysdev functionality
> > into "normal" struct device interfaces. Still, even if that turns out to be
> > too complicated, the bloat reduction resulting from the second patch kind of
> > shows that moving at least some sysdev users to a simpler interface (like in
> > the first patch) is a good idea anyway.
> >
> > This is a proof of concept, so the patches have not been tested. Please be
> > extrememly careful, because they touch sensitive code, so to speak. In the
> > majority of cases the changes are rather straightforward, but there are some
> > more interesting cases as well (io_apic.c most importantly).
>
> Since Greg likes the idea and there haven't been any objections so far, here's
> the official submission. The patches have been tested on HP nx6325 and
> Toshiba Portege R500.
>
> Patch [1/8] is regareded as 2.6.38 material, following Greg's advice. The
> other patches in the set are regarded as 2.6.39 material. The last one
> obviously depends on all of the previous ones.
>
> [1/8] - Introduce struct syscore_ops for registering operations to be run on
> one CPU during suspend/resume/shutdown.
>
> [2/8] - Convert sysdev users in arch/x86 to using struct syscore_ops.
>
> [3/8] - Make ACPI use struct syscore_ops for irqrouter_resume().
>
> [4/8] - Make timekeeping use struct syscore_ops for suspend/resume.
>
> [5/8] - Make Intel IOMMU use struct syscore_ops for suspend/resume.
>
> [6/8] - Make KVM use struct syscore_ops for suspend/resume.
>
> [7/8] - Make cpufreq use struct syscore_ops for boot CPU suspend/resume.
>
> [8/8] - Introduce config switch allowing architectures to skip sysdev
> suspend/resume/shutdown code.
>
> If there are no objectsions, I'd like to push these patches through the suspend
> tree.
A little followup with two ARM-related patches.
[9/10] - Make sh drivers use struct syscore_ops for suspend/resume (instead of
sysdevs).
[10/10] - Use struct syscore_ops for suspend/resume (instead of sysdevs) in
core ARM code.
Thanks,
Rafael
From: Rafael J. Wysocki <[email protected]>
Convert arch/arm/kernel/time.c and arch/arm/kernel/leds.c to using
struct syscore_ops for power management instead of sysdev classes
and sysdevs.
This simplifies the code in arch/arm/kernel/time.c quite a bit and is
necessary for removing sysdevs from the kernel entirely in future.
Signed-off-by: Rafael J. Wysocki <[email protected]>
---
arch/arm/include/asm/mach/time.h | 1 -
arch/arm/kernel/leds.c | 28 ++++++++++++++++------------
arch/arm/kernel/time.c | 33 +++++++++++----------------------
3 files changed, 27 insertions(+), 35 deletions(-)
Index: linux-2.6/arch/arm/kernel/time.c
===================================================================
--- linux-2.6.orig/arch/arm/kernel/time.c
+++ linux-2.6/arch/arm/kernel/time.c
@@ -21,7 +21,7 @@
#include <linux/timex.h>
#include <linux/errno.h>
#include <linux/profile.h>
-#include <linux/sysdev.h>
+#include <linux/syscore_ops.h>
#include <linux/timer.h>
#include <linux/irq.h>
@@ -117,48 +117,37 @@ void timer_tick(void)
#endif
#if defined(CONFIG_PM) && !defined(CONFIG_GENERIC_CLOCKEVENTS)
-static int timer_suspend(struct sys_device *dev, pm_message_t state)
+static int timer_suspend(void)
{
- struct sys_timer *timer = container_of(dev, struct sys_timer, dev);
-
- if (timer->suspend != NULL)
+ if (system_timer->suspend)
timer->suspend();
return 0;
}
-static int timer_resume(struct sys_device *dev)
+static void timer_resume(void)
{
- struct sys_timer *timer = container_of(dev, struct sys_timer, dev);
-
- if (timer->resume != NULL)
- timer->resume();
-
- return 0;
+ if (system_timer->resume)
+ system_timer->resume();
}
#else
#define timer_suspend NULL
#define timer_resume NULL
#endif
-static struct sysdev_class timer_sysclass = {
- .name = "timer",
+static struct syscore_ops timer_syscore_ops = {
.suspend = timer_suspend,
.resume = timer_resume,
};
-static int __init timer_init_sysfs(void)
+static int __init timer_init_syscore_ops(void)
{
- int ret = sysdev_class_register(&timer_sysclass);
- if (ret == 0) {
- system_timer->dev.cls = &timer_sysclass;
- ret = sysdev_register(&system_timer->dev);
- }
+ register_syscore_ops(&timer_syscore_ops);
- return ret;
+ return 0;
}
-device_initcall(timer_init_sysfs);
+device_initcall(timer_init_syscore_ops);
void __init time_init(void)
{
Index: linux-2.6/arch/arm/include/asm/mach/time.h
===================================================================
--- linux-2.6.orig/arch/arm/include/asm/mach/time.h
+++ linux-2.6/arch/arm/include/asm/mach/time.h
@@ -34,7 +34,6 @@
* timer interrupt which may be pending.
*/
struct sys_timer {
- struct sys_device dev;
void (*init)(void);
void (*suspend)(void);
void (*resume)(void);
Index: linux-2.6/arch/arm/kernel/leds.c
===================================================================
--- linux-2.6.orig/arch/arm/kernel/leds.c
+++ linux-2.6/arch/arm/kernel/leds.c
@@ -10,6 +10,7 @@
#include <linux/module.h>
#include <linux/init.h>
#include <linux/sysdev.h>
+#include <linux/syscore_ops.h>
#include <asm/leds.h>
@@ -69,36 +70,37 @@ static ssize_t leds_store(struct sys_dev
static SYSDEV_ATTR(event, 0200, NULL, leds_store);
-static int leds_suspend(struct sys_device *dev, pm_message_t state)
+static struct sysdev_class leds_sysclass = {
+ .name = "leds",
+};
+
+static struct sys_device leds_device = {
+ .id = 0,
+ .cls = &leds_sysclass,
+};
+
+static int leds_suspend(void)
{
leds_event(led_stop);
return 0;
}
-static int leds_resume(struct sys_device *dev)
+static void leds_resume(void)
{
leds_event(led_start);
- return 0;
}
-static int leds_shutdown(struct sys_device *dev)
+static void leds_shutdown(void)
{
leds_event(led_halted);
- return 0;
}
-static struct sysdev_class leds_sysclass = {
- .name = "leds",
+static struct syscore_ops leds_syscore_ops = {
.shutdown = leds_shutdown,
.suspend = leds_suspend,
.resume = leds_resume,
};
-static struct sys_device leds_device = {
- .id = 0,
- .cls = &leds_sysclass,
-};
-
static int __init leds_init(void)
{
int ret;
@@ -107,6 +109,8 @@ static int __init leds_init(void)
ret = sysdev_register(&leds_device);
if (ret == 0)
ret = sysdev_create_file(&leds_device, &attr_event);
+ if (ret == 0)
+ register_syscore_ops(&leds_syscore_ops);
return ret;
}
From: Rafael J. Wysocki <[email protected]>
Convert the SuperH clocks framework and shared interrupt handling
code to using struct syscore_ops instead of a sysdev classes and
sysdevs for power managment.
This reduces the code size significantly and simplifies it. The
optimizations causing things not to be restored after creating a
hibernation image are removed, but they might lead to undesirable
effects during resume from hibernation (e.g. the clocks would be left
as the boot kernel set them, which might be not the same way as the
hibernated kernel had seen them before the hibernation).
This also is necessary for removing sysdevs from the kernel entirely
in the future.
Signed-off-by: Rafael J. Wysocki <[email protected]>
---
drivers/sh/clk/core.c | 68 +++++++--------------------
drivers/sh/intc/core.c | 108 ++++++++++++++------------------------------
drivers/sh/intc/internals.h | 2
3 files changed, 53 insertions(+), 125 deletions(-)
Index: linux-2.6/drivers/sh/clk/core.c
===================================================================
--- linux-2.6.orig/drivers/sh/clk/core.c
+++ linux-2.6/drivers/sh/clk/core.c
@@ -21,7 +21,7 @@
#include <linux/module.h>
#include <linux/mutex.h>
#include <linux/list.h>
-#include <linux/sysdev.h>
+#include <linux/syscore_ops.h>
#include <linux/seq_file.h>
#include <linux/err.h>
#include <linux/io.h>
@@ -630,68 +630,36 @@ long clk_round_parent(struct clk *clk, u
EXPORT_SYMBOL_GPL(clk_round_parent);
#ifdef CONFIG_PM
-static int clks_sysdev_suspend(struct sys_device *dev, pm_message_t state)
+static void clks_core_resume(void)
{
- static pm_message_t prev_state;
struct clk *clkp;
- switch (state.event) {
- case PM_EVENT_ON:
- /* Resumeing from hibernation */
- if (prev_state.event != PM_EVENT_FREEZE)
- break;
-
- list_for_each_entry(clkp, &clock_list, node) {
- if (likely(clkp->ops)) {
- unsigned long rate = clkp->rate;
-
- if (likely(clkp->ops->set_parent))
- clkp->ops->set_parent(clkp,
- clkp->parent);
- if (likely(clkp->ops->set_rate))
- clkp->ops->set_rate(clkp, rate);
- else if (likely(clkp->ops->recalc))
- clkp->rate = clkp->ops->recalc(clkp);
- }
+ list_for_each_entry(clkp, &clock_list, node) {
+ if (likely(clkp->ops)) {
+ unsigned long rate = clkp->rate;
+
+ if (likely(clkp->ops->set_parent))
+ clkp->ops->set_parent(clkp,
+ clkp->parent);
+ if (likely(clkp->ops->set_rate))
+ clkp->ops->set_rate(clkp, rate);
+ else if (likely(clkp->ops->recalc))
+ clkp->rate = clkp->ops->recalc(clkp);
}
- break;
- case PM_EVENT_FREEZE:
- break;
- case PM_EVENT_SUSPEND:
- break;
}
-
- prev_state = state;
- return 0;
-}
-
-static int clks_sysdev_resume(struct sys_device *dev)
-{
- return clks_sysdev_suspend(dev, PMSG_ON);
}
-static struct sysdev_class clks_sysdev_class = {
- .name = "clks",
-};
-
-static struct sysdev_driver clks_sysdev_driver = {
- .suspend = clks_sysdev_suspend,
- .resume = clks_sysdev_resume,
-};
-
-static struct sys_device clks_sysdev_dev = {
- .cls = &clks_sysdev_class,
+static struct syscore_ops clks_syscore_ops = {
+ .resume = clks_core_resume,
};
-static int __init clk_sysdev_init(void)
+static int __init clk_syscore_init(void)
{
- sysdev_class_register(&clks_sysdev_class);
- sysdev_driver_register(&clks_sysdev_class, &clks_sysdev_driver);
- sysdev_register(&clks_sysdev_dev);
+ register_syscore_ops(&clks_syscore_ops);
return 0;
}
-subsys_initcall(clk_sysdev_init);
+subsys_initcall(clk_syscore_init);
#endif
/*
Index: linux-2.6/drivers/sh/intc/core.c
===================================================================
--- linux-2.6.orig/drivers/sh/intc/core.c
+++ linux-2.6/drivers/sh/intc/core.c
@@ -24,7 +24,7 @@
#include <linux/slab.h>
#include <linux/interrupt.h>
#include <linux/sh_intc.h>
-#include <linux/sysdev.h>
+#include <linux/syscore_ops.h>
#include <linux/list.h>
#include <linux/spinlock.h>
#include <linux/radix-tree.h>
@@ -376,108 +376,70 @@ err0:
return -ENOMEM;
}
-static ssize_t
-show_intc_name(struct sys_device *dev, struct sysdev_attribute *attr, char *buf)
+static int intc_suspend(void)
{
struct intc_desc_int *d;
- d = container_of(dev, struct intc_desc_int, sysdev);
+ list_for_each_entry(d, &intc_list, list) {
+ int irq;
- return sprintf(buf, "%s\n", d->chip.name);
-}
+ /* enable wakeup irqs belonging to this intc controller */
+ for_each_active_irq(irq) {
+ struct irq_data *data;
+ struct irq_desc *desc;
+ struct irq_chip *chip;
+
+ data = irq_get_irq_data(irq);
+ chip = irq_data_get_irq_chip(data);
+ if (chip != &d->chip)
+ continue;
+ desc = irq_to_desc(irq);
+ if ((desc->status & IRQ_WAKEUP))
+ chip->irq_enable(data);
+ }
+ }
-static SYSDEV_ATTR(name, S_IRUGO, show_intc_name, NULL);
+ return 0;
+}
-static int intc_suspend(struct sys_device *dev, pm_message_t state)
+static void intc_resume(void)
{
struct intc_desc_int *d;
- struct irq_data *data;
- struct irq_desc *desc;
- struct irq_chip *chip;
- int irq;
-
- /* get intc controller associated with this sysdev */
- d = container_of(dev, struct intc_desc_int, sysdev);
-
- switch (state.event) {
- case PM_EVENT_ON:
- if (d->state.event != PM_EVENT_FREEZE)
- break;
+
+ list_for_each_entry(d, &intc_list, list) {
+ int irq;
for_each_active_irq(irq) {
- desc = irq_to_desc(irq);
+ struct irq_data *data;
+ struct irq_desc *desc;
+ struct irq_chip *chip;
+
data = irq_get_irq_data(irq);
chip = irq_data_get_irq_chip(data);
-
/*
* This will catch the redirect and VIRQ cases
* due to the dummy_irq_chip being inserted.
*/
if (chip != &d->chip)
continue;
+ desc = irq_to_desc(irq);
if (desc->status & IRQ_DISABLED)
chip->irq_disable(data);
else
chip->irq_enable(data);
}
- break;
- case PM_EVENT_FREEZE:
- /* nothing has to be done */
- break;
- case PM_EVENT_SUSPEND:
- /* enable wakeup irqs belonging to this intc controller */
- for_each_active_irq(irq) {
- desc = irq_to_desc(irq);
- data = irq_get_irq_data(irq);
- chip = irq_data_get_irq_chip(data);
-
- if (chip != &d->chip)
- continue;
- if ((desc->status & IRQ_WAKEUP))
- chip->irq_enable(data);
- }
- break;
}
-
- d->state = state;
-
- return 0;
}
-static int intc_resume(struct sys_device *dev)
-{
- return intc_suspend(dev, PMSG_ON);
-}
-
-struct sysdev_class intc_sysdev_class = {
- .name = "intc",
+struct syscore_ops intc_syscore_ops = {
.suspend = intc_suspend,
.resume = intc_resume,
};
-/* register this intc as sysdev to allow suspend/resume */
-static int __init register_intc_sysdevs(void)
+static int __init intc_syscore_init(void)
{
- struct intc_desc_int *d;
- int error;
+ register_syscore_ops(&intc_syscore_ops);
- error = sysdev_class_register(&intc_sysdev_class);
- if (!error) {
- list_for_each_entry(d, &intc_list, list) {
- d->sysdev.id = d->index;
- d->sysdev.cls = &intc_sysdev_class;
- error = sysdev_register(&d->sysdev);
- if (error == 0)
- error = sysdev_create_file(&d->sysdev,
- &attr_name);
- if (error)
- break;
- }
- }
-
- if (error)
- pr_err("sysdev registration error\n");
-
- return error;
+ return 0;
}
-device_initcall(register_intc_sysdevs);
+device_initcall(intc_syscore_init);
Index: linux-2.6/drivers/sh/intc/internals.h
===================================================================
--- linux-2.6.orig/drivers/sh/intc/internals.h
+++ linux-2.6/drivers/sh/intc/internals.h
@@ -51,9 +51,7 @@ struct intc_subgroup_entry {
struct intc_desc_int {
struct list_head list;
- struct sys_device sysdev;
struct radix_tree_root tree;
- pm_message_t state;
raw_spinlock_t lock;
unsigned int index;
unsigned long *reg;
On Sat, 12 Mar 2011, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <[email protected]>
> ===================================================================
> --- linux-2.6.orig/arch/x86/Kconfig
> +++ linux-2.6/arch/x86/Kconfig
> @@ -184,6 +184,9 @@ config ARCH_HIBERNATION_POSSIBLE
> config ARCH_SUSPEND_POSSIBLE
> def_bool y
>
> +config ARCH_NO_SYSDEV_OPS
> + def_bool y
> +
Can we please put that in drivers/base/Kconfig and let the arch
Kconfig select it?
Thanks,
tglx
On Sat, 12 Mar 2011, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <[email protected]>
>
> The timekeeping subsystem uses a sysdev class and a sysdev for
> executing timekeeping_suspend() after interrupts have been turned off
> on the boot CPU (during system suspend) and for executing
> timekeeping_resume() before turning on interrupts on the boot CPU
> (during system resume). However, since both of these functions
> ignore their arguments, the entire mechanism may be replaced with a
> struct syscore_ops object which is simpler.
>
> Signed-off-by: Rafael J. Wysocki <[email protected]>
Reviewed-by: Thomas Gleixner <[email protected]>
On Sat, 12 Mar 2011, Rafael J. Wysocki wrote:
> -static int lapic_resume(struct sys_device *dev)
> +static void lapic_resume(void)
> {
> unsigned int l, h;
> unsigned long flags;
> @@ -2083,7 +2083,7 @@ static int lapic_resume(struct sys_devic
> struct IO_APIC_route_entry **ioapic_entries = NULL;
>
> if (!apic_pm_state.active)
> - return 0;
> + return;
>
> local_irq_save(flags);
That want's the following on top:
@@ -2079,8 +2079,7 @@ static void lapic_resume(void)
{
unsigned int l, h;
unsigned long flags;
- int maxlvt;
- int ret = 0;
+ int maxlvt, ret;
struct IO_APIC_route_entry **ioapic_entries = NULL;
if (!apic_pm_state.active)
@@ -2091,7 +2090,6 @@ static void lapic_resume(void)
ioapic_entries = alloc_ioapic_entries();
if (!ioapic_entries) {
WARN(1, "Alloc ioapic_entries in lapic resume failed.");
- ret = -ENOMEM;
goto restore;
}
Otherwise, Reviewed-by: Thomas Gleixner <[email protected]>
Thanks,
tglx
On Sunday, March 13, 2011, Thomas Gleixner wrote:
> On Sat, 12 Mar 2011, Rafael J. Wysocki wrote:
> > -static int lapic_resume(struct sys_device *dev)
> > +static void lapic_resume(void)
> > {
> > unsigned int l, h;
> > unsigned long flags;
> > @@ -2083,7 +2083,7 @@ static int lapic_resume(struct sys_devic
> > struct IO_APIC_route_entry **ioapic_entries = NULL;
> >
> > if (!apic_pm_state.active)
> > - return 0;
> > + return;
> >
> > local_irq_save(flags);
>
> That want's the following on top:
>
> @@ -2079,8 +2079,7 @@ static void lapic_resume(void)
> {
> unsigned int l, h;
> unsigned long flags;
> - int maxlvt;
> - int ret = 0;
> + int maxlvt, ret;
> struct IO_APIC_route_entry **ioapic_entries = NULL;
>
> if (!apic_pm_state.active)
> @@ -2091,7 +2090,6 @@ static void lapic_resume(void)
> ioapic_entries = alloc_ioapic_entries();
> if (!ioapic_entries) {
> WARN(1, "Alloc ioapic_entries in lapic resume failed.");
> - ret = -ENOMEM;
> goto restore;
> }
Right, I'll fold it into the final version of the patch.
> Otherwise, Reviewed-by: Thomas Gleixner <[email protected]>
Thanks!
Rafael
On Sunday, March 13, 2011, Thomas Gleixner wrote:
> On Sat, 12 Mar 2011, Rafael J. Wysocki wrote:
> > From: Rafael J. Wysocki <[email protected]>
> >
> > The timekeeping subsystem uses a sysdev class and a sysdev for
> > executing timekeeping_suspend() after interrupts have been turned off
> > on the boot CPU (during system suspend) and for executing
> > timekeeping_resume() before turning on interrupts on the boot CPU
> > (during system resume). However, since both of these functions
> > ignore their arguments, the entire mechanism may be replaced with a
> > struct syscore_ops object which is simpler.
> >
> > Signed-off-by: Rafael J. Wysocki <[email protected]>
>
> Reviewed-by: Thomas Gleixner <[email protected]>
Thanks!
Rafael
On Sunday, March 13, 2011, Thomas Gleixner wrote:
> On Sat, 12 Mar 2011, Rafael J. Wysocki wrote:
> > From: Rafael J. Wysocki <[email protected]>
> > ===================================================================
> > --- linux-2.6.orig/arch/x86/Kconfig
> > +++ linux-2.6/arch/x86/Kconfig
> > @@ -184,6 +184,9 @@ config ARCH_HIBERNATION_POSSIBLE
> > config ARCH_SUSPEND_POSSIBLE
> > def_bool y
> >
> > +config ARCH_NO_SYSDEV_OPS
> > + def_bool y
> > +
>
> Can we please put that in drivers/base/Kconfig and let the arch
> Kconfig select it?
Sure. Updated patch follows.
Thanks,
Rafael
---
From: Rafael J. Wysocki <[email protected]>
Subject: Introduce ARCH_NO_SYSDEV_OPS config option (v2)
Introduce Kconfig option allowing architectures where sysdev
operations used during system suspend, resume and shutdown have been
completely replaced with struct sycore_ops operations to avoid
building sysdev code that will never be used.
Make callbacks in struct sys_device and struct sysdev_driver depend
on ARCH_NO_SYSDEV_OPS to allows us to verify if all of the references
have been actually removed from the code the given architecture
depends on.
Make x86 select ARCH_NO_SYSDEV_OPS.
Signed-off-by: Rafael J. Wysocki <[email protected]>
---
arch/x86/Kconfig | 1 +
drivers/base/Kconfig | 6 ++++++
drivers/base/sys.c | 3 ++-
include/linux/device.h | 4 ++++
include/linux/pm.h | 10 ++++++++--
include/linux/sysdev.h | 7 +++++--
6 files changed, 26 insertions(+), 5 deletions(-)
Index: linux-2.6/include/linux/pm.h
===================================================================
--- linux-2.6.orig/include/linux/pm.h
+++ linux-2.6/include/linux/pm.h
@@ -529,13 +529,19 @@ struct dev_power_domain {
*/
#ifdef CONFIG_PM_SLEEP
-extern void device_pm_lock(void);
+#ifndef CONFIG_ARCH_NO_SYSDEV_OPS
+extern int sysdev_suspend(pm_message_t state);
extern int sysdev_resume(void);
+#else
+static inline int sysdev_suspend(pm_message_t state) { return 0; }
+static inline int sysdev_resume(void) { return 0; }
+#endif
+
+extern void device_pm_lock(void);
extern void dpm_resume_noirq(pm_message_t state);
extern void dpm_resume_end(pm_message_t state);
extern void device_pm_unlock(void);
-extern int sysdev_suspend(pm_message_t state);
extern int dpm_suspend_noirq(pm_message_t state);
extern int dpm_suspend_start(pm_message_t state);
Index: linux-2.6/include/linux/sysdev.h
===================================================================
--- linux-2.6.orig/include/linux/sysdev.h
+++ linux-2.6/include/linux/sysdev.h
@@ -33,12 +33,13 @@ struct sysdev_class {
const char *name;
struct list_head drivers;
struct sysdev_class_attribute **attrs;
-
+ struct kset kset;
+#ifndef CONFIG_ARCH_NO_SYSDEV_OPS
/* Default operations for these types of devices */
int (*shutdown)(struct sys_device *);
int (*suspend)(struct sys_device *, pm_message_t state);
int (*resume)(struct sys_device *);
- struct kset kset;
+#endif
};
struct sysdev_class_attribute {
@@ -76,9 +77,11 @@ struct sysdev_driver {
struct list_head entry;
int (*add)(struct sys_device *);
int (*remove)(struct sys_device *);
+#ifndef CONFIG_ARCH_NO_SYSDEV_OPS
int (*shutdown)(struct sys_device *);
int (*suspend)(struct sys_device *, pm_message_t state);
int (*resume)(struct sys_device *);
+#endif
};
Index: linux-2.6/include/linux/device.h
===================================================================
--- linux-2.6.orig/include/linux/device.h
+++ linux-2.6/include/linux/device.h
@@ -635,8 +635,12 @@ static inline int devtmpfs_mount(const c
/* drivers/base/power/shutdown.c */
extern void device_shutdown(void);
+#ifndef CONFIG_ARCH_NO_SYSDEV_OPS
/* drivers/base/sys.c */
extern void sysdev_shutdown(void);
+#else
+static inline void sysdev_shutdown(void) { }
+#endif
/* debugging and troubleshooting/diagnostic helpers. */
extern const char *dev_driver_string(const struct device *dev);
Index: linux-2.6/arch/x86/Kconfig
===================================================================
--- linux-2.6.orig/arch/x86/Kconfig
+++ linux-2.6/arch/x86/Kconfig
@@ -67,6 +67,7 @@ config X86
select GENERIC_IRQ_PROBE
select GENERIC_PENDING_IRQ if SMP
select USE_GENERIC_SMP_HELPERS if SMP
+ select ARCH_NO_SYSDEV_OPS
config INSTRUCTION_DECODER
def_bool (KPROBES || PERF_EVENTS)
Index: linux-2.6/drivers/base/sys.c
===================================================================
--- linux-2.6.orig/drivers/base/sys.c
+++ linux-2.6/drivers/base/sys.c
@@ -302,7 +302,7 @@ void sysdev_unregister(struct sys_device
}
-
+#ifndef CONFIG_ARCH_NO_SYSDEV_OPS
/**
* sysdev_shutdown - Shut down all system devices.
*
@@ -497,6 +497,7 @@ int sysdev_resume(void)
return 0;
}
EXPORT_SYMBOL_GPL(sysdev_resume);
+#endif /* CONFIG_ARCH_NO_SYSDEV_OPS */
int __init system_bus_init(void)
{
Index: linux-2.6/drivers/base/Kconfig
===================================================================
--- linux-2.6.orig/drivers/base/Kconfig
+++ linux-2.6/drivers/base/Kconfig
@@ -168,4 +168,10 @@ config SYS_HYPERVISOR
bool
default n
+config ARCH_NO_SYSDEV_OPS
+ bool
+ ---help---
+ To be set by architectures that don't use sysdev or sysdev driver
+ power management (suspend/resume) and shutdown operations.
+
endmenu
On 3/13/2011 6:04 AM, Rafael J. Wysocki wrote:
> #if defined(CONFIG_PM)&& !defined(CONFIG_GENERIC_CLOCKEVENTS)
> -static int timer_suspend(struct sys_device *dev, pm_message_t state)
> +static int timer_suspend(void)
> {
> - struct sys_timer *timer = container_of(dev, struct sys_timer, dev);
> -
> - if (timer->suspend != NULL)
> + if (system_timer->suspend)
> timer->suspend();
>
Shouldn't this be system_timer->suspend() ?
--
Sent by an employee of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.
On Monday, March 14, 2011, Stephen Boyd wrote:
> On 3/13/2011 6:04 AM, Rafael J. Wysocki wrote:
> > #if defined(CONFIG_PM)&& !defined(CONFIG_GENERIC_CLOCKEVENTS)
> > -static int timer_suspend(struct sys_device *dev, pm_message_t state)
> > +static int timer_suspend(void)
> > {
> > - struct sys_timer *timer = container_of(dev, struct sys_timer, dev);
> > -
> > - if (timer->suspend != NULL)
> > + if (system_timer->suspend)
> > timer->suspend();
> >
>
> Shouldn't this be system_timer->suspend() ?
Indeed, thanks a lot!
I must have forgotten to test it with CONFIG_GENERIC_CLOCKEVENTS unset.
Updated patch follows.
---
From: Rafael J. Wysocki <[email protected]>
Subject: ARM: Use struct syscore_ops instead of sysdevs for PM in timer and leds
Convert arch/arm/kernel/time.c and arch/arm/kernel/leds.c to using
struct syscore_ops for power management instead of sysdev classes
and sysdevs.
This simplifies the code in arch/arm/kernel/time.c quite a bit and is
necessary for removing sysdevs from the kernel entirely in future.
Signed-off-by: Rafael J. Wysocki <[email protected]>
---
arch/arm/include/asm/mach/time.h | 1 -
arch/arm/kernel/leds.c | 28 ++++++++++++++++------------
arch/arm/kernel/time.c | 35 ++++++++++++-----------------------
3 files changed, 28 insertions(+), 36 deletions(-)
Index: linux-2.6/arch/arm/kernel/time.c
===================================================================
--- linux-2.6.orig/arch/arm/kernel/time.c
+++ linux-2.6/arch/arm/kernel/time.c
@@ -21,7 +21,7 @@
#include <linux/timex.h>
#include <linux/errno.h>
#include <linux/profile.h>
-#include <linux/sysdev.h>
+#include <linux/syscore_ops.h>
#include <linux/timer.h>
#include <linux/irq.h>
@@ -117,48 +117,37 @@ void timer_tick(void)
#endif
#if defined(CONFIG_PM) && !defined(CONFIG_GENERIC_CLOCKEVENTS)
-static int timer_suspend(struct sys_device *dev, pm_message_t state)
+static int timer_suspend(void)
{
- struct sys_timer *timer = container_of(dev, struct sys_timer, dev);
-
- if (timer->suspend != NULL)
- timer->suspend();
+ if (system_timer->suspend)
+ system_timer->suspend();
return 0;
}
-static int timer_resume(struct sys_device *dev)
+static void timer_resume(void)
{
- struct sys_timer *timer = container_of(dev, struct sys_timer, dev);
-
- if (timer->resume != NULL)
- timer->resume();
-
- return 0;
+ if (system_timer->resume)
+ system_timer->resume();
}
#else
#define timer_suspend NULL
#define timer_resume NULL
#endif
-static struct sysdev_class timer_sysclass = {
- .name = "timer",
+static struct syscore_ops timer_syscore_ops = {
.suspend = timer_suspend,
.resume = timer_resume,
};
-static int __init timer_init_sysfs(void)
+static int __init timer_init_syscore_ops(void)
{
- int ret = sysdev_class_register(&timer_sysclass);
- if (ret == 0) {
- system_timer->dev.cls = &timer_sysclass;
- ret = sysdev_register(&system_timer->dev);
- }
+ register_syscore_ops(&timer_syscore_ops);
- return ret;
+ return 0;
}
-device_initcall(timer_init_sysfs);
+device_initcall(timer_init_syscore_ops);
void __init time_init(void)
{
Index: linux-2.6/arch/arm/include/asm/mach/time.h
===================================================================
--- linux-2.6.orig/arch/arm/include/asm/mach/time.h
+++ linux-2.6/arch/arm/include/asm/mach/time.h
@@ -34,7 +34,6 @@
* timer interrupt which may be pending.
*/
struct sys_timer {
- struct sys_device dev;
void (*init)(void);
void (*suspend)(void);
void (*resume)(void);
Index: linux-2.6/arch/arm/kernel/leds.c
===================================================================
--- linux-2.6.orig/arch/arm/kernel/leds.c
+++ linux-2.6/arch/arm/kernel/leds.c
@@ -10,6 +10,7 @@
#include <linux/module.h>
#include <linux/init.h>
#include <linux/sysdev.h>
+#include <linux/syscore_ops.h>
#include <asm/leds.h>
@@ -69,36 +70,37 @@ static ssize_t leds_store(struct sys_dev
static SYSDEV_ATTR(event, 0200, NULL, leds_store);
-static int leds_suspend(struct sys_device *dev, pm_message_t state)
+static struct sysdev_class leds_sysclass = {
+ .name = "leds",
+};
+
+static struct sys_device leds_device = {
+ .id = 0,
+ .cls = &leds_sysclass,
+};
+
+static int leds_suspend(void)
{
leds_event(led_stop);
return 0;
}
-static int leds_resume(struct sys_device *dev)
+static void leds_resume(void)
{
leds_event(led_start);
- return 0;
}
-static int leds_shutdown(struct sys_device *dev)
+static void leds_shutdown(void)
{
leds_event(led_halted);
- return 0;
}
-static struct sysdev_class leds_sysclass = {
- .name = "leds",
+static struct syscore_ops leds_syscore_ops = {
.shutdown = leds_shutdown,
.suspend = leds_suspend,
.resume = leds_resume,
};
-static struct sys_device leds_device = {
- .id = 0,
- .cls = &leds_sysclass,
-};
-
static int __init leds_init(void)
{
int ret;
@@ -107,6 +109,8 @@ static int __init leds_init(void)
ret = sysdev_register(&leds_device);
if (ret == 0)
ret = sysdev_create_file(&leds_device, &attr_event);
+ if (ret == 0)
+ register_syscore_ops(&leds_syscore_ops);
return ret;
}
On Saturday, March 12, 2011, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <[email protected]>
>
> The cpufreq subsystem uses sysdev suspend and resume for
> executing cpufreq_suspend() and cpufreq_resume(), respectively,
> during system suspend, after interrupts have been switched off on the
> boot CPU, and during system resume, while interrupts are still off on
> the boot CPU. In both cases the other CPUs are off-line at the
> relevant point (either they have been switched off via CPU hotplug
> during suspend, or they haven't been switched on yet during resume).
> For this reason, although it may seem that cpufreq_suspend() and
> cpufreq_resume() are executed for all CPUs in the system, they are
> only called for the boot CPU in fact, which is quite confusing.
>
> To remove the confusion and to prepare for elimiating sysdev
> suspend and resume operations from the kernel enirely, convernt
> cpufreq to using a struct syscore_ops object for the boot CPU
> suspend and resume and rename the callbacks so that their names
> reflect their purpose. In addition, put some explanatory remarks
> into their kerneldoc comments.
I had to modify this patch, because the previous version had problems with
systems that don't support cpufreq. The updated patch (which is appended)
also simplifies the code even further.
Thanks,
Rafael
---
From: Rafael J. Wysocki <[email protected]>
Subject: cpufreq: Use syscore_ops for boot CPU suspend/resume (v2)
The cpufreq subsystem uses sysdev suspend and resume for
executing cpufreq_suspend() and cpufreq_resume(), respectively,
during system suspend, after interrupts have been switched off on the
boot CPU, and during system resume, while interrupts are still off on
the boot CPU. In both cases the other CPUs are off-line at the
relevant point (either they have been switched off via CPU hotplug
during suspend, or they haven't been switched on yet during resume).
For this reason, although it may seem that cpufreq_suspend() and
cpufreq_resume() are executed for all CPUs in the system, they are
only called for the boot CPU in fact, which is quite confusing.
To remove the confusion and to prepare for elimiating sysdev
suspend and resume operations from the kernel enirely, convernt
cpufreq to using a struct syscore_ops object for the boot CPU
suspend and resume and rename the callbacks so that their names
reflect their purpose. In addition, put some explanatory remarks
into their kerneldoc comments.
Signed-off-by: Rafael J. Wysocki <[email protected]>
---
drivers/cpufreq/cpufreq.c | 66 ++++++++++++++++++----------------------------
1 file changed, 26 insertions(+), 40 deletions(-)
Index: linux-2.6/drivers/cpufreq/cpufreq.c
===================================================================
--- linux-2.6.orig/drivers/cpufreq/cpufreq.c
+++ linux-2.6/drivers/cpufreq/cpufreq.c
@@ -28,6 +28,7 @@
#include <linux/cpu.h>
#include <linux/completion.h>
#include <linux/mutex.h>
+#include <linux/syscore_ops.h>
#include <trace/events/power.h>
@@ -1340,35 +1341,31 @@ out:
}
EXPORT_SYMBOL(cpufreq_get);
+static struct sysdev_driver cpufreq_sysdev_driver = {
+ .add = cpufreq_add_dev,
+ .remove = cpufreq_remove_dev,
+};
+
/**
- * cpufreq_suspend - let the low level driver prepare for suspend
+ * cpufreq_bp_suspend - Prepare the boot CPU for system suspend.
+ *
+ * This function is only executed for the boot processor. The other CPUs
+ * have been put offline by means of CPU hotplug.
*/
-
-static int cpufreq_suspend(struct sys_device *sysdev, pm_message_t pmsg)
+static int cpufreq_bp_suspend(void)
{
int ret = 0;
- int cpu = sysdev->id;
+ int cpu = smp_processor_id();
struct cpufreq_policy *cpu_policy;
dprintk("suspending cpu %u\n", cpu);
- if (!cpu_online(cpu))
- return 0;
-
- /* we may be lax here as interrupts are off. Nonetheless
- * we need to grab the correct cpu policy, as to check
- * whether we really run on this CPU.
- */
-
+ /* If there's no policy for the boot CPU, we have nothing to do. */
cpu_policy = cpufreq_cpu_get(cpu);
if (!cpu_policy)
- return -EINVAL;
-
- /* only handle each CPU group once */
- if (unlikely(cpu_policy->cpu != cpu))
- goto out;
+ return 0;
if (cpufreq_driver->suspend) {
ret = cpufreq_driver->suspend(cpu_policy);
@@ -1377,13 +1374,12 @@ static int cpufreq_suspend(struct sys_de
"step on CPU %u\n", cpu_policy->cpu);
}
-out:
cpufreq_cpu_put(cpu_policy);
return ret;
}
/**
- * cpufreq_resume - restore proper CPU frequency handling after resume
+ * cpufreq_bp_resume - Restore proper frequency handling of the boot CPU.
*
* 1.) resume CPUfreq hardware support (cpufreq_driver->resume())
* 2.) schedule call cpufreq_update_policy() ASAP as interrupts are
@@ -1391,31 +1387,23 @@ out:
* what we believe it to be. This is a bit later than when it
* should be, but nonethteless it's better than calling
* cpufreq_driver->get() here which might re-enable interrupts...
+ *
+ * This function is only executed for the boot CPU. The other CPUs have not
+ * been turned on yet.
*/
-static int cpufreq_resume(struct sys_device *sysdev)
+static void cpufreq_bp_resume(void)
{
int ret = 0;
- int cpu = sysdev->id;
+ int cpu = smp_processor_id();
struct cpufreq_policy *cpu_policy;
dprintk("resuming cpu %u\n", cpu);
- if (!cpu_online(cpu))
- return 0;
-
- /* we may be lax here as interrupts are off. Nonetheless
- * we need to grab the correct cpu policy, as to check
- * whether we really run on this CPU.
- */
-
+ /* If there's no policy for the boot CPU, we have nothing to do. */
cpu_policy = cpufreq_cpu_get(cpu);
if (!cpu_policy)
- return -EINVAL;
-
- /* only handle each CPU group once */
- if (unlikely(cpu_policy->cpu != cpu))
- goto fail;
+ return;
if (cpufreq_driver->resume) {
ret = cpufreq_driver->resume(cpu_policy);
@@ -1430,14 +1418,11 @@ static int cpufreq_resume(struct sys_dev
fail:
cpufreq_cpu_put(cpu_policy);
- return ret;
}
-static struct sysdev_driver cpufreq_sysdev_driver = {
- .add = cpufreq_add_dev,
- .remove = cpufreq_remove_dev,
- .suspend = cpufreq_suspend,
- .resume = cpufreq_resume,
+static struct syscore_ops cpufreq_syscore_ops = {
+ .suspend = cpufreq_bp_suspend,
+ .resume = cpufreq_bp_resume,
};
@@ -2002,6 +1987,7 @@ static int __init cpufreq_core_init(void
cpufreq_global_kobject = kobject_create_and_add("cpufreq",
&cpu_sysdev_class.kset.kobj);
BUG_ON(!cpufreq_global_kobject);
+ register_syscore_ops(&cpufreq_syscore_ops);
return 0;
}
On Sun, Mar 13, 2011 at 02:03:49PM +0100, R. J. Wysocki wrote:
> From: Rafael J. Wysocki <[email protected]>
>
> Convert the SuperH clocks framework and shared interrupt handling
> code to using struct syscore_ops instead of a sysdev classes and
> sysdevs for power managment.
>
> This reduces the code size significantly and simplifies it. The
> optimizations causing things not to be restored after creating a
> hibernation image are removed, but they might lead to undesirable
> effects during resume from hibernation (e.g. the clocks would be left
> as the boot kernel set them, which might be not the same way as the
> hibernated kernel had seen them before the hibernation).
>
> This also is necessary for removing sysdevs from the kernel entirely
> in the future.
>
> Signed-off-by: Rafael J. Wysocki <[email protected]>
This misses the use of the sysdev class by the userimask code, though I'm
open to suggestions for alternatives.
applied
thanks,
Len Brown, Intel Open Source Technology Center
On Thursday, March 17, 2011, Paul Mundt wrote:
> On Sun, Mar 13, 2011 at 02:03:49PM +0100, R. J. Wysocki wrote:
> > From: Rafael J. Wysocki <[email protected]>
> >
> > Convert the SuperH clocks framework and shared interrupt handling
> > code to using struct syscore_ops instead of a sysdev classes and
> > sysdevs for power managment.
> >
> > This reduces the code size significantly and simplifies it. The
> > optimizations causing things not to be restored after creating a
> > hibernation image are removed, but they might lead to undesirable
> > effects during resume from hibernation (e.g. the clocks would be left
> > as the boot kernel set them, which might be not the same way as the
> > hibernated kernel had seen them before the hibernation).
> >
> > This also is necessary for removing sysdevs from the kernel entirely
> > in the future.
> >
> > Signed-off-by: Rafael J. Wysocki <[email protected]>
>
> This misses the use of the sysdev class by the userimask code, though I'm
> open to suggestions for alternatives.
For now, I'd simply move the sysdev class definition to userimask.c, like
in the patch below. The current goal is to eliminate the suspend/resume and
shutdown operations from sysdevs (and sysdev drivers), the next step will
be to replace the remaining sysdevs with alternative mechanisms.
Thanks,
Rafael
---
From: Rafael J. Wysocki <[email protected]>
Subject: Drivers / sh: Use struct syscore_ops instead of sysdev class and sysdev (v2)
Convert the SuperH clocks framework and shared interrupt handling
code to using struct syscore_ops instead of a sysdev classes and
sysdevs for power managment.
This reduces the code size significantly and simplifies it. The
optimizations causing things not to be restored after creating a
hibernation image are removed, but they might lead to undesirable
effects during resume from hibernation (e.g. the clocks would be left
as the boot kernel set them, which might be not the same way as the
hibernated kernel had seen them before the hibernation).
This also is necessary for removing sysdevs from the kernel entirely
in the future.
Signed-off-by: Rafael J. Wysocki <[email protected]>
---
drivers/sh/clk/core.c | 68 +++++++--------------------
drivers/sh/intc/core.c | 108 ++++++++++++++------------------------------
drivers/sh/intc/internals.h | 5 --
drivers/sh/intc/userimask.c | 9 +++
4 files changed, 63 insertions(+), 127 deletions(-)
Index: linux-2.6/drivers/sh/clk/core.c
===================================================================
--- linux-2.6.orig/drivers/sh/clk/core.c
+++ linux-2.6/drivers/sh/clk/core.c
@@ -21,7 +21,7 @@
#include <linux/module.h>
#include <linux/mutex.h>
#include <linux/list.h>
-#include <linux/sysdev.h>
+#include <linux/syscore_ops.h>
#include <linux/seq_file.h>
#include <linux/err.h>
#include <linux/io.h>
@@ -630,68 +630,36 @@ long clk_round_parent(struct clk *clk, u
EXPORT_SYMBOL_GPL(clk_round_parent);
#ifdef CONFIG_PM
-static int clks_sysdev_suspend(struct sys_device *dev, pm_message_t state)
+static void clks_core_resume(void)
{
- static pm_message_t prev_state;
struct clk *clkp;
- switch (state.event) {
- case PM_EVENT_ON:
- /* Resumeing from hibernation */
- if (prev_state.event != PM_EVENT_FREEZE)
- break;
-
- list_for_each_entry(clkp, &clock_list, node) {
- if (likely(clkp->ops)) {
- unsigned long rate = clkp->rate;
-
- if (likely(clkp->ops->set_parent))
- clkp->ops->set_parent(clkp,
- clkp->parent);
- if (likely(clkp->ops->set_rate))
- clkp->ops->set_rate(clkp, rate);
- else if (likely(clkp->ops->recalc))
- clkp->rate = clkp->ops->recalc(clkp);
- }
+ list_for_each_entry(clkp, &clock_list, node) {
+ if (likely(clkp->ops)) {
+ unsigned long rate = clkp->rate;
+
+ if (likely(clkp->ops->set_parent))
+ clkp->ops->set_parent(clkp,
+ clkp->parent);
+ if (likely(clkp->ops->set_rate))
+ clkp->ops->set_rate(clkp, rate);
+ else if (likely(clkp->ops->recalc))
+ clkp->rate = clkp->ops->recalc(clkp);
}
- break;
- case PM_EVENT_FREEZE:
- break;
- case PM_EVENT_SUSPEND:
- break;
}
-
- prev_state = state;
- return 0;
-}
-
-static int clks_sysdev_resume(struct sys_device *dev)
-{
- return clks_sysdev_suspend(dev, PMSG_ON);
}
-static struct sysdev_class clks_sysdev_class = {
- .name = "clks",
-};
-
-static struct sysdev_driver clks_sysdev_driver = {
- .suspend = clks_sysdev_suspend,
- .resume = clks_sysdev_resume,
-};
-
-static struct sys_device clks_sysdev_dev = {
- .cls = &clks_sysdev_class,
+static struct syscore_ops clks_syscore_ops = {
+ .resume = clks_core_resume,
};
-static int __init clk_sysdev_init(void)
+static int __init clk_syscore_init(void)
{
- sysdev_class_register(&clks_sysdev_class);
- sysdev_driver_register(&clks_sysdev_class, &clks_sysdev_driver);
- sysdev_register(&clks_sysdev_dev);
+ register_syscore_ops(&clks_syscore_ops);
return 0;
}
-subsys_initcall(clk_sysdev_init);
+subsys_initcall(clk_syscore_init);
#endif
/*
Index: linux-2.6/drivers/sh/intc/core.c
===================================================================
--- linux-2.6.orig/drivers/sh/intc/core.c
+++ linux-2.6/drivers/sh/intc/core.c
@@ -24,7 +24,7 @@
#include <linux/slab.h>
#include <linux/interrupt.h>
#include <linux/sh_intc.h>
-#include <linux/sysdev.h>
+#include <linux/syscore_ops.h>
#include <linux/list.h>
#include <linux/spinlock.h>
#include <linux/radix-tree.h>
@@ -376,108 +376,70 @@ err0:
return -ENOMEM;
}
-static ssize_t
-show_intc_name(struct sys_device *dev, struct sysdev_attribute *attr, char *buf)
+static int intc_suspend(void)
{
struct intc_desc_int *d;
- d = container_of(dev, struct intc_desc_int, sysdev);
+ list_for_each_entry(d, &intc_list, list) {
+ int irq;
- return sprintf(buf, "%s\n", d->chip.name);
-}
+ /* enable wakeup irqs belonging to this intc controller */
+ for_each_active_irq(irq) {
+ struct irq_data *data;
+ struct irq_desc *desc;
+ struct irq_chip *chip;
+
+ data = irq_get_irq_data(irq);
+ chip = irq_data_get_irq_chip(data);
+ if (chip != &d->chip)
+ continue;
+ desc = irq_to_desc(irq);
+ if ((desc->status & IRQ_WAKEUP))
+ chip->irq_enable(data);
+ }
+ }
-static SYSDEV_ATTR(name, S_IRUGO, show_intc_name, NULL);
+ return 0;
+}
-static int intc_suspend(struct sys_device *dev, pm_message_t state)
+static void intc_resume(void)
{
struct intc_desc_int *d;
- struct irq_data *data;
- struct irq_desc *desc;
- struct irq_chip *chip;
- int irq;
-
- /* get intc controller associated with this sysdev */
- d = container_of(dev, struct intc_desc_int, sysdev);
-
- switch (state.event) {
- case PM_EVENT_ON:
- if (d->state.event != PM_EVENT_FREEZE)
- break;
+
+ list_for_each_entry(d, &intc_list, list) {
+ int irq;
for_each_active_irq(irq) {
- desc = irq_to_desc(irq);
+ struct irq_data *data;
+ struct irq_desc *desc;
+ struct irq_chip *chip;
+
data = irq_get_irq_data(irq);
chip = irq_data_get_irq_chip(data);
-
/*
* This will catch the redirect and VIRQ cases
* due to the dummy_irq_chip being inserted.
*/
if (chip != &d->chip)
continue;
+ desc = irq_to_desc(irq);
if (desc->status & IRQ_DISABLED)
chip->irq_disable(data);
else
chip->irq_enable(data);
}
- break;
- case PM_EVENT_FREEZE:
- /* nothing has to be done */
- break;
- case PM_EVENT_SUSPEND:
- /* enable wakeup irqs belonging to this intc controller */
- for_each_active_irq(irq) {
- desc = irq_to_desc(irq);
- data = irq_get_irq_data(irq);
- chip = irq_data_get_irq_chip(data);
-
- if (chip != &d->chip)
- continue;
- if ((desc->status & IRQ_WAKEUP))
- chip->irq_enable(data);
- }
- break;
}
-
- d->state = state;
-
- return 0;
}
-static int intc_resume(struct sys_device *dev)
-{
- return intc_suspend(dev, PMSG_ON);
-}
-
-struct sysdev_class intc_sysdev_class = {
- .name = "intc",
+struct syscore_ops intc_syscore_ops = {
.suspend = intc_suspend,
.resume = intc_resume,
};
-/* register this intc as sysdev to allow suspend/resume */
-static int __init register_intc_sysdevs(void)
+static int __init intc_syscore_init(void)
{
- struct intc_desc_int *d;
- int error;
+ register_syscore_ops(&intc_syscore_ops);
- error = sysdev_class_register(&intc_sysdev_class);
- if (!error) {
- list_for_each_entry(d, &intc_list, list) {
- d->sysdev.id = d->index;
- d->sysdev.cls = &intc_sysdev_class;
- error = sysdev_register(&d->sysdev);
- if (error == 0)
- error = sysdev_create_file(&d->sysdev,
- &attr_name);
- if (error)
- break;
- }
- }
-
- if (error)
- pr_err("sysdev registration error\n");
-
- return error;
+ return 0;
}
-device_initcall(register_intc_sysdevs);
+device_initcall(intc_syscore_init);
Index: linux-2.6/drivers/sh/intc/internals.h
===================================================================
--- linux-2.6.orig/drivers/sh/intc/internals.h
+++ linux-2.6/drivers/sh/intc/internals.h
@@ -4,7 +4,7 @@
#include <linux/kernel.h>
#include <linux/types.h>
#include <linux/radix-tree.h>
-#include <linux/sysdev.h>
+#include <linux/module.h>
#define _INTC_MK(fn, mode, addr_e, addr_d, width, shift) \
((shift) | ((width) << 5) | ((fn) << 9) | ((mode) << 13) | \
@@ -51,9 +51,7 @@ struct intc_subgroup_entry {
struct intc_desc_int {
struct list_head list;
- struct sys_device sysdev;
struct radix_tree_root tree;
- pm_message_t state;
raw_spinlock_t lock;
unsigned int index;
unsigned long *reg;
@@ -158,7 +156,6 @@ void _intc_enable(struct irq_data *data,
extern struct list_head intc_list;
extern raw_spinlock_t intc_big_lock;
extern unsigned int nr_intc_controllers;
-extern struct sysdev_class intc_sysdev_class;
unsigned int intc_get_dfl_prio_level(void);
unsigned int intc_get_prio_level(unsigned int irq);
Index: linux-2.6/drivers/sh/intc/userimask.c
===================================================================
--- linux-2.6.orig/drivers/sh/intc/userimask.c
+++ linux-2.6/drivers/sh/intc/userimask.c
@@ -57,12 +57,21 @@ store_intc_userimask(struct sysdev_class
static SYSDEV_CLASS_ATTR(userimask, S_IRUSR | S_IWUSR,
show_intc_userimask, store_intc_userimask);
+static struct sysdev_class intc_sysdev_class = {
+ .name = "intc",
+};
static int __init userimask_sysdev_init(void)
{
+ int error;
+
if (unlikely(!uimask))
return -ENXIO;
+ error = sysdev_class_register(&intc_sysdev_class);
+ if (error)
+ return error;
+
return sysdev_class_create_file(&intc_sysdev_class, &attr_userimask);
}
late_initcall(userimask_sysdev_init);
From: Rafael J. Wysocki <[email protected]>
The timekeeping subsystem uses a sysdev class and a sysdev for
executing timekeeping_suspend() after interrupts have been turned off
on the boot CPU (during system suspend) and for executing
timekeeping_resume() before turning on interrupts on the boot CPU
(during system resume). However, since both of these functions
ignore their arguments, the entire mechanism may be replaced with a
struct syscore_ops object which is simpler.
Signed-off-by: Rafael J. Wysocki <[email protected]>
Reviewed-by: Thomas Gleixner <[email protected]>
---
kernel/time/timekeeping.c | 27 ++++++++-------------------
1 file changed, 8 insertions(+), 19 deletions(-)
Index: linux-2.6/kernel/time/timekeeping.c
===================================================================
--- linux-2.6.orig/kernel/time/timekeeping.c
+++ linux-2.6/kernel/time/timekeeping.c
@@ -14,7 +14,7 @@
#include <linux/init.h>
#include <linux/mm.h>
#include <linux/sched.h>
-#include <linux/sysdev.h>
+#include <linux/syscore_ops.h>
#include <linux/clocksource.h>
#include <linux/jiffies.h>
#include <linux/time.h>
@@ -597,13 +597,12 @@ static struct timespec timekeeping_suspe
/**
* timekeeping_resume - Resumes the generic timekeeping subsystem.
- * @dev: unused
*
* This is for the generic clocksource timekeeping.
* xtime/wall_to_monotonic/jiffies/etc are
* still managed by arch specific suspend/resume code.
*/
-static int timekeeping_resume(struct sys_device *dev)
+static void timekeeping_resume(void)
{
unsigned long flags;
struct timespec ts;
@@ -632,11 +631,9 @@ static int timekeeping_resume(struct sys
/* Resume hrtimers */
hres_timers_resume();
-
- return 0;
}
-static int timekeeping_suspend(struct sys_device *dev, pm_message_t state)
+static int timekeeping_suspend(void)
{
unsigned long flags;
@@ -654,26 +651,18 @@ static int timekeeping_suspend(struct sy
}
/* sysfs resume/suspend bits for timekeeping */
-static struct sysdev_class timekeeping_sysclass = {
- .name = "timekeeping",
+static struct syscore_ops timekeeping_syscore_ops = {
.resume = timekeeping_resume,
.suspend = timekeeping_suspend,
};
-static struct sys_device device_timer = {
- .id = 0,
- .cls = &timekeeping_sysclass,
-};
-
-static int __init timekeeping_init_device(void)
+static int __init timekeeping_init_ops(void)
{
- int error = sysdev_class_register(&timekeeping_sysclass);
- if (!error)
- error = sysdev_register(&device_timer);
- return error;
+ register_syscore_ops(&timekeeping_syscore_ops);
+ return 0;
}
-device_initcall(timekeeping_init_device);
+device_initcall(timekeeping_init_ops);
/*
* If the error is already larger, we look ahead even further
From: Rafael J. Wysocki <[email protected]>
The Intel IOMMU subsystem uses a sysdev class and a sysdev for
executing iommu_suspend() after interrupts have been turned off
on the boot CPU (during system suspend) and for executing
iommu_resume() before turning on interrupts on the boot CPU
(during system resume). However, since both of these functions
ignore their arguments, the entire mechanism may be replaced with a
struct syscore_ops object which is simpler.
Signed-off-by: Rafael J. Wysocki <[email protected]>
---
drivers/pci/intel-iommu.c | 38 +++++++++-----------------------------
1 file changed, 9 insertions(+), 29 deletions(-)
Index: linux-2.6/drivers/pci/intel-iommu.c
===================================================================
--- linux-2.6.orig/drivers/pci/intel-iommu.c
+++ linux-2.6/drivers/pci/intel-iommu.c
@@ -36,7 +36,7 @@
#include <linux/iova.h>
#include <linux/iommu.h>
#include <linux/intel-iommu.h>
-#include <linux/sysdev.h>
+#include <linux/syscore_ops.h>
#include <linux/tboot.h>
#include <linux/dmi.h>
#include <asm/cacheflush.h>
@@ -3135,7 +3135,7 @@ static void iommu_flush_all(void)
}
}
-static int iommu_suspend(struct sys_device *dev, pm_message_t state)
+static int iommu_suspend(void)
{
struct dmar_drhd_unit *drhd;
struct intel_iommu *iommu = NULL;
@@ -3175,7 +3175,7 @@ nomem:
return -ENOMEM;
}
-static int iommu_resume(struct sys_device *dev)
+static void iommu_resume(void)
{
struct dmar_drhd_unit *drhd;
struct intel_iommu *iommu = NULL;
@@ -3183,7 +3183,7 @@ static int iommu_resume(struct sys_devic
if (init_iommu_hw()) {
WARN(1, "IOMMU setup failed, DMAR can not resume!\n");
- return -EIO;
+ return;
}
for_each_active_iommu(iommu, drhd) {
@@ -3204,40 +3204,20 @@ static int iommu_resume(struct sys_devic
for_each_active_iommu(iommu, drhd)
kfree(iommu->iommu_state);
-
- return 0;
}
-static struct sysdev_class iommu_sysclass = {
- .name = "iommu",
+static struct syscore_ops iommu_syscore_ops = {
.resume = iommu_resume,
.suspend = iommu_suspend,
};
-static struct sys_device device_iommu = {
- .cls = &iommu_sysclass,
-};
-
-static int __init init_iommu_sysfs(void)
+static void __init init_iommu_pm_ops(void)
{
- int error;
-
- error = sysdev_class_register(&iommu_sysclass);
- if (error)
- return error;
-
- error = sysdev_register(&device_iommu);
- if (error)
- sysdev_class_unregister(&iommu_sysclass);
-
- return error;
+ register_syscore_ops(&iommu_syscore_ops);
}
#else
-static int __init init_iommu_sysfs(void)
-{
- return 0;
-}
+static inline int init_iommu_pm_ops(void) { }
#endif /* CONFIG_PM */
/*
@@ -3320,7 +3300,7 @@ int __init intel_iommu_init(void)
#endif
dma_ops = &intel_dma_ops;
- init_iommu_sysfs();
+ init_iommu_pm_ops();
register_iommu(&intel_iommu_ops);
From: Rafael J. Wysocki <[email protected]>
The cpufreq subsystem uses sysdev suspend and resume for
executing cpufreq_suspend() and cpufreq_resume(), respectively,
during system suspend, after interrupts have been switched off on the
boot CPU, and during system resume, while interrupts are still off on
the boot CPU. In both cases the other CPUs are off-line at the
relevant point (either they have been switched off via CPU hotplug
during suspend, or they haven't been switched on yet during resume).
For this reason, although it may seem that cpufreq_suspend() and
cpufreq_resume() are executed for all CPUs in the system, they are
only called for the boot CPU in fact, which is quite confusing.
To remove the confusion and to prepare for elimiating sysdev
suspend and resume operations from the kernel enirely, convernt
cpufreq to using a struct syscore_ops object for the boot CPU
suspend and resume and rename the callbacks so that their names
reflect their purpose. In addition, put some explanatory remarks
into their kerneldoc comments.
Signed-off-by: Rafael J. Wysocki <[email protected]>
---
drivers/cpufreq/cpufreq.c | 66 ++++++++++++++++++----------------------------
1 file changed, 26 insertions(+), 40 deletions(-)
Index: linux-2.6/drivers/cpufreq/cpufreq.c
===================================================================
--- linux-2.6.orig/drivers/cpufreq/cpufreq.c
+++ linux-2.6/drivers/cpufreq/cpufreq.c
@@ -28,6 +28,7 @@
#include <linux/cpu.h>
#include <linux/completion.h>
#include <linux/mutex.h>
+#include <linux/syscore_ops.h>
#include <trace/events/power.h>
@@ -1340,35 +1341,31 @@ out:
}
EXPORT_SYMBOL(cpufreq_get);
+static struct sysdev_driver cpufreq_sysdev_driver = {
+ .add = cpufreq_add_dev,
+ .remove = cpufreq_remove_dev,
+};
+
/**
- * cpufreq_suspend - let the low level driver prepare for suspend
+ * cpufreq_bp_suspend - Prepare the boot CPU for system suspend.
+ *
+ * This function is only executed for the boot processor. The other CPUs
+ * have been put offline by means of CPU hotplug.
*/
-
-static int cpufreq_suspend(struct sys_device *sysdev, pm_message_t pmsg)
+static int cpufreq_bp_suspend(void)
{
int ret = 0;
- int cpu = sysdev->id;
+ int cpu = smp_processor_id();
struct cpufreq_policy *cpu_policy;
dprintk("suspending cpu %u\n", cpu);
- if (!cpu_online(cpu))
- return 0;
-
- /* we may be lax here as interrupts are off. Nonetheless
- * we need to grab the correct cpu policy, as to check
- * whether we really run on this CPU.
- */
-
+ /* If there's no policy for the boot CPU, we have nothing to do. */
cpu_policy = cpufreq_cpu_get(cpu);
if (!cpu_policy)
- return -EINVAL;
-
- /* only handle each CPU group once */
- if (unlikely(cpu_policy->cpu != cpu))
- goto out;
+ return 0;
if (cpufreq_driver->suspend) {
ret = cpufreq_driver->suspend(cpu_policy);
@@ -1377,13 +1374,12 @@ static int cpufreq_suspend(struct sys_de
"step on CPU %u\n", cpu_policy->cpu);
}
-out:
cpufreq_cpu_put(cpu_policy);
return ret;
}
/**
- * cpufreq_resume - restore proper CPU frequency handling after resume
+ * cpufreq_bp_resume - Restore proper frequency handling of the boot CPU.
*
* 1.) resume CPUfreq hardware support (cpufreq_driver->resume())
* 2.) schedule call cpufreq_update_policy() ASAP as interrupts are
@@ -1391,31 +1387,23 @@ out:
* what we believe it to be. This is a bit later than when it
* should be, but nonethteless it's better than calling
* cpufreq_driver->get() here which might re-enable interrupts...
+ *
+ * This function is only executed for the boot CPU. The other CPUs have not
+ * been turned on yet.
*/
-static int cpufreq_resume(struct sys_device *sysdev)
+static void cpufreq_bp_resume(void)
{
int ret = 0;
- int cpu = sysdev->id;
+ int cpu = smp_processor_id();
struct cpufreq_policy *cpu_policy;
dprintk("resuming cpu %u\n", cpu);
- if (!cpu_online(cpu))
- return 0;
-
- /* we may be lax here as interrupts are off. Nonetheless
- * we need to grab the correct cpu policy, as to check
- * whether we really run on this CPU.
- */
-
+ /* If there's no policy for the boot CPU, we have nothing to do. */
cpu_policy = cpufreq_cpu_get(cpu);
if (!cpu_policy)
- return -EINVAL;
-
- /* only handle each CPU group once */
- if (unlikely(cpu_policy->cpu != cpu))
- goto fail;
+ return;
if (cpufreq_driver->resume) {
ret = cpufreq_driver->resume(cpu_policy);
@@ -1430,14 +1418,11 @@ static int cpufreq_resume(struct sys_dev
fail:
cpufreq_cpu_put(cpu_policy);
- return ret;
}
-static struct sysdev_driver cpufreq_sysdev_driver = {
- .add = cpufreq_add_dev,
- .remove = cpufreq_remove_dev,
- .suspend = cpufreq_suspend,
- .resume = cpufreq_resume,
+static struct syscore_ops cpufreq_syscore_ops = {
+ .suspend = cpufreq_bp_suspend,
+ .resume = cpufreq_bp_resume,
};
@@ -2002,6 +1987,7 @@ static int __init cpufreq_core_init(void
cpufreq_global_kobject = kobject_create_and_add("cpufreq",
&cpu_sysdev_class.kset.kobj);
BUG_ON(!cpufreq_global_kobject);
+ register_syscore_ops(&cpufreq_syscore_ops);
return 0;
}
From: Rafael J. Wysocki <[email protected]>
Introduce Kconfig option allowing architectures where sysdev
operations used during system suspend, resume and shutdown have been
completely replaced with struct sycore_ops operations to avoid
building sysdev code that will never be used.
Make callbacks in struct sys_device and struct sysdev_driver depend
on ARCH_NO_SYSDEV_OPS to allows us to verify if all of the references
have been actually removed from the code the given architecture
depends on.
Make x86 select ARCH_NO_SYSDEV_OPS.
Signed-off-by: Rafael J. Wysocki <[email protected]>
---
arch/x86/Kconfig | 1 +
drivers/base/Kconfig | 7 +++++++
drivers/base/sys.c | 3 ++-
include/linux/device.h | 4 ++++
include/linux/pm.h | 10 ++++++++--
include/linux/sysdev.h | 7 +++++--
6 files changed, 27 insertions(+), 5 deletions(-)
Index: linux-2.6/include/linux/pm.h
===================================================================
--- linux-2.6.orig/include/linux/pm.h
+++ linux-2.6/include/linux/pm.h
@@ -529,13 +529,19 @@ struct dev_power_domain {
*/
#ifdef CONFIG_PM_SLEEP
-extern void device_pm_lock(void);
+#ifndef CONFIG_ARCH_NO_SYSDEV_OPS
+extern int sysdev_suspend(pm_message_t state);
extern int sysdev_resume(void);
+#else
+static inline int sysdev_suspend(pm_message_t state) { return 0; }
+static inline int sysdev_resume(void) { return 0; }
+#endif
+
+extern void device_pm_lock(void);
extern void dpm_resume_noirq(pm_message_t state);
extern void dpm_resume_end(pm_message_t state);
extern void device_pm_unlock(void);
-extern int sysdev_suspend(pm_message_t state);
extern int dpm_suspend_noirq(pm_message_t state);
extern int dpm_suspend_start(pm_message_t state);
Index: linux-2.6/include/linux/sysdev.h
===================================================================
--- linux-2.6.orig/include/linux/sysdev.h
+++ linux-2.6/include/linux/sysdev.h
@@ -33,12 +33,13 @@ struct sysdev_class {
const char *name;
struct list_head drivers;
struct sysdev_class_attribute **attrs;
-
+ struct kset kset;
+#ifndef CONFIG_ARCH_NO_SYSDEV_OPS
/* Default operations for these types of devices */
int (*shutdown)(struct sys_device *);
int (*suspend)(struct sys_device *, pm_message_t state);
int (*resume)(struct sys_device *);
- struct kset kset;
+#endif
};
struct sysdev_class_attribute {
@@ -76,9 +77,11 @@ struct sysdev_driver {
struct list_head entry;
int (*add)(struct sys_device *);
int (*remove)(struct sys_device *);
+#ifndef CONFIG_ARCH_NO_SYSDEV_OPS
int (*shutdown)(struct sys_device *);
int (*suspend)(struct sys_device *, pm_message_t state);
int (*resume)(struct sys_device *);
+#endif
};
Index: linux-2.6/include/linux/device.h
===================================================================
--- linux-2.6.orig/include/linux/device.h
+++ linux-2.6/include/linux/device.h
@@ -633,8 +633,12 @@ static inline int devtmpfs_mount(const c
/* drivers/base/power/shutdown.c */
extern void device_shutdown(void);
+#ifndef CONFIG_ARCH_NO_SYSDEV_OPS
/* drivers/base/sys.c */
extern void sysdev_shutdown(void);
+#else
+static inline void sysdev_shutdown(void) { }
+#endif
/* debugging and troubleshooting/diagnostic helpers. */
extern const char *dev_driver_string(const struct device *dev);
Index: linux-2.6/arch/x86/Kconfig
===================================================================
--- linux-2.6.orig/arch/x86/Kconfig
+++ linux-2.6/arch/x86/Kconfig
@@ -71,6 +71,7 @@ config X86
select GENERIC_IRQ_SHOW
select IRQ_FORCED_THREADING
select USE_GENERIC_SMP_HELPERS if SMP
+ select ARCH_NO_SYSDEV_OPS
config INSTRUCTION_DECODER
def_bool (KPROBES || PERF_EVENTS)
Index: linux-2.6/drivers/base/sys.c
===================================================================
--- linux-2.6.orig/drivers/base/sys.c
+++ linux-2.6/drivers/base/sys.c
@@ -329,7 +329,7 @@ void sysdev_unregister(struct sys_device
}
-
+#ifndef CONFIG_ARCH_NO_SYSDEV_OPS
/**
* sysdev_shutdown - Shut down all system devices.
*
@@ -524,6 +524,7 @@ int sysdev_resume(void)
return 0;
}
EXPORT_SYMBOL_GPL(sysdev_resume);
+#endif /* CONFIG_ARCH_NO_SYSDEV_OPS */
int __init system_bus_init(void)
{
Index: linux-2.6/drivers/base/Kconfig
===================================================================
--- linux-2.6.orig/drivers/base/Kconfig
+++ linux-2.6/drivers/base/Kconfig
@@ -168,4 +168,11 @@ config SYS_HYPERVISOR
bool
default n
+config ARCH_NO_SYSDEV_OPS
+ bool
+ ---help---
+ To be selected by architectures that don't use sysdev class or
+ sysdev driver power management (suspend/resume) and shutdown
+ operations.
+
endmenu
From: Rafael J. Wysocki <[email protected]>
KVM uses a sysdev class and a sysdev for executing kvm_suspend()
after interrupts have been turned off on the boot CPU (during system
suspend) and for executing kvm_resume() before turning on interrupts
on the boot CPU (during system resume). However, since both of these
functions ignore their arguments, the entire mechanism may be
replaced with a struct syscore_ops object which is simpler.
Signed-off-by: Rafael J. Wysocki <[email protected]>
---
virt/kvm/kvm_main.c | 34 ++++++++--------------------------
1 file changed, 8 insertions(+), 26 deletions(-)
Index: linux-2.6/virt/kvm/kvm_main.c
===================================================================
--- linux-2.6.orig/virt/kvm/kvm_main.c
+++ linux-2.6/virt/kvm/kvm_main.c
@@ -30,7 +30,7 @@
#include <linux/debugfs.h>
#include <linux/highmem.h>
#include <linux/file.h>
-#include <linux/sysdev.h>
+#include <linux/syscore_ops.h>
#include <linux/cpu.h>
#include <linux/sched.h>
#include <linux/cpumask.h>
@@ -2447,33 +2447,26 @@ static void kvm_exit_debug(void)
debugfs_remove(kvm_debugfs_dir);
}
-static int kvm_suspend(struct sys_device *dev, pm_message_t state)
+static int kvm_suspend(void)
{
if (kvm_usage_count)
hardware_disable_nolock(NULL);
return 0;
}
-static int kvm_resume(struct sys_device *dev)
+static void kvm_resume(void)
{
if (kvm_usage_count) {
WARN_ON(raw_spin_is_locked(&kvm_lock));
hardware_enable_nolock(NULL);
}
- return 0;
}
-static struct sysdev_class kvm_sysdev_class = {
- .name = "kvm",
+static struct syscore_ops kvm_syscore_ops = {
.suspend = kvm_suspend,
.resume = kvm_resume,
};
-static struct sys_device kvm_sysdev = {
- .id = 0,
- .cls = &kvm_sysdev_class,
-};
-
struct page *bad_page;
pfn_t bad_pfn;
@@ -2557,14 +2550,6 @@ int kvm_init(void *opaque, unsigned vcpu
goto out_free_2;
register_reboot_notifier(&kvm_reboot_notifier);
- r = sysdev_class_register(&kvm_sysdev_class);
- if (r)
- goto out_free_3;
-
- r = sysdev_register(&kvm_sysdev);
- if (r)
- goto out_free_4;
-
/* A kmem cache lets us meet the alignment requirements of fx_save. */
if (!vcpu_align)
vcpu_align = __alignof__(struct kvm_vcpu);
@@ -2572,7 +2557,7 @@ int kvm_init(void *opaque, unsigned vcpu
0, NULL);
if (!kvm_vcpu_cache) {
r = -ENOMEM;
- goto out_free_5;
+ goto out_free_3;
}
r = kvm_async_pf_init();
@@ -2589,6 +2574,8 @@ int kvm_init(void *opaque, unsigned vcpu
goto out_unreg;
}
+ register_syscore_ops(&kvm_syscore_ops);
+
kvm_preempt_ops.sched_in = kvm_sched_in;
kvm_preempt_ops.sched_out = kvm_sched_out;
@@ -2600,10 +2587,6 @@ out_unreg:
kvm_async_pf_deinit();
out_free:
kmem_cache_destroy(kvm_vcpu_cache);
-out_free_5:
- sysdev_unregister(&kvm_sysdev);
-out_free_4:
- sysdev_class_unregister(&kvm_sysdev_class);
out_free_3:
unregister_reboot_notifier(&kvm_reboot_notifier);
unregister_cpu_notifier(&kvm_cpu_notifier);
@@ -2631,8 +2614,7 @@ void kvm_exit(void)
misc_deregister(&kvm_dev);
kmem_cache_destroy(kvm_vcpu_cache);
kvm_async_pf_deinit();
- sysdev_unregister(&kvm_sysdev);
- sysdev_class_unregister(&kvm_sysdev_class);
+ unregister_syscore_ops(&kvm_syscore_ops);
unregister_reboot_notifier(&kvm_reboot_notifier);
unregister_cpu_notifier(&kvm_cpu_notifier);
on_each_cpu(hardware_disable_nolock, NULL, 1);
From: Rafael J. Wysocki <[email protected]>
Some subsystems in the x86 tree need to carry out suspend/resume and
shutdown operations with one CPU on-line and interrupts disabled and
they define sysdev classes and sysdevs or sysdev drivers for this
purpose. This leads to unnecessarily complicated code and excessive
memory usage, so switch them to using struct syscore_ops objects for
this purpose instead.
Generally, there are three categories of subsystems that use
sysdevs for implementing PM operations: (1) subsystems whose
suspend/resume callbacks ignore their arguments entirely (the
majority), (2) subsystems whose suspend/resume callbacks use their
struct sys_device argument, but don't really need to do that,
because they can be implemented differently in an arguably simpler
way (io_apic.c), and (3) subsystems whose suspend/resume callbacks
use their struct sys_device argument, but the value of that argument
is always the same and could be ignored (microcode_core.c). In all
of these cases the subsystems in question may be readily converted to
using struct syscore_ops objects for power management and shutdown.
Signed-off-by: Rafael J. Wysocki <[email protected]>
Reviewed-by: Thomas Gleixner <[email protected]>
---
arch/x86/kernel/amd_iommu_init.c | 26 ++--------
arch/x86/kernel/apic/apic.c | 33 +++----------
arch/x86/kernel/apic/io_apic.c | 97 ++++++++++++++++++---------------------
arch/x86/kernel/cpu/mcheck/mce.c | 21 ++++----
arch/x86/kernel/cpu/mtrr/main.c | 10 ++--
arch/x86/kernel/i8237.c | 30 ++----------
arch/x86/kernel/i8259.c | 33 ++++---------
arch/x86/kernel/microcode_core.c | 34 +++++--------
arch/x86/kernel/pci-gart_64.c | 32 ++----------
arch/x86/oprofile/nmi_int.c | 44 ++++-------------
10 files changed, 128 insertions(+), 232 deletions(-)
Index: linux-2.6/arch/x86/kernel/amd_iommu_init.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/amd_iommu_init.c
+++ linux-2.6/arch/x86/kernel/amd_iommu_init.c
@@ -21,7 +21,7 @@
#include <linux/acpi.h>
#include <linux/list.h>
#include <linux/slab.h>
-#include <linux/sysdev.h>
+#include <linux/syscore_ops.h>
#include <linux/interrupt.h>
#include <linux/msi.h>
#include <asm/pci-direct.h>
@@ -1260,7 +1260,7 @@ static void disable_iommus(void)
* disable suspend until real resume implemented
*/
-static int amd_iommu_resume(struct sys_device *dev)
+static void amd_iommu_resume(void)
{
struct amd_iommu *iommu;
@@ -1276,11 +1276,9 @@ static int amd_iommu_resume(struct sys_d
*/
amd_iommu_flush_all_devices();
amd_iommu_flush_all_domains();
-
- return 0;
}
-static int amd_iommu_suspend(struct sys_device *dev, pm_message_t state)
+static int amd_iommu_suspend(void)
{
/* disable IOMMUs to go out of the way for BIOS */
disable_iommus();
@@ -1288,17 +1286,11 @@ static int amd_iommu_suspend(struct sys_
return 0;
}
-static struct sysdev_class amd_iommu_sysdev_class = {
- .name = "amd_iommu",
+static struct syscore_ops amd_iommu_syscore_ops = {
.suspend = amd_iommu_suspend,
.resume = amd_iommu_resume,
};
-static struct sys_device device_amd_iommu = {
- .id = 0,
- .cls = &amd_iommu_sysdev_class,
-};
-
/*
* This is the core init function for AMD IOMMU hardware in the system.
* This function is called from the generic x86 DMA layer initialization
@@ -1415,14 +1407,6 @@ static int __init amd_iommu_init(void)
goto free;
}
- ret = sysdev_class_register(&amd_iommu_sysdev_class);
- if (ret)
- goto free;
-
- ret = sysdev_register(&device_amd_iommu);
- if (ret)
- goto free;
-
ret = amd_iommu_init_devices();
if (ret)
goto free;
@@ -1441,6 +1425,8 @@ static int __init amd_iommu_init(void)
amd_iommu_init_notifier();
+ register_syscore_ops(&amd_iommu_syscore_ops);
+
if (iommu_pass_through)
goto out;
Index: linux-2.6/arch/x86/kernel/apic/apic.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/apic/apic.c
+++ linux-2.6/arch/x86/kernel/apic/apic.c
@@ -24,7 +24,7 @@
#include <linux/ftrace.h>
#include <linux/ioport.h>
#include <linux/module.h>
-#include <linux/sysdev.h>
+#include <linux/syscore_ops.h>
#include <linux/delay.h>
#include <linux/timex.h>
#include <linux/dmar.h>
@@ -2046,7 +2046,7 @@ static struct {
unsigned int apic_thmr;
} apic_pm_state;
-static int lapic_suspend(struct sys_device *dev, pm_message_t state)
+static int lapic_suspend(void)
{
unsigned long flags;
int maxlvt;
@@ -2084,23 +2084,21 @@ static int lapic_suspend(struct sys_devi
return 0;
}
-static int lapic_resume(struct sys_device *dev)
+static void lapic_resume(void)
{
unsigned int l, h;
unsigned long flags;
- int maxlvt;
- int ret = 0;
+ int maxlvt, ret;
struct IO_APIC_route_entry **ioapic_entries = NULL;
if (!apic_pm_state.active)
- return 0;
+ return;
local_irq_save(flags);
if (intr_remapping_enabled) {
ioapic_entries = alloc_ioapic_entries();
if (!ioapic_entries) {
WARN(1, "Alloc ioapic_entries in lapic resume failed.");
- ret = -ENOMEM;
goto restore;
}
@@ -2162,8 +2160,6 @@ static int lapic_resume(struct sys_devic
}
restore:
local_irq_restore(flags);
-
- return ret;
}
/*
@@ -2171,17 +2167,11 @@ restore:
* are needed on every CPU up until machine_halt/restart/poweroff.
*/
-static struct sysdev_class lapic_sysclass = {
- .name = "lapic",
+static struct syscore_ops lapic_syscore_ops = {
.resume = lapic_resume,
.suspend = lapic_suspend,
};
-static struct sys_device device_lapic = {
- .id = 0,
- .cls = &lapic_sysclass,
-};
-
static void __cpuinit apic_pm_activate(void)
{
apic_pm_state.active = 1;
@@ -2189,16 +2179,11 @@ static void __cpuinit apic_pm_activate(v
static int __init init_lapic_sysfs(void)
{
- int error;
-
- if (!cpu_has_apic)
- return 0;
/* XXX: remove suspend/resume procs if !apic_pm_state.active? */
+ if (cpu_has_apic)
+ register_syscore_ops(&lapic_syscore_ops);
- error = sysdev_class_register(&lapic_sysclass);
- if (!error)
- error = sysdev_register(&device_lapic);
- return error;
+ return 0;
}
/* local apic needs to resume before other devices access its registers. */
Index: linux-2.6/arch/x86/kernel/apic/io_apic.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/apic/io_apic.c
+++ linux-2.6/arch/x86/kernel/apic/io_apic.c
@@ -30,7 +30,7 @@
#include <linux/compiler.h>
#include <linux/acpi.h>
#include <linux/module.h>
-#include <linux/sysdev.h>
+#include <linux/syscore_ops.h>
#include <linux/msi.h>
#include <linux/htirq.h>
#include <linux/freezer.h>
@@ -2918,89 +2918,84 @@ static int __init io_apic_bug_finalize(v
late_initcall(io_apic_bug_finalize);
-struct sysfs_ioapic_data {
- struct sys_device dev;
- struct IO_APIC_route_entry entry[0];
-};
-static struct sysfs_ioapic_data * mp_ioapic_data[MAX_IO_APICS];
+static struct IO_APIC_route_entry *ioapic_saved_data[MAX_IO_APICS];
-static int ioapic_suspend(struct sys_device *dev, pm_message_t state)
+static void suspend_ioapic(int ioapic_id)
{
- struct IO_APIC_route_entry *entry;
- struct sysfs_ioapic_data *data;
+ struct IO_APIC_route_entry *saved_data = ioapic_saved_data[ioapic_id];
int i;
- data = container_of(dev, struct sysfs_ioapic_data, dev);
- entry = data->entry;
- for (i = 0; i < nr_ioapic_registers[dev->id]; i ++, entry ++ )
- *entry = ioapic_read_entry(dev->id, i);
+ if (!saved_data)
+ return;
+
+ for (i = 0; i < nr_ioapic_registers[ioapic_id]; i++)
+ saved_data[i] = ioapic_read_entry(ioapic_id, i);
+}
+
+static int ioapic_suspend(void)
+{
+ int ioapic_id;
+
+ for (ioapic_id = 0; ioapic_id < nr_ioapics; ioapic_id++)
+ suspend_ioapic(ioapic_id);
return 0;
}
-static int ioapic_resume(struct sys_device *dev)
+static void resume_ioapic(int ioapic_id)
{
- struct IO_APIC_route_entry *entry;
- struct sysfs_ioapic_data *data;
+ struct IO_APIC_route_entry *saved_data = ioapic_saved_data[ioapic_id];
unsigned long flags;
union IO_APIC_reg_00 reg_00;
int i;
- data = container_of(dev, struct sysfs_ioapic_data, dev);
- entry = data->entry;
+ if (!saved_data)
+ return;
raw_spin_lock_irqsave(&ioapic_lock, flags);
- reg_00.raw = io_apic_read(dev->id, 0);
- if (reg_00.bits.ID != mp_ioapics[dev->id].apicid) {
- reg_00.bits.ID = mp_ioapics[dev->id].apicid;
- io_apic_write(dev->id, 0, reg_00.raw);
+ reg_00.raw = io_apic_read(ioapic_id, 0);
+ if (reg_00.bits.ID != mp_ioapics[ioapic_id].apicid) {
+ reg_00.bits.ID = mp_ioapics[ioapic_id].apicid;
+ io_apic_write(ioapic_id, 0, reg_00.raw);
}
raw_spin_unlock_irqrestore(&ioapic_lock, flags);
- for (i = 0; i < nr_ioapic_registers[dev->id]; i++)
- ioapic_write_entry(dev->id, i, entry[i]);
+ for (i = 0; i < nr_ioapic_registers[ioapic_id]; i++)
+ ioapic_write_entry(ioapic_id, i, saved_data[i]);
+}
- return 0;
+static void ioapic_resume(void)
+{
+ int ioapic_id;
+
+ for (ioapic_id = nr_ioapics - 1; ioapic_id >= 0; ioapic_id--)
+ resume_ioapic(ioapic_id);
}
-static struct sysdev_class ioapic_sysdev_class = {
- .name = "ioapic",
+static struct syscore_ops ioapic_syscore_ops = {
.suspend = ioapic_suspend,
.resume = ioapic_resume,
};
-static int __init ioapic_init_sysfs(void)
+static int __init ioapic_init_ops(void)
{
- struct sys_device * dev;
- int i, size, error;
+ int i;
- error = sysdev_class_register(&ioapic_sysdev_class);
- if (error)
- return error;
+ for (i = 0; i < nr_ioapics; i++) {
+ unsigned int size;
- for (i = 0; i < nr_ioapics; i++ ) {
- size = sizeof(struct sys_device) + nr_ioapic_registers[i]
+ size = nr_ioapic_registers[i]
* sizeof(struct IO_APIC_route_entry);
- mp_ioapic_data[i] = kzalloc(size, GFP_KERNEL);
- if (!mp_ioapic_data[i]) {
- printk(KERN_ERR "Can't suspend/resume IOAPIC %d\n", i);
- continue;
- }
- dev = &mp_ioapic_data[i]->dev;
- dev->id = i;
- dev->cls = &ioapic_sysdev_class;
- error = sysdev_register(dev);
- if (error) {
- kfree(mp_ioapic_data[i]);
- mp_ioapic_data[i] = NULL;
- printk(KERN_ERR "Can't suspend/resume IOAPIC %d\n", i);
- continue;
- }
+ ioapic_saved_data[i] = kzalloc(size, GFP_KERNEL);
+ if (!ioapic_saved_data[i])
+ pr_err("IOAPIC %d: suspend/resume impossible!\n", i);
}
+ register_syscore_ops(&ioapic_syscore_ops);
+
return 0;
}
-device_initcall(ioapic_init_sysfs);
+device_initcall(ioapic_init_ops);
/*
* Dynamic irq allocate and deallocation
Index: linux-2.6/arch/x86/kernel/i8237.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/i8237.c
+++ linux-2.6/arch/x86/kernel/i8237.c
@@ -10,7 +10,7 @@
*/
#include <linux/init.h>
-#include <linux/sysdev.h>
+#include <linux/syscore_ops.h>
#include <asm/dma.h>
@@ -21,7 +21,7 @@
* in asm/dma.h.
*/
-static int i8237A_resume(struct sys_device *dev)
+static void i8237A_resume(void)
{
unsigned long flags;
int i;
@@ -41,31 +41,15 @@ static int i8237A_resume(struct sys_devi
enable_dma(4);
release_dma_lock(flags);
-
- return 0;
}
-static int i8237A_suspend(struct sys_device *dev, pm_message_t state)
-{
- return 0;
-}
-
-static struct sysdev_class i8237_sysdev_class = {
- .name = "i8237",
- .suspend = i8237A_suspend,
+static struct syscore_ops i8237_syscore_ops = {
.resume = i8237A_resume,
};
-static struct sys_device device_i8237A = {
- .id = 0,
- .cls = &i8237_sysdev_class,
-};
-
-static int __init i8237A_init_sysfs(void)
+static int __init i8237A_init_ops(void)
{
- int error = sysdev_class_register(&i8237_sysdev_class);
- if (!error)
- error = sysdev_register(&device_i8237A);
- return error;
+ register_syscore_ops(&i8237_syscore_ops);
+ return 0;
}
-device_initcall(i8237A_init_sysfs);
+device_initcall(i8237A_init_ops);
Index: linux-2.6/arch/x86/kernel/i8259.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/i8259.c
+++ linux-2.6/arch/x86/kernel/i8259.c
@@ -8,7 +8,7 @@
#include <linux/random.h>
#include <linux/init.h>
#include <linux/kernel_stat.h>
-#include <linux/sysdev.h>
+#include <linux/syscore_ops.h>
#include <linux/bitops.h>
#include <linux/acpi.h>
#include <linux/io.h>
@@ -245,20 +245,19 @@ static void save_ELCR(char *trigger)
trigger[1] = inb(0x4d1) & 0xDE;
}
-static int i8259A_resume(struct sys_device *dev)
+static void i8259A_resume(void)
{
init_8259A(i8259A_auto_eoi);
restore_ELCR(irq_trigger);
- return 0;
}
-static int i8259A_suspend(struct sys_device *dev, pm_message_t state)
+static int i8259A_suspend(void)
{
save_ELCR(irq_trigger);
return 0;
}
-static int i8259A_shutdown(struct sys_device *dev)
+static void i8259A_shutdown(void)
{
/* Put the i8259A into a quiescent state that
* the kernel initialization code can get it
@@ -266,21 +265,14 @@ static int i8259A_shutdown(struct sys_de
*/
outb(0xff, PIC_MASTER_IMR); /* mask all of 8259A-1 */
outb(0xff, PIC_SLAVE_IMR); /* mask all of 8259A-1 */
- return 0;
}
-static struct sysdev_class i8259_sysdev_class = {
- .name = "i8259",
+static struct syscore_ops i8259_syscore_ops = {
.suspend = i8259A_suspend,
.resume = i8259A_resume,
.shutdown = i8259A_shutdown,
};
-static struct sys_device device_i8259A = {
- .id = 0,
- .cls = &i8259_sysdev_class,
-};
-
static void mask_8259A(void)
{
unsigned long flags;
@@ -399,17 +391,12 @@ struct legacy_pic default_legacy_pic = {
struct legacy_pic *legacy_pic = &default_legacy_pic;
-static int __init i8259A_init_sysfs(void)
+static int __init i8259A_init_ops(void)
{
- int error;
-
- if (legacy_pic != &default_legacy_pic)
- return 0;
+ if (legacy_pic == &default_legacy_pic)
+ register_syscore_ops(&i8259_syscore_ops);
- error = sysdev_class_register(&i8259_sysdev_class);
- if (!error)
- error = sysdev_register(&device_i8259A);
- return error;
+ return 0;
}
-device_initcall(i8259A_init_sysfs);
+device_initcall(i8259A_init_ops);
Index: linux-2.6/arch/x86/kernel/pci-gart_64.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/pci-gart_64.c
+++ linux-2.6/arch/x86/kernel/pci-gart_64.c
@@ -27,7 +27,7 @@
#include <linux/kdebug.h>
#include <linux/scatterlist.h>
#include <linux/iommu-helper.h>
-#include <linux/sysdev.h>
+#include <linux/syscore_ops.h>
#include <linux/io.h>
#include <linux/gfp.h>
#include <asm/atomic.h>
@@ -589,7 +589,7 @@ void set_up_gart_resume(u32 aper_order,
aperture_alloc = aper_alloc;
}
-static void gart_fixup_northbridges(struct sys_device *dev)
+static void gart_fixup_northbridges(void)
{
int i;
@@ -613,33 +613,20 @@ static void gart_fixup_northbridges(stru
}
}
-static int gart_resume(struct sys_device *dev)
+static void gart_resume(void)
{
pr_info("PCI-DMA: Resuming GART IOMMU\n");
- gart_fixup_northbridges(dev);
+ gart_fixup_northbridges();
enable_gart_translations();
-
- return 0;
}
-static int gart_suspend(struct sys_device *dev, pm_message_t state)
-{
- return 0;
-}
-
-static struct sysdev_class gart_sysdev_class = {
- .name = "gart",
- .suspend = gart_suspend,
+static struct syscore_ops gart_syscore_ops = {
.resume = gart_resume,
};
-static struct sys_device device_gart = {
- .cls = &gart_sysdev_class,
-};
-
/*
* Private Northbridge GATT initialization in case we cannot use the
* AGP driver for some reason.
@@ -650,7 +637,7 @@ static __init int init_amd_gatt(struct a
unsigned aper_base, new_aper_base;
struct pci_dev *dev;
void *gatt;
- int i, error;
+ int i;
pr_info("PCI-DMA: Disabling AGP.\n");
@@ -685,12 +672,7 @@ static __init int init_amd_gatt(struct a
agp_gatt_table = gatt;
- error = sysdev_class_register(&gart_sysdev_class);
- if (!error)
- error = sysdev_register(&device_gart);
- if (error)
- panic("Could not register gart_sysdev -- "
- "would corrupt data on next suspend");
+ register_syscore_ops(&gart_syscore_ops);
flush_gart();
Index: linux-2.6/arch/x86/oprofile/nmi_int.c
===================================================================
--- linux-2.6.orig/arch/x86/oprofile/nmi_int.c
+++ linux-2.6/arch/x86/oprofile/nmi_int.c
@@ -15,7 +15,7 @@
#include <linux/notifier.h>
#include <linux/smp.h>
#include <linux/oprofile.h>
-#include <linux/sysdev.h>
+#include <linux/syscore_ops.h>
#include <linux/slab.h>
#include <linux/moduleparam.h>
#include <linux/kdebug.h>
@@ -536,7 +536,7 @@ static void nmi_shutdown(void)
#ifdef CONFIG_PM
-static int nmi_suspend(struct sys_device *dev, pm_message_t state)
+static int nmi_suspend(void)
{
/* Only one CPU left, just stop that one */
if (nmi_enabled == 1)
@@ -544,49 +544,31 @@ static int nmi_suspend(struct sys_device
return 0;
}
-static int nmi_resume(struct sys_device *dev)
+static void nmi_resume(void)
{
if (nmi_enabled == 1)
nmi_cpu_start(NULL);
- return 0;
}
-static struct sysdev_class oprofile_sysclass = {
- .name = "oprofile",
+static struct syscore_ops oprofile_syscore_ops = {
.resume = nmi_resume,
.suspend = nmi_suspend,
};
-static struct sys_device device_oprofile = {
- .id = 0,
- .cls = &oprofile_sysclass,
-};
-
-static int __init init_sysfs(void)
+static void __init init_suspend_resume(void)
{
- int error;
-
- error = sysdev_class_register(&oprofile_sysclass);
- if (error)
- return error;
-
- error = sysdev_register(&device_oprofile);
- if (error)
- sysdev_class_unregister(&oprofile_sysclass);
-
- return error;
+ register_syscore_ops(&oprofile_syscore_ops);
}
-static void exit_sysfs(void)
+static void exit_suspend_resume(void)
{
- sysdev_unregister(&device_oprofile);
- sysdev_class_unregister(&oprofile_sysclass);
+ unregister_syscore_ops(&oprofile_syscore_ops);
}
#else
-static inline int init_sysfs(void) { return 0; }
-static inline void exit_sysfs(void) { }
+static inline void init_suspend_resume(void) { }
+static inline void exit_suspend_resume(void) { }
#endif /* CONFIG_PM */
@@ -789,9 +771,7 @@ int __init op_nmi_init(struct oprofile_o
mux_init(ops);
- ret = init_sysfs();
- if (ret)
- return ret;
+ init_suspend_resume();
printk(KERN_INFO "oprofile: using NMI interrupt.\n");
return 0;
@@ -799,5 +779,5 @@ int __init op_nmi_init(struct oprofile_o
void op_nmi_exit(void)
{
- exit_sysfs();
+ exit_suspend_resume();
}
Index: linux-2.6/arch/x86/kernel/cpu/mcheck/mce.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/cpu/mcheck/mce.c
+++ linux-2.6/arch/x86/kernel/cpu/mcheck/mce.c
@@ -21,6 +21,7 @@
#include <linux/percpu.h>
#include <linux/string.h>
#include <linux/sysdev.h>
+#include <linux/syscore_ops.h>
#include <linux/delay.h>
#include <linux/ctype.h>
#include <linux/sched.h>
@@ -1749,14 +1750,14 @@ static int mce_disable_error_reporting(v
return 0;
}
-static int mce_suspend(struct sys_device *dev, pm_message_t state)
+static int mce_suspend(void)
{
return mce_disable_error_reporting();
}
-static int mce_shutdown(struct sys_device *dev)
+static void mce_shutdown(void)
{
- return mce_disable_error_reporting();
+ mce_disable_error_reporting();
}
/*
@@ -1764,14 +1765,18 @@ static int mce_shutdown(struct sys_devic
* Only one CPU is active at this time, the others get re-added later using
* CPU hotplug:
*/
-static int mce_resume(struct sys_device *dev)
+static void mce_resume(void)
{
__mcheck_cpu_init_generic();
__mcheck_cpu_init_vendor(__this_cpu_ptr(&cpu_info));
-
- return 0;
}
+static struct syscore_ops mce_syscore_ops = {
+ .suspend = mce_suspend,
+ .shutdown = mce_shutdown,
+ .resume = mce_resume,
+};
+
static void mce_cpu_restart(void *data)
{
del_timer_sync(&__get_cpu_var(mce_timer));
@@ -1808,9 +1813,6 @@ static void mce_enable_ce(void *all)
}
static struct sysdev_class mce_sysclass = {
- .suspend = mce_suspend,
- .shutdown = mce_shutdown,
- .resume = mce_resume,
.name = "machinecheck",
};
@@ -2139,6 +2141,7 @@ static __init int mcheck_init_device(voi
return err;
}
+ register_syscore_ops(&mce_syscore_ops);
register_hotcpu_notifier(&mce_cpu_notifier);
misc_register(&mce_log_device);
Index: linux-2.6/arch/x86/kernel/cpu/mtrr/main.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/cpu/mtrr/main.c
+++ linux-2.6/arch/x86/kernel/cpu/mtrr/main.c
@@ -45,6 +45,7 @@
#include <linux/cpu.h>
#include <linux/pci.h>
#include <linux/smp.h>
+#include <linux/syscore_ops.h>
#include <asm/processor.h>
#include <asm/e820.h>
@@ -630,7 +631,7 @@ struct mtrr_value {
static struct mtrr_value mtrr_value[MTRR_MAX_VAR_RANGES];
-static int mtrr_save(struct sys_device *sysdev, pm_message_t state)
+static int mtrr_save(void)
{
int i;
@@ -642,7 +643,7 @@ static int mtrr_save(struct sys_device *
return 0;
}
-static int mtrr_restore(struct sys_device *sysdev)
+static void mtrr_restore(void)
{
int i;
@@ -653,12 +654,11 @@ static int mtrr_restore(struct sys_devic
mtrr_value[i].ltype);
}
}
- return 0;
}
-static struct sysdev_driver mtrr_sysdev_driver = {
+static struct syscore_ops mtrr_syscore_ops = {
.suspend = mtrr_save,
.resume = mtrr_restore,
};
@@ -839,7 +839,7 @@ static int __init mtrr_init_finialize(vo
* TBD: is there any system with such CPU which supports
* suspend/resume? If no, we should remove the code.
*/
- sysdev_driver_register(&cpu_sysdev_class, &mtrr_sysdev_driver);
+ register_syscore_ops(&mtrr_syscore_ops);
return 0;
}
Index: linux-2.6/arch/x86/kernel/microcode_core.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/microcode_core.c
+++ linux-2.6/arch/x86/kernel/microcode_core.c
@@ -82,6 +82,7 @@
#include <linux/cpu.h>
#include <linux/fs.h>
#include <linux/mm.h>
+#include <linux/syscore_ops.h>
#include <asm/microcode.h>
#include <asm/processor.h>
@@ -438,33 +439,25 @@ static int mc_sysdev_remove(struct sys_d
return 0;
}
-static int mc_sysdev_resume(struct sys_device *dev)
+static struct sysdev_driver mc_sysdev_driver = {
+ .add = mc_sysdev_add,
+ .remove = mc_sysdev_remove,
+};
+
+/**
+ * mc_bp_resume - Update boot CPU microcode during resume.
+ */
+static void mc_bp_resume(void)
{
- int cpu = dev->id;
+ int cpu = smp_processor_id();
struct ucode_cpu_info *uci = ucode_cpu_info + cpu;
- if (!cpu_online(cpu))
- return 0;
-
- /*
- * All non-bootup cpus are still disabled,
- * so only CPU 0 will apply ucode here.
- *
- * Moreover, there can be no concurrent
- * updates from any other places at this point.
- */
- WARN_ON(cpu != 0);
-
if (uci->valid && uci->mc)
microcode_ops->apply_microcode(cpu);
-
- return 0;
}
-static struct sysdev_driver mc_sysdev_driver = {
- .add = mc_sysdev_add,
- .remove = mc_sysdev_remove,
- .resume = mc_sysdev_resume,
+static struct syscore_ops mc_syscore_ops = {
+ .resume = mc_bp_resume,
};
static __cpuinit int
@@ -542,6 +535,7 @@ static int __init microcode_init(void)
if (error)
return error;
+ register_syscore_ops(&mc_syscore_ops);
register_hotcpu_notifier(&mc_cpu_notifier);
pr_info("Microcode Update Driver: v" MICROCODE_VERSION
Hi,
On Saturday, March 12, 2011, Rafael J. Wysocki wrote:
> On Thursday, March 10, 2011, Rafael J. Wysocki wrote:
> > There are multiple problems with sysdevs, or struct sys_device objects to
> > be precise, that are so annoying that some people have started to think
> > of removind them entirely from the kernel. To me, personally, the most
> > obvious issue is the way sysdevs are used for defining suspend/resume
> > callbacks to be executed with one CPU on-line and interrupts disabled.
> > Greg and Kay may tell you more about the other problems with sysdevs. :-)
> >
> > Some subsystems need to carry out certain operations during suspend after
> > we've disabled non-boot CPUs and interrupts have been switched off on the
> > only on-line one. Currently, the only way to achieve that is to define
> > sysdev suspend/resume callbacks, but this is cumbersome and inefficient.
> > Namely, to do that, one has to define a sysdev class providing the callbacks
> > and a sysdev actually using them, which is excessively complicated. Moreover,
> > the sysdev suspend/resume callbacks take arguments that are not really used
> > by the majority of subsystems defining sysdev suspend/resume callbacks
> > (or even if they are used, they don't really _need_ to be used, so they
> > are simply unnecessary). Of course, if a sysdev is only defined to provide
> > suspend/resume (and maybe shutdown) callbacks, there's no real reason why
> > it should show up in sysfs.
> >
> > For this reason, I thought it would be a good idea to provide a simpler
> > interface for subsystems to define "very late" suspend callbacks and
> > "very early" resume callbacks (and "very late" shutdown callbacks as well)
> > without the entire bloat related to sysdevs. The interface is introduced
> > by the first of the following patches, while the second patch converts some
> > sysdev users related to the x86 architecture to using the new interface.
> >
> > I believe that call sysdev users who need to define suspend/resume/shutdown
> > callbacks may be converted to using the interface provided by the first patch,
> > which in turn should allow us to convert the remaining sysdev functionality
> > into "normal" struct device interfaces. Still, even if that turns out to be
> > too complicated, the bloat reduction resulting from the second patch kind of
> > shows that moving at least some sysdev users to a simpler interface (like in
> > the first patch) is a good idea anyway.
> >
> > This is a proof of concept, so the patches have not been tested. Please be
> > extrememly careful, because they touch sensitive code, so to speak. In the
> > majority of cases the changes are rather straightforward, but there are some
> > more interesting cases as well (io_apic.c most importantly).
>
> Since Greg likes the idea and there haven't been any objections so far, here's
> the official submission. The patches have been tested on HP nx6325 and
> Toshiba Portege R500.
>
> Patch [1/8] is regareded as 2.6.38 material, following Greg's advice. The
> other patches in the set are regarded as 2.6.39 material. The last one
> obviously depends on all of the previous ones.
>
> [1/8] - Introduce struct syscore_ops for registering operations to be run on
> one CPU during suspend/resume/shutdown.
>
> [2/8] - Convert sysdev users in arch/x86 to using struct syscore_ops.
>
> [3/8] - Make ACPI use struct syscore_ops for irqrouter_resume().
>
> [4/8] - Make timekeeping use struct syscore_ops for suspend/resume.
>
> [5/8] - Make Intel IOMMU use struct syscore_ops for suspend/resume.
>
> [6/8] - Make KVM use struct syscore_ops for suspend/resume.
>
> [7/8] - Make cpufreq use struct syscore_ops for boot CPU suspend/resume.
>
> [8/8] - Introduce config switch allowing architectures to skip sysdev
> suspend/resume/shutdown code.
>
> If there are no objectsions, I'd like to push these patches through the suspend
> tree.
[1/8] has been merged in the meantime and [3/8] has been included into the
ACPI tree. if there are no objections, I'm going to push the following
patches to Linus this week through the suspend-2.6 tree:
[1/6] - Convert sysdev users in arch/x86 to using struct syscore_ops.
[2/6] - Make timekeeping use struct syscore_ops for suspend/resume.
[3/6] - Make Intel IOMMU use struct syscore_ops for suspend/resume.
[4/6] - Make KVM use struct syscore_ops for suspend/resume.
[5/6] - Make cpufreq use struct syscore_ops for boot CPU suspend/resume.
[6/6] - Introduce config switch allowing architectures to skip sysdev
suspend/resume/shutdown code.
Thanks,
Rafael
On 03/22/2011 01:37 AM, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki<[email protected]>
>
> KVM uses a sysdev class and a sysdev for executing kvm_suspend()
> after interrupts have been turned off on the boot CPU (during system
> suspend) and for executing kvm_resume() before turning on interrupts
> on the boot CPU (during system resume). However, since both of these
> functions ignore their arguments, the entire mechanism may be
> replaced with a struct syscore_ops object which is simpler.
>
Acked-by: Avi Kivity <[email protected]>
--
error compiling committee.c: too many arguments to function
* Rafael J. Wysocki <[email protected]> wrote:
> > If there are no objectsions, I'd like to push these patches through the suspend
> > tree.
>
> [1/8] has been merged in the meantime and [3/8] has been included into the
> ACPI tree. if there are no objections, I'm going to push the following
> patches to Linus this week through the suspend-2.6 tree:
>
> [1/6] - Convert sysdev users in arch/x86 to using struct syscore_ops.
>
> [2/6] - Make timekeeping use struct syscore_ops for suspend/resume.
>
> [3/6] - Make Intel IOMMU use struct syscore_ops for suspend/resume.
>
> [4/6] - Make KVM use struct syscore_ops for suspend/resume.
>
> [5/6] - Make cpufreq use struct syscore_ops for boot CPU suspend/resume.
>
> [6/6] - Introduce config switch allowing architectures to skip sysdev
> suspend/resume/shutdown code.
The x86 bits look fine.
Acked-by: Ingo Molnar <[email protected]>
The patches affect a lot of hardware so please make sure they are tested well
before pushing them to Linus :-)
Ingo
On Mon, Mar 21, 2011 at 07:36:17PM -0400, Rafael J. Wysocki wrote:
> drivers/pci/intel-iommu.c | 38 +++++++++-----------------------------
> 1 file changed, 9 insertions(+), 29 deletions(-)
Looks good. I prepare a patch to convert AMD IOMMU to syscore_ops too.
Joerg
--
AMD Operating System Research Center
Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632
On Sat, Mar 19, 2011 at 01:47:27AM +0100, Rafael J. Wysocki wrote:
> On Thursday, March 17, 2011, Paul Mundt wrote:
> > On Sun, Mar 13, 2011 at 02:03:49PM +0100, R. J. Wysocki wrote:
> > > From: Rafael J. Wysocki <[email protected]>
> > >
> > > Convert the SuperH clocks framework and shared interrupt handling
> > > code to using struct syscore_ops instead of a sysdev classes and
> > > sysdevs for power managment.
> > >
> > > This reduces the code size significantly and simplifies it. The
> > > optimizations causing things not to be restored after creating a
> > > hibernation image are removed, but they might lead to undesirable
> > > effects during resume from hibernation (e.g. the clocks would be left
> > > as the boot kernel set them, which might be not the same way as the
> > > hibernated kernel had seen them before the hibernation).
> > >
> > > This also is necessary for removing sysdevs from the kernel entirely
> > > in the future.
> > >
> > > Signed-off-by: Rafael J. Wysocki <[email protected]>
> >
> > This misses the use of the sysdev class by the userimask code, though I'm
> > open to suggestions for alternatives.
>
> For now, I'd simply move the sysdev class definition to userimask.c, like
> in the patch below. The current goal is to eliminate the suspend/resume and
> shutdown operations from sysdevs (and sysdev drivers), the next step will
> be to replace the remaining sysdevs with alternative mechanisms.
>
It's not quite that straightforward, you've also killed off the name
attribute for each of the intc sysdevs, so we no longer have a visible
way to map a given intc controller number to the controller name in a
user visible way.
I'm not opposed to the syscore thing for suspend/resume ops, but I'm not
willing to trash the userimask and name mapping interface in the process
with no alternatives.
userimask was the first global configuration item I added, but there are
other per-controller and global configuration knobs that I plan to export
through the interface, so there really needs to be a compelling reason
for moving off of sysdevs.
On Tue, 2011-03-22 at 23:04 +0900, Paul Mundt wrote:
> On Sat, Mar 19, 2011 at 01:47:27AM +0100, Rafael J. Wysocki wrote:
> > On Thursday, March 17, 2011, Paul Mundt wrote:
> > > On Sun, Mar 13, 2011 at 02:03:49PM +0100, R. J. Wysocki wrote:
> > > > From: Rafael J. Wysocki <[email protected]>
> > > >
> > > > Convert the SuperH clocks framework and shared interrupt handling
> > > > code to using struct syscore_ops instead of a sysdev classes and
> > > > sysdevs for power managment.
> > > >
> > > > This reduces the code size significantly and simplifies it. The
> > > > optimizations causing things not to be restored after creating a
> > > > hibernation image are removed, but they might lead to undesirable
> > > > effects during resume from hibernation (e.g. the clocks would be left
> > > > as the boot kernel set them, which might be not the same way as the
> > > > hibernated kernel had seen them before the hibernation).
> > > >
> > > > This also is necessary for removing sysdevs from the kernel entirely
> > > > in the future.
> > > >
> > > > Signed-off-by: Rafael J. Wysocki <[email protected]>
> > >
> > > This misses the use of the sysdev class by the userimask code, though I'm
> > > open to suggestions for alternatives.
> >
> > For now, I'd simply move the sysdev class definition to userimask.c, like
> > in the patch below. The current goal is to eliminate the suspend/resume and
> > shutdown operations from sysdevs (and sysdev drivers), the next step will
> > be to replace the remaining sysdevs with alternative mechanisms.
> >
> It's not quite that straightforward, you've also killed off the name
> attribute for each of the intc sysdevs, so we no longer have a visible
> way to map a given intc controller number to the controller name in a
> user visible way.
>
> I'm not opposed to the syscore thing for suspend/resume ops, but I'm not
> willing to trash the userimask and name mapping interface in the process
> with no alternatives.
>
> userimask was the first global configuration item I added, but there are
> other per-controller and global configuration knobs that I plan to export
> through the interface, so there really needs to be a compelling reason
> for moving off of sysdevs.
Yes, they don't fit into the model. They have been a dumb hack from the
first day, and never integrated into the kenrel driver core or hotplug
properly.
If you need the userspace visibility, better just add a "struct
bus_type" with a proper name for your subsystem and register a "struct
device" with the bus_type assigned for all of them, instead of using the
broken concept of sydevs. You can even make them show up
in /sys/devices/system/<bus_type name>/<struct device name>/ if you want
to.
That way userspace can properly enumerate them in a flat list
in /sys/bus/<bus_type name>/devices/*, and gets proper events on module
load and during system coldplug, and can hook into the usual hotplug
pathes to set/get these values instead of crawling magicly defined and
decoupled locations in /sys which can not express proper hierarchy,
classicication, or anything else that all other devices can just do.
There is really no reason for any device being a magic and conceptually
broken sysdev today - just to be different from any other device the
kernel exports to userspace.
Thanks,
Kay
On Tuesday, March 22, 2011, Paul Mundt wrote:
> On Sat, Mar 19, 2011 at 01:47:27AM +0100, Rafael J. Wysocki wrote:
> > On Thursday, March 17, 2011, Paul Mundt wrote:
> > > On Sun, Mar 13, 2011 at 02:03:49PM +0100, R. J. Wysocki wrote:
> > > > From: Rafael J. Wysocki <[email protected]>
> > > >
> > > > Convert the SuperH clocks framework and shared interrupt handling
> > > > code to using struct syscore_ops instead of a sysdev classes and
> > > > sysdevs for power managment.
> > > >
> > > > This reduces the code size significantly and simplifies it. The
> > > > optimizations causing things not to be restored after creating a
> > > > hibernation image are removed, but they might lead to undesirable
> > > > effects during resume from hibernation (e.g. the clocks would be left
> > > > as the boot kernel set them, which might be not the same way as the
> > > > hibernated kernel had seen them before the hibernation).
> > > >
> > > > This also is necessary for removing sysdevs from the kernel entirely
> > > > in the future.
> > > >
> > > > Signed-off-by: Rafael J. Wysocki <[email protected]>
> > >
> > > This misses the use of the sysdev class by the userimask code, though I'm
> > > open to suggestions for alternatives.
> >
> > For now, I'd simply move the sysdev class definition to userimask.c, like
> > in the patch below. The current goal is to eliminate the suspend/resume and
> > shutdown operations from sysdevs (and sysdev drivers), the next step will
> > be to replace the remaining sysdevs with alternative mechanisms.
> >
> It's not quite that straightforward, you've also killed off the name
> attribute for each of the intc sysdevs, so we no longer have a visible
> way to map a given intc controller number to the controller name in a
> user visible way.
Hmm. So you actually want those empty directories to show up in sysfs?
I didn't think they were really useful ...
> I'm not opposed to the syscore thing for suspend/resume ops, but I'm not
> willing to trash the userimask and name mapping interface in the process
> with no alternatives.
OK
> userimask was the first global configuration item I added, but there are
> other per-controller and global configuration knobs that I plan to export
> through the interface, so there really needs to be a compelling reason
> for moving off of sysdevs.
Yes, there is. They are simply unusable in other situations, but I agree
we'll need to find a clean way to replace them where there's a good reason to
use them.
Updated patch follows.
Thanks,
Rafael
---
From: Rafael J. Wysocki <[email protected]>
Subject: Drivers / sh: Use struct syscore_ops instead of sysdevs
Convert the SuperH clocks framework and shared interrupt handling
code to using struct syscore_ops instead of a sysdev classes and
sysdevs for power managment.
This reduces the code size significantly and simplifies it. The
optimizations causing things not to be restored after creating a
hibernation image are removed, but they might lead to undesirable
effects during resume from hibernation (e.g. the clocks would be left
as the boot kernel set them, which might be not the same way as the
hibernated kernel had seen them before the hibernation).
This also is necessary for removing sysdevs from the kernel entirely
in the future.
Signed-off-by: Rafael J. Wysocki <[email protected]>
---
drivers/sh/clk/core.c | 68 ++++++++-----------------------
drivers/sh/intc/core.c | 95 +++++++++++++++++++++-----------------------
drivers/sh/intc/internals.h | 1
3 files changed, 65 insertions(+), 99 deletions(-)
Index: linux-2.6/drivers/sh/clk/core.c
===================================================================
--- linux-2.6.orig/drivers/sh/clk/core.c
+++ linux-2.6/drivers/sh/clk/core.c
@@ -21,7 +21,7 @@
#include <linux/module.h>
#include <linux/mutex.h>
#include <linux/list.h>
-#include <linux/sysdev.h>
+#include <linux/syscore_ops.h>
#include <linux/seq_file.h>
#include <linux/err.h>
#include <linux/io.h>
@@ -630,68 +630,36 @@ long clk_round_parent(struct clk *clk, u
EXPORT_SYMBOL_GPL(clk_round_parent);
#ifdef CONFIG_PM
-static int clks_sysdev_suspend(struct sys_device *dev, pm_message_t state)
+static void clks_core_resume(void)
{
- static pm_message_t prev_state;
struct clk *clkp;
- switch (state.event) {
- case PM_EVENT_ON:
- /* Resumeing from hibernation */
- if (prev_state.event != PM_EVENT_FREEZE)
- break;
-
- list_for_each_entry(clkp, &clock_list, node) {
- if (likely(clkp->ops)) {
- unsigned long rate = clkp->rate;
-
- if (likely(clkp->ops->set_parent))
- clkp->ops->set_parent(clkp,
- clkp->parent);
- if (likely(clkp->ops->set_rate))
- clkp->ops->set_rate(clkp, rate);
- else if (likely(clkp->ops->recalc))
- clkp->rate = clkp->ops->recalc(clkp);
- }
+ list_for_each_entry(clkp, &clock_list, node) {
+ if (likely(clkp->ops)) {
+ unsigned long rate = clkp->rate;
+
+ if (likely(clkp->ops->set_parent))
+ clkp->ops->set_parent(clkp,
+ clkp->parent);
+ if (likely(clkp->ops->set_rate))
+ clkp->ops->set_rate(clkp, rate);
+ else if (likely(clkp->ops->recalc))
+ clkp->rate = clkp->ops->recalc(clkp);
}
- break;
- case PM_EVENT_FREEZE:
- break;
- case PM_EVENT_SUSPEND:
- break;
}
-
- prev_state = state;
- return 0;
-}
-
-static int clks_sysdev_resume(struct sys_device *dev)
-{
- return clks_sysdev_suspend(dev, PMSG_ON);
}
-static struct sysdev_class clks_sysdev_class = {
- .name = "clks",
-};
-
-static struct sysdev_driver clks_sysdev_driver = {
- .suspend = clks_sysdev_suspend,
- .resume = clks_sysdev_resume,
-};
-
-static struct sys_device clks_sysdev_dev = {
- .cls = &clks_sysdev_class,
+static struct syscore_ops clks_syscore_ops = {
+ .resume = clks_core_resume,
};
-static int __init clk_sysdev_init(void)
+static int __init clk_syscore_init(void)
{
- sysdev_class_register(&clks_sysdev_class);
- sysdev_driver_register(&clks_sysdev_class, &clks_sysdev_driver);
- sysdev_register(&clks_sysdev_dev);
+ register_syscore_ops(&clks_syscore_ops);
return 0;
}
-subsys_initcall(clk_sysdev_init);
+subsys_initcall(clk_syscore_init);
#endif
/*
Index: linux-2.6/drivers/sh/intc/core.c
===================================================================
--- linux-2.6.orig/drivers/sh/intc/core.c
+++ linux-2.6/drivers/sh/intc/core.c
@@ -25,6 +25,7 @@
#include <linux/interrupt.h>
#include <linux/sh_intc.h>
#include <linux/sysdev.h>
+#include <linux/syscore_ops.h>
#include <linux/list.h>
#include <linux/spinlock.h>
#include <linux/radix-tree.h>
@@ -376,91 +377,89 @@ err0:
return -ENOMEM;
}
-static ssize_t
-show_intc_name(struct sys_device *dev, struct sysdev_attribute *attr, char *buf)
+static int intc_suspend(void)
{
struct intc_desc_int *d;
- d = container_of(dev, struct intc_desc_int, sysdev);
+ list_for_each_entry(d, &intc_list, list) {
+ int irq;
- return sprintf(buf, "%s\n", d->chip.name);
-}
+ /* enable wakeup irqs belonging to this intc controller */
+ for_each_active_irq(irq) {
+ struct irq_data *data;
+ struct irq_desc *desc;
+ struct irq_chip *chip;
-static SYSDEV_ATTR(name, S_IRUGO, show_intc_name, NULL);
+ data = irq_get_irq_data(irq);
+ chip = irq_data_get_irq_chip(data);
+ if (chip != &d->chip)
+ continue;
+ desc = irq_to_desc(irq);
+ if ((desc->status & IRQ_WAKEUP))
+ chip->irq_enable(data);
+ }
+ }
+
+ return 0;
+}
-static int intc_suspend(struct sys_device *dev, pm_message_t state)
+static void intc_resume(void)
{
struct intc_desc_int *d;
- struct irq_data *data;
- struct irq_desc *desc;
- struct irq_chip *chip;
- int irq;
-
- /* get intc controller associated with this sysdev */
- d = container_of(dev, struct intc_desc_int, sysdev);
- switch (state.event) {
- case PM_EVENT_ON:
- if (d->state.event != PM_EVENT_FREEZE)
- break;
+ list_for_each_entry(d, &intc_list, list) {
+ int irq;
for_each_active_irq(irq) {
- desc = irq_to_desc(irq);
+ struct irq_data *data;
+ struct irq_desc *desc;
+ struct irq_chip *chip;
+
data = irq_get_irq_data(irq);
chip = irq_data_get_irq_chip(data);
-
/*
* This will catch the redirect and VIRQ cases
* due to the dummy_irq_chip being inserted.
*/
if (chip != &d->chip)
continue;
+ desc = irq_to_desc(irq);
if (desc->status & IRQ_DISABLED)
chip->irq_disable(data);
else
chip->irq_enable(data);
}
- break;
- case PM_EVENT_FREEZE:
- /* nothing has to be done */
- break;
- case PM_EVENT_SUSPEND:
- /* enable wakeup irqs belonging to this intc controller */
- for_each_active_irq(irq) {
- desc = irq_to_desc(irq);
- data = irq_get_irq_data(irq);
- chip = irq_data_get_irq_chip(data);
-
- if (chip != &d->chip)
- continue;
- if ((desc->status & IRQ_WAKEUP))
- chip->irq_enable(data);
- }
- break;
}
-
- d->state = state;
-
- return 0;
}
-static int intc_resume(struct sys_device *dev)
-{
- return intc_suspend(dev, PMSG_ON);
-}
+struct syscore_ops intc_syscore_ops = {
+ .suspend = intc_suspend,
+ .resume = intc_resume,
+};
struct sysdev_class intc_sysdev_class = {
.name = "intc",
- .suspend = intc_suspend,
- .resume = intc_resume,
};
-/* register this intc as sysdev to allow suspend/resume */
+static ssize_t
+show_intc_name(struct sys_device *dev, struct sysdev_attribute *attr, char *buf)
+{
+ struct intc_desc_int *d;
+
+ d = container_of(dev, struct intc_desc_int, sysdev);
+
+ return sprintf(buf, "%s\n", d->chip.name);
+}
+
+static SYSDEV_ATTR(name, S_IRUGO, show_intc_name, NULL);
+
static int __init register_intc_sysdevs(void)
{
struct intc_desc_int *d;
int error;
+ register_syscore_ops(&intc_syscore_ops);
+
error = sysdev_class_register(&intc_sysdev_class);
if (!error) {
list_for_each_entry(d, &intc_list, list) {
Index: linux-2.6/drivers/sh/intc/internals.h
===================================================================
--- linux-2.6.orig/drivers/sh/intc/internals.h
+++ linux-2.6/drivers/sh/intc/internals.h
@@ -53,7 +53,6 @@ struct intc_desc_int {
struct list_head list;
struct sys_device sysdev;
struct radix_tree_root tree;
- pm_message_t state;
raw_spinlock_t lock;
unsigned int index;
unsigned long *reg;
On Tuesday, March 22, 2011, Kay Sievers wrote:
> On Tue, 2011-03-22 at 23:04 +0900, Paul Mundt wrote:
> > On Sat, Mar 19, 2011 at 01:47:27AM +0100, Rafael J. Wysocki wrote:
> > > On Thursday, March 17, 2011, Paul Mundt wrote:
> > > > On Sun, Mar 13, 2011 at 02:03:49PM +0100, R. J. Wysocki wrote:
> > > > > From: Rafael J. Wysocki <[email protected]>
> > > > >
> > > > > Convert the SuperH clocks framework and shared interrupt handling
> > > > > code to using struct syscore_ops instead of a sysdev classes and
> > > > > sysdevs for power managment.
> > > > >
> > > > > This reduces the code size significantly and simplifies it. The
> > > > > optimizations causing things not to be restored after creating a
> > > > > hibernation image are removed, but they might lead to undesirable
> > > > > effects during resume from hibernation (e.g. the clocks would be left
> > > > > as the boot kernel set them, which might be not the same way as the
> > > > > hibernated kernel had seen them before the hibernation).
> > > > >
> > > > > This also is necessary for removing sysdevs from the kernel entirely
> > > > > in the future.
> > > > >
> > > > > Signed-off-by: Rafael J. Wysocki <[email protected]>
> > > >
> > > > This misses the use of the sysdev class by the userimask code, though I'm
> > > > open to suggestions for alternatives.
> > >
> > > For now, I'd simply move the sysdev class definition to userimask.c, like
> > > in the patch below. The current goal is to eliminate the suspend/resume and
> > > shutdown operations from sysdevs (and sysdev drivers), the next step will
> > > be to replace the remaining sysdevs with alternative mechanisms.
> > >
> > It's not quite that straightforward, you've also killed off the name
> > attribute for each of the intc sysdevs, so we no longer have a visible
> > way to map a given intc controller number to the controller name in a
> > user visible way.
> >
> > I'm not opposed to the syscore thing for suspend/resume ops, but I'm not
> > willing to trash the userimask and name mapping interface in the process
> > with no alternatives.
> >
> > userimask was the first global configuration item I added, but there are
> > other per-controller and global configuration knobs that I plan to export
> > through the interface, so there really needs to be a compelling reason
> > for moving off of sysdevs.
>
> Yes, they don't fit into the model. They have been a dumb hack from the
> first day, and never integrated into the kenrel driver core or hotplug
> properly.
>
> If you need the userspace visibility, better just add a "struct
> bus_type" with a proper name for your subsystem and register a "struct
> device" with the bus_type assigned for all of them, instead of using the
> broken concept of sydevs. You can even make them show up
> in /sys/devices/system/<bus_type name>/<struct device name>/ if you want
> to.
I don't really think that's going to be useful in this particular case.
The reason is, first, because the struct device would cause lots of other
stuff to show up in sysfs which would be totally redundant and confusing
and, second, because the things exported here are simply static attributes,
pretty much like the stuff in /sys/power/.
Perhaps there's a more straightforward way to make some files show up in
sysfs on a specific path than defininig an otherwise useless bus type and
device object?
> That way userspace can properly enumerate them in a flat list
> in /sys/bus/<bus_type name>/devices/*, and gets proper events on module
> load and during system coldplug, and can hook into the usual hotplug
> pathes to set/get these values instead of crawling magicly defined and
> decoupled locations in /sys which can not express proper hierarchy,
> classicication, or anything else that all other devices can just do.
There's no hotplug involved or anything remotely like that AFAICS.
There are simply static files as I said above, they are created
early during system initialization and simply stay there.
> There is really no reason for any device being a magic and conceptually
> broken sysdev today - just to be different from any other device the
> kernel exports to userspace.
It's not a "device being a sysdev", it's sysdevs being used for creating
a user space interface, which isn't broken by itself.
Thanks,
Rafael
On Tuesday, March 22, 2011, Ingo Molnar wrote:
>
> * Rafael J. Wysocki <[email protected]> wrote:
>
> > > If there are no objectsions, I'd like to push these patches through the suspend
> > > tree.
> >
> > [1/8] has been merged in the meantime and [3/8] has been included into the
> > ACPI tree. if there are no objections, I'm going to push the following
> > patches to Linus this week through the suspend-2.6 tree:
> >
> > [1/6] - Convert sysdev users in arch/x86 to using struct syscore_ops.
> >
> > [2/6] - Make timekeeping use struct syscore_ops for suspend/resume.
> >
> > [3/6] - Make Intel IOMMU use struct syscore_ops for suspend/resume.
> >
> > [4/6] - Make KVM use struct syscore_ops for suspend/resume.
> >
> > [5/6] - Make cpufreq use struct syscore_ops for boot CPU suspend/resume.
> >
> > [6/6] - Introduce config switch allowing architectures to skip sysdev
> > suspend/resume/shutdown code.
>
> The x86 bits look fine.
>
> Acked-by: Ingo Molnar <[email protected]>
Thanks!
> The patches affect a lot of hardware so please make sure they are tested well
> before pushing them to Linus :-)
I have tested the majority, but unfortunately I have no hardware to test
the Intel IOMMU patch on it.
Thanks,
Rafael
On Tue, 2011-03-22 at 21:30 +0100, Rafael J. Wysocki wrote:
> On Tuesday, March 22, 2011, Kay Sievers wrote:
> > On Tue, 2011-03-22 at 23:04 +0900, Paul Mundt wrote:
> > > On Sat, Mar 19, 2011 at 01:47:27AM +0100, Rafael J. Wysocki wrote:
> > > > On Thursday, March 17, 2011, Paul Mundt wrote:
> > > > > On Sun, Mar 13, 2011 at 02:03:49PM +0100, R. J. Wysocki wrote:
> > > > > > From: Rafael J. Wysocki <[email protected]>
> > > > > >
> > > > > > Convert the SuperH clocks framework and shared interrupt handling
> > > > > > code to using struct syscore_ops instead of a sysdev classes and
> > > > > > sysdevs for power managment.
> > > > > >
> > > > > > This reduces the code size significantly and simplifies it. The
> > > > > > optimizations causing things not to be restored after creating a
> > > > > > hibernation image are removed, but they might lead to undesirable
> > > > > > effects during resume from hibernation (e.g. the clocks would be left
> > > > > > as the boot kernel set them, which might be not the same way as the
> > > > > > hibernated kernel had seen them before the hibernation).
> > > > > >
> > > > > > This also is necessary for removing sysdevs from the kernel entirely
> > > > > > in the future.
> > > > > >
> > > > > > Signed-off-by: Rafael J. Wysocki <[email protected]>
> > > > >
> > > > > This misses the use of the sysdev class by the userimask code, though I'm
> > > > > open to suggestions for alternatives.
> > > >
> > > > For now, I'd simply move the sysdev class definition to userimask.c, like
> > > > in the patch below. The current goal is to eliminate the suspend/resume and
> > > > shutdown operations from sysdevs (and sysdev drivers), the next step will
> > > > be to replace the remaining sysdevs with alternative mechanisms.
> > > >
> > > It's not quite that straightforward, you've also killed off the name
> > > attribute for each of the intc sysdevs, so we no longer have a visible
> > > way to map a given intc controller number to the controller name in a
> > > user visible way.
> > >
> > > I'm not opposed to the syscore thing for suspend/resume ops, but I'm not
> > > willing to trash the userimask and name mapping interface in the process
> > > with no alternatives.
> > >
> > > userimask was the first global configuration item I added, but there are
> > > other per-controller and global configuration knobs that I plan to export
> > > through the interface, so there really needs to be a compelling reason
> > > for moving off of sysdevs.
> >
> > Yes, they don't fit into the model. They have been a dumb hack from the
> > first day, and never integrated into the kenrel driver core or hotplug
> > properly.
> >
> > If you need the userspace visibility, better just add a "struct
> > bus_type" with a proper name for your subsystem and register a "struct
> > device" with the bus_type assigned for all of them, instead of using the
> > broken concept of sydevs. You can even make them show up
> > in /sys/devices/system/<bus_type name>/<struct device name>/ if you want
> > to.
>
> I don't really think that's going to be useful in this particular case.
> The reason is, first, because the struct device would cause lots of other
> stuff to show up in sysfs which would be totally redundant and confusing
> and, second, because the things exported here are simply static attributes,
> pretty much like the stuff in /sys/power/.
>
> Perhaps there's a more straightforward way to make some files show up in
> sysfs on a specific path than defininig an otherwise useless bus type and
> device object?
That's absolutely not the point. Please don't get yourself into that
thinking. If people want to "export stuff to userspace", they must not
invent new things. We need to get rid of the silly special cases.
Userspace is not meant to learn subsystem specific rules for every new
thing. There is _one_ way to export device attributes, and that is
"struct device" today.
If that's to expensive for anybody, just don't use sysfs. It's the rule
we have today. :)
> > That way userspace can properly enumerate them in a flat list
> > in /sys/bus/<bus_type name>/devices/*, and gets proper events on module
> > load and during system coldplug, and can hook into the usual hotplug
> > pathes to set/get these values instead of crawling magicly defined and
> > decoupled locations in /sys which can not express proper hierarchy,
> > classicication, or anything else that all other devices can just do.
>
> There's no hotplug involved or anything remotely like that AFAICS.
> There are simply static files as I said above, they are created
> early during system initialization and simply stay there.
That's not the point. It's about a single way to retrieve information
about devices, extendability, and coldplug during bootup, where existing
devices need to be handled only after userspace is up. That is just a
case of "hotplug" that has the same codepath for userspace, even when
the devices can never really come and go.
> > There is really no reason for any device being a magic and conceptually
> > broken sysdev today - just to be different from any other device the
> > kernel exports to userspace.
>
> It's not a "device being a sysdev", it's sysdevs being used for creating
> a user space interface, which isn't broken by itself.
Yeah , absolutely. But if any device wants to export anything is _must_
be a "struct device" today. If that does not fit, it must not use sysfs
at all.
Kay
On Tuesday, March 22, 2011, Kay Sievers wrote:
> On Tue, 2011-03-22 at 21:30 +0100, Rafael J. Wysocki wrote:
> > On Tuesday, March 22, 2011, Kay Sievers wrote:
> > > On Tue, 2011-03-22 at 23:04 +0900, Paul Mundt wrote:
> > > > On Sat, Mar 19, 2011 at 01:47:27AM +0100, Rafael J. Wysocki wrote:
> > > > > On Thursday, March 17, 2011, Paul Mundt wrote:
> > > > > > On Sun, Mar 13, 2011 at 02:03:49PM +0100, R. J. Wysocki wrote:
> > > > > > > From: Rafael J. Wysocki <[email protected]>
> > > > > > >
> > > > > > > Convert the SuperH clocks framework and shared interrupt handling
> > > > > > > code to using struct syscore_ops instead of a sysdev classes and
> > > > > > > sysdevs for power managment.
> > > > > > >
> > > > > > > This reduces the code size significantly and simplifies it. The
> > > > > > > optimizations causing things not to be restored after creating a
> > > > > > > hibernation image are removed, but they might lead to undesirable
> > > > > > > effects during resume from hibernation (e.g. the clocks would be left
> > > > > > > as the boot kernel set them, which might be not the same way as the
> > > > > > > hibernated kernel had seen them before the hibernation).
> > > > > > >
> > > > > > > This also is necessary for removing sysdevs from the kernel entirely
> > > > > > > in the future.
> > > > > > >
> > > > > > > Signed-off-by: Rafael J. Wysocki <[email protected]>
> > > > > >
> > > > > > This misses the use of the sysdev class by the userimask code, though I'm
> > > > > > open to suggestions for alternatives.
> > > > >
> > > > > For now, I'd simply move the sysdev class definition to userimask.c, like
> > > > > in the patch below. The current goal is to eliminate the suspend/resume and
> > > > > shutdown operations from sysdevs (and sysdev drivers), the next step will
> > > > > be to replace the remaining sysdevs with alternative mechanisms.
> > > > >
> > > > It's not quite that straightforward, you've also killed off the name
> > > > attribute for each of the intc sysdevs, so we no longer have a visible
> > > > way to map a given intc controller number to the controller name in a
> > > > user visible way.
> > > >
> > > > I'm not opposed to the syscore thing for suspend/resume ops, but I'm not
> > > > willing to trash the userimask and name mapping interface in the process
> > > > with no alternatives.
> > > >
> > > > userimask was the first global configuration item I added, but there are
> > > > other per-controller and global configuration knobs that I plan to export
> > > > through the interface, so there really needs to be a compelling reason
> > > > for moving off of sysdevs.
> > >
> > > Yes, they don't fit into the model. They have been a dumb hack from the
> > > first day, and never integrated into the kenrel driver core or hotplug
> > > properly.
> > >
> > > If you need the userspace visibility, better just add a "struct
> > > bus_type" with a proper name for your subsystem and register a "struct
> > > device" with the bus_type assigned for all of them, instead of using the
> > > broken concept of sydevs. You can even make them show up
> > > in /sys/devices/system/<bus_type name>/<struct device name>/ if you want
> > > to.
> >
> > I don't really think that's going to be useful in this particular case.
> > The reason is, first, because the struct device would cause lots of other
> > stuff to show up in sysfs which would be totally redundant and confusing
> > and, second, because the things exported here are simply static attributes,
> > pretty much like the stuff in /sys/power/.
> >
> > Perhaps there's a more straightforward way to make some files show up in
> > sysfs on a specific path than defininig an otherwise useless bus type and
> > device object?
>
> That's absolutely not the point. Please don't get yourself into that
> thinking. If people want to "export stuff to userspace", they must not
> invent new things. We need to get rid of the silly special cases.
Why exactly? Do they actually hurt anyone and if so then how?
> Userspace is not meant to learn subsystem specific rules for every new
> thing.
That depends a good deal of who's writing the user space in question. If
that's the same person who's working on the particular part of the kernel,
I don't see a big problem.
> There is _one_ way to export device attributes, and that is
> "struct device" today.
>
> If that's to expensive for anybody, just don't use sysfs. It's the rule
> we have today. :)
Oh, good to know. It's changed a bit since I last heard. Never mind.
Still, I won't let you change the things in /sys/power to struct devices,
sorry about that. ;-)
And I wonder how are you going to deal with clocksource exporting things
via the sysdev interface right now. I'd simply create two directories and
put the two files into them and be done with that, but I guess that
wouldn't fit into the model somehow, right?
> > > That way userspace can properly enumerate them in a flat list
> > > in /sys/bus/<bus_type name>/devices/*, and gets proper events on module
> > > load and during system coldplug, and can hook into the usual hotplug
> > > pathes to set/get these values instead of crawling magicly defined and
> > > decoupled locations in /sys which can not express proper hierarchy,
> > > classicication, or anything else that all other devices can just do.
> >
> > There's no hotplug involved or anything remotely like that AFAICS.
> > There are simply static files as I said above, they are created
> > early during system initialization and simply stay there.
>
> That's not the point. It's about a single way to retrieve information
> about devices, extendability, and coldplug during bootup, where existing
> devices need to be handled only after userspace is up.
I'd say the case at hand has nothing to do with that.
> That is just a case of "hotplug" that has the same codepath for userspace,
> even when the devices can never really come and go.
My impression is that when you say "user space", you actually mean some
_specific_ user space, don't you?
> > > There is really no reason for any device being a magic and conceptually
> > > broken sysdev today - just to be different from any other device the
> > > kernel exports to userspace.
> >
> > It's not a "device being a sysdev", it's sysdevs being used for creating
> > a user space interface, which isn't broken by itself.
>
> Yeah , absolutely. But if any device wants to export anything is _must_
> be a "struct device" today. If that does not fit, it must not use sysfs
> at all.
Well, it's not a "device wants to export something". It's platform code
wanting to export some information related to the platform and I really
don't see a reason why it should create bus types and device objects
_specifically_ for that. It's just too wasteful, both in terms of memory
and time needed for handling that in the device core.
Thanks,
Rafael
On Tue, 2011-03-22 at 22:00 +0100, Rafael J. Wysocki wrote:
> On Tuesday, March 22, 2011, Kay Sievers wrote:
> > On Tue, 2011-03-22 at 21:30 +0100, Rafael J. Wysocki wrote:
> > >
> > > Perhaps there's a more straightforward way to make some files show up in
> > > sysfs on a specific path than defininig an otherwise useless bus type and
> > > device object?
> >
> > That's absolutely not the point. Please don't get yourself into that
> > thinking. If people want to "export stuff to userspace", they must not
> > invent new things. We need to get rid of the silly special cases.
>
> Why exactly? Do they actually hurt anyone and if so then how?
Sure, "devices" are devices, and devices have well-defines set of
properties, not some magic directory, people can mess around with the
way they like.
> > Userspace is not meant to learn subsystem specific rules for every new
> > thing.
>
> That depends a good deal of who's writing the user space in question. If
> that's the same person who's working on the particular part of the kernel,
> I don't see a big problem.
Not for "devices". There are rules for devices, which are defined by the
driver core, and the sysdev stuff needs to go, because it does not fit
into that model.
> > There is _one_ way to export device attributes, and that is
> > "struct device" today.
> >
> > If that's to expensive for anybody, just don't use sysfs. It's the rule
> > we have today. :)
>
> Oh, good to know. It's changed a bit since I last heard. Never mind.
Oh, don't get me wrong, this is all is about "devices" not any other
controls.
> Still, I won't let you change the things in /sys/power to struct devices,
> sorry about that. ;-)
Fine as long as they are power specific things, and not "devices". You
don't have sysdevs there, right? :)
> And I wonder how are you going to deal with clocksource exporting things
> via the sysdev interface right now. I'd simply create two directories and
> put the two files into them and be done with that, but I guess that
> wouldn't fit into the model somehow, right?
Nope, register a bus_type, and use struct device for all of them, Parent
them to /sys/devices/system/ if they should keep their location and
layout.
> > > > That way userspace can properly enumerate them in a flat list
> > > > in /sys/bus/<bus_type name>/devices/*, and gets proper events on module
> > > > load and during system coldplug, and can hook into the usual hotplug
> > > > pathes to set/get these values instead of crawling magicly defined and
> > > > decoupled locations in /sys which can not express proper hierarchy,
> > > > classicication, or anything else that all other devices can just do.
> > >
> > > There's no hotplug involved or anything remotely like that AFAICS.
> > > There are simply static files as I said above, they are created
> > > early during system initialization and simply stay there.
> >
> > That's not the point. It's about a single way to retrieve information
> > about devices, extendability, and coldplug during bootup, where existing
> > devices need to be handled only after userspace is up.
>
> I'd say the case at hand has nothing to do with that.
It has. As for CPUs. We can not do proper CPU-dependent module
autoloading, because the events happen before userspace runs, and
clodplug can not see the broken sysdevs, because they have no events to
re-trigger, like all others have.
> > That is just a case of "hotplug" that has the same codepath for userspace,
> > even when the devices can never really come and go.
>
> My impression is that when you say "user space", you actually mean some
> _specific_ user space, don't you?
On usual boxes it's udev/libudev and all the stuff around it. But
andreoid has the same stuff in their own way of doing it. So it's not
about an implementation in userspace, it's about a sane event and
classification interface for kernel-exported devices. Again tis is not
about any other stuff in /sys, only the "devices", and we want to have
only a single type, and a single way to handle it in userspace.
> > > > There is really no reason for any device being a magic and conceptually
> > > > broken sysdev today - just to be different from any other device the
> > > > kernel exports to userspace.
> > >
> > > It's not a "device being a sysdev", it's sysdevs being used for creating
> > > a user space interface, which isn't broken by itself.
> >
> > Yeah , absolutely. But if any device wants to export anything is _must_
> > be a "struct device" today. If that does not fit, it must not use sysfs
> > at all.
>
> Well, it's not a "device wants to export something". It's platform code
> wanting to export some information related to the platform and I really
> don't see a reason why it should create bus types and device objects
> _specifically_ for that. It's just too wasteful, both in terms of memory
> and time needed for handling that in the device core.
Because they are devices, and there is a lot to win, if the kernel
exports all "devices" in the same way. This is not about saving an inode
in /sys, it's the ability to do runtime device configuration with common
tools.
Kay
On Tue, Mar 22, 2011 at 10:12:36PM +0100, Kay Sievers wrote:
> On Tue, 2011-03-22 at 22:00 +0100, Rafael J. Wysocki wrote:
> > On Tuesday, March 22, 2011, Kay Sievers wrote:
> > > On Tue, 2011-03-22 at 21:30 +0100, Rafael J. Wysocki wrote:
> > > >
> > > > Perhaps there's a more straightforward way to make some files show up in
> > > > sysfs on a specific path than defininig an otherwise useless bus type and
> > > > device object?
> > >
> > > That's absolutely not the point. Please don't get yourself into that
> > > thinking. If people want to "export stuff to userspace", they must not
> > > invent new things. We need to get rid of the silly special cases.
> >
> > Why exactly? Do they actually hurt anyone and if so then how?
>
> Sure, "devices" are devices, and devices have well-defines set of
> properties, not some magic directory, people can mess around with the
> way they like.
>
This is starting to get silly. In the case of interrupt controllers as
with clocksources the bus abstraction is completely meaningless. I could
register a type of "system" bus, but how would that be any different from
what we have from sysdevs already today? All it does is force people to
duplicate crap all over the place instead and pretend like extra busses
exist in order to fit some arbitrary definition of how the driver model
should work.
You talk about inventing special interfaces to bypass the device model,
but that's not the case here. Rolling my own interface with kobjects and
attribute groups as with /sys/power or making an arbitrary bus type for a
single class of system devices seems infinitely more hackish than the
current sysdev model.
The comment at the top of sys.c says:
* sys.c - pseudo-bus for system 'devices' (cpus, PICs, timers, etc)
Which is precisely where I would expect interrupt controllers and timers
and CPUs to go. I'm not going to make an IRQ bus or a timer bus and
arbitrarily map some things there and some things somewhere else in the
name of some abstraction insanity. These interrupt controllers all have
consistent attributes that make the sysdev class model work well, but
there are also many other types of interrupt controllers on the same CPUs
that use a different abstraction.
Beyond that, we also have sysdev class utilization for DMA controllers,
per-CPU store queues, etc, etc. all of which would need to be converted
to something else (see for example arch/sh/kernel/cpu/sh4/sq.c -- which
in turn was modelled after the cpufreq code, which also would need to
change to something else). It's not entirely obvious how to convert these
things, or why one should even bother. I can live with the struct device
overhead even if I find it to be a meaningless abstraction in this case,
but what sort of bus/class model to shoe horn these things in to is
rather beyond me.
Indeed it seems to me that these are precisely the sorts of things that
sysdevs are intended for, which is why I elected to use them from the
onset. Simply saying "don't use sysdevs" and "pretend like you have some
sort of a magical pseudo-bus that's not a system bus" doesn't quite do it
for me.
On Wed, 2011-03-23 at 06:49 +0900, Paul Mundt wrote:
> On Tue, Mar 22, 2011 at 10:12:36PM +0100, Kay Sievers wrote:
> > On Tue, 2011-03-22 at 22:00 +0100, Rafael J. Wysocki wrote:
> > > On Tuesday, March 22, 2011, Kay Sievers wrote:
> > > > On Tue, 2011-03-22 at 21:30 +0100, Rafael J. Wysocki wrote:
> > > > >
> > > > > Perhaps there's a more straightforward way to make some files show up in
> > > > > sysfs on a specific path than defininig an otherwise useless bus type and
> > > > > device object?
> > > >
> > > > That's absolutely not the point. Please don't get yourself into that
> > > > thinking. If people want to "export stuff to userspace", they must not
> > > > invent new things. We need to get rid of the silly special cases.
> > >
> > > Why exactly? Do they actually hurt anyone and if so then how?
> >
> > Sure, "devices" are devices, and devices have well-defines set of
> > properties, not some magic directory, people can mess around with the
> > way they like.
> >
> This is starting to get silly. In the case of interrupt controllers as
> with clocksources the bus abstraction is completely meaningless. I could
> register a type of "system" bus, but how would that be any different from
> what we have from sysdevs already today? All it does is force people to
> duplicate crap all over the place instead and pretend like extra busses
> exist in order to fit some arbitrary definition of how the driver model
> should work.
>
> You talk about inventing special interfaces to bypass the device model,
> but that's not the case here. Rolling my own interface with kobjects and
> attribute groups as with /sys/power or making an arbitrary bus type for a
> single class of system devices seems infinitely more hackish than the
> current sysdev model.
>
> The comment at the top of sys.c says:
>
> * sys.c - pseudo-bus for system 'devices' (cpus, PICs, timers, etc)
Which is what we need to get rid of. It does not make any sense on the
global picture to have anything like that exported to userspace.
> Which is precisely where I would expect interrupt controllers and timers
> and CPUs to go. I'm not going to make an IRQ bus or a timer bus and
> arbitrarily map some things there and some things somewhere else in the
> name of some abstraction insanity. These interrupt controllers all have
> consistent attributes that make the sysdev class model work well, but
> there are also many other types of interrupt controllers on the same CPUs
> that use a different abstraction.
>
> Beyond that, we also have sysdev class utilization for DMA controllers,
> per-CPU store queues, etc, etc. all of which would need to be converted
> to something else (see for example arch/sh/kernel/cpu/sh4/sq.c -- which
> in turn was modelled after the cpufreq code, which also would need to
> change to something else). It's not entirely obvious how to convert these
> things, or why one should even bother. I can live with the struct device
> overhead even if I find it to be a meaningless abstraction in this case,
> but what sort of bus/class model to shoe horn these things in to is
> rather beyond me.
>
> Indeed it seems to me that these are precisely the sorts of things that
> sysdevs are intended for, which is why I elected to use them from the
> onset. Simply saying "don't use sysdevs" and "pretend like you have some
> sort of a magical pseudo-bus that's not a system bus" doesn't quite do it
> for me.
Nope, a device has a "name", a "subsystem" and a "devpath", has
well-defined core-maintained properties at the device directory. It is
not some random custom directory which people can put where they like
it. Userspace has expectations about devices which need to be met, and
that can only happen if these devices are "struct device".
All real devices sort into a hierarchy, possibly in different parent
locations, and have a single point of classification which is the
devices/ directory and contains symlinks. Only that way we can cope with
it in userspace.
People should really stop messing around in /sys for optimization
purposes. We have a common device model, and need to use it. Sysdevs do
not fit into that model.
I can't tell how that fits into your use case, but please use something
else than sysfs, if you need device information exported, but you can't
use "struct device".
Thanks,
Kay
On Tuesday, March 22, 2011, Kay Sievers wrote:
> On Tue, 2011-03-22 at 22:00 +0100, Rafael J. Wysocki wrote:
> > On Tuesday, March 22, 2011, Kay Sievers wrote:
> > > On Tue, 2011-03-22 at 21:30 +0100, Rafael J. Wysocki wrote:
> > > >
> > > > Perhaps there's a more straightforward way to make some files show up in
> > > > sysfs on a specific path than defininig an otherwise useless bus type and
> > > > device object?
> > >
> > > That's absolutely not the point. Please don't get yourself into that
> > > thinking. If people want to "export stuff to userspace", they must not
> > > invent new things. We need to get rid of the silly special cases.
> >
> > Why exactly? Do they actually hurt anyone and if so then how?
>
> Sure, "devices" are devices, and devices have well-defines set of
> properties, not some magic directory, people can mess around with the
> way they like.
So it looks like the the problem is that the exported attributes happen to
be under /sys/devices/. Would it still be a problem if they were somewhere
else?
> > > Userspace is not meant to learn subsystem specific rules for every new
> > > thing.
> >
> > That depends a good deal of who's writing the user space in question. If
> > that's the same person who's working on the particular part of the kernel,
> > I don't see a big problem.
>
> Not for "devices". There are rules for devices, which are defined by the
> driver core, and the sysdev stuff needs to go, because it does not fit
> into that model.
OK, I understand that.
Now, there's stuff that doesn't really match the "device" model. Where is
the right place to export that? Perhaps we should add something like
/sys/platform/ (in analogy with /sys/firmware/)?
> > > There is _one_ way to export device attributes, and that is
> > > "struct device" today.
> > >
> > > If that's to expensive for anybody, just don't use sysfs. It's the rule
> > > we have today. :)
> >
> > Oh, good to know. It's changed a bit since I last heard. Never mind.
>
> Oh, don't get me wrong, this is all is about "devices" not any other
> controls.
>
> > Still, I won't let you change the things in /sys/power to struct devices,
> > sorry about that. ;-)
>
> Fine as long as they are power specific things, and not "devices". You
> don't have sysdevs there, right? :)
No, I don't.
> > And I wonder how are you going to deal with clocksource exporting things
> > via the sysdev interface right now. I'd simply create two directories and
> > put the two files into them and be done with that, but I guess that
> > wouldn't fit into the model somehow, right?
>
> Nope, register a bus_type, and use struct device for all of them, Parent
> them to /sys/devices/system/ if they should keep their location and
> layout.
Well, I'll be watching what happens to the patch trying to do that, but I'm
not going to bet anything on its success. ;-)
> > > > > That way userspace can properly enumerate them in a flat list
> > > > > in /sys/bus/<bus_type name>/devices/*, and gets proper events on module
> > > > > load and during system coldplug, and can hook into the usual hotplug
> > > > > pathes to set/get these values instead of crawling magicly defined and
> > > > > decoupled locations in /sys which can not express proper hierarchy,
> > > > > classicication, or anything else that all other devices can just do.
> > > >
> > > > There's no hotplug involved or anything remotely like that AFAICS.
> > > > There are simply static files as I said above, they are created
> > > > early during system initialization and simply stay there.
> > >
> > > That's not the point. It's about a single way to retrieve information
> > > about devices, extendability, and coldplug during bootup, where existing
> > > devices need to be handled only after userspace is up.
> >
> > I'd say the case at hand has nothing to do with that.
>
> It has. As for CPUs. We can not do proper CPU-dependent module
> autoloading, because the events happen before userspace runs, and
> clodplug can not see the broken sysdevs, because they have no events to
> re-trigger, like all others have.
Well, as I said, would it be OK if the things in question happened to be
located somewhere outside of /sys/devices/ ?
> > > That is just a case of "hotplug" that has the same codepath for userspace,
> > > even when the devices can never really come and go.
> >
> > My impression is that when you say "user space", you actually mean some
> > _specific_ user space, don't you?
>
> On usual boxes it's udev/libudev and all the stuff around it. But
> andreoid has the same stuff in their own way of doing it. So it's not
> about an implementation in userspace, it's about a sane event and
> classification interface for kernel-exported devices. Again tis is not
> about any other stuff in /sys, only the "devices", and we want to have
> only a single type, and a single way to handle it in userspace.
OK
> > > > > There is really no reason for any device being a magic and conceptually
> > > > > broken sysdev today - just to be different from any other device the
> > > > > kernel exports to userspace.
> > > >
> > > > It's not a "device being a sysdev", it's sysdevs being used for creating
> > > > a user space interface, which isn't broken by itself.
> > >
> > > Yeah , absolutely. But if any device wants to export anything is _must_
> > > be a "struct device" today. If that does not fit, it must not use sysfs
> > > at all.
> >
> > Well, it's not a "device wants to export something". It's platform code
> > wanting to export some information related to the platform and I really
> > don't see a reason why it should create bus types and device objects
> > _specifically_ for that. It's just too wasteful, both in terms of memory
> > and time needed for handling that in the device core.
>
> Because they are devices, and there is a lot to win, if the kernel
> exports all "devices" in the same way. This is not about saving an inode
> in /sys, it's the ability to do runtime device configuration with common
> tools.
If I understand you correctly, the goal is to have everything under
/sys/devices/ been represented by struct device objects and there are good
reasons to do that.
In that case we either have to move the things exported via sysdevs somewhere
else (presumably having to create that "somewhere" before), or we have to
introduce struct device objects specifically for exporting them. I don't
really think the latter approach will be very popular, so quite likely we'll
need to have a plan for moving those things to different locations.
Thanks,
Rafael
On Tuesday, March 22, 2011, Joerg Roedel wrote:
> On Mon, Mar 21, 2011 at 07:36:17PM -0400, Rafael J. Wysocki wrote:
> > drivers/pci/intel-iommu.c | 38 +++++++++-----------------------------
> > 1 file changed, 9 insertions(+), 29 deletions(-)
>
> Looks good.
May I take that as an ACK?
> I prepare a patch to convert AMD IOMMU to syscore_ops too.
Already done. :-)
It's a part of patch [1/6].
Thanks,
Rafael
On Tue, 2011-03-22 at 23:05 +0100, Rafael J. Wysocki wrote:
> On Tuesday, March 22, 2011, Kay Sievers wrote:
> > On Tue, 2011-03-22 at 22:00 +0100, Rafael J. Wysocki wrote:
> > > On Tuesday, March 22, 2011, Kay Sievers wrote:
> > > > On Tue, 2011-03-22 at 21:30 +0100, Rafael J. Wysocki wrote:
> > > > >
> > > > > Perhaps there's a more straightforward way to make some files show up in
> > > > > sysfs on a specific path than defininig an otherwise useless bus type and
> > > > > device object?
> > > >
> > > > That's absolutely not the point. Please don't get yourself into that
> > > > thinking. If people want to "export stuff to userspace", they must not
> > > > invent new things. We need to get rid of the silly special cases.
> > >
> > > Why exactly? Do they actually hurt anyone and if so then how?
> >
> > Sure, "devices" are devices, and devices have well-defines set of
> > properties, not some magic directory, people can mess around with the
> > way they like.
>
> So it looks like the the problem is that the exported attributes happen to
> be under /sys/devices/. Would it still be a problem if they were somewhere
> else?
We are not going to invent another location for any devices. They need
to stay in /devices if they are devices. And all devices need to be
"struct device".
> > > > Userspace is not meant to learn subsystem specific rules for every new
> > > > thing.
> > >
> > > That depends a good deal of who's writing the user space in question. If
> > > that's the same person who's working on the particular part of the kernel,
> > > I don't see a big problem.
> >
> > Not for "devices". There are rules for devices, which are defined by the
> > driver core, and the sysdev stuff needs to go, because it does not fit
> > into that model.
>
> OK, I understand that.
>
> Now, there's stuff that doesn't really match the "device" model. Where is
> the right place to export that? Perhaps we should add something like
> /sys/platform/ (in analogy with /sys/firmware/)?
No, add a subsystem (bus_type) for any of them, and register them. There
is no such thing as "devices which do not fit the device model", they
are all fine there. Please stop optimizing single bytes and creating a
mess in /sys. Every device is a "struct device".
Think of "struct bus_type" as "struct subsystem", we will rename that
when we are ready. It is just a group of devices which are of the same
type, it has nothing to do with a bus in the sense of hardware.
We need unified exports of _all devices to userspace, not custom layouts
in /sys.
There's is a pretty much outdated Documentation/sysfs-rules.txt, wich
covers part of the history and the plans.
> > > > There is _one_ way to export device attributes, and that is
> > > > "struct device" today.
> > > >
> > > > If that's to expensive for anybody, just don't use sysfs. It's the rule
> > > > we have today. :)
> > >
> > > Oh, good to know. It's changed a bit since I last heard. Never mind.
> >
> > Oh, don't get me wrong, this is all is about "devices" not any other
> > controls.
> >
> > > Still, I won't let you change the things in /sys/power to struct devices,
> > > sorry about that. ;-)
> >
> > Fine as long as they are power specific things, and not "devices". You
> > don't have sysdevs there, right? :)
>
> No, I don't.
Then all is fine. All other stuff is more like /proc, and can never be
really unified. All we care about is devices, which have common methods
for userspace to trigger and consume, and need to be unified. Power
specific control files seems all fine in its kobject use.
> > > And I wonder how are you going to deal with clocksource exporting things
> > > via the sysdev interface right now. I'd simply create two directories and
> > > put the two files into them and be done with that, but I guess that
> > > wouldn't fit into the model somehow, right?
> >
> > Nope, register a bus_type, and use struct device for all of them, Parent
> > them to /sys/devices/system/ if they should keep their location and
> > layout.
>
> Well, I'll be watching what happens to the patch trying to do that, but I'm
> not going to bet anything on its success. ;-)
It should be pretty straight-forward. We will need to do that for CPUs I
guess, because the interface is kinda commonly used.
> > > > > > That way userspace can properly enumerate them in a flat list
> > > > > > in /sys/bus/<bus_type name>/devices/*, and gets proper events on module
> > > > > > load and during system coldplug, and can hook into the usual hotplug
> > > > > > pathes to set/get these values instead of crawling magicly defined and
> > > > > > decoupled locations in /sys which can not express proper hierarchy,
> > > > > > classicication, or anything else that all other devices can just do.
> > > > >
> > > > > There's no hotplug involved or anything remotely like that AFAICS.
> > > > > There are simply static files as I said above, they are created
> > > > > early during system initialization and simply stay there.
> > > >
> > > > That's not the point. It's about a single way to retrieve information
> > > > about devices, extendability, and coldplug during bootup, where existing
> > > > devices need to be handled only after userspace is up.
> > >
> > > I'd say the case at hand has nothing to do with that.
> >
> > It has. As for CPUs. We can not do proper CPU-dependent module
> > autoloading, because the events happen before userspace runs, and
> > clodplug can not see the broken sysdevs, because they have no events to
> > re-trigger, like all others have.
>
> Well, as I said, would it be OK if the things in question happened to be
> located somewhere outside of /sys/devices/ ?
No, no device directory can be outside of /sys/devices.
> > > > That is just a case of "hotplug" that has the same codepath for userspace,
> > > > even when the devices can never really come and go.
> > >
> > > My impression is that when you say "user space", you actually mean some
> > > _specific_ user space, don't you?
> >
> > On usual boxes it's udev/libudev and all the stuff around it. But
> > andreoid has the same stuff in their own way of doing it. So it's not
> > about an implementation in userspace, it's about a sane event and
> > classification interface for kernel-exported devices. Again tis is not
> > about any other stuff in /sys, only the "devices", and we want to have
> > only a single type, and a single way to handle it in userspace.
>
> OK
>
> > > > > > There is really no reason for any device being a magic and conceptually
> > > > > > broken sysdev today - just to be different from any other device the
> > > > > > kernel exports to userspace.
> > > > >
> > > > > It's not a "device being a sysdev", it's sysdevs being used for creating
> > > > > a user space interface, which isn't broken by itself.
> > > >
> > > > Yeah , absolutely. But if any device wants to export anything is _must_
> > > > be a "struct device" today. If that does not fit, it must not use sysfs
> > > > at all.
> > >
> > > Well, it's not a "device wants to export something". It's platform code
> > > wanting to export some information related to the platform and I really
> > > don't see a reason why it should create bus types and device objects
> > > _specifically_ for that. It's just too wasteful, both in terms of memory
> > > and time needed for handling that in the device core.
> >
> > Because they are devices, and there is a lot to win, if the kernel
> > exports all "devices" in the same way. This is not about saving an inode
> > in /sys, it's the ability to do runtime device configuration with common
> > tools.
>
> If I understand you correctly, the goal is to have everything under
> /sys/devices/ been represented by struct device objects and there are good
> reasons to do that.
>
> In that case we either have to move the things exported via sysdevs somewhere
> else (presumably having to create that "somewhere" before), or we have to
> introduce struct device objects specifically for exporting them. I don't
> really think the latter approach will be very popular, so quite likely we'll
> need to have a plan for moving those things to different locations.
We can just keep the same location for anything that is expected to be
in /sys/devices/system. All outside visible difference is that these
devices (then struct device) also have a "subsystem" (bus_type) which
carries the flat list of devices, the devices have a "uevent" file and
with that they are coldpluggable, event re-triggerable, could have
modalias, ... and all the other 1000 things that just work today, and
what we need for many of them. They are then just handled like any other
exported device too.
Kay
On Tuesday, March 22, 2011, Kay Sievers wrote:
> On Wed, 2011-03-23 at 06:49 +0900, Paul Mundt wrote:
...
> > You talk about inventing special interfaces to bypass the device model,
> > but that's not the case here. Rolling my own interface with kobjects and
> > attribute groups as with /sys/power or making an arbitrary bus type for a
> > single class of system devices seems infinitely more hackish than the
> > current sysdev model.
> >
> > The comment at the top of sys.c says:
> >
> > * sys.c - pseudo-bus for system 'devices' (cpus, PICs, timers, etc)
> >
> > Which is precisely where I would expect interrupt controllers and timers
> > and CPUs to go. I'm not going to make an IRQ bus or a timer bus and
> > arbitrarily map some things there and some things somewhere else in the
> > name of some abstraction insanity. These interrupt controllers all have
> > consistent attributes that make the sysdev class model work well, but
> > there are also many other types of interrupt controllers on the same CPUs
> > that use a different abstraction.
> >
> > Beyond that, we also have sysdev class utilization for DMA controllers,
> > per-CPU store queues, etc, etc. all of which would need to be converted
> > to something else (see for example arch/sh/kernel/cpu/sh4/sq.c -- which
> > in turn was modelled after the cpufreq code, which also would need to
> > change to something else). It's not entirely obvious how to convert these
> > things, or why one should even bother.
The reason why is, if I understand it correctly, because user space tools
generally expect /sys/devices/ to be consistent in terms of the representation
of things and /sys/devices/system/ currently violates that expectation which
leads to all sorts of problems with device discovery, hotplug etc.
Now, whether or not to convert all the things currently exported through
sysdevs to struct device objects is not too obvious to me. I think it simply
may be better to move them into a different direcory in sysfs (presumably
creating one for this purpose).
> > I can live with the struct device overhead even if I find it to be a
> > meaningless abstraction in this case, but what sort of bus/class model to
> > shoe horn these things in to is rather beyond me.
> >
> > Indeed it seems to me that these are precisely the sorts of things that
> > sysdevs are intended for, which is why I elected to use them from the
> > onset. Simply saying "don't use sysdevs" and "pretend like you have some
> > sort of a magical pseudo-bus that's not a system bus" doesn't quite do it
> > for me.
>
> Nope, a device has a "name", a "subsystem" and a "devpath", has
> well-defined core-maintained properties at the device directory. It is
> not some random custom directory which people can put where they like
> it. Userspace has expectations about devices which need to be met, and
> that can only happen if these devices are "struct device".
>
> All real devices sort into a hierarchy, possibly in different parent
> locations, and have a single point of classification which is the
> devices/ directory and contains symlinks. Only that way we can cope with
> it in userspace.
>
> People should really stop messing around in /sys for optimization
> purposes. We have a common device model, and need to use it. Sysdevs do
> not fit into that model.
>
> I can't tell how that fits into your use case, but please use something
> else than sysfs, if you need device information exported, but you can't
> use "struct device".
I really think you shouldn't say "sysfs" when you in fact you mean
"/sys/devices/". :-)
Now, I can easily understand arguments about representing everything under
/sys/devices/ by struct device objects, no question about that. However,
I also think there should be a place for things like those mentioned in the
comment in sys.c, presumably outside of /sys/devices/.
Thanks,
Rafael
On Tue, Mar 22, 2011 at 11:00:56PM +0100, Kay Sievers wrote:
> On Wed, 2011-03-23 at 06:49 +0900, Paul Mundt wrote:
> > The comment at the top of sys.c says:
> >
> > * sys.c - pseudo-bus for system 'devices' (cpus, PICs, timers, etc)
>
> Which is what we need to get rid of. It does not make any sense on the
> global picture to have anything like that exported to userspace.
>
So far I haven't heard any rationale for why it doesn't. Exporting CPU
state to userspace certainly makes sense, and the sysdev model has worked
reasonably for CPUs, memory nodes, etc.
> People should really stop messing around in /sys for optimization
> purposes. We have a common device model, and need to use it. Sysdevs do
> not fit into that model.
>
> I can't tell how that fits into your use case, but please use something
> else than sysfs, if you need device information exported, but you can't
> use "struct device".
>
As long as CPU state is present in sysfs people will be tied to it for
per-CPU kobjects and the like, and until something concrete is proposed
for what to do about these cases there's not much chance of sysdevs going
away.
Once cpufreq, timekeeping, and NUMA node state have migrated to whatever
the driver model folks find acceptable, I'll happily follow suit.
On Tuesday, March 22, 2011, Kay Sievers wrote:
> On Tue, 2011-03-22 at 23:05 +0100, Rafael J. Wysocki wrote:
> > On Tuesday, March 22, 2011, Kay Sievers wrote:
> > > On Tue, 2011-03-22 at 22:00 +0100, Rafael J. Wysocki wrote:
> > > > On Tuesday, March 22, 2011, Kay Sievers wrote:
> > > > > On Tue, 2011-03-22 at 21:30 +0100, Rafael J. Wysocki wrote:
> > > > > >
> > > > > > Perhaps there's a more straightforward way to make some files show up in
> > > > > > sysfs on a specific path than defininig an otherwise useless bus type and
> > > > > > device object?
> > > > >
> > > > > That's absolutely not the point. Please don't get yourself into that
> > > > > thinking. If people want to "export stuff to userspace", they must not
> > > > > invent new things. We need to get rid of the silly special cases.
> > > >
> > > > Why exactly? Do they actually hurt anyone and if so then how?
> > >
> > > Sure, "devices" are devices, and devices have well-defines set of
> > > properties, not some magic directory, people can mess around with the
> > > way they like.
> >
> > So it looks like the the problem is that the exported attributes happen to
> > be under /sys/devices/. Would it still be a problem if they were somewhere
> > else?
>
> We are not going to invent another location for any devices. They need
> to stay in /devices if they are devices. And all devices need to be
> "struct device".
_They_ _are_ _not_ _devices_.
Please take clocksource as an example. It needs to export two attributes,
available_clocksource and current_clocksource, which are _useful_ user space
interfaces. Why the heck are you trying to convince me it's a good idea to
create a special bus type and struct device _specifically_ for exporting them?!
Moreover, is there anything device-alike in those two attributes? I don't
really think so.
So please stop arguing this way, because it simply isn't going to fly and
people will always say a big fat "no" to such ideas, for a good reason.
> > > > > Userspace is not meant to learn subsystem specific rules for every new
> > > > > thing.
> > > >
> > > > That depends a good deal of who's writing the user space in question. If
> > > > that's the same person who's working on the particular part of the kernel,
> > > > I don't see a big problem.
> > >
> > > Not for "devices". There are rules for devices, which are defined by the
> > > driver core, and the sysdev stuff needs to go, because it does not fit
> > > into that model.
> >
> > OK, I understand that.
> >
> > Now, there's stuff that doesn't really match the "device" model. Where is
> > the right place to export that? Perhaps we should add something like
> > /sys/platform/ (in analogy with /sys/firmware/)?
>
> No, add a subsystem (bus_type) for any of them, and register them. There
> is no such thing as "devices which do not fit the device model", they
> are all fine there. Please stop optimizing single bytes and creating a
> mess in /sys. Every device is a "struct device".
Again. Those things are _not_ devices. Am I not clear enough?
> Think of "struct bus_type" as "struct subsystem", we will rename that
> when we are ready. It is just a group of devices which are of the same
> type, it has nothing to do with a bus in the sense of hardware.
>
> We need unified exports of _all devices to userspace, not custom layouts
> in /sys.
>
> There's is a pretty much outdated Documentation/sysfs-rules.txt, wich
> covers part of the history and the plans.
You seem to be thinking that anything exported through sysfs needs to be
device, which I don't think is event approximately correct (what about
/sys/firmware/ or /sys/kernel/ or /sys/fs/ , for a few examples?).
Think this way: it is useful (and IMHO correct) to export some things to
user space that without necessarily regarding them as "devices", physical
or not. Some of them _happen_ to be exported through sysdevs, but that
doesn't really mean the _are_ devices. They are simply _software_ interfaces
to things that have no device representation and don't _need_ one.
> > > > > There is _one_ way to export device attributes, and that is
> > > > > "struct device" today.
> > > > >
> > > > > If that's to expensive for anybody, just don't use sysfs. It's the rule
> > > > > we have today. :)
> > > >
> > > > Oh, good to know. It's changed a bit since I last heard. Never mind.
> > >
> > > Oh, don't get me wrong, this is all is about "devices" not any other
> > > controls.
> > >
> > > > Still, I won't let you change the things in /sys/power to struct devices,
> > > > sorry about that. ;-)
> > >
> > > Fine as long as they are power specific things, and not "devices". You
> > > don't have sysdevs there, right? :)
> >
> > No, I don't.
>
> Then all is fine. All other stuff is more like /proc, and can never be
> really unified.
YES! And _that_'s precisely what I'm (and Paul is) talking about.
> All we care about is devices, which have common methods
> for userspace to trigger and consume, and need to be unified. Power
> specific control files seems all fine in its kobject use.
I understand that, really.
> > > > And I wonder how are you going to deal with clocksource exporting things
> > > > via the sysdev interface right now. I'd simply create two directories and
> > > > put the two files into them and be done with that, but I guess that
> > > > wouldn't fit into the model somehow, right?
> > >
> > > Nope, register a bus_type, and use struct device for all of them, Parent
> > > them to /sys/devices/system/ if they should keep their location and
> > > layout.
> >
> > Well, I'll be watching what happens to the patch trying to do that, but I'm
> > not going to bet anything on its success. ;-)
>
> It should be pretty straight-forward. We will need to do that for CPUs I
> guess, because the interface is kinda commonly used.
No. CPUs are _very_ special.
> > > > > > > That way userspace can properly enumerate them in a flat list
> > > > > > > in /sys/bus/<bus_type name>/devices/*, and gets proper events on module
> > > > > > > load and during system coldplug, and can hook into the usual hotplug
> > > > > > > pathes to set/get these values instead of crawling magicly defined and
> > > > > > > decoupled locations in /sys which can not express proper hierarchy,
> > > > > > > classicication, or anything else that all other devices can just do.
> > > > > >
> > > > > > There's no hotplug involved or anything remotely like that AFAICS.
> > > > > > There are simply static files as I said above, they are created
> > > > > > early during system initialization and simply stay there.
> > > > >
> > > > > That's not the point. It's about a single way to retrieve information
> > > > > about devices, extendability, and coldplug during bootup, where existing
> > > > > devices need to be handled only after userspace is up.
> > > >
> > > > I'd say the case at hand has nothing to do with that.
> > >
> > > It has. As for CPUs. We can not do proper CPU-dependent module
> > > autoloading, because the events happen before userspace runs, and
> > > clodplug can not see the broken sysdevs, because they have no events to
> > > re-trigger, like all others have.
> >
> > Well, as I said, would it be OK if the things in question happened to be
> > located somewhere outside of /sys/devices/ ?
>
> No, no device directory can be outside of /sys/devices.
Sorry, I'm repeating that for the last time. I'm not talking about devices.
I'm talking about _totally_ _random_ _stuff_ which is "like /proc, and can
never be really unified" (your own words) which _happens_ to be exported
through the sysdev interface, because that happend to be _easy_ at one point.
Can we agree on that at least?
Thanks,
Rafael
On Tue, 2011-03-22 at 23:23 +0100, Rafael J. Wysocki wrote:
> On Tuesday, March 22, 2011, Kay Sievers wrote:
> > On Wed, 2011-03-23 at 06:49 +0900, Paul Mundt wrote:
> ...
> > > You talk about inventing special interfaces to bypass the device model,
> > > but that's not the case here. Rolling my own interface with kobjects and
> > > attribute groups as with /sys/power or making an arbitrary bus type for a
> > > single class of system devices seems infinitely more hackish than the
> > > current sysdev model.
> > >
> > > The comment at the top of sys.c says:
> > >
> > > * sys.c - pseudo-bus for system 'devices' (cpus, PICs, timers, etc)
> > >
> > > Which is precisely where I would expect interrupt controllers and timers
> > > and CPUs to go. I'm not going to make an IRQ bus or a timer bus and
> > > arbitrarily map some things there and some things somewhere else in the
> > > name of some abstraction insanity. These interrupt controllers all have
> > > consistent attributes that make the sysdev class model work well, but
> > > there are also many other types of interrupt controllers on the same CPUs
> > > that use a different abstraction.
> > >
> > > Beyond that, we also have sysdev class utilization for DMA controllers,
> > > per-CPU store queues, etc, etc. all of which would need to be converted
> > > to something else (see for example arch/sh/kernel/cpu/sh4/sq.c -- which
> > > in turn was modelled after the cpufreq code, which also would need to
> > > change to something else). It's not entirely obvious how to convert these
> > > things, or why one should even bother.
>
> The reason why is, if I understand it correctly, because user space tools
> generally expect /sys/devices/ to be consistent in terms of the representation
> of things and /sys/devices/system/ currently violates that expectation which
> leads to all sorts of problems with device discovery, hotplug etc.
>
> Now, whether or not to convert all the things currently exported through
> sysdevs to struct device objects is not too obvious to me. I think it simply
> may be better to move them into a different direcory in sysfs (presumably
> creating one for this purpose).
>
> > > I can live with the struct device overhead even if I find it to be a
> > > meaningless abstraction in this case, but what sort of bus/class model to
> > > shoe horn these things in to is rather beyond me.
> > >
> > > Indeed it seems to me that these are precisely the sorts of things that
> > > sysdevs are intended for, which is why I elected to use them from the
> > > onset. Simply saying "don't use sysdevs" and "pretend like you have some
> > > sort of a magical pseudo-bus that's not a system bus" doesn't quite do it
> > > for me.
> >
> > Nope, a device has a "name", a "subsystem" and a "devpath", has
> > well-defined core-maintained properties at the device directory. It is
> > not some random custom directory which people can put where they like
> > it. Userspace has expectations about devices which need to be met, and
> > that can only happen if these devices are "struct device".
> >
> > All real devices sort into a hierarchy, possibly in different parent
> > locations, and have a single point of classification which is the
> > devices/ directory and contains symlinks. Only that way we can cope with
> > it in userspace.
> >
> > People should really stop messing around in /sys for optimization
> > purposes. We have a common device model, and need to use it. Sysdevs do
> > not fit into that model.
> >
> > I can't tell how that fits into your use case, but please use something
> > else than sysfs, if you need device information exported, but you can't
> > use "struct device".
>
> I really think you shouldn't say "sysfs" when you in fact you mean
> "/sys/devices/". :-)
>
> Now, I can easily understand arguments about representing everything under
> /sys/devices/ by struct device objects, no question about that. However,
> I also think there should be a place for things like those mentioned in the
> comment in sys.c, presumably outside of /sys/devices/.
No, please. We have all we need. Let's do one example, which you might
apply to any other thing, because you never know what's the next big
thing in hardware. We need to be a future-proof-as-possible, and that's
not some second-class out-of-scope sysfs directory.
Lets' take CPUs:
- they send events when registered
- they want to export device specific properties
- userspace wants to take actions when such devices are available
That all fits properly into the driver model in theory. Unless you do
coldplug and bootup a box.
These devices are already there before userspace even starts, hence we
find all these devices and "trigger" an fake uevent for all of them at
bootup. That will match execute all the rules specified for that device,
just as it would be hotplugges in that moment, hence we call it
coldplug, which works for all devices with the hotplug code, even when
they are never hot-pluggable.
What we do for coldplug is that we iterate over all flat lists of
subsystems and find the devices lists and trigger the event by poking in
the "uevent" sysfs file. Now all the sysdevs do not have a subsystem to
find, and do not have a standard "uevent" file.
Back to the CPUs, we have all the nice device directories which could
have all the CPU features in properties we need to make autoloading of
cpufreq, governer, kvm possible (patch exists from Andi Kleen already)
But these dumb CPU sysfs device directories are completely invisible for
the *usual* logic, and should just join the model and all will just work
out-of-the-box.
When we started to clean up /sys (again only talking about devices, not
other stuff) we had:
/block/*
/class/<subsys>/*
/bus/<subsys>/devices/*
/devices/system/<subsys>/*
which are 4 different exports of exactly the same thing, a "device".
"Block" we converted to "class" already, "class" will be converted to
"bus", and "bus" will be renamed to "subsystem". All the current names
will be kept as compat symlinks, just as we did for "block". After that,
_all_ devices have a "subsystem" and a subsystem global directory where
people can add custom stuff shared by all devices-of-the-same-type. Ev
You can also argument from the other side, if a kernel device export is
not worth the few bytes of /sys/devices/ and a "subsystem" (struct
bus_type) it should not be in /sys at all, especially not hidden
somehwere outside of /sys/devices when it is something remotely close to
a device.
Kay
On Tue, 2011-03-22 at 23:42 +0100, Rafael J. Wysocki wrote:
> On Tuesday, March 22, 2011, Kay Sievers wrote:
> > On Tue, 2011-03-22 at 23:05 +0100, Rafael J. Wysocki wrote:
> > > On Tuesday, March 22, 2011, Kay Sievers wrote:
> > > > On Tue, 2011-03-22 at 22:00 +0100, Rafael J. Wysocki wrote:
> > > > > On Tuesday, March 22, 2011, Kay Sievers wrote:
> > > > > > On Tue, 2011-03-22 at 21:30 +0100, Rafael J. Wysocki wrote:
> > > > > > >
> > > > > > > Perhaps there's a more straightforward way to make some files show up in
> > > > > > > sysfs on a specific path than defininig an otherwise useless bus type and
> > > > > > > device object?
> > > > > >
> > > > > > That's absolutely not the point. Please don't get yourself into that
> > > > > > thinking. If people want to "export stuff to userspace", they must not
> > > > > > invent new things. We need to get rid of the silly special cases.
> > > > >
> > > > > Why exactly? Do they actually hurt anyone and if so then how?
> > > >
> > > > Sure, "devices" are devices, and devices have well-defines set of
> > > > properties, not some magic directory, people can mess around with the
> > > > way they like.
> > >
> > > So it looks like the the problem is that the exported attributes happen to
> > > be under /sys/devices/. Would it still be a problem if they were somewhere
> > > else?
> >
> > We are not going to invent another location for any devices. They need
> > to stay in /devices if they are devices. And all devices need to be
> > "struct device".
>
> _They_ _are_ _not_ _devices_.
Ah, I see. Then they should not ever have been created as a sysdev in
the first place. I thought you did only talk about stuff that needs
pm_ops. If they don't do anything like a device, they need to go
somewhere else, yes.
> Please take clocksource as an example. It needs to export two attributes,
> available_clocksource and current_clocksource, which are _useful_ user space
> interfaces. Why the heck are you trying to convince me it's a good idea to
> create a special bus type and struct device _specifically_ for exporting them?!
>
> Moreover, is there anything device-alike in those two attributes? I don't
> really think so.
>
> So please stop arguing this way, because it simply isn't going to fly and
> people will always say a big fat "no" to such ideas, for a good reason.
I never expected anything like that to use a sysdev. I'm not tryin gto
convince you about anything, I'm just surprised what people do, well I'm
not after all the years. :)
> > > > > > Userspace is not meant to learn subsystem specific rules for every new
> > > > > > thing.
> > > > >
> > > > > That depends a good deal of who's writing the user space in question. If
> > > > > that's the same person who's working on the particular part of the kernel,
> > > > > I don't see a big problem.
> > > >
> > > > Not for "devices". There are rules for devices, which are defined by the
> > > > driver core, and the sysdev stuff needs to go, because it does not fit
> > > > into that model.
> > >
> > > OK, I understand that.
> > >
> > > Now, there's stuff that doesn't really match the "device" model. Where is
> > > the right place to export that? Perhaps we should add something like
> > > /sys/platform/ (in analogy with /sys/firmware/)?
> >
> > No, add a subsystem (bus_type) for any of them, and register them. There
> > is no such thing as "devices which do not fit the device model", they
> > are all fine there. Please stop optimizing single bytes and creating a
> > mess in /sys. Every device is a "struct device".
>
> Again. Those things are _not_ devices. Am I not clear enough?
No that wasn't clear. They need to leave the driver core then, if they
are not devices. :)
> > Think of "struct bus_type" as "struct subsystem", we will rename that
> > when we are ready. It is just a group of devices which are of the same
> > type, it has nothing to do with a bus in the sense of hardware.
> >
> > We need unified exports of _all devices to userspace, not custom layouts
> > in /sys.
> >
> > There's is a pretty much outdated Documentation/sysfs-rules.txt, wich
> > covers part of the history and the plans.
>
> You seem to be thinking that anything exported through sysfs needs to be
> device, which I don't think is event approximately correct (what about
> /sys/firmware/ or /sys/kernel/ or /sys/fs/ , for a few examples?).
All fine. Again, I'm only talking about "devices", which is class/,
block/, bus/, dev/, devices/ in /sys.
> Think this way: it is useful (and IMHO correct) to export some things to
> user space that without necessarily regarding them as "devices", physical
> or not. Some of them _happen_ to be exported through sysdevs, but that
> doesn't really mean the _are_ devices. They are simply _software_ interfaces
> to things that have no device representation and don't _need_ one.
Fine, they need to leave the driver core stuff alone, and not pretend to
be a device.
> > > > > > There is _one_ way to export device attributes, and that is
> > > > > > "struct device" today.
> > > > > >
> > > > > > If that's to expensive for anybody, just don't use sysfs. It's the rule
> > > > > > we have today. :)
> > > > >
> > > > > Oh, good to know. It's changed a bit since I last heard. Never mind.
> > > >
> > > > Oh, don't get me wrong, this is all is about "devices" not any other
> > > > controls.
> > > >
> > > > > Still, I won't let you change the things in /sys/power to struct devices,
> > > > > sorry about that. ;-)
> > > >
> > > > Fine as long as they are power specific things, and not "devices". You
> > > > don't have sysdevs there, right? :)
> > >
> > > No, I don't.
> >
> > Then all is fine. All other stuff is more like /proc, and can never be
> > really unified.
>
> YES! And _that_'s precisely what I'm (and Paul is) talking about.
Good.
> > All we care about is devices, which have common methods
> > for userspace to trigger and consume, and need to be unified. Power
> > specific control files seems all fine in its kobject use.
>
> I understand that, really.
>
> > > > > And I wonder how are you going to deal with clocksource exporting things
> > > > > via the sysdev interface right now. I'd simply create two directories and
> > > > > put the two files into them and be done with that, but I guess that
> > > > > wouldn't fit into the model somehow, right?
> > > >
> > > > Nope, register a bus_type, and use struct device for all of them, Parent
> > > > them to /sys/devices/system/ if they should keep their location and
> > > > layout.
> > >
> > > Well, I'll be watching what happens to the patch trying to do that, but I'm
> > > not going to bet anything on its success. ;-)
> >
> > It should be pretty straight-forward. We will need to do that for CPUs I
> > guess, because the interface is kinda commonly used.
>
> No. CPUs are _very_ special.
Not in the view of the driver core or the associated user space
interfaces. They are just like any other device.
> > > > > > > > That way userspace can properly enumerate them in a flat list
> > > > > > > > in /sys/bus/<bus_type name>/devices/*, and gets proper events on module
> > > > > > > > load and during system coldplug, and can hook into the usual hotplug
> > > > > > > > pathes to set/get these values instead of crawling magicly defined and
> > > > > > > > decoupled locations in /sys which can not express proper hierarchy,
> > > > > > > > classicication, or anything else that all other devices can just do.
> > > > > > >
> > > > > > > There's no hotplug involved or anything remotely like that AFAICS.
> > > > > > > There are simply static files as I said above, they are created
> > > > > > > early during system initialization and simply stay there.
> > > > > >
> > > > > > That's not the point. It's about a single way to retrieve information
> > > > > > about devices, extendability, and coldplug during bootup, where existing
> > > > > > devices need to be handled only after userspace is up.
> > > > >
> > > > > I'd say the case at hand has nothing to do with that.
> > > >
> > > > It has. As for CPUs. We can not do proper CPU-dependent module
> > > > autoloading, because the events happen before userspace runs, and
> > > > clodplug can not see the broken sysdevs, because they have no events to
> > > > re-trigger, like all others have.
> > >
> > > Well, as I said, would it be OK if the things in question happened to be
> > > located somewhere outside of /sys/devices/ ?
> >
> > No, no device directory can be outside of /sys/devices.
>
> Sorry, I'm repeating that for the last time. I'm not talking about devices.
> I'm talking about _totally_ _random_ _stuff_ which is "like /proc, and can
> never be really unified" (your own words) which _happens_ to be exported
> through the sysdev interface, because that happend to be _easy_ at one point.
> Can we agree on that at least?
Sure, they should leave the driver core alone, and should never been a
sysdev or any other device in the first place. We should not create
anything for them.
Kay
On Tue, 2011-03-22 at 23:05 +0100, Rafael J. Wysocki wrote:
> On Tuesday, March 22, 2011, Kay Sievers wrote:
> >> Because they are devices, and there is a lot to win, if the kernel
> > exports all "devices" in the same way. This is not about saving an inode
> > in /sys, it's the ability to do runtime device configuration with common
> > tools.
>
> If I understand you correctly, the goal is to have everything under
> /sys/devices/ been represented by struct device objects and there are good
> reasons to do that.
>
> In that case we either have to move the things exported via sysdevs somewhere
> else (presumably having to create that "somewhere" before), or we have to
> introduce struct device objects specifically for exporting them. I don't
> really think the latter approach will be very popular, so quite likely we'll
> need to have a plan for moving those things to different locations.
Ok, now that we sorted the misunderstanding out, let's start this topic
from here again. (I really read your reply as: "let's move the CPUS
somewhere else")
What's the list of stuff you discovered using sysdevs, which is not a
device at all, and has never more than a single instance of stuff of the
same type? The clocksource?
And let's find some place for them to put their properties instead of
messing around with device from the driver core.
Kay
On Tuesday, March 22, 2011, Kay Sievers wrote:
> On Tue, 2011-03-22 at 23:23 +0100, Rafael J. Wysocki wrote:
> > On Tuesday, March 22, 2011, Kay Sievers wrote:
...
> >
> > Now, I can easily understand arguments about representing everything under
> > /sys/devices/ by struct device objects, no question about that. However,
> > I also think there should be a place for things like those mentioned in the
> > comment in sys.c, presumably outside of /sys/devices/.
>
> No, please. We have all we need. Let's do one example, which you might
> apply to any other thing, because you never know what's the next big
> thing in hardware. We need to be a future-proof-as-possible, and that's
> not some second-class out-of-scope sysfs directory.
>
> Lets' take CPUs:
> - they send events when registered
> - they want to export device specific properties
> - userspace wants to take actions when such devices are available
>
> That all fits properly into the driver model in theory. Unless you do
> coldplug and bootup a box.
>
> These devices are already there before userspace even starts, hence we
> find all these devices and "trigger" an fake uevent for all of them at
> bootup. That will match execute all the rules specified for that device,
> just as it would be hotplugges in that moment, hence we call it
> coldplug, which works for all devices with the hotplug code, even when
> they are never hot-pluggable.
>
> What we do for coldplug is that we iterate over all flat lists of
> subsystems and find the devices lists and trigger the event by poking in
> the "uevent" sysfs file. Now all the sysdevs do not have a subsystem to
> find, and do not have a standard "uevent" file.
>
> Back to the CPUs, we have all the nice device directories which could
> have all the CPU features in properties we need to make autoloading of
> cpufreq, governer, kvm possible (patch exists from Andi Kleen already)
>
> But these dumb CPU sysfs device directories are completely invisible for
> the *usual* logic, and should just join the model and all will just work
> out-of-the-box.
That all is cool, but I'm not sure how it is related to things like
available_clocksource and current_clocksource (which happen to be located
under /sys/devices/system/clocksource/clocksource0/ being simply a path
in sysfs).
> When we started to clean up /sys (again only talking about devices, not
> other stuff) we had:
> /block/*
> /class/<subsys>/*
> /bus/<subsys>/devices/*
> /devices/system/<subsys>/*
> which are 4 different exports of exactly the same thing, a "device".
> "Block" we converted to "class" already, "class" will be converted to
> "bus", and "bus" will be renamed to "subsystem". All the current names
> will be kept as compat symlinks, just as we did for "block". After that,
> _all_ devices have a "subsystem" and a subsystem global directory where
> people can add custom stuff shared by all devices-of-the-same-type. Ev
OK, sounds good.
> You can also argument from the other side, if a kernel device export is
> not worth the few bytes of /sys/devices/ and a "subsystem" (struct
> bus_type) it should not be in /sys at all, especially not hidden
> somehwere outside of /sys/devices when it is something remotely close to
> a device.
Well, Greg apparently thinks that available_clocksource and current_clocksource
could be located under /sys/bus/clock/. Perhaps other attributes now exported
through sysdevs could be moved to places like this?
Rafael
On Wed, 2011-03-23 at 00:32 +0100, Rafael J. Wysocki wrote:
> On Tuesday, March 22, 2011, Kay Sievers wrote:
> > On Tue, 2011-03-22 at 23:23 +0100, Rafael J. Wysocki wrote:
> > > On Tuesday, March 22, 2011, Kay Sievers wrote:
> ...
> > >
> > > Now, I can easily understand arguments about representing everything under
> > > /sys/devices/ by struct device objects, no question about that. However,
> > > I also think there should be a place for things like those mentioned in the
> > > comment in sys.c, presumably outside of /sys/devices/.
> >
> > No, please. We have all we need. Let's do one example, which you might
> > apply to any other thing, because you never know what's the next big
> > thing in hardware. We need to be a future-proof-as-possible, and that's
> > not some second-class out-of-scope sysfs directory.
> >
> > Lets' take CPUs:
> > - they send events when registered
> > - they want to export device specific properties
> > - userspace wants to take actions when such devices are available
> >
> > That all fits properly into the driver model in theory. Unless you do
> > coldplug and bootup a box.
> >
> > These devices are already there before userspace even starts, hence we
> > find all these devices and "trigger" an fake uevent for all of them at
> > bootup. That will match execute all the rules specified for that device,
> > just as it would be hotplugges in that moment, hence we call it
> > coldplug, which works for all devices with the hotplug code, even when
> > they are never hot-pluggable.
> >
> > What we do for coldplug is that we iterate over all flat lists of
> > subsystems and find the devices lists and trigger the event by poking in
> > the "uevent" sysfs file. Now all the sysdevs do not have a subsystem to
> > find, and do not have a standard "uevent" file.
> >
> > Back to the CPUs, we have all the nice device directories which could
> > have all the CPU features in properties we need to make autoloading of
> > cpufreq, governer, kvm possible (patch exists from Andi Kleen already)
> >
> > But these dumb CPU sysfs device directories are completely invisible for
> > the *usual* logic, and should just join the model and all will just work
> > out-of-the-box.
>
> That all is cool, but I'm not sure how it is related to things like
> available_clocksource and current_clocksource (which happen to be located
> under /sys/devices/system/clocksource/clocksource0/ being simply a path
> in sysfs).
Sure, it isn't related to clocksource at all. I didn't really get the
idea that there are users that just fake core devices only to get a
place to put a couple of attributes. I was still in the context of the
$SUBJECT of this thread.
This stuff should just stay away from devices, not sysdev, not "struct
device".
For other things like CPUs, which are fine to be represented as driver
core devices, all the above is still valid, and they should be real
devices and have their own subsystem, which exposes them to coldplug and
usual event handling.
> Well, Greg apparently thinks that available_clocksource and current_clocksource
> could be located under /sys/bus/clock/. Perhaps other attributes now exported
> through sysdevs could be moved to places like this?
Sure, we could do that. All such subsystems have a directory to put
subsystem-global stuff. In this case it would be a subsystem without any
registered device. But it leaves us open to add real devices to it
later, which might be the case for some similar subsystems.
The other option would be /sys/kernel/clocksource/ with the few
attributes to create.
We should decide if "clocksource" is kind of "device-related" or not. Do
you have any list of subsystems besides "clocksource", which would help
to get a bigger picture what we should expect?
Kay
On Wednesday, March 23, 2011, Kay Sievers wrote:
> On Tue, 2011-03-22 at 23:05 +0100, Rafael J. Wysocki wrote:
> > On Tuesday, March 22, 2011, Kay Sievers wrote:
>
> > >> Because they are devices, and there is a lot to win, if the kernel
> > > exports all "devices" in the same way. This is not about saving an inode
> > > in /sys, it's the ability to do runtime device configuration with common
> > > tools.
> >
> > If I understand you correctly, the goal is to have everything under
> > /sys/devices/ been represented by struct device objects and there are good
> > reasons to do that.
> >
> > In that case we either have to move the things exported via sysdevs somewhere
> > else (presumably having to create that "somewhere" before), or we have to
> > introduce struct device objects specifically for exporting them. I don't
> > really think the latter approach will be very popular, so quite likely we'll
> > need to have a plan for moving those things to different locations.
>
> Ok, now that we sorted the misunderstanding out, let's start this topic
> from here again. (I really read your reply as: "let's move the CPUS
> somewhere else")
Well, I didn't mean that. :-)
> What's the list of stuff you discovered using sysdevs, which is not a
> device at all, and has never more than a single instance of stuff of the
> same type? The clocksource?
>
> And let's find some place for them to put their properties instead of
> messing around with device from the driver core.
On x86 the clocksource will be the only remaining one after my changes
recently posted. In fact, it will be the only remaining one except for the
CPUs. :-)
The other architectures are a mixed bag, though. First, there's some exported
stuff similar to the clocksource. For one example, there's the "leds" sysdev
in arch/arm/kernel/leds.c that appears to export one attribute (in addition
to implementing suspend/resume/shutdown which I'm going to move to syscore_ops).
Second, there's some outright abuse here-and-there that may be removed entirely.
Finally, there are things that probably may be converted to struct devices,
in the powerpc tree in particular IIRC.
IMHO all of these things will have to be considered on the case-by-case basis
and discussed with the appropriate maintainers.
Thanks,
Rafael
On Wednesday, March 23, 2011, Kay Sievers wrote:
> On Wed, 2011-03-23 at 00:32 +0100, Rafael J. Wysocki wrote:
> > On Tuesday, March 22, 2011, Kay Sievers wrote:
> > > On Tue, 2011-03-22 at 23:23 +0100, Rafael J. Wysocki wrote:
> > > > On Tuesday, March 22, 2011, Kay Sievers wrote:
> > ...
> > > >
> > > > Now, I can easily understand arguments about representing everything under
> > > > /sys/devices/ by struct device objects, no question about that. However,
> > > > I also think there should be a place for things like those mentioned in the
> > > > comment in sys.c, presumably outside of /sys/devices/.
> > >
> > > No, please. We have all we need. Let's do one example, which you might
> > > apply to any other thing, because you never know what's the next big
> > > thing in hardware. We need to be a future-proof-as-possible, and that's
> > > not some second-class out-of-scope sysfs directory.
> > >
> > > Lets' take CPUs:
> > > - they send events when registered
> > > - they want to export device specific properties
> > > - userspace wants to take actions when such devices are available
> > >
> > > That all fits properly into the driver model in theory. Unless you do
> > > coldplug and bootup a box.
> > >
> > > These devices are already there before userspace even starts, hence we
> > > find all these devices and "trigger" an fake uevent for all of them at
> > > bootup. That will match execute all the rules specified for that device,
> > > just as it would be hotplugges in that moment, hence we call it
> > > coldplug, which works for all devices with the hotplug code, even when
> > > they are never hot-pluggable.
> > >
> > > What we do for coldplug is that we iterate over all flat lists of
> > > subsystems and find the devices lists and trigger the event by poking in
> > > the "uevent" sysfs file. Now all the sysdevs do not have a subsystem to
> > > find, and do not have a standard "uevent" file.
> > >
> > > Back to the CPUs, we have all the nice device directories which could
> > > have all the CPU features in properties we need to make autoloading of
> > > cpufreq, governer, kvm possible (patch exists from Andi Kleen already)
> > >
> > > But these dumb CPU sysfs device directories are completely invisible for
> > > the *usual* logic, and should just join the model and all will just work
> > > out-of-the-box.
> >
> > That all is cool, but I'm not sure how it is related to things like
> > available_clocksource and current_clocksource (which happen to be located
> > under /sys/devices/system/clocksource/clocksource0/ being simply a path
> > in sysfs).
>
> Sure, it isn't related to clocksource at all. I didn't really get the
> idea that there are users that just fake core devices only to get a
> place to put a couple of attributes. I was still in the context of the
> $SUBJECT of this thread.
>
> This stuff should just stay away from devices, not sysdev, not "struct
> device".
>
> For other things like CPUs, which are fine to be represented as driver
> core devices, all the above is still valid, and they should be real
> devices and have their own subsystem, which exposes them to coldplug and
> usual event handling.
>
> > Well, Greg apparently thinks that available_clocksource and current_clocksource
> > could be located under /sys/bus/clock/. Perhaps other attributes now exported
> > through sysdevs could be moved to places like this?
>
> Sure, we could do that. All such subsystems have a directory to put
> subsystem-global stuff. In this case it would be a subsystem without any
> registered device. But it leaves us open to add real devices to it
> later, which might be the case for some similar subsystems.
>
> The other option would be /sys/kernel/clocksource/ with the few
> attributes to create.
>
> We should decide if "clocksource" is kind of "device-related" or not. Do
> you have any list of subsystems besides "clocksource", which would help
> to get a bigger picture what we should expect?
Not at the moment. I'll prepare one while working on syscore_ops patches
for the non-x86 architectures.
Thanks,
Rafael
On Tue, Mar 22, 2011 at 06:07:21PM -0400, Rafael J. Wysocki wrote:
> On Tuesday, March 22, 2011, Joerg Roedel wrote:
> > On Mon, Mar 21, 2011 at 07:36:17PM -0400, Rafael J. Wysocki wrote:
> > > drivers/pci/intel-iommu.c | 38 +++++++++-----------------------------
> > > 1 file changed, 9 insertions(+), 29 deletions(-)
> >
> > Looks good.
>
> May I take that as an ACK?
Sure :-)
> > I prepare a patch to convert AMD IOMMU to syscore_ops too.
>
> Already done. :-)
>
> It's a part of patch [1/6].
I see, saves me work, thanks :-)
Joerg
--
AMD Operating System Research Center
Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632
On Tue, Mar 22, 2011 at 11:44:03PM +0100, Kay Sievers wrote:
> When we started to clean up /sys (again only talking about devices, not
> other stuff) we had:
> /block/*
> /class/<subsys>/*
> /bus/<subsys>/devices/*
> /devices/system/<subsys>/*
> which are 4 different exports of exactly the same thing, a "device".
> "Block" we converted to "class" already, "class" will be converted to
> "bus", and "bus" will be renamed to "subsystem". All the current names
> will be kept as compat symlinks, just as we did for "block".
I bet userspace will end up failing with that and need updating yet
again, inspite of the symlinks.
While Linus complains about the churn in the ARM subtree, I'd like to
officially complain about the pointless churn in sysfs which actively
breaks (and sometimes prevents) userspace booting when you upgrade
kernels.
On Tue, Mar 22, 2011 at 09:19:28PM +0100, Rafael J. Wysocki wrote:
> Convert the SuperH clocks framework and shared interrupt handling
> code to using struct syscore_ops instead of a sysdev classes and
> sysdevs for power managment.
>
> This reduces the code size significantly and simplifies it. The
> optimizations causing things not to be restored after creating a
> hibernation image are removed, but they might lead to undesirable
> effects during resume from hibernation (e.g. the clocks would be left
> as the boot kernel set them, which might be not the same way as the
> hibernated kernel had seen them before the hibernation).
>
> This also is necessary for removing sysdevs from the kernel entirely
> in the future.
>
> Signed-off-by: Rafael J. Wysocki <[email protected]>
> ---
> drivers/sh/clk/core.c | 68 ++++++++-----------------------
> drivers/sh/intc/core.c | 95 +++++++++++++++++++++-----------------------
> drivers/sh/intc/internals.h | 1
> 3 files changed, 65 insertions(+), 99 deletions(-)
>
This one looks good, and seems to work fine. Applied, thanks.
On Wed, Mar 23, 2011 at 07:23:48AM +0900, Paul Mundt wrote:
> On Tue, Mar 22, 2011 at 11:00:56PM +0100, Kay Sievers wrote:
> > Which is what we need to get rid of. It does not make any sense on the
> > global picture to have anything like that exported to userspace.
> So far I haven't heard any rationale for why it doesn't. Exporting CPU
> state to userspace certainly makes sense, and the sysdev model has worked
> reasonably for CPUs, memory nodes, etc.
FWIW it'd be really helpful to have CPUs (or at least SoCs) be regular
struct devices for integration with the regulator API so we can have all
things that might use a regulator (like DVFS) be struct devices but...
> Once cpufreq, timekeeping, and NUMA node state have migrated to whatever
> the driver model folks find acceptable, I'll happily follow suit.
...we're not precisely there yet :/
On Wed, Mar 23, 2011 at 11:12:20AM +0000, Mark Brown wrote:
> On Wed, Mar 23, 2011 at 07:23:48AM +0900, Paul Mundt wrote:
> > On Tue, Mar 22, 2011 at 11:00:56PM +0100, Kay Sievers wrote:
>
> > > Which is what we need to get rid of. It does not make any sense on the
> > > global picture to have anything like that exported to userspace.
>
> > So far I haven't heard any rationale for why it doesn't. Exporting CPU
> > state to userspace certainly makes sense, and the sysdev model has worked
> > reasonably for CPUs, memory nodes, etc.
>
> FWIW it'd be really helpful to have CPUs (or at least SoCs) be regular
> struct devices for integration with the regulator API so we can have all
> things that might use a regulator (like DVFS) be struct devices but...
>
Sure, that makes sense. The easiest would probably be to just replace the
struct cpu sysdev with a struct device pointer and fix up drivers/base/cpu.c
accordingly. The linux/cpu.h API is unfortunately rather coupled to the
idea of having a sysdev, but this is purely for attributes and attribute
groups and primarily impacts powerpc, so the conversion shouldn't be too
painful. For simple topology registration the bulk of the architectures
ultimately don't care what's backing the struct cpu within the sysfs
context.
On Wednesday, March 23, 2011, Paul Mundt wrote:
> On Tue, Mar 22, 2011 at 09:19:28PM +0100, Rafael J. Wysocki wrote:
> > Convert the SuperH clocks framework and shared interrupt handling
> > code to using struct syscore_ops instead of a sysdev classes and
> > sysdevs for power managment.
> >
> > This reduces the code size significantly and simplifies it. The
> > optimizations causing things not to be restored after creating a
> > hibernation image are removed, but they might lead to undesirable
> > effects during resume from hibernation (e.g. the clocks would be left
> > as the boot kernel set them, which might be not the same way as the
> > hibernated kernel had seen them before the hibernation).
> >
> > This also is necessary for removing sysdevs from the kernel entirely
> > in the future.
> >
> > Signed-off-by: Rafael J. Wysocki <[email protected]>
> > ---
> > drivers/sh/clk/core.c | 68 ++++++++-----------------------
> > drivers/sh/intc/core.c | 95 +++++++++++++++++++++-----------------------
> > drivers/sh/intc/internals.h | 1
> > 3 files changed, 65 insertions(+), 99 deletions(-)
> >
> This one looks good, and seems to work fine. Applied, thanks.
Cool, thanks!
Hi Linus,
Please pull additional power management updates for 2.6.39 from:
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6.git syscore
They make subsystems that x86 depends on use struct syscore_ops objects instead
of sysdevs for "core" power management, which reduces the code size and kernel
memory footprint a bit and sipmlifies the "core" suspend/resume and shutdown
code paths.
arch/x86/Kconfig | 1 +
arch/x86/kernel/amd_iommu_init.c | 26 ++--------
arch/x86/kernel/apic/apic.c | 33 ++++---------
arch/x86/kernel/apic/io_apic.c | 97 ++++++++++++++++++--------------------
arch/x86/kernel/cpu/mcheck/mce.c | 21 +++++----
arch/x86/kernel/cpu/mtrr/main.c | 10 ++--
arch/x86/kernel/i8237.c | 30 +++---------
arch/x86/kernel/i8259.c | 33 ++++---------
arch/x86/kernel/microcode_core.c | 34 ++++++--------
arch/x86/kernel/pci-gart_64.c | 32 +++----------
arch/x86/oprofile/nmi_int.c | 44 +++++------------
drivers/base/Kconfig | 7 +++
drivers/base/sys.c | 3 +-
drivers/cpufreq/cpufreq.c | 66 ++++++++++----------------
drivers/pci/intel-iommu.c | 38 ++++-----------
include/linux/device.h | 4 ++
include/linux/pm.h | 10 +++-
include/linux/sysdev.h | 7 ++-
kernel/time/timekeeping.c | 27 +++-------
virt/kvm/kvm_main.c | 34 +++----------
20 files changed, 206 insertions(+), 351 deletions(-)
---------------
Rafael J. Wysocki (6):
x86: Use syscore_ops instead of sysdev classes and sysdevs
timekeeping: Use syscore_ops instead of sysdev class and sysdev
PCI / Intel IOMMU: Use syscore_ops instead of sysdev class and sysdev
KVM: Use syscore_ops instead of sysdev class and sysdev
cpufreq: Use syscore_ops for boot CPU suspend/resume (v2)
Introduce ARCH_NO_SYSDEV_OPS config option (v2)
On Sat, Mar 12, 2011 at 1:18 PM, Rafael J. Wysocki <[email protected]> wrote:
> -static int __init init_iommu_sysfs(void)
> -{
> - ? ? ? return 0;
> -}
> +static inline int init_iommu_pm_ops(void) { }
> ?#endif /* CONFIG_PM */
drivers/pci/intel-iommu.c:3391: warning: no return statement in
function returning non-void
s/static inline int/static inline void/
Reported-by: Tony Luck <[email protected]>
-Tony
On Thursday, June 02, 2011, Tony Luck wrote:
> On Sat, Mar 12, 2011 at 1:18 PM, Rafael J. Wysocki <[email protected]> wrote:
> > -static int __init init_iommu_sysfs(void)
> > -{
> > - return 0;
> > -}
> > +static inline int init_iommu_pm_ops(void) { }
> > #endif /* CONFIG_PM */
>
> drivers/pci/intel-iommu.c:3391: warning: no return statement in
> function returning non-void
>
> s/static inline int/static inline void/
>
> Reported-by: Tony Luck <[email protected]>
I guess you mean the following?
Rafael
---
From: Rafael J. Wysocki <[email protected]>
If CONFIG_PM is not set, init_iommu_pm_ops() introduced by commit
134fac3f457f3dd753ecdb25e6da3e5f6629f696 (PCI / Intel IOMMU: Use
syscore_ops instead of sysdev class and sysdev) is not defined
appropriately. Fix this issue.
Reported-by: Tony Luck <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>
---
drivers/pci/intel-iommu.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
Index: linux-2.6/drivers/pci/intel-iommu.c
===================================================================
--- linux-2.6.orig/drivers/pci/intel-iommu.c
+++ linux-2.6/drivers/pci/intel-iommu.c
@@ -3388,7 +3388,7 @@ static void __init init_iommu_pm_ops(voi
}
#else
-static inline int init_iommu_pm_ops(void) { }
+static inline void init_iommu_pm_ops(void) {}
#endif /* CONFIG_PM */
/*
> I guess you mean the following?
Rafael
Exactly.
Thanks
-Tony
---
From: Rafael J. Wysocki <[email protected]>
If CONFIG_PM is not set, init_iommu_pm_ops() introduced by commit
134fac3f457f3dd753ecdb25e6da3e5f6629f696 (PCI / Intel IOMMU: Use
syscore_ops instead of sysdev class and sysdev) is not defined
appropriately. Fix this issue.
Reported-by: Tony Luck <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>
---
drivers/pci/intel-iommu.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
Index: linux-2.6/drivers/pci/intel-iommu.c
===================================================================
--- linux-2.6.orig/drivers/pci/intel-iommu.c
+++ linux-2.6/drivers/pci/intel-iommu.c
@@ -3388,7 +3388,7 @@ static void __init init_iommu_pm_ops(voi
}
#else
-static inline int init_iommu_pm_ops(void) { }
+static inline void init_iommu_pm_ops(void) {}
#endif /* CONFIG_PM */
/*